Skip to content

NaturalIntelligence/path-expression-matcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

path-expression-matcher

Efficient path tracking and pattern matching for XML, JSON, YAML or any other parsers.

🎯 Purpose

path-expression-matcher provides two core classes for tracking and matching paths:

  • Expression: Parses and stores pattern expressions (e.g., "root.users.user[id]")
  • Matcher: Tracks current path during parsing and matches against expressions

Compatible with fast-xml-parser and similar tools.

πŸ“¦ Installation

npm install path-expression-matcher

πŸš€ Quick Start

import { Expression, Matcher } from 'path-expression-matcher';

// Create expression (parse once, reuse many times)
const expr = new Expression("root.users.user");

// Create matcher (tracks current path)
const matcher = new Matcher();

matcher.push("root");
matcher.push("users");
matcher.push("user", { id: "123" });

// Match current path against expression
if (matcher.matches(expr)) {
  console.log("Match found!");
  console.log("Current path:", matcher.toString()); // "root.users.user"
}

πŸ“– Pattern Syntax

Basic Paths

"root.users.user"           // Exact path match
"*.users.user"              // Wildcard: any parent
"root.*.user"               // Wildcard: any middle
"root.users.*"              // Wildcard: any child

Deep Wildcard

"..user"                    // user anywhere in tree
"root..user"                // user anywhere under root
"..users..user"             // users somewhere, then user below it

Attribute Matching

"user[id]"                  // user with "id" attribute
"user[type=admin]"          // user with type="admin" (current node only)
"root[lang]..user"          // user under root that has "lang" attribute

Position Selectors

"user:first"                // First user (counter=0)
"user:nth(2)"               // Third user (counter=2, zero-based)
"user:odd"                  // Odd-numbered users (counter=1,3,5...)
"user:even"                 // Even-numbered users (counter=0,2,4...)
"root.users.user:first"     // First user under users

Note: Position selectors use the counter (occurrence count of the tag name), not the position (child index). For example, in <root><a/><b/><a/></root>, the second <a/> has position=2 but counter=1.

Combined Patterns

"..user[id]:first"          // First user with id, anywhere
"root..user[type=admin]"    // Admin user under root

πŸ”§ API Reference

Expression

Constructor

new Expression(pattern, options)

Parameters:

  • pattern (string): Pattern to parse
  • options.separator (string): Path separator (default: '.')

Example:

const expr1 = new Expression("root.users.user");
const expr2 = new Expression("root/users/user", { separator: '/' });

Methods

  • hasDeepWildcard() β†’ boolean
  • hasAttributeCondition() β†’ boolean
  • hasPositionSelector() β†’ boolean
  • toString() β†’ string

Matcher

Constructor

new Matcher(options)

Parameters:

  • options.separator (string): Default path separator (default: '.')

Path Tracking Methods

push(tagName, attrValues)

Add a tag to the current path. Position and counter are automatically calculated.

Parameters:

  • tagName (string): Tag name
  • attrValues (object, optional): Attribute key-value pairs (current node only)

Example:

matcher.push("user", { id: "123", type: "admin" });
matcher.push("item");  // No attributes

Position vs Counter:

  • Position: The child index in the parent (0, 1, 2, 3...)
  • Counter: How many times this tag name appeared at this level (0, 1, 2...)

Example:

<root>
  <a/>      <!-- position=0, counter=0 -->
  <b/>      <!-- position=1, counter=0 -->
  <a/>      <!-- position=2, counter=1 -->
</root>
pop()

Remove the last tag from the path.

matcher.pop();
updateCurrent(attrValues)

Update current node's attributes (useful when attributes are parsed after push).

matcher.push("user");  // Don't know values yet
// ... parse attributes ...
matcher.updateCurrent({ id: "123" });
reset()

Clear the entire path.

matcher.reset();

Query Methods

matches(expression)

Check if current path matches an Expression.

const expr = new Expression("root.users.user");
if (matcher.matches(expr)) {
  // Current path matches
}
getCurrentTag()

Get current tag name.

const tag = matcher.getCurrentTag(); // "user"
getAttrValue(attrName)

Get attribute value of current node.

const id = matcher.getAttrValue("id"); // "123"
hasAttr(attrName)

Check if current node has an attribute.

if (matcher.hasAttr("id")) {
  // Current node has "id" attribute
}
getPosition()

Get sibling position of current node (child index in parent).

const position = matcher.getPosition(); // 0, 1, 2, ...
getCounter()

Get repeat counter of current node (occurrence count of this tag name).

const counter = matcher.getCounter(); // 0, 1, 2, ...
getIndex() (deprecated)

Alias for getPosition(). Use getPosition() or getCounter() instead for clarity.

const index = matcher.getIndex(); // Same as getPosition()
getDepth()

Get current path depth.

const depth = matcher.getDepth(); // 3 for "root.users.user"
toString(separator?)

Get path as string.

const path = matcher.toString();     // "root.users.user"
const path2 = matcher.toString('/'); // "root/users/user"
toArray()

Get path as array.

const arr = matcher.toArray(); // ["root", "users", "user"]

State Management

snapshot()

Create a snapshot of current state.

const snapshot = matcher.snapshot();
restore(snapshot)

Restore from a snapshot.

matcher.restore(snapshot);

πŸ’‘ Usage Examples

Example 1: XML Parser with stopNodes

import { XMLParser } from 'fast-xml-parser';
import { Expression, Matcher } from 'path-expression-matcher';

class MyParser {
  constructor() {
    this.matcher = new Matcher();
    
    // Pre-compile stop node patterns
    this.stopNodeExpressions = [
      new Expression("html.body.script"),
      new Expression("html.body.style"),
      new Expression("..svg"),
    ];
  }
  
  parseTag(tagName, attrs) {
    this.matcher.push(tagName, attrs);
    
    // Check if this is a stop node
    for (const expr of this.stopNodeExpressions) {
      if (this.matcher.matches(expr)) {
        // Don't parse children, read as raw text
        return this.readRawContent();
      }
    }
    
    // Continue normal parsing
    this.parseChildren();
    
    this.matcher.pop();
  }
}

Example 2: Conditional Processing

const matcher = new Matcher();
const userExpr = new Expression("..user[type=admin]");
const firstItemExpr = new Expression("..item:first");

function processTag(tagName, value, attrs) {
  matcher.push(tagName, attrs);
  
  if (matcher.matches(userExpr)) {
    value = enhanceAdminUser(value);
  }
  
  if (matcher.matches(firstItemExpr)) {
    value = markAsFirst(value);
  }
  
  matcher.pop();
  return value;
}

Example 3: Path-based Filtering

const patterns = [
  new Expression("data.users.user"),
  new Expression("data.posts.post"),
  new Expression("..comment[approved=true]"),
];

function shouldInclude(matcher) {
  return patterns.some(expr => matcher.matches(expr));
}

Example 4: Custom Separator

const matcher = new Matcher({ separator: '/' });
const expr = new Expression("root/config/database", { separator: '/' });

matcher.push("root");
matcher.push("config");
matcher.push("database");

console.log(matcher.toString()); // "root/config/database"
console.log(matcher.matches(expr)); // true

Example 5: Attribute Checking

const matcher = new Matcher();
matcher.push("root");
matcher.push("user", { id: "123", type: "admin", status: "active" });

// Check attribute existence (current node only)
console.log(matcher.hasAttr("id"));        // true
console.log(matcher.hasAttr("email"));     // false

// Get attribute value (current node only)
console.log(matcher.getAttrValue("type")); // "admin"

// Match by attribute
const expr1 = new Expression("user[id]");
console.log(matcher.matches(expr1));       // true

const expr2 = new Expression("user[type=admin]");
console.log(matcher.matches(expr2));       // true

Example 6: Position vs Counter

const matcher = new Matcher();
matcher.push("root");

// Mixed tags at same level
matcher.push("item");  // position=0, counter=0 (first item)
matcher.pop();

matcher.push("div");   // position=1, counter=0 (first div)
matcher.pop();

matcher.push("item");  // position=2, counter=1 (second item)

console.log(matcher.getPosition()); // 2 (third child overall)
console.log(matcher.getCounter());  // 1 (second "item" specifically)

// :first uses counter, not position
const expr = new Expression("root.item:first");
console.log(matcher.matches(expr)); // false (counter=1, not 0)

πŸ—οΈ Architecture

Data Storage Strategy

Ancestor nodes: Store only tag name, position, and counter (minimal memory) Current node: Store tag name, position, counter, and attribute values

This design minimizes memory usage:

  • No attribute names stored (derived from values object when needed)
  • Attribute values only for current node, not ancestors
  • Attribute checking for ancestors is not supported (acceptable trade-off)
  • For 1M nodes with 3 attributes each, saves ~50MB vs storing attribute names

Matching Strategy

Matching is performed bottom-to-top (from current node toward root):

  1. Start at current node
  2. Match segments from pattern end to start
  3. Attribute checking only works for current node (ancestors have no attribute data)
  4. Position selectors use counter (occurrence count), not position (child index)

Performance

  • Expression parsing: One-time cost when Expression is created
  • Expression analysis: Cached (hasDeepWildcard, hasAttributeCondition, hasPositionSelector)
  • Path tracking: O(1) for push/pop operations
  • Pattern matching: O(n*m) where n = path depth, m = pattern segments
  • Memory per ancestor node: ~40-60 bytes (tag, position, counter only)
  • Memory per current node: ~80-120 bytes (adds attribute values)

πŸŽ“ Design Patterns

Pre-compile Patterns (Recommended)

// βœ… GOOD: Parse once, reuse many times
const expr = new Expression("..user[id]");

for (let i = 0; i < 1000; i++) {
  if (matcher.matches(expr)) {
    // ...
  }
}
// ❌ BAD: Parse on every iteration
for (let i = 0; i < 1000; i++) {
  if (matcher.matches(new Expression("..user[id]"))) {
    // ...
  }
}

Batch Pattern Checking

// For multiple patterns, check all at once
const patterns = [
  new Expression("..user"),
  new Expression("..post"),
  new Expression("..comment"),
];

function matchesAny(matcher, patterns) {
  return patterns.some(expr => matcher.matches(expr));
}

πŸ”— Integration with fast-xml-parser

Basic integration:

import { XMLParser } from 'fast-xml-parser';
import { Expression, Matcher } from 'path-expression-matcher';

const parser = new XMLParser({
  // Custom options using path-expression-matcher
  stopNodes: ["script", "style"].map(tag => new Expression(`..${tag}`)),
  
  tagValueProcessor: (tagName, value, jPath, hasAttrs, isLeaf, matcher) => {
    // matcher is available in callbacks
    if (matcher.matches(new Expression("..user[type=admin]"))) {
      return enhanceValue(value);
    }
    return value;
  }
});

πŸ§ͺ Testing

npm test

All 77 tests covering:

  • Pattern parsing (exact, wildcards, attributes, position)
  • Path tracking (push, pop, update)
  • Pattern matching (all combinations)
  • Edge cases and error conditions

πŸ“„ License

MIT

🀝 Contributing

Issues and PRs welcome! This package is designed to be used by XML/JSON parsers like fast-xml-parser.

About

helpful to match path by parsers for XML, JSON, YAML or any type

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors