Skip to content

glob.* and regex.globs_match builtins #210

@matajoh

Description

@matajoh

Summary

Implement the glob.match, glob.quote_meta, and regex.globs_match built-in functions. All three are currently registered as placeholders in src/builtins/glob.cc and src/builtins/regex.cc respectively, returning "glob not supported" at runtime.

OPA reference

  • glob.match: glob.match(pattern, delimiters, match) — returns true if match matches the glob pattern, using delimiters to split into segments. If delimiters is null, matches without delimiters; defaults to ["."] if unset.
  • glob.quote_meta: glob.quote_meta(pattern) — escapes all glob metacharacters in pattern so it can be used as a literal.
  • regex.globs_match: regex.globs_match(glob1, glob2) — returns true if the intersection of the two glob-style regular expressions matches a non-empty set of non-empty strings.

OPA implementation reference

Current state

  • src/builtins/glob.cc: Placeholder declarations for glob.match and glob.quote_meta exist with full type metadata; both return BuiltInDef::placeholder(...).
  • src/builtins/regex.cc:656-674: Placeholder declaration for regex.globs_match exists; returns BuiltInDef::placeholder(...).
  • Dispatcher routing in both files is already wired up; only the implementation functions need to be written and swapped from placeholder to create.

Work items

  1. glob.match implementation — Implement a glob matching engine that supports * (match any sequence within a segment), ? (match a single character), character classes [abc] / [a-z], alternation {foo,bar}, and configurable path delimiters. Standard C/POSIX fnmatch does not support custom delimiters or OPA's full semantics, so this likely requires a small custom matching function or integrating a suitable C/C++ glob library.
  2. glob.quote_meta implementation — Escape all glob metacharacters (*, ?, [, ], {, }, \\) in the input string. This is straightforward (~20 lines).
  3. regex.globs_match implementation — Convert both glob patterns to regular expressions and test whether their intersection is non-empty. OPA's approach converts globs to regexes and uses an approximation. A glob-to-regex converter can be shared with glob.match.
  4. Swap BuiltInDef::placeholderBuiltInDef::create in all three factory functions.
  5. Update README.md — Remove glob.* and regex.globs_match from the unsupported builtins list (lines 150, 156).
  6. Tests — Add test cases covering:
    • Basic wildcard matching (*, ?)
    • Character classes and alternation
    • Delimiter-based segmentation (e.g., "a.b.c" with ["."])
    • Null delimiters (no segmentation)
    • quote_meta round-tripping
    • globs_match positive and negative cases
    • Edge cases: empty pattern, empty string, escaped metacharacters
    • Validate behavior matches OPA for the relevant OPA compliance tests

Notes

  • The glob metacharacter set and matching semantics should match OPA's behavior, which follows the gobwas/glob library conventions.
  • regex.globs_match is tightly coupled to the glob implementation (needs the same glob-to-regex converter), which is why it is included in this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    built-insAdding built-in functionsopa-compatIncreasing compatibility with the upstream OPA implementation.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions