Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1335

If you approve this dependent PR, these changes will be merged into the original PR branch gpu-flag.

This PR will be automatically closed if the original PR is merged.


📄 313% (3.13x) speedup for ReferenceFinder._find_references_in_file in codeflash/languages/javascript/find_references.py

⏱️ Runtime : 5.05 milliseconds 1.22 milliseconds (best of 8 runs)

📝 Explanation and details

This optimization achieves a 313% speedup (from 5.05ms to 1.22ms) by eliminating redundant string decoding operations during AST traversal. The key improvements are:

What was optimized:

  1. Node text caching: Added _node_text_cache and _node_bytes_cache dictionaries to store decoded text and byte slices for each tree-sitter node, keyed by node ID
  2. Lazy decoding: Introduced _get_node_text() and _get_node_bytes() helper methods that cache results on first access
  3. Byte-level comparisons: Changed identifier matching from string equality (name == search_name) to byte equality (node_bytes == search_bytes), avoiding UTF-8 decoding unless necessary
  4. Pre-encoded search term: The search_name is encoded once per file as search_bytes rather than repeatedly during comparisons

Why this is faster:
The original code repeatedly sliced and decoded the same AST node text during recursive traversal. Line profiler shows _find_identifier_references spent 52.1% of time in child_by_field_name("function") and 13.9% checking node types, with additional time decoding node text multiple times. The optimization eliminates this redundancy—each node's text is decoded at most once and cached. Byte comparisons are faster than string comparisons in Python and skip decoding entirely when names don't match.

Impact:

  • The line profiler shows _find_references_in_file total time dropped from 21.5ms to 6.6ms (69% reduction)
  • The recursive _find_identifier_references becomes dramatically faster by avoiding repeated decode operations on the same nodes
  • Memory overhead is minimal—caches are cleared per file and only store node IDs and their decoded text
  • This optimization particularly benefits files with many function calls or deep AST nesting where the same parent/child nodes are accessed repeatedly

The caching strategy is safe because tree-sitter nodes are immutable within a parse tree, and the caches are explicitly cleared between files to prevent memory leaks or cross-file contamination.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 27 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests

To edit these changes git checkout codeflash/optimize-pr1335-2026-02-04T01.22.32 and push.

Codeflash Static Badge

aseembits93 and others added 5 commits February 3, 2026 14:33
Add a `gpu` parameter to instrument tests with torch.cuda.Event timing
instead of time.perf_counter_ns() for measuring GPU kernel execution time.
Falls back to CPU timing when CUDA is not available/initialized.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fix unused variables, single-item membership tests, unnecessary lambdas,
and ternary expressions that can use `or` operator.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
This optimization achieves a **313% speedup** (from 5.05ms to 1.22ms) by eliminating redundant string decoding operations during AST traversal. The key improvements are:

**What was optimized:**
1. **Node text caching**: Added `_node_text_cache` and `_node_bytes_cache` dictionaries to store decoded text and byte slices for each tree-sitter node, keyed by node ID
2. **Lazy decoding**: Introduced `_get_node_text()` and `_get_node_bytes()` helper methods that cache results on first access
3. **Byte-level comparisons**: Changed identifier matching from string equality (`name == search_name`) to byte equality (`node_bytes == search_bytes`), avoiding UTF-8 decoding unless necessary
4. **Pre-encoded search term**: The `search_name` is encoded once per file as `search_bytes` rather than repeatedly during comparisons

**Why this is faster:**
The original code repeatedly sliced and decoded the same AST node text during recursive traversal. Line profiler shows `_find_identifier_references` spent 52.1% of time in `child_by_field_name("function")` and 13.9% checking node types, with additional time decoding node text multiple times. The optimization eliminates this redundancy—each node's text is decoded at most once and cached. Byte comparisons are faster than string comparisons in Python and skip decoding entirely when names don't match.

**Impact:**
- The line profiler shows `_find_references_in_file` total time dropped from 21.5ms to 6.6ms (69% reduction)
- The recursive `_find_identifier_references` becomes dramatically faster by avoiding repeated decode operations on the same nodes
- Memory overhead is minimal—caches are cleared per file and only store node IDs and their decoded text
- This optimization particularly benefits files with many function calls or deep AST nesting where the same parent/child nodes are accessed repeatedly

The caching strategy is safe because tree-sitter nodes are immutable within a parse tree, and the caches are explicitly cleared between files to prevent memory leaks or cross-file contamination.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant