⚡️ Speed up method `ReferenceFinder._find_references_in_file` by 313% in PR #1335 (`gpu-flag`) #1356

codeflash-ai · 2026-02-04T01:22:38Z

⚡️ This pull request contains optimizations for PR #1335

If you approve this dependent PR, these changes will be merged into the original PR branch gpu-flag.

This PR will be automatically closed if the original PR is merged.

📄 313% (3.13x) speedup for `ReferenceFinder._find_references_in_file` in `codeflash/languages/javascript/find_references.py`

⏱️ Runtime : 5.05 milliseconds → 1.22 milliseconds (best of 8 runs)

📝 Explanation and details

This optimization achieves a 313% speedup (from 5.05ms to 1.22ms) by eliminating redundant string decoding operations during AST traversal. The key improvements are:

What was optimized:

Node text caching: Added _node_text_cache and _node_bytes_cache dictionaries to store decoded text and byte slices for each tree-sitter node, keyed by node ID
Lazy decoding: Introduced _get_node_text() and _get_node_bytes() helper methods that cache results on first access
Byte-level comparisons: Changed identifier matching from string equality (name == search_name) to byte equality (node_bytes == search_bytes), avoiding UTF-8 decoding unless necessary
Pre-encoded search term: The search_name is encoded once per file as search_bytes rather than repeatedly during comparisons

Why this is faster:
The original code repeatedly sliced and decoded the same AST node text during recursive traversal. Line profiler shows _find_identifier_references spent 52.1% of time in child_by_field_name("function") and 13.9% checking node types, with additional time decoding node text multiple times. The optimization eliminates this redundancy—each node's text is decoded at most once and cached. Byte comparisons are faster than string comparisons in Python and skip decoding entirely when names don't match.

Impact:

The line profiler shows _find_references_in_file total time dropped from 21.5ms to 6.6ms (69% reduction)
The recursive _find_identifier_references becomes dramatically faster by avoiding repeated decode operations on the same nodes
Memory overhead is minimal—caches are cleared per file and only store node IDs and their decoded text
This optimization particularly benefits files with many function calls or deep AST nesting where the same parent/child nodes are accessed repeatedly

The caching strategy is safe because tree-sitter nodes are immutable within a parse tree, and the caches are explicitly cleared between files to prevent memory leaks or cross-file contamination.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 27 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

To edit these changes git checkout codeflash/optimize-pr1335-2026-02-04T01.22.32 and push.

Add a `gpu` parameter to instrument tests with torch.cuda.Event timing instead of time.perf_counter_ns() for measuring GPU kernel execution time. Falls back to CPU timing when CUDA is not available/initialized. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Fix unused variables, single-item membership tests, unnecessary lambdas, and ternary expressions that can use `or` operator. Co-Authored-By: Claude Opus 4.5 <[email protected]>

This optimization achieves a **313% speedup** (from 5.05ms to 1.22ms) by eliminating redundant string decoding operations during AST traversal. The key improvements are: **What was optimized:** 1. **Node text caching**: Added `_node_text_cache` and `_node_bytes_cache` dictionaries to store decoded text and byte slices for each tree-sitter node, keyed by node ID 2. **Lazy decoding**: Introduced `_get_node_text()` and `_get_node_bytes()` helper methods that cache results on first access 3. **Byte-level comparisons**: Changed identifier matching from string equality (`name == search_name`) to byte equality (`node_bytes == search_bytes`), avoiding UTF-8 decoding unless necessary 4. **Pre-encoded search term**: The `search_name` is encoded once per file as `search_bytes` rather than repeatedly during comparisons **Why this is faster:** The original code repeatedly sliced and decoded the same AST node text during recursive traversal. Line profiler shows `_find_identifier_references` spent 52.1% of time in `child_by_field_name("function")` and 13.9% checking node types, with additional time decoding node text multiple times. The optimization eliminates this redundancy—each node's text is decoded at most once and cached. Byte comparisons are faster than string comparisons in Python and skip decoding entirely when names don't match. **Impact:** - The line profiler shows `_find_references_in_file` total time dropped from 21.5ms to 6.6ms (69% reduction) - The recursive `_find_identifier_references` becomes dramatically faster by avoiding repeated decode operations on the same nodes - Memory overhead is minimal—caches are cleared per file and only store node IDs and their decoded text - This optimization particularly benefits files with many function calls or deep AST nesting where the same parent/child nodes are accessed repeatedly The caching strategy is safe because tree-sitter nodes are immutable within a parse tree, and the caches are explicitly cleared between files to prevent memory leaks or cross-file contamination.

aseembits93 and others added 5 commits February 3, 2026 14:33

Merge branch 'main' into gpu-flag

dce74b1

fix: resolve ruff lint errors for pre-commit

a4e0fb4

Fix unused variables, single-item membership tests, unnecessary lambdas, and ternary expressions that can use `or` operator. Co-Authored-By: Claude Opus 4.5 <[email protected]>

linter fixes

805e612

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 4, 2026

codeflash-ai bot mentioned this pull request Feb 4, 2026

feat: add gpu flag for CUDA event-based timing #1335

Open

2 tasks

KRRT7 force-pushed the gpu-flag branch from e02071d to 85088c3 Compare February 4, 2026 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `ReferenceFinder._find_references_in_file` by 313% in PR #1335 (`gpu-flag`) #1356

⚡️ Speed up method `ReferenceFinder._find_references_in_file` by 313% in PR #1335 (`gpu-flag`) #1356

Uh oh!

codeflash-ai bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method ReferenceFinder._find_references_in_file by 313% in PR #1335 (gpu-flag) #1356

Are you sure you want to change the base?

⚡️ Speed up method ReferenceFinder._find_references_in_file by 313% in PR #1335 (gpu-flag) #1356

Uh oh!

Conversation

codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1335

📄 313% (3.13x) speedup for ReferenceFinder._find_references_in_file in codeflash/languages/javascript/find_references.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `ReferenceFinder._find_references_in_file` by 313% in PR #1335 (`gpu-flag`) #1356

⚡️ Speed up method `ReferenceFinder._find_references_in_file` by 313% in PR #1335 (`gpu-flag`) #1356

📄 313% (3.13x) speedup for `ReferenceFinder._find_references_in_file` in `codeflash/languages/javascript/find_references.py`