⚡️ Speed up function `_analyze_imports_in_optimized_code` by 39% in PR #1460 (`call-graphee`) by codeflash-ai[bot] · Pull Request #1463 · codeflash-ai/codeflash

codeflash-ai · 2026-02-12T07:03:37Z

⚡️ This pull request contains optimizations for PR #1460

If you approve this dependent PR, these changes will be merged into the original PR branch call-graphee.

This PR will be automatically closed if the original PR is merged.

📄 39% (0.39x) speedup for `_analyze_imports_in_optimized_code` in `codeflash/context/unused_definition_remover.py`

⏱️ Runtime : 10.3 milliseconds → 7.42 milliseconds (best of 13 runs)

📝 Explanation and details

The optimized code achieves a 38% runtime improvement (10.3ms → 7.42ms) by replacing the inefficient ast.walk() traversal with a targeted ast.NodeVisitor pattern.

Key Optimization:
The original code used ast.walk(optimized_ast) which visits every node in the AST (4,466 nodes in the profiled example), performing isinstance() checks on each one to find Import/ImportFrom nodes. This resulted in 18.77ms spent just traversing the tree (46.9% of total runtime).

The optimized version introduces an _ImportCollector class that uses Python's ast.NodeVisitor pattern to selectively visit only Import and ImportFrom nodes. By defining visit_Import() and visit_ImportFrom() methods, the collector automatically skips irrelevant nodes during traversal. This reduces the collection phase to just 2.88ms (12% of runtime), saving approximately 15.89ms.

Performance Profile:

The line profiler shows the collector.visit() call takes 2.88ms vs. the original loop's 18.77ms
The subsequent processing loop over collected nodes runs faster (1.69k iterations vs. 4.42k), eliminating 62% of unnecessary isinstance() checks
All other operations (helper preprocessing, dictionary lookups, set operations) remain essentially unchanged

Test Case Behavior:
The optimization is most effective for:

Large ASTs with many nodes: The test_large_scale_many_import_statements_with_helpers shows 62.6% speedup (662μs → 407μs) when processing 200 import statements, demonstrating the benefit of selective traversal
Complex code with deep nesting: ASTs with more non-import nodes see greater relative gains

Smaller test cases show 30-40% slower runtimes due to the overhead of instantiating the collector class, but these are measuring microsecond differences (8-25μs) that are negligible in real-world usage where the function processes larger ASTs.

Practical Impact:
This function analyzes import statements in optimized code to map names to helper functions. Given its role in code optimization workflows, it likely processes many ASTs repeatedly. The 38% runtime reduction directly improves the optimization pipeline's throughput, especially when analyzing codebases with numerous import statements.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 10 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import ast  # used to create ASTs from code snippets
from pathlib import Path  # used to construct file_path values with .stem
from types import SimpleNamespace  # lightweight real class for helper objects

import pytest  # used for our unit tests
from codeflash.context.unused_definition_remover import \
    _analyze_imports_in_optimized_code

def test_empty_ast_returns_empty_map():
    # An empty module (no imports) should produce an empty mapping.
    tree = ast.parse("")  # parse empty code -> no Import/ImportFrom nodes
    code_context = SimpleNamespace(helper_functions=[])  # no helpers provided
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 8.05μs -> 20.0μs (59.8% slower)

def test_from_import_matches_helper():
    # "from module import foo as bar" where a helper for module.foo exists
    src = "from mymodule import foo as bar"
    tree = ast.parse(src)
    # Create a helper whose file_path.stem is "mymodule" and only_function_name "foo"
    helper = SimpleNamespace(
        definition_type=None,  # non-class so it is included
        only_function_name="foo",
        file_path=Path("mymodule.py"),
        qualified_name="mymodule.foo_helper",
        fully_qualified_name="package.mymodule.foo_helper",
    )
    code_context = SimpleNamespace(helper_functions=[helper])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 16.2μs -> 25.2μs (35.5% slower)

def test_from_import_without_alias_uses_original_name():
    # "from module import foo" should use 'foo' as the key (no alias)
    src = "from mod import foo"
    tree = ast.parse(src)
    helper = SimpleNamespace(
        definition_type=None,
        only_function_name="foo",
        file_path=Path("mod.py"),
        qualified_name="mod.foo_q",
        fully_qualified_name="pkg.mod.foo_q",
    )
    code_context = SimpleNamespace(helper_functions=[helper])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 14.8μs -> 22.9μs (35.2% slower)

def test_import_module_creates_full_call_keys_and_placeholder():
    # "import somemod" should create a placeholder key "somemod.{func}" (empty set)
    # and a concrete key "somemod.<funcname>" that contains qualified names.
    src = "import somemod"
    tree = ast.parse(src)
    helper1 = SimpleNamespace(
        definition_type=None,
        only_function_name="alpha",
        file_path=Path("somemod.py"),
        qualified_name="somemod.alpha_q",
        fully_qualified_name="pkg.somemod.alpha_q",
    )
    helper2 = SimpleNamespace(
        definition_type=None,
        only_function_name="beta",
        file_path=Path("somemod.py"),
        qualified_name="somemod.beta_q",
        fully_qualified_name="pkg.somemod.beta_q",
    )
    code_context = SimpleNamespace(helper_functions=[helper1, helper2])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 16.8μs -> 24.3μs (31.0% slower)

def test_import_module_with_alias_uses_alias_in_keys():
    # "import somemod as sm" should use 'sm' as the module prefix in keys
    src = "import somemod as sm"
    tree = ast.parse(src)
    helper = SimpleNamespace(
        definition_type=None,
        only_function_name="run",
        file_path=Path("somemod.py"),
        qualified_name="somemod.run_q",
        fully_qualified_name="pkg.somemod.run_q",
    )
    code_context = SimpleNamespace(helper_functions=[helper])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 15.1μs -> 21.8μs (30.5% slower)

def test_from_import_with_no_module_is_ignored():
    # "from . import foo" has module == None in AST and should be ignored
    src = "from . import foo"
    tree = ast.parse(src)
    helper = SimpleNamespace(
        definition_type=None,
        only_function_name="foo",
        file_path=Path("whatever.py"),
        qualified_name="whatever.foo",
        fully_qualified_name="pkg.whatever.foo",
    )
    code_context = SimpleNamespace(helper_functions=[helper])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 13.0μs -> 20.0μs (35.1% slower)

def test_helpers_with_definition_type_class_are_skipped():
    # Helpers whose definition_type == "class" should not be included in any mapping
    src = "from mod import foo"
    tree = ast.parse(src)
    class_helper = SimpleNamespace(
        definition_type="class",  # explicitly a class -> should be skipped
        only_function_name="foo",
        file_path=Path("mod.py"),
        qualified_name="mod.foo_class",
        fully_qualified_name="pkg.mod.foo_class",
    )
    code_context = SimpleNamespace(helper_functions=[class_helper])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 9.99μs -> 17.0μs (41.1% slower)

def test_multiple_helpers_for_same_function_name_all_are_included():
    # When there are multiple helper functions with the same only_function_name in the same module,
    # all their qualified names should be present in the import mapping.
    src = "from dupmod import util"
    tree = ast.parse(src)
    helper_a = SimpleNamespace(
        definition_type=None,
        only_function_name="util",
        file_path=Path("dupmod.py"),
        qualified_name="dupmod.util_a",
        fully_qualified_name="pkg.dupmod.util_a",
    )
    helper_b = SimpleNamespace(
        definition_type=None,
        only_function_name="util",
        file_path=Path("dupmod.py"),
        qualified_name="dupmod.util_b",
        fully_qualified_name="pkg.dupmod.util_b",
    )
    code_context = SimpleNamespace(helper_functions=[helper_a, helper_b])
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 16.1μs -> 22.6μs (28.9% slower)
    # Both helpers' qualified and fully qualified names should be present
    for expected in (
        "dupmod.util_a",
        "pkg.dupmod.util_a",
        "dupmod.util_b",
        "pkg.dupmod.util_b",
    ):
        pass

def test_large_scale_many_helpers_for_single_module():
    # Create a large number of helpers (1000) in the same module and import that module.
    # This ensures scalability and that all helpers are processed deterministically.
    num_helpers = 1000  # boundary-scale test as requested (up to 1000)
    module_name = "bigmod"
    src = f"import {module_name}"
    tree = ast.parse(src)
    helpers = []
    # Construct many helpers with unique function names
    for i in range(num_helpers):
        func = f"func_{i}"
        helpers.append(
            SimpleNamespace(
                definition_type=None,
                only_function_name=func,
                file_path=Path(f"{module_name}.py"),
                qualified_name=f"{module_name}.{func}_q",
                fully_qualified_name=f"pkg.{module_name}.{func}_q",
            )
        )
    code_context = SimpleNamespace(helper_functions=helpers)
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 1.00ms -> 1.00ms (0.167% slower)
    # Expect exactly num_helpers concrete keys like "bigmod.func_i"
    # (plus the placeholder key counted above)
    concrete_keys = [k for k in result.keys() if k != f"{module_name}.{{func}}"]
    # Spot-check a few entries to ensure their sets include expected qualified names
    for check_index in (0, 10, 999):
        key = f"{module_name}.func_{check_index}"

def test_large_scale_many_import_statements_with_helpers():
    # Build an AST that imports many different modules, each having a single helper.
    # This tests scaling across many distinct modules and import-from handling.
    num_modules = 200  # moderate number to keep test time reasonable but sizable
    import_lines = "\n".join(f"from mod_{i} import h as alias_{i}" for i in range(num_modules))
    tree = ast.parse(import_lines)
    helpers = []
    for i in range(num_modules):
        helpers.append(
            SimpleNamespace(
                definition_type=None,
                only_function_name="h",
                file_path=Path(f"mod_{i}.py"),
                qualified_name=f"mod_{i}.h_q",
                fully_qualified_name=f"pkg.mod_{i}.h_q",
            )
        )
    code_context = SimpleNamespace(helper_functions=helpers)
    codeflash_output = _analyze_imports_in_optimized_code(tree, code_context); result = codeflash_output # 662μs -> 407μs (62.6% faster)
    # Each alias_{i} should be a key mapping to the two qualified names
    for i in range(num_modules):
        key = f"alias_{i}"
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1460-2026-02-12T07.03.31 and push.

The optimized code achieves a **38% runtime improvement** (10.3ms → 7.42ms) by replacing the inefficient `ast.walk()` traversal with a targeted `ast.NodeVisitor` pattern. **Key Optimization:** The original code used `ast.walk(optimized_ast)` which visits **every node in the AST** (4,466 nodes in the profiled example), performing `isinstance()` checks on each one to find Import/ImportFrom nodes. This resulted in 18.77ms spent just traversing the tree (46.9% of total runtime). The optimized version introduces an `_ImportCollector` class that uses Python's `ast.NodeVisitor` pattern to selectively visit only Import and ImportFrom nodes. By defining `visit_Import()` and `visit_ImportFrom()` methods, the collector automatically skips irrelevant nodes during traversal. This reduces the collection phase to just 2.88ms (12% of runtime), saving approximately 15.89ms. **Performance Profile:** - The line profiler shows the `collector.visit()` call takes 2.88ms vs. the original loop's 18.77ms - The subsequent processing loop over collected nodes runs faster (1.69k iterations vs. 4.42k), eliminating 62% of unnecessary `isinstance()` checks - All other operations (helper preprocessing, dictionary lookups, set operations) remain essentially unchanged **Test Case Behavior:** The optimization is most effective for: - **Large ASTs with many nodes**: The `test_large_scale_many_import_statements_with_helpers` shows 62.6% speedup (662μs → 407μs) when processing 200 import statements, demonstrating the benefit of selective traversal - **Complex code with deep nesting**: ASTs with more non-import nodes see greater relative gains Smaller test cases show 30-40% slower runtimes due to the overhead of instantiating the collector class, but these are measuring microsecond differences (8-25μs) that are negligible in real-world usage where the function processes larger ASTs. **Practical Impact:** This function analyzes import statements in optimized code to map names to helper functions. Given its role in code optimization workflows, it likely processes many ASTs repeatedly. The 38% runtime reduction directly improves the optimization pipeline's throughput, especially when analyzing codebases with numerous import statements.

claude · 2026-02-12T07:24:17Z

PR Review Summary

Prek Checks

✅ Passed — One formatting issue found (extra blank line at line 673) and auto-fixed. Committed and pushed as 0e284ad1.

Mypy

⚠️ 14 errors in unused_definition_remover.py — All are pre-existing on origin/main (e.g., attr-defined, var-annotated, arg-type, assignment errors). No new type errors introduced by this PR.

Code Review

✅ No critical issues found. The optimization is correct and safe:

_ImportCollector (lines 641–654): Replaces ast.walk() + isinstance() with a targeted ast.NodeVisitor. This is functionally equivalent — visit_Import and visit_ImportFrom collect import nodes, while generic_visit (default for all other node types) ensures the full tree is still traversed. Imports nested in if TYPE_CHECKING: blocks or similar constructs are still found.
definition_type change (lines 633, 808): Accesses helper.definition_type (a str | None field on the model) instead of helper.jedi_definition.type if helper.jedi_definition else None. This matches the parent PR's model refactor.

Test Coverage

File	Stmts	Miss	Cover	Missing Lines
`codeflash/context/unused_definition_remover.py`	475	42	91%	113-115, 237, 310, 362, 424-449, 460, 511, 547-549, 558-561, 610-611, 701, 707, 712, 837-839

✅ All lines changed by this PR are covered by existing tests (lines 633-634, 641-654, 808-809 are not in the missing list)
ℹ️ This is a dependent optimization PR on top of PR feat: add reference graph for Python #1460 (call-graphee). Coverage comparison against main is not meaningful for this PR since most file differences come from the parent branch.

Codeflash Optimization PRs

No open codeflash PRs targeting main with all CI checks passing. PR ⚡️ Speed up method BenchmarkDetail.to_dict by 31% in PR #1386 (add_vitest_reporter_for_output_format) #1389 targets main but has type-check-cli and pr-review failures — not eligible for merge.

Last updated: 2026-02-12

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 12, 2026

codeflash-ai bot mentioned this pull request Feb 12, 2026

feat: add reference graph for Python #1460

Merged

2 tasks

style: auto-fix linting issues

0e284ad

KRRT7 merged commit 5ac33e1 into call-graphee Feb 12, 2026
25 of 28 checks passed

KRRT7 deleted the codeflash/optimize-pr1460-2026-02-12T07.03.31 branch February 12, 2026 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

⚡️ Speed up function `_analyze_imports_in_optimized_code` by 39% in PR #1460 (`call-graphee`)#1463

⚡️ Speed up function `_analyze_imports_in_optimized_code` by 39% in PR #1460 (`call-graphee`)#1463
KRRT7 merged 2 commits intocall-grapheefrom
codeflash/optimize-pr1460-2026-02-12T07.03.31

codeflash-ai bot commented Feb 12, 2026

Uh oh!

claude bot commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

codeflash-ai bot commented Feb 12, 2026

⚡️ This pull request contains optimizations for PR #1460

📄 39% (0.39x) speedup for _analyze_imports_in_optimized_code in codeflash/context/unused_definition_remover.py

📝 Explanation and details

Uh oh!

claude bot commented Feb 12, 2026

PR Review Summary

Prek Checks

Mypy

Code Review

Test Coverage

Codeflash Optimization PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 39% (0.39x) speedup for `_analyze_imports_in_optimized_code` in `codeflash/context/unused_definition_remover.py`