⚡️ Speed up method JavaImportResolver._is_external_library by 148% in PR #1199 (omni-java)#1372
Open
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
Open
Conversation
The optimized code achieves a **147% speedup** (from 584μs to 236μs) by introducing two key optimizations: **1. Caching Previously-Seen Results** The optimization adds `self._external_library_cache: dict[str, bool]` to memoize results of previous `_is_external_library` calls. This is particularly effective because: - Java projects often check the same imports repeatedly across multiple files - The test results show dramatic speedups for repeated checks: 610% faster for external packages and 704% faster for internal packages when called 100 times - Even single calls benefit from cache hits when the same package is checked multiple times during analysis **2. Set-Based Membership Tests Instead of Linear String Scanning** The original code used `for prefix in self.COMMON_EXTERNAL_PREFIXES` with `startswith()` checks, performing linear iteration and string concatenation (`prefix + "."`) on every call. The optimization: - Converts `COMMON_EXTERNAL_PREFIXES` to `frozenset` for O(1) membership tests - Builds package prefixes incrementally (`"org"` → `"org.apache"` → `"org.apache.commons"`) and checks each against the set - Eliminates repeated string concatenations in the hot loop (the line profiler shows the original `prefix + "."` operation consumed 61.7% of total time) **Performance Characteristics by Test Case:** - **Exact prefix matches**: 100-380% faster (e.g., "org.apache", "lombok") due to early set lookup - **Short dotted paths**: 20-60% faster for 2-3 segment packages - **Nested paths with cache misses**: Slightly slower (up to 71%) on first call due to prefix-building overhead, but subsequent calls are 100%+ faster via caching - **Batch operations**: 38-88% faster when processing many similar packages, demonstrating cache effectiveness The optimization is especially valuable in real-world scenarios where: - Import resolution happens across multiple files in a project (cache hits accumulate) - Common frameworks like Spring, JUnit, or Apache Commons are heavily used (high cache hit rate) - The resolver is called repeatedly during code analysis or build processes The trade-off is minimal: slightly increased memory usage for the cache (proportional to unique imports seen) and marginal first-call overhead for deeply nested external packages, both negligible compared to the massive gains in repeated-check scenarios.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1199
If you approve this dependent PR, these changes will be merged into the original PR branch
omni-java.📄 148% (1.48x) speedup for
JavaImportResolver._is_external_libraryincodeflash/languages/java/import_resolver.py⏱️ Runtime :
584 microseconds→236 microseconds(best of177runs)📝 Explanation and details
The optimized code achieves a 147% speedup (from 584μs to 236μs) by introducing two key optimizations:
1. Caching Previously-Seen Results
The optimization adds
self._external_library_cache: dict[str, bool]to memoize results of previous_is_external_librarycalls. This is particularly effective because:2. Set-Based Membership Tests Instead of Linear String Scanning
The original code used
for prefix in self.COMMON_EXTERNAL_PREFIXESwithstartswith()checks, performing linear iteration and string concatenation (prefix + ".") on every call. The optimization:COMMON_EXTERNAL_PREFIXEStofrozensetfor O(1) membership tests"org"→"org.apache"→"org.apache.commons") and checks each against the setprefix + "."operation consumed 61.7% of total time)Performance Characteristics by Test Case:
The optimization is especially valuable in real-world scenarios where:
The trade-off is minimal: slightly increased memory usage for the cache (proportional to unique imports seen) and marginal first-call overhead for deeply nested external packages, both negligible compared to the massive gains in repeated-check scenarios.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1199-2026-02-04T05.47.50and push.