⚡️ Speed up function contains_wildcards by 106%#122
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function contains_wildcards by 106%#122codeflash-ai[bot] wants to merge 1 commit intomainfrom
contains_wildcards by 106%#122codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimization replaces the `any()` generator expression with an explicit for-loop that performs early-exit when a wildcard is found. This achieves a **106% speedup** (runtime reduced from 217μs to 105μs). **What Changed:** - Original: `return any(wildcard_character in pattern for wildcard_character in WILDCARD_CHARACTERS)` - Optimized: Explicit for-loop with early `return True` when any wildcard is found **Why It's Faster:** 1. **Reduced Function Call Overhead**: The original code involves calling the built-in `any()` function plus creating a generator object. The optimized version eliminates this overhead by using direct control flow. 2. **Earlier Short-Circuiting**: While both implementations support early termination when a wildcard is found, the explicit loop version short-circuits more efficiently. The line profiler shows the optimized version exits early in 64 out of 90 iterations (71%), demonstrating effective early-return behavior. 3. **Lower Per-Check Cost**: The optimized code shows a per-hit time of ~898ns for the loop iteration compared to 5062ns for the entire `any()` expression in the original, indicating more efficient memory and instruction patterns. **Test Results:** The optimization is particularly effective for: - **Patterns with wildcards at the start** (e.g., `"*start"`): 127-150% faster due to immediate detection - **Patterns with wildcards anywhere** (e.g., `"file*name"`): 113-140% faster - **Long strings without wildcards** (e.g., 1000-char strings): 64-80% faster - though slower than wildcard cases, the optimization still helps by avoiding generator overhead for the full iteration Even in the worst case (checking all 3 wildcard characters on strings without wildcards), the optimized version is 65-80% faster due to eliminated generator and `any()` overhead. This optimization is valuable for any code that frequently checks patterns for wildcards, especially in file system operations, glob pattern validation, or path processing hot paths where this function might be called repeatedly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 106% (1.06x) speedup for
contains_wildcardsinsrc/datasets/data_files.py⏱️ Runtime :
217 microseconds→105 microseconds(best of68runs)📝 Explanation and details
The optimization replaces the
any()generator expression with an explicit for-loop that performs early-exit when a wildcard is found. This achieves a 106% speedup (runtime reduced from 217μs to 105μs).What Changed:
return any(wildcard_character in pattern for wildcard_character in WILDCARD_CHARACTERS)return Truewhen any wildcard is foundWhy It's Faster:
Reduced Function Call Overhead: The original code involves calling the built-in
any()function plus creating a generator object. The optimized version eliminates this overhead by using direct control flow.Earlier Short-Circuiting: While both implementations support early termination when a wildcard is found, the explicit loop version short-circuits more efficiently. The line profiler shows the optimized version exits early in 64 out of 90 iterations (71%), demonstrating effective early-return behavior.
Lower Per-Check Cost: The optimized code shows a per-hit time of ~898ns for the loop iteration compared to 5062ns for the entire
any()expression in the original, indicating more efficient memory and instruction patterns.Test Results:
The optimization is particularly effective for:
"*start"): 127-150% faster due to immediate detection"file*name"): 113-140% fasterEven in the worst case (checking all 3 wildcard characters on strings without wildcards), the optimized version is 65-80% faster due to eliminated generator and
any()overhead.This optimization is valuable for any code that frequently checks patterns for wildcards, especially in file system operations, glob pattern validation, or path processing hot paths where this function might be called repeatedly.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-contains_wildcards-mlclg43uand push.