Skip to content

⚡️ Speed up function _is_supported_dill_version by 2,625%#130

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_is_supported_dill_version-mlcvr2i6
Open

⚡️ Speed up function _is_supported_dill_version by 2,625%#130
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_is_supported_dill_version-mlcvr2i6

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 7, 2026

📄 2,625% (26.25x) speedup for _is_supported_dill_version in src/datasets/utils/_dill.py

⏱️ Runtime : 5.35 milliseconds 196 microseconds (best of 108 runs)

📝 Explanation and details

This optimization achieves a 27× speedup (2625%) by eliminating redundant work on every function call. The key improvement moves version parsing from runtime to module import time.

What Changed:

  • Created a module-level set _SUPPORTED_DILL_RELEASES that pre-parses the five supported version strings once at import
  • Changed the function to perform a simple membership check against this precomputed set instead of parsing versions on every call

Why It's Faster:
The original implementation called version.parse() five times on every invocation, which is expensive:

  • Each version.parse() call takes ~9,000-11,000 nanoseconds per the line profiler
  • This totaled ~42 million nanoseconds across profiled calls
  • The optimized version reduces this to a single dictionary/set lookup (~800 nanoseconds per call)

Python's version.parse() performs string parsing, validation, and object construction - operations that are unnecessary when checking against a fixed set of versions. By doing this work once at import time and using a set for O(1) membership testing, we avoid approximately 40-50 microseconds of overhead per call.

Test Performance:
The optimization shows consistent 16-36× speedups across all test scenarios:

  • Basic supported/unsupported checks: 1650-2100% faster
  • Pre-release variants: 2300-3600% faster
  • Large-scale tests (200 iterations): 3400% faster
  • Boundary cases: 1600-2500% faster

The speedup is most dramatic for supported versions (which previously parsed all five versions before finding a match) and scales particularly well with repeated calls, making this optimization valuable for any hot path that performs frequent version checks.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 333 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
# function to test
# NOTE: This is the exact original implementation and must not be modified.
from packaging import \
    version  # keep the same import names as original for fidelity; used to construct Version objects for tests
from packaging import version as _version
from src.datasets import \
    config  # import the real config module used by the function
from src.datasets import config as _config
from src.datasets.utils._dill import _is_supported_dill_version

def test_supported_versions_basic():
    """
    Basic scenario: the canonical supported versions should return True.
    We iterate through the explicitly listed supported versions and assert True.
    """
    supported = ["0.3.6", "0.3.7", "0.3.8", "0.3.9", "0.4.0"]  # supported version strings
    for v in supported:
        # set the config to a Version instance for the test
        config.DILL_VERSION = version.parse(v)
        # function should recognize the release as supported
        codeflash_output = _is_supported_dill_version() # 76.3μs -> 3.96μs (1824% faster)

def test_unsupported_versions_basic():
    """
    Basic scenario: clearly unsupported versions should return False.
    We test a selection of earlier and later versions that should not be recognized.
    """
    unsupported = ["0.3.5", "0.3.10", "0.4.1", "1.0.0"]  # versions outside supported set
    for v in unsupported:
        config.DILL_VERSION = version.parse(v)  # set config to the parsed version
        codeflash_output = _is_supported_dill_version() # 62.0μs -> 2.82μs (2098% faster)

def test_pre_release_and_post_release_variants():
    """
    Edge cases: pre-release, post-release, dev and local/build metadata variations
    should still be recognized as supported if their major/minor/patch release tuple
    matches one of the supported releases.
    """
    # various forms that should map their release tuple to (0,3,7)
    variants_supported = [
        "0.3.7a1",        # alpha pre-release
        "0.3.7rc1",       # release candidate
        "0.3.7.post1",    # post-release
        "0.3.7.dev1",     # dev release
        "0.3.7+local",    # local/build metadata
        "0.3.7.post1+abc" # combined post and local metadata
    ]
    for v in variants_supported:
        config.DILL_VERSION = version.parse(v)  # set config to parsed version with suffixes
        # all of these should be treated as supported because release[:3] == (0,3,7)
        codeflash_output = _is_supported_dill_version() # 92.1μs -> 3.76μs (2349% faster)

    # a variant that changes the core release (e.g., 0.3.8.dev0) should match 0.3.8 -> supported
    config.DILL_VERSION = version.parse("0.3.8.dev0")
    codeflash_output = _is_supported_dill_version() # 14.3μs -> 382ns (3636% faster)

def test_short_and_long_release_tuples():
    """
    Edge cases with release tuples of different lengths:
    - A short release like '0.3' yields a shorter release tuple and should not match supported ones.
    - A longer release like '0.3.7.1' should be sliced to the first three components and matched accordingly.
    """
    # short release: missing the patch component -> release tuple (0,3) so not supported
    config.DILL_VERSION = version.parse("0.3")
    codeflash_output = _is_supported_dill_version() # 18.8μs -> 1.40μs (1248% faster)

    # longer release: 0.3.7.1 has release tuple (0,3,7,1) -> first three elements (0,3,7) -> supported
    config.DILL_VERSION = version.parse("0.3.7.1")
    codeflash_output = _is_supported_dill_version() # 14.9μs -> 770ns (1835% faster)

    # single zero release should not match any supported triple
    config.DILL_VERSION = version.parse("0")
    codeflash_output = _is_supported_dill_version() # 14.6μs -> 444ns (3182% faster)

def test_invalid_config_values_raise():
    """
    Edge cases where config.DILL_VERSION is not a Version-like object.
    The implementation expects an object with a 'release' attribute, so passing None
    or a plain string should raise an AttributeError when accessing .release.
    """
    # None: accessing .release will raise AttributeError
    config.DILL_VERSION = None
    with pytest.raises(AttributeError):
        _is_supported_dill_version() # 2.49μs -> 2.13μs (16.7% faster)

    # Plain string: also does not have .release attribute -> AttributeError
    config.DILL_VERSION = "0.3.7"
    with pytest.raises(AttributeError):
        _is_supported_dill_version() # 1.28μs -> 1.15μs (11.5% faster)

def test_large_scale_variants_are_classified_correctly():
    """
    Large-scale scenario: create a variety of version strings (within limits) to
    assert the function behaves correctly and efficiently for many inputs.
    We keep the total iterations below 1000 (here 200) to respect test constraints.
    """
    # precompute supported release tuples for quick membership checks
    supported_releases = {
        version.parse("0.3.6").release,
        version.parse("0.3.7").release,
        version.parse("0.3.8").release,
        version.parse("0.3.9").release,
        version.parse("0.4.0").release,
    }

    # generate up to 200 diverse version strings using different syntactic forms
    variants = []
    # produce variants anchored around supported and unsupported patch/minor numbers
    for i in range(200):  # safe loop count under 1000
        # cycle base patch among 6..9, and occasionally produce other minors/patches
        base_patch = 6 + (i % 4)  # yields 6,7,8,9 repeating
        minor = 3 if (i % 10) < 8 else 4  # mostly minor 3, occasionally minor 4
        # alternate between plain, post, dev, local, and extended patch forms
        if i % 5 == 0:
            variants.append(f"0.{minor}.{base_patch}")  # plain
        elif i % 5 == 1:
            variants.append(f"0.{minor}.{base_patch}.1")  # extended patch
        elif i % 5 == 2:
            variants.append(f"0.{minor}.{base_patch}.1+build{i}")  # extended + local
        elif i % 5 == 3:
            variants.append(f"0.{minor}.{base_patch}.post{i}")  # post releases
        else:
            variants.append(f"0.{minor}.{base_patch}.dev{i}")  # dev releases

    # iterate through generated variants and assert classification matches release[:3] membership
    for v in variants:
        parsed = version.parse(v)
        expected = parsed.release[:3] in supported_releases  # expected boolean
        config.DILL_VERSION = parsed  # set config to the parsed Version
        codeflash_output = _is_supported_dill_version(); result = codeflash_output # 2.82ms -> 80.8μs (3397% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from unittest.mock import MagicMock, patch

import pytest
from packaging import version
from src.datasets import config
from src.datasets.utils._dill import _is_supported_dill_version

def test_supported_version_0_3_6():
    """Test that dill version 0.3.6 is recognized as supported."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 6)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 24.8μs -> 1.42μs (1652% faster)

def test_supported_version_0_3_7():
    """Test that dill version 0.3.7 is recognized as supported."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 7)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.9μs -> 1.39μs (1772% faster)

def test_supported_version_0_3_8():
    """Test that dill version 0.3.8 is recognized as supported."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 8)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.3μs -> 1.37μs (1755% faster)

def test_supported_version_0_3_9():
    """Test that dill version 0.3.9 is recognized as supported."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 9)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.9μs -> 1.42μs (1721% faster)

def test_supported_version_0_4_0():
    """Test that dill version 0.4.0 is recognized as supported."""
    mock_version = MagicMock()
    mock_version.release = (0, 4, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.5μs -> 1.39μs (1733% faster)

def test_unsupported_version_0_3_5():
    """Test that dill version 0.3.5 (just before supported range) is unsupported."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 5)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.0μs -> 1.27μs (1875% faster)

def test_unsupported_version_0_4_1():
    """Test that dill version 0.4.1 (just after supported range) is unsupported."""
    mock_version = MagicMock()
    mock_version.release = (0, 4, 1)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.6μs -> 1.27μs (1910% faster)

def test_unsupported_version_0_2_0():
    """Test that dill version 0.2.0 (much earlier) is unsupported."""
    mock_version = MagicMock()
    mock_version.release = (0, 2, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.5μs -> 1.24μs (1963% faster)

def test_unsupported_version_1_0_0():
    """Test that dill version 1.0.0 (much later) is unsupported."""
    mock_version = MagicMock()
    mock_version.release = (1, 0, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.5μs -> 1.21μs (2011% faster)

def test_version_with_four_components():
    """Test that version tuples with more than 3 components are handled correctly.
    
    The function only checks the first 3 components, so (0, 3, 6, 1) should
    still match (0, 3, 6).
    """
    mock_version = MagicMock()
    mock_version.release = (0, 3, 6, 1)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.8μs -> 1.45μs (1679% faster)

def test_version_with_five_components():
    """Test that version tuples with five components are handled correctly."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 7, 2, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 26.0μs -> 1.49μs (1648% faster)

def test_version_with_exactly_three_components_zero():
    """Test edge case where version is exactly (0, 0, 0)."""
    mock_version = MagicMock()
    mock_version.release = (0, 0, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.2μs -> 1.23μs (1961% faster)

def test_version_with_large_numbers():
    """Test that versions with large numbers in unsupported ranges are rejected."""
    mock_version = MagicMock()
    mock_version.release = (999, 999, 999)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.4μs -> 1.24μs (1956% faster)

def test_version_0_3_6_with_zero_patch():
    """Test that (0, 3, 6) with any trailing zeros still matches."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 6, 0, 0, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.2μs -> 1.48μs (1604% faster)

def test_version_between_0_3_6_and_0_3_7():
    """Test that intermediate versions like (0, 3, 6, 5) between supported versions
    still use only the first 3 components and match."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 6, 5)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.6μs -> 1.45μs (1662% faster)

def test_version_boundary_0_3_5_9():
    """Test version just below the first supported version: (0, 3, 5, 9)."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 5, 9)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.8μs -> 1.27μs (1930% faster)

def test_version_boundary_0_4_0_1():
    """Test version just above the last supported version: (0, 4, 0, 1)."""
    mock_version = MagicMock()
    mock_version.release = (0, 4, 0, 1)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.6μs -> 1.38μs (1754% faster)

def test_version_0_3_10():
    """Test that (0, 3, 10) is not in the supported list even though it's between
    0.3.9 and 0.4.0."""
    mock_version = MagicMock()
    mock_version.release = (0, 3, 10)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        codeflash_output = _is_supported_dill_version() # 25.3μs -> 1.21μs (1994% faster)

def test_version_0_3_5_vs_0_3_6():
    """Test the exact boundary: 0.3.5 (not supported) vs 0.3.6 (supported)."""
    # Test 0.3.5 - should be False
    mock_version_0_3_5 = MagicMock()
    mock_version_0_3_5.release = (0, 3, 5)
    
    with patch.object(config, 'DILL_VERSION', mock_version_0_3_5):
        codeflash_output = _is_supported_dill_version() # 26.0μs -> 1.27μs (1944% faster)
    
    # Test 0.3.6 - should be True
    mock_version_0_3_6 = MagicMock()
    mock_version_0_3_6.release = (0, 3, 6)
    
    with patch.object(config, 'DILL_VERSION', mock_version_0_3_6):
        codeflash_output = _is_supported_dill_version() # 18.8μs -> 879ns (2035% faster)

def test_version_0_3_9_vs_0_4_0():
    """Test the boundary between the supported versions and the next major version."""
    # Test 0.3.9 - should be True
    mock_version_0_3_9 = MagicMock()
    mock_version_0_3_9.release = (0, 3, 9)
    
    with patch.object(config, 'DILL_VERSION', mock_version_0_3_9):
        codeflash_output = _is_supported_dill_version() # 25.6μs -> 1.36μs (1775% faster)
    
    # Test 0.4.0 - should be True
    mock_version_0_4_0 = MagicMock()
    mock_version_0_4_0.release = (0, 4, 0)
    
    with patch.object(config, 'DILL_VERSION', mock_version_0_4_0):
        codeflash_output = _is_supported_dill_version() # 18.9μs -> 836ns (2156% faster)
    
    # Test 0.4.1 - should be False
    mock_version_0_4_1 = MagicMock()
    mock_version_0_4_1.release = (0, 4, 1)
    
    with patch.object(config, 'DILL_VERSION', mock_version_0_4_1):
        codeflash_output = _is_supported_dill_version() # 18.5μs -> 714ns (2489% faster)

def test_multiple_supported_versions_comprehensive():
    """Test all five supported versions in a batch to ensure consistency."""
    supported_versions = [
        (0, 3, 6),
        (0, 3, 7),
        (0, 3, 8),
        (0, 3, 9),
        (0, 4, 0),
    ]
    
    for version_tuple in supported_versions:
        mock_version = MagicMock()
        mock_version.release = version_tuple
        
        with patch.object(config, 'DILL_VERSION', mock_version):
            codeflash_output = _is_supported_dill_version(); result = codeflash_output

def test_range_of_unsupported_versions_before():
    """Test a range of unsupported versions before the first supported version."""
    unsupported_versions = [
        (0, 0, 0),
        (0, 1, 0),
        (0, 2, 0),
        (0, 2, 5),
        (0, 3, 0),
        (0, 3, 1),
        (0, 3, 2),
        (0, 3, 3),
        (0, 3, 4),
        (0, 3, 5),
    ]
    
    for version_tuple in unsupported_versions:
        mock_version = MagicMock()
        mock_version.release = version_tuple
        
        with patch.object(config, 'DILL_VERSION', mock_version):
            codeflash_output = _is_supported_dill_version(); result = codeflash_output

def test_range_of_unsupported_versions_after():
    """Test a range of unsupported versions after the last supported version."""
    unsupported_versions = [
        (0, 4, 1),
        (0, 4, 2),
        (0, 5, 0),
        (0, 6, 0),
        (1, 0, 0),
        (1, 1, 0),
        (2, 0, 0),
        (10, 0, 0),
        (100, 100, 100),
    ]
    
    for version_tuple in unsupported_versions:
        mock_version = MagicMock()
        mock_version.release = version_tuple
        
        with patch.object(config, 'DILL_VERSION', mock_version):
            codeflash_output = _is_supported_dill_version(); result = codeflash_output

def test_versions_with_extended_components():
    """Test that versions with extended tuples are handled consistently.
    
    This tests the scalability of the function with various tuple lengths.
    """
    # Each supported version extended with various numbers of additional components
    test_cases = [
        ((0, 3, 6, 0), True),
        ((0, 3, 6, 1, 2, 3), True),
        ((0, 3, 7, 99, 99), True),
        ((0, 3, 8, 0, 0, 0, 0, 0), True),
        ((0, 3, 9, 5), True),
        ((0, 4, 0, 10, 20, 30, 40, 50), True),
        ((0, 3, 5, 99, 99, 99), False),
        ((0, 4, 1, 0, 0), False),
        ((0, 3, 10, 0, 0), False),
    ]
    
    for version_tuple, expected in test_cases:
        mock_version = MagicMock()
        mock_version.release = version_tuple
        
        with patch.object(config, 'DILL_VERSION', mock_version):
            codeflash_output = _is_supported_dill_version(); result = codeflash_output

def test_performance_with_many_checks():
    """Test that the function performs efficiently when called many times."""
    # This test ensures the function doesn't have any performance issues
    # when called in rapid succession
    iterations = 500
    
    mock_version = MagicMock()
    mock_version.release = (0, 3, 7)
    
    with patch.object(config, 'DILL_VERSION', mock_version):
        # Call the function many times and collect results
        results = [_is_supported_dill_version() for _ in range(iterations)]

def test_all_five_supported_with_extended_components():
    """Test all five supported versions with extended component tuples
    to verify the slicing [:3] works correctly."""
    
    supported_base = [
        (0, 3, 6),
        (0, 3, 7),
        (0, 3, 8),
        (0, 3, 9),
        (0, 4, 0),
    ]
    
    # Create extended versions with up to 10 additional components
    for base_version in supported_base:
        for num_extra in range(0, 11):
            extended_version = base_version + tuple(range(num_extra))
            
            mock_version = MagicMock()
            mock_version.release = extended_version
            
            with patch.object(config, 'DILL_VERSION', mock_version):
                codeflash_output = _is_supported_dill_version(); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_is_supported_dill_version-mlcvr2i6 and push.

Codeflash Static Badge

This optimization achieves a **27× speedup (2625%)** by eliminating redundant work on every function call. The key improvement moves version parsing from runtime to module import time.

**What Changed:**
- Created a module-level set `_SUPPORTED_DILL_RELEASES` that pre-parses the five supported version strings once at import
- Changed the function to perform a simple membership check against this precomputed set instead of parsing versions on every call

**Why It's Faster:**
The original implementation called `version.parse()` five times on every invocation, which is expensive:
- Each `version.parse()` call takes ~9,000-11,000 nanoseconds per the line profiler
- This totaled ~42 million nanoseconds across profiled calls
- The optimized version reduces this to a single dictionary/set lookup (~800 nanoseconds per call)

Python's `version.parse()` performs string parsing, validation, and object construction - operations that are unnecessary when checking against a fixed set of versions. By doing this work once at import time and using a set for O(1) membership testing, we avoid approximately 40-50 microseconds of overhead per call.

**Test Performance:**
The optimization shows consistent 16-36× speedups across all test scenarios:
- Basic supported/unsupported checks: 1650-2100% faster
- Pre-release variants: 2300-3600% faster  
- Large-scale tests (200 iterations): 3400% faster
- Boundary cases: 1600-2500% faster

The speedup is most dramatic for supported versions (which previously parsed all five versions before finding a match) and scales particularly well with repeated calls, making this optimization valuable for any hot path that performs frequent version checks.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 February 7, 2026 22:22
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants