Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Dec 16, 2025

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

Summary by CodeRabbit

  • New Features

    • Improved clustering API with keyword-based parameters for enhanced flexibility in system optimization.
    • New example systems for district heating and operational workflow demonstrations.
  • Breaking Changes

    • Clustering parameter names updated for clarity: hours_per_periodcluster_duration, nr_of_periodsn_clusters, fix_storage_flowsinclude_storage, and additional semantic updates.
    • Migration guidance provided for upgrading to v5.1.0.
  • Documentation

    • Updated example notebooks and data generation utilities to reflect new clustering approach.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

Introduces a new clustering API via flow_system.transform.cluster() with redesigned parameters and multi-dimensional clustering support. Renames ClusteringParameters fields (e.g., hours_per_periodcluster_duration, nr_of_periodsn_clusters). Updates example notebooks and utilities to use new data loading patterns. Deprecates the old ClusteredOptimization workflow.

Changes

Cohort / File(s) Summary
Release Documentation
CHANGELOG.md
Added upcoming v5.1.0 release notes detailing new cluster() API with keyword parameters and breaking changes to ClusteringParameters field names; includes migration examples.
Notebook Examples & Utilities
docs/notebooks/08a-aggregation.ipynb, docs/notebooks/08b-rolling-horizon.ipynb
Refactored notebooks to load pre-built FlowSystem objects from NetCDF instead of raw CSV time-series; simplified visualization workflows and replaced cell-based system construction with FlowSystem-based approaches.
Example System Generation
docs/notebooks/data/generate_example_systems.py
Added two new public helper functions: create_district_heating_system() and create_operational_system() for generating example FlowSystem instances; updated module docstring and registry.
Example Usage
examples/03_Optimization_modes/example_optimization_modes.py
Replaced old ClusteredOptimization with new flow_system.transform.cluster() API; introduced clustering configuration variables and internal ClusteredResult wrapper for backward compatibility.
Core Clustering Engine
flixopt/clustering.py
Refactored Clustering to support optional per-segment segmentation; redesigned ClusteringParameters with new field names and multi-dimensional support in ClusteringModel; added segment-based constraint generation, flexibility penalties, and helper methods for parameter parsing and constraint management.
Optimization Framework
flixopt/optimization.py
Updated deprecation messaging for ClusteredOptimization; migrated internal parameter names from old conventions to new ones; adjusted clustering validation and aggregation flag handling.
Transform Accessor
flixopt/transform_accessor.py
Redesigned cluster() method signature to accept individual keyword parameters instead of pre-built object; added internal _cluster_simple and _cluster_multi_dimensional helpers for single- and multi-dimensional clustering workflows; enhanced dataset aggregation and error handling.
Integration Tests
tests/deprecated/test_integration.py
Updated ClusteringParameters initialization to use new field names: n_clusters, cluster_duration, include_storage, aggregate_data, flexibility_percent, flexibility_penalty.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FlowSystem
    participant TransformAccessor
    participant ClusteringParameters
    participant Clustering
    participant ClusteringModel
    participant Optimizer

    User->>FlowSystem: transform.cluster(n_clusters, cluster_duration, ...)
    FlowSystem->>TransformAccessor: cluster(keyword args)
    TransformAccessor->>ClusteringParameters: Create from keywords
    ClusteringParameters-->>TransformAccessor: Instance
    
    alt Multi-dimensional clustering
        TransformAccessor->>TransformAccessor: _cluster_multi_dimensional()
        loop Per period/scenario
            TransformAccessor->>Clustering: Compute per-slice
            Clustering-->>TransformAccessor: Clustering results
        end
    else Single-dimension clustering
        TransformAccessor->>TransformAccessor: _cluster_simple()
        TransformAccessor->>Clustering: Compute for full dataset
        Clustering-->>TransformAccessor: Clustering results
    end
    
    TransformAccessor->>ClusteringModel: Apply constraints (multi-dim support)
    ClusteringModel->>ClusteringModel: do_modeling() with segments/flexibility
    ClusteringModel-->>TransformAccessor: Constraints added
    
    alt aggregate_data enabled
        TransformAccessor->>TransformAccessor: Aggregate time-series back
    end
    
    TransformAccessor-->>FlowSystem: Clustered FlowSystem
    FlowSystem-->>User: Return result
    User->>Optimizer: optimize()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • flixopt/clustering.py: Multi-dimensional clustering architecture shift, segment-based constraint generation, flexibility penalty logic, and parameter interactions between Clustering, ClusteringParameters, and ClusteringModel. Ensure correct variable scoping and constraint application across clusters and segments.
  • flixopt/transform_accessor.py: New _cluster_simple and _cluster_multi_dimensional branching logic; dataset aggregation behavior when aggregate_data=True; storage of clustering metadata in _clustering_info and interaction with per-dimension clustering results.
  • flixopt/clustering.py ClusteringModel.do_modeling(): Complex constraint generation with optional binary flexibility, multi-dimensional clustering support, and interaction between equality and flexibility penalty constraints. Verify correctness across cluster/segment combinations.
  • Parameter renaming consistency: Verify all references to old names (hours_per_period, nr_of_periods, etc.) have been migrated to new equivalents (cluster_duration, n_clusters, etc.) across the codebase.

Poem

🐰 Hops of Joy

Clustering's refined with a hop and a bound,
Multi-dimensional patterns abound!
Old parameter names to the warren we wave,
New cluster() API comes forth, brave—
Segments and flexibilities dance in the code,
Transform the FlowSystem down the new road! 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is a template placeholder with no actual content filled in; no specific details about changes, breaking changes, migration paths, or testing are provided. Fill in the template with a substantive description of the clustering improvements, list breaking changes (e.g., ClusteringParameters parameter renames), document migration guidance, and confirm testing was performed.
Title check ❓ Inconclusive The title 'Improve clustering' is vague and generic, lacking specificity about what clustering improvements were made despite substantial API redesign and breaking changes. Provide a more specific title that describes the main improvement, such as 'Refactor clustering API with multi-dimensional support and segmentation' or 'Introduce new cluster() method and deprecate ClusteredOptimization'.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 87.10% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/better-tsam-integration-first-step

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FBumann
Copy link
Member Author

FBumann commented Dec 16, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
docs/notebooks/data/generate_example_systems.py (1)

246-250: Add error handling for file loading operations.

Both create_district_heating_system() and create_operational_system() load CSV data without error handling. If the file is missing or the date range doesn't exist, users will see cryptic pandas errors rather than helpful messages.

Consider adding validation:

data_path = DATA_DIR / 'Zeitreihen2020.csv'
if not data_path.exists():
    raise FileNotFoundError(f"Required data file not found: {data_path}")
data = pd.read_csv(data_path, index_col=0, parse_dates=True).sort_index()
docs/notebooks/08a-aggregation.ipynb (1)

61-61: Consider handling the import error more gracefully.

The inline import from data.generate_example_systems import create_district_heating_system will raise an ImportError with a confusing message if the module doesn't exist. A more helpful error would guide users:

if not data_file.exists():
    try:
        from data.generate_example_systems import create_district_heating_system
    except ImportError:
        raise ImportError(
            "Example data file not found and generator module unavailable. "
            "Please download the example data or ensure data/generate_example_systems.py exists."
        )
    # ... rest of generation code
flixopt/transform_accessor.py (2)

256-269: Consider extracting validation to a shared helper.

The validation logic (lines 256-269) is duplicated from _cluster_simple() (lines 186-198). Extracting this to a private method would reduce duplication and ensure consistency.

def _validate_clustering_timesteps(self, params: ClusteringParameters) -> float:
    """Validate timesteps and return dt value."""
    dt_min = float(self._fs.hours_per_timestep.min().item())
    dt_max = float(self._fs.hours_per_timestep.max().item())
    if dt_min != dt_max:
        raise ValueError(...)
    ratio = params.cluster_duration_hours / dt_max
    if not np.isclose(ratio, round(ratio), atol=1e-9):
        raise ValueError(...)
    return dt_min

363-371: Redundant keys in clustering_info dict.

Both 'clustering' and 'clustering_results' store the same clustering_results dict. If 'clustering' is required for backwards compatibility with _add_clustering_constraints, consider documenting this more explicitly or refactoring the consumer to use a single key.

         clustered_fs._clustering_info = {
             'parameters': params,
-            'clustering': clustering_results,  # Required by _add_clustering_constraints
-            'clustering_results': clustering_results,  # Dict of Clustering objects per dimension
+            'clustering': clustering_results,  # Dict of Clustering objects per (period, scenario)
             'components_to_clusterize': components_to_clusterize,
             'original_fs': self._fs,
             'has_periods': self._fs.periods is not None,
             'has_scenarios': self._fs.scenarios is not None,
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78969bd and 79db532.

📒 Files selected for processing (9)
  • CHANGELOG.md (1 hunks)
  • docs/notebooks/08a-aggregation.ipynb (19 hunks)
  • docs/notebooks/08b-rolling-horizon.ipynb (2 hunks)
  • docs/notebooks/data/generate_example_systems.py (4 hunks)
  • examples/03_Optimization_modes/example_optimization_modes.py (2 hunks)
  • flixopt/clustering.py (5 hunks)
  • flixopt/optimization.py (4 hunks)
  • flixopt/transform_accessor.py (5 hunks)
  • tests/deprecated/test_integration.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/transform_accessor.py (2)
flixopt/flow_system.py (4)
  • FlowSystem (49-2225)
  • to_dataset (581-633)
  • copy (856-881)
  • sel (2014-2040)
flixopt/core.py (2)
  • drop_constant_arrays (564-594)
  • to_dataarray (370-484)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md

62-62: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


93-93: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

🔇 Additional comments (22)
docs/notebooks/data/generate_example_systems.py (2)

236-348: LGTM! Well-structured helper function.

The create_district_heating_system() function is well-documented and follows consistent patterns with existing helper functions. The docstring clearly explains the purpose and usage.


351-448: LGTM! Operational system helper is well-designed.

The create_operational_system() function provides a clean example for operational planning without investments. Good separation of concerns between investment and operational optimization examples.

CHANGELOG.md (1)

54-112: LGTM! Comprehensive migration documentation.

The changelog clearly documents the clustering API improvements with:

  • Clear before/after migration examples
  • Parameter mapping table
  • Usage examples for the new API

This will help users migrate smoothly from the old to the new clustering API.

flixopt/optimization.py (4)

396-402: LGTM! Clear deprecation guidance with migration example.

The updated deprecation message provides a concrete example of the new API usage, making it easy for users to migrate from ClusteredOptimization to the new flow_system.transform.cluster() API.


444-449: LGTM! Consistent parameter naming.

The internal parameter name updates (cluster_duration_hours and error messages) align with the public API changes documented in the CHANGELOG.


462-463: LGTM! Clustering object initialization updated correctly.

The parameter names passed to the Clustering constructor have been updated to match the new API (hours_per_period and nr_of_periods), maintaining consistency with the parameter renaming.


472-472: LGTM! Simplified parameter naming.

The change from aggregate_data_and_fix_non_binary_vars to aggregate_data improves readability while maintaining the same functionality.

tests/deprecated/test_integration.py (1)

284-293: LGTM! Test correctly updated for new clustering API.

The ClusteringParameters initialization has been properly migrated to use the new parameter names:

  • n_clusters (was nr_of_periods)
  • cluster_duration (was hours_per_period)
  • include_storage (was fix_storage_flows)
  • aggregate_data (was aggregate_data_and_fix_non_binary_vars)
  • flexibility_percent (was percentage_of_period_freedom)
  • flexibility_penalty (was penalty_of_period_freedom)

This matches the migration guide in CHANGELOG.md exactly.

docs/notebooks/08b-rolling-horizon.ipynb (1)

54-60: LGTM! Clear narrative update.

The markdown cell effectively communicates the shift to loading a pre-built FlowSystem, aligning with the PR's objective to adopt a FlowSystem-centric workflow.

examples/03_Optimization_modes/example_optimization_modes.py (2)

36-41: LGTM! Clear clustering configuration.

The new parameter names (n_clusters, cluster_duration, include_storage) are more intuitive than the previous hours_per_period and nr_of_periods.


192-204: Clean migration to the new transform.cluster() API.

The new API usage is clear and correctly passes the clustering configuration. The conditional peak time series assignment based on keep_extreme_periods is properly implemented.

docs/notebooks/08a-aggregation.ipynb (2)

69-83: Visualization code looks appropriate for the notebook.

The hardcoded component names ('HeatDemand', 'GridBuy') and timestep slice (672 = 7 days × 96 at 15-min resolution) are reasonable for a demonstration notebook where the example system is controlled.


360-386: Well-organized summary with clear learning outcomes.

The summary effectively captures the key takeaways and appropriately defers clustering details to the dedicated notebook (08c-clustering.ipynb).

flixopt/transform_accessor.py (2)

55-172: Well-designed public API with clear documentation.

The explicit parameter signature is more user-friendly than requiring a separate ClusteringParameters object. The docstring provides comprehensive examples covering basic clustering, segmentation, and multi-period scenarios.


174-243: Solid implementation with proper validation.

The timestep consistency check and cluster duration divisibility validation are correctly implemented. The use of np.isclose with a tight tolerance for the ratio check is appropriate for floating-point comparison.

flixopt/clustering.py (7)

89-121: Clean integration of segmentation support.

The logic correctly handles the case when nr_of_periods is None (segmentation-only mode) by falling back to total_periods. The tsam configuration properly sets segmentation and noSegments based on the n_segments parameter.


285-325: Well-implemented segment equation index generation.

The early return on line 300-301 when n_segments is None correctly guards against accessing segmentDurationDict when segmentation is disabled. The loop logic properly generates index pairs to equate all timesteps within a segment to the first timestep.


328-353: Clean utility function for duration parsing.

Using pd.Timedelta for parsing duration strings is appropriate and handles standard pandas duration formats correctly.


356-454: Well-documented parameter class with clear API.

The redesigned ClusteringParameters class provides a cleaner interface with intuitive parameter names. The convenience properties (use_extreme_periods, use_segmentation) improve code readability.


495-562: Efficient constraint generation with proper batching.

Batching continuous variables into a single constraint (line 545-548) is a good optimization. Binary variables correctly get individual constraints to support the flexibility correction variables.


653-667: Correct penalty application for flexibility deviations.

The method properly sums correction variables over the appropriate dimension (eq_idx for equality constraints or time as fallback) and adds the penalty to the objective via PENALTY_EFFECT_LABEL.


623-628: The self._model.variables.binaries attribute is a valid linopy API feature and requires no changes.

The .binaries property is a documented feature of linopy's Variables class, so the membership check on line 625 is correct and aligns with linopy's standard usage patterns.

}
},
"source": "# Load time series data (15-min resolution)\ndata = pd.read_csv('../../examples/resources/Zeitreihen2020.csv', index_col=0, parse_dates=True).sort_index()\ndata = data['2020-01-01':'2020-01-14 23:45:00'] # Two weeks\ndata.index.name = 'time' # Rename index for consistency\n\ntimesteps = data.index\n\n# Extract profiles\nelectricity_demand = data['P_Netz/MW'].to_numpy()\nheat_demand = data['Q_Netz/MW'].to_numpy()\nelectricity_price = data['Strompr.€/MWh'].to_numpy()\ngas_price = data['Gaspr.€/MWh'].to_numpy()\n\nprint(f'Timesteps: {len(timesteps)} ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Heat demand: {heat_demand.min():.1f} - {heat_demand.max():.1f} MW')\nprint(f'Electricity price: {electricity_price.min():.1f} - {electricity_price.max():.1f} €/MWh')",
"source": "from pathlib import Path\n\n# Generate example data if not present (for local development)\ndata_file = Path('data/operational_system.nc4')\nif not data_file.exists():\n from data.generate_example_systems import create_operational_system\n\n fs = create_operational_system()\n fs.optimize(fx.solvers.HighsSolver(log_to_console=False))\n fs.to_netcdf(data_file, overwrite=True)\n\n# Load the operational system (real data from Zeitreihen2020.csv, two weeks)\nflow_system = fx.FlowSystem.from_netcdf(data_file)\n\ntimesteps = flow_system.timesteps\nprint(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Components: {list(flow_system.components.keys())}')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Ensure parent directory exists before writing NetCDF.

If the data/ directory doesn't exist, fs.to_netcdf(data_file, overwrite=True) will fail with a file system error. Additionally, there's no error handling for file I/O operations or optimization failures, which could lead to confusing errors for users.

Apply this diff to create the directory and add basic error handling:

 from pathlib import Path
 
 # Generate example data if not present (for local development)
 data_file = Path('data/operational_system.nc4')
 if not data_file.exists():
+    # Ensure parent directory exists
+    data_file.parent.mkdir(parents=True, exist_ok=True)
+    
     from data.generate_example_systems import create_operational_system
 
+    print("Generating example data (this may take a moment)...")
     fs = create_operational_system()
     fs.optimize(fx.solvers.HighsSolver(log_to_console=False))
     fs.to_netcdf(data_file, overwrite=True)
+    print(f"Example data saved to {data_file}")
 
-# Load the operational system (real data from Zeitreihen2020.csv, two weeks)
+# Load the operational system
 flow_system = fx.FlowSystem.from_netcdf(data_file)
 
 timesteps = flow_system.timesteps
 print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
 print(f'Components: {list(flow_system.components.keys())}')

Note: The comment mentioning "real data from Zeitreihen2020.csv" is misleading since the code may generate synthetic data if the file doesn't exist.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"source": "from pathlib import Path\n\n# Generate example data if not present (for local development)\ndata_file = Path('data/operational_system.nc4')\nif not data_file.exists():\n from data.generate_example_systems import create_operational_system\n\n fs = create_operational_system()\n fs.optimize(fx.solvers.HighsSolver(log_to_console=False))\n fs.to_netcdf(data_file, overwrite=True)\n\n# Load the operational system (real data from Zeitreihen2020.csv, two weeks)\nflow_system = fx.FlowSystem.from_netcdf(data_file)\n\ntimesteps = flow_system.timesteps\nprint(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Components: {list(flow_system.components.keys())}')",
from pathlib import Path
# Generate example data if not present (for local development)
data_file = Path('data/operational_system.nc4')
if not data_file.exists():
# Ensure parent directory exists
data_file.parent.mkdir(parents=True, exist_ok=True)
from data.generate_example_systems import create_operational_system
print("Generating example data (this may take a moment)...")
fs = create_operational_system()
fs.optimize(fx.solvers.HighsSolver(log_to_console=False))
fs.to_netcdf(data_file, overwrite=True)
print(f"Example data saved to {data_file}")
# Load the operational system
flow_system = fx.FlowSystem.from_netcdf(data_file)
timesteps = flow_system.timesteps
print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Components: {list(flow_system.components.keys())}')

Comment on lines +207 to 214
class ClusteredResult:
def __init__(self, name, fs):
self.name = name
self.flow_system = fs
self.durations = {'total': 0} # Placeholder

optimization = ClusteredResult('Clustered', clustered_fs)
optimizations.append(optimization)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The durations placeholder will show misleading 0 seconds in the comparison chart.

Line 250 uses calc.durations for the speed comparison. This placeholder returns {'total': 0}, making the clustered optimization appear to take no time.

Consider capturing the actual duration:

+        import timeit
+        start = timeit.default_timer()
         clustered_fs = flow_system.copy().transform.cluster(
             n_clusters=n_clusters,
             cluster_duration=cluster_duration,
             include_storage=include_storage,
             time_series_for_high_peaks=time_series_for_high_peaks,
             time_series_for_low_peaks=time_series_for_low_peaks,
         )
         clustered_fs.optimize(fx.solvers.HighsSolver(0.01 / 100, 60))
+        elapsed = timeit.default_timer() - start

         # Wrap in a simple object for compatibility with comparison code
         class ClusteredResult:
             def __init__(self, name, fs):
                 self.name = name
                 self.flow_system = fs
-                self.durations = {'total': 0}  # Placeholder
+                self.durations = {'total': elapsed}

         optimization = ClusteredResult('Clustered', clustered_fs)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In examples/03_Optimization_modes/example_optimization_modes.py around lines 207
to 214, the ClusteredResult currently sets durations to a hardcoded {'total':
0}, which misrepresents timing in the comparison chart; replace the placeholder
by measuring the actual runtime of the clustered optimization (e.g., record
start = time.perf_counter() before running the clustered flow system, run the
optimization, record end = time.perf_counter() afterwards) and set
optimization.durations['total'] = end - start (and any other duration keys to
match the shape used elsewhere) so the plotted speed comparison uses real
elapsed seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants