-
Notifications
You must be signed in to change notification settings - Fork 9
Improve clustering #533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/flow-system-first
Are you sure you want to change the base?
Improve clustering #533
Conversation
…egration # Conflicts: # flixopt/flow_system.py
WalkthroughIntroduces a new clustering API via Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant FlowSystem
participant TransformAccessor
participant ClusteringParameters
participant Clustering
participant ClusteringModel
participant Optimizer
User->>FlowSystem: transform.cluster(n_clusters, cluster_duration, ...)
FlowSystem->>TransformAccessor: cluster(keyword args)
TransformAccessor->>ClusteringParameters: Create from keywords
ClusteringParameters-->>TransformAccessor: Instance
alt Multi-dimensional clustering
TransformAccessor->>TransformAccessor: _cluster_multi_dimensional()
loop Per period/scenario
TransformAccessor->>Clustering: Compute per-slice
Clustering-->>TransformAccessor: Clustering results
end
else Single-dimension clustering
TransformAccessor->>TransformAccessor: _cluster_simple()
TransformAccessor->>Clustering: Compute for full dataset
Clustering-->>TransformAccessor: Clustering results
end
TransformAccessor->>ClusteringModel: Apply constraints (multi-dim support)
ClusteringModel->>ClusteringModel: do_modeling() with segments/flexibility
ClusteringModel-->>TransformAccessor: Constraints added
alt aggregate_data enabled
TransformAccessor->>TransformAccessor: Aggregate time-series back
end
TransformAccessor-->>FlowSystem: Clustered FlowSystem
FlowSystem-->>User: Return result
User->>Optimizer: optimize()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
docs/notebooks/data/generate_example_systems.py (1)
246-250: Add error handling for file loading operations.Both
create_district_heating_system()andcreate_operational_system()load CSV data without error handling. If the file is missing or the date range doesn't exist, users will see cryptic pandas errors rather than helpful messages.Consider adding validation:
data_path = DATA_DIR / 'Zeitreihen2020.csv' if not data_path.exists(): raise FileNotFoundError(f"Required data file not found: {data_path}") data = pd.read_csv(data_path, index_col=0, parse_dates=True).sort_index()docs/notebooks/08a-aggregation.ipynb (1)
61-61: Consider handling the import error more gracefully.The inline import
from data.generate_example_systems import create_district_heating_systemwill raise an ImportError with a confusing message if the module doesn't exist. A more helpful error would guide users:if not data_file.exists(): try: from data.generate_example_systems import create_district_heating_system except ImportError: raise ImportError( "Example data file not found and generator module unavailable. " "Please download the example data or ensure data/generate_example_systems.py exists." ) # ... rest of generation codeflixopt/transform_accessor.py (2)
256-269: Consider extracting validation to a shared helper.The validation logic (lines 256-269) is duplicated from
_cluster_simple()(lines 186-198). Extracting this to a private method would reduce duplication and ensure consistency.def _validate_clustering_timesteps(self, params: ClusteringParameters) -> float: """Validate timesteps and return dt value.""" dt_min = float(self._fs.hours_per_timestep.min().item()) dt_max = float(self._fs.hours_per_timestep.max().item()) if dt_min != dt_max: raise ValueError(...) ratio = params.cluster_duration_hours / dt_max if not np.isclose(ratio, round(ratio), atol=1e-9): raise ValueError(...) return dt_min
363-371: Redundant keys in clustering_info dict.Both
'clustering'and'clustering_results'store the sameclustering_resultsdict. If'clustering'is required for backwards compatibility with_add_clustering_constraints, consider documenting this more explicitly or refactoring the consumer to use a single key.clustered_fs._clustering_info = { 'parameters': params, - 'clustering': clustering_results, # Required by _add_clustering_constraints - 'clustering_results': clustering_results, # Dict of Clustering objects per dimension + 'clustering': clustering_results, # Dict of Clustering objects per (period, scenario) 'components_to_clusterize': components_to_clusterize, 'original_fs': self._fs, 'has_periods': self._fs.periods is not None, 'has_scenarios': self._fs.scenarios is not None, }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.md(1 hunks)docs/notebooks/08a-aggregation.ipynb(19 hunks)docs/notebooks/08b-rolling-horizon.ipynb(2 hunks)docs/notebooks/data/generate_example_systems.py(4 hunks)examples/03_Optimization_modes/example_optimization_modes.py(2 hunks)flixopt/clustering.py(5 hunks)flixopt/optimization.py(4 hunks)flixopt/transform_accessor.py(5 hunks)tests/deprecated/test_integration.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/transform_accessor.py (2)
flixopt/flow_system.py (4)
FlowSystem(49-2225)to_dataset(581-633)copy(856-881)sel(2014-2040)flixopt/core.py (2)
drop_constant_arrays(564-594)to_dataarray(370-484)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
62-62: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
93-93: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
🔇 Additional comments (22)
docs/notebooks/data/generate_example_systems.py (2)
236-348: LGTM! Well-structured helper function.The
create_district_heating_system()function is well-documented and follows consistent patterns with existing helper functions. The docstring clearly explains the purpose and usage.
351-448: LGTM! Operational system helper is well-designed.The
create_operational_system()function provides a clean example for operational planning without investments. Good separation of concerns between investment and operational optimization examples.CHANGELOG.md (1)
54-112: LGTM! Comprehensive migration documentation.The changelog clearly documents the clustering API improvements with:
- Clear before/after migration examples
- Parameter mapping table
- Usage examples for the new API
This will help users migrate smoothly from the old to the new clustering API.
flixopt/optimization.py (4)
396-402: LGTM! Clear deprecation guidance with migration example.The updated deprecation message provides a concrete example of the new API usage, making it easy for users to migrate from
ClusteredOptimizationto the newflow_system.transform.cluster()API.
444-449: LGTM! Consistent parameter naming.The internal parameter name updates (
cluster_duration_hoursand error messages) align with the public API changes documented in the CHANGELOG.
462-463: LGTM! Clustering object initialization updated correctly.The parameter names passed to the
Clusteringconstructor have been updated to match the new API (hours_per_periodandnr_of_periods), maintaining consistency with the parameter renaming.
472-472: LGTM! Simplified parameter naming.The change from
aggregate_data_and_fix_non_binary_varstoaggregate_dataimproves readability while maintaining the same functionality.tests/deprecated/test_integration.py (1)
284-293: LGTM! Test correctly updated for new clustering API.The
ClusteringParametersinitialization has been properly migrated to use the new parameter names:
n_clusters(wasnr_of_periods)cluster_duration(washours_per_period)include_storage(wasfix_storage_flows)aggregate_data(wasaggregate_data_and_fix_non_binary_vars)flexibility_percent(waspercentage_of_period_freedom)flexibility_penalty(waspenalty_of_period_freedom)This matches the migration guide in CHANGELOG.md exactly.
docs/notebooks/08b-rolling-horizon.ipynb (1)
54-60: LGTM! Clear narrative update.The markdown cell effectively communicates the shift to loading a pre-built FlowSystem, aligning with the PR's objective to adopt a FlowSystem-centric workflow.
examples/03_Optimization_modes/example_optimization_modes.py (2)
36-41: LGTM! Clear clustering configuration.The new parameter names (
n_clusters,cluster_duration,include_storage) are more intuitive than the previoushours_per_periodandnr_of_periods.
192-204: Clean migration to the new transform.cluster() API.The new API usage is clear and correctly passes the clustering configuration. The conditional peak time series assignment based on
keep_extreme_periodsis properly implemented.docs/notebooks/08a-aggregation.ipynb (2)
69-83: Visualization code looks appropriate for the notebook.The hardcoded component names ('HeatDemand', 'GridBuy') and timestep slice (672 = 7 days × 96 at 15-min resolution) are reasonable for a demonstration notebook where the example system is controlled.
360-386: Well-organized summary with clear learning outcomes.The summary effectively captures the key takeaways and appropriately defers clustering details to the dedicated notebook (08c-clustering.ipynb).
flixopt/transform_accessor.py (2)
55-172: Well-designed public API with clear documentation.The explicit parameter signature is more user-friendly than requiring a separate
ClusteringParametersobject. The docstring provides comprehensive examples covering basic clustering, segmentation, and multi-period scenarios.
174-243: Solid implementation with proper validation.The timestep consistency check and cluster duration divisibility validation are correctly implemented. The use of
np.isclosewith a tight tolerance for the ratio check is appropriate for floating-point comparison.flixopt/clustering.py (7)
89-121: Clean integration of segmentation support.The logic correctly handles the case when
nr_of_periodsisNone(segmentation-only mode) by falling back tototal_periods. Thetsamconfiguration properly setssegmentationandnoSegmentsbased on then_segmentsparameter.
285-325: Well-implemented segment equation index generation.The early return on line 300-301 when
n_segments is Nonecorrectly guards against accessingsegmentDurationDictwhen segmentation is disabled. The loop logic properly generates index pairs to equate all timesteps within a segment to the first timestep.
328-353: Clean utility function for duration parsing.Using
pd.Timedeltafor parsing duration strings is appropriate and handles standard pandas duration formats correctly.
356-454: Well-documented parameter class with clear API.The redesigned
ClusteringParametersclass provides a cleaner interface with intuitive parameter names. The convenience properties (use_extreme_periods,use_segmentation) improve code readability.
495-562: Efficient constraint generation with proper batching.Batching continuous variables into a single constraint (line 545-548) is a good optimization. Binary variables correctly get individual constraints to support the flexibility correction variables.
653-667: Correct penalty application for flexibility deviations.The method properly sums correction variables over the appropriate dimension (
eq_idxfor equality constraints ortimeas fallback) and adds the penalty to the objective viaPENALTY_EFFECT_LABEL.
623-628: Theself._model.variables.binariesattribute is a valid linopy API feature and requires no changes.The
.binariesproperty is a documented feature of linopy's Variables class, so the membership check on line 625 is correct and aligns with linopy's standard usage patterns.
| } | ||
| }, | ||
| "source": "# Load time series data (15-min resolution)\ndata = pd.read_csv('../../examples/resources/Zeitreihen2020.csv', index_col=0, parse_dates=True).sort_index()\ndata = data['2020-01-01':'2020-01-14 23:45:00'] # Two weeks\ndata.index.name = 'time' # Rename index for consistency\n\ntimesteps = data.index\n\n# Extract profiles\nelectricity_demand = data['P_Netz/MW'].to_numpy()\nheat_demand = data['Q_Netz/MW'].to_numpy()\nelectricity_price = data['Strompr.€/MWh'].to_numpy()\ngas_price = data['Gaspr.€/MWh'].to_numpy()\n\nprint(f'Timesteps: {len(timesteps)} ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Heat demand: {heat_demand.min():.1f} - {heat_demand.max():.1f} MW')\nprint(f'Electricity price: {electricity_price.min():.1f} - {electricity_price.max():.1f} €/MWh')", | ||
| "source": "from pathlib import Path\n\n# Generate example data if not present (for local development)\ndata_file = Path('data/operational_system.nc4')\nif not data_file.exists():\n from data.generate_example_systems import create_operational_system\n\n fs = create_operational_system()\n fs.optimize(fx.solvers.HighsSolver(log_to_console=False))\n fs.to_netcdf(data_file, overwrite=True)\n\n# Load the operational system (real data from Zeitreihen2020.csv, two weeks)\nflow_system = fx.FlowSystem.from_netcdf(data_file)\n\ntimesteps = flow_system.timesteps\nprint(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Components: {list(flow_system.components.keys())}')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure parent directory exists before writing NetCDF.
If the data/ directory doesn't exist, fs.to_netcdf(data_file, overwrite=True) will fail with a file system error. Additionally, there's no error handling for file I/O operations or optimization failures, which could lead to confusing errors for users.
Apply this diff to create the directory and add basic error handling:
from pathlib import Path
# Generate example data if not present (for local development)
data_file = Path('data/operational_system.nc4')
if not data_file.exists():
+ # Ensure parent directory exists
+ data_file.parent.mkdir(parents=True, exist_ok=True)
+
from data.generate_example_systems import create_operational_system
+ print("Generating example data (this may take a moment)...")
fs = create_operational_system()
fs.optimize(fx.solvers.HighsSolver(log_to_console=False))
fs.to_netcdf(data_file, overwrite=True)
+ print(f"Example data saved to {data_file}")
-# Load the operational system (real data from Zeitreihen2020.csv, two weeks)
+# Load the operational system
flow_system = fx.FlowSystem.from_netcdf(data_file)
timesteps = flow_system.timesteps
print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')
print(f'Components: {list(flow_system.components.keys())}')Note: The comment mentioning "real data from Zeitreihen2020.csv" is misleading since the code may generate synthetic data if the file doesn't exist.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "source": "from pathlib import Path\n\n# Generate example data if not present (for local development)\ndata_file = Path('data/operational_system.nc4')\nif not data_file.exists():\n from data.generate_example_systems import create_operational_system\n\n fs = create_operational_system()\n fs.optimize(fx.solvers.HighsSolver(log_to_console=False))\n fs.to_netcdf(data_file, overwrite=True)\n\n# Load the operational system (real data from Zeitreihen2020.csv, two weeks)\nflow_system = fx.FlowSystem.from_netcdf(data_file)\n\ntimesteps = flow_system.timesteps\nprint(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)')\nprint(f'Components: {list(flow_system.components.keys())}')", | |
| from pathlib import Path | |
| # Generate example data if not present (for local development) | |
| data_file = Path('data/operational_system.nc4') | |
| if not data_file.exists(): | |
| # Ensure parent directory exists | |
| data_file.parent.mkdir(parents=True, exist_ok=True) | |
| from data.generate_example_systems import create_operational_system | |
| print("Generating example data (this may take a moment)...") | |
| fs = create_operational_system() | |
| fs.optimize(fx.solvers.HighsSolver(log_to_console=False)) | |
| fs.to_netcdf(data_file, overwrite=True) | |
| print(f"Example data saved to {data_file}") | |
| # Load the operational system | |
| flow_system = fx.FlowSystem.from_netcdf(data_file) | |
| timesteps = flow_system.timesteps | |
| print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 96:.0f} days at 15-min resolution)') | |
| print(f'Components: {list(flow_system.components.keys())}') |
| class ClusteredResult: | ||
| def __init__(self, name, fs): | ||
| self.name = name | ||
| self.flow_system = fs | ||
| self.durations = {'total': 0} # Placeholder | ||
|
|
||
| optimization = ClusteredResult('Clustered', clustered_fs) | ||
| optimizations.append(optimization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The durations placeholder will show misleading 0 seconds in the comparison chart.
Line 250 uses calc.durations for the speed comparison. This placeholder returns {'total': 0}, making the clustered optimization appear to take no time.
Consider capturing the actual duration:
+ import timeit
+ start = timeit.default_timer()
clustered_fs = flow_system.copy().transform.cluster(
n_clusters=n_clusters,
cluster_duration=cluster_duration,
include_storage=include_storage,
time_series_for_high_peaks=time_series_for_high_peaks,
time_series_for_low_peaks=time_series_for_low_peaks,
)
clustered_fs.optimize(fx.solvers.HighsSolver(0.01 / 100, 60))
+ elapsed = timeit.default_timer() - start
# Wrap in a simple object for compatibility with comparison code
class ClusteredResult:
def __init__(self, name, fs):
self.name = name
self.flow_system = fs
- self.durations = {'total': 0} # Placeholder
+ self.durations = {'total': elapsed}
optimization = ClusteredResult('Clustered', clustered_fs)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/03_Optimization_modes/example_optimization_modes.py around lines 207
to 214, the ClusteredResult currently sets durations to a hardcoded {'total':
0}, which misrepresents timing in the comparison chart; replace the placeholder
by measuring the actual runtime of the clustered optimization (e.g., record
start = time.perf_counter() before running the clustered flow system, run the
optimization, record end = time.perf_counter() afterwards) and set
optimization.durations['total'] = end - start (and any other duration keys to
match the shape used elsewhere) so the plotted speed comparison uses real
elapsed seconds.
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
New Features
Breaking Changes
hours_per_period→cluster_duration,nr_of_periods→n_clusters,fix_storage_flows→include_storage, and additional semantic updates.Documentation
✏️ Tip: You can customize this high-level summary in your review settings.