Fix utilsforecast evaluation compatibility #282

Kushagra7777 · 2026-01-05T15:26:20Z

Handle missing cutoff/id_col in evaluation output
Make cross-validation robust to utilsforecast schema changes

Handle missing cutoff/id_col in evaluation output Make cross-validation robust to utilsforecast schema changes

Add cutoff_col to mase and forward it to mae for correct evaluation

AzulGarza

thanks @Kushagra7777! since the pr introduced the new cutoff_col, we need to update the tests as well.

AzulGarza

thanks, @Kushagra7777! it seems the tests are working now, one remaining thing is fixing lint and styling checks.

AzulGarza · 2026-01-08T17:06:54Z

i'm also adding copilot as reviewer to have its comments.

Copilot

Pull request overview

This pull request enhances the robustness of the experiment handler to handle schema changes in the utilsforecast library. The changes make the evaluation and cross-validation functionality more resilient when the utilsforecast library's output schema varies (e.g., missing cutoff or id_col columns).

Key changes:

Introduced _zero_to_nan_pd helper function to replace reliance on utilsforecast's _zero_to_nan
Updated the mase function to defensively handle optional cutoff columns in evaluation output
Added defensive column filtering in evaluate_forecast_df to handle missing cutoff/id_cutoff columns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T17:10:56Z

timecopilot/utils/experiment_handler.py

+        if "cutoff" not in eval_df.columns:
+            if "id_cutoff" in eval_df.columns:
+                eval_df = eval_df.merge(cutoffs, on="id_cutoff", how="left")
+            else:
+                pass
+
+        cols = ["unique_id", "cutoff", "metric"] + models
+        cols = [c for c in cols if c in eval_df.columns]
+        eval_df = eval_df[cols]


The new defensive logic for handling missing cutoff/id_cutoff columns (lines 271-275) and the column filtering (lines 277-279) lack test coverage. Consider adding test cases that verify the behavior when utilsforecast's evaluate() function returns DataFrames with different schema variations (e.g., missing cutoff column, missing id_cutoff column, or both).

Copilot · 2026-01-08T17:10:56Z

timecopilot/utils/experiment_handler.py

+def _zero_to_nan_pd(s: pd.Series) -> pd.Series:
+    s = s.astype(float).copy()
+    s[s == 0] = np.nan
+    return s


The new _zero_to_nan_pd helper function lacks direct test coverage. While it may be indirectly tested through the mase function, consider adding a direct unit test to verify its behavior, especially for edge cases like empty series, all-zero series, and series with mixed zero and non-zero values.

Copilot · 2026-01-08T17:10:57Z

timecopilot/utils/experiment_handler.py

    seasonality: int,
    train_df: pd.DataFrame,
    id_col: str = "unique_id",
+    time_col: str = "ds",


The time_col parameter is added to the function signature but is never used in the function body. Consider removing this parameter if it's not needed, or if it was added for future use, document its intended purpose.

We should work on adding the option to set a time column to timecopilot in general, but as suggested here I'm not sure if it should be added as an argument until the functionality of the argument is also being added.

The laziest way of doing this would be to add a time_col arg to calls like forecast() then just renaming the specified column in the dataframe to "ds".

timecopilot/utils/experiment_handler.py

Update live tests to query using the actual unique_id from generated data instead of a hardcoded value. This avoids empty series selection and prevents flaky failures when checking queryability.

as suggested by copilot Co-authored-by: Copilot <[email protected]>

Remove series-specific query that caused invalid dataframe Keep test focused on queryable state only

Disable anomaly detection in test_is_queryable

This reverts commit d0829f7.

spolisar · 2026-01-16T20:14:11Z

Why switch to _zero_to_nan_pd? It appears to achieve the same purpose as the imported function it's replacing?

timecopilot/timecopilot/utils/experiment_handler.py

Lines 28 to 31 in 1bad91c

    
           def _zero_to_nan_pd(s: pd.Series) -> pd.Series: 
        
               s = s.astype(float).copy() 
        
               s[s == 0] = np.nan 
        
               return s

The .copy() after the astype() call shouldn't be necessary, by default astype() calls make a copy.

Replaced the imported _zero_to_nan with a pandas-based implementation to avoid division-by-zero and index alignment issues. Fixed the unnecessary .copy() usage. Removed the unused time_col: str = "ds" argument. Reverted earlier notebook changes that were unrelated and only attempted to address lint and styling errors. and also did the final lint fix,

Applied ruff-format changes to satisfy pre-commit checks. No functional changes.

Kushagra7777 · 2026-01-19T05:03:19Z

@spolisar
about the _zero_to_nan_pd function, i tried the imported one earlier, there were division by zero and index alignment issues with that, so pandas was a much safer option.

and about copy(), i am fixing it.

removing this line as well --> time_col: str = "ds".

spolisar · 2026-01-19T17:55:10Z

That makes sense, seems good overall.

spolisar · 2026-01-26T22:16:35Z

~~@AzulGarza If we're updating to match a recent change in a dependency, we should also pin that dependency to the new version 726da08.~~
Never mind, I overlooked the preceding commit, looks like it's already handled

AzulGarza · 2026-01-21T23:19:08Z

timecopilot/utils/experiment_handler.py

    seasonality: int,
    train_df: pd.DataFrame,
    id_col: str = "unique_id",
+    time_col: str = "ds",


@Kushagra7777 should we use time_col inside the function? in mae?

spolisar

Two things I noticed when taking a look at the recent changes. Replacing an internal function with an import means we might want to change an import in a test file that used the internal function. My memory might be off, but from what I remember some of the lines removed in 726da08 were used to resolve the missing cutoff col issue.
Was there a chunk of code that was used to consistently recreate this problem for testing?

tests/utils/test_experiment_handler.py

timecopilot/utils/experiment_handler.py

AzulGarza · 2026-01-27T00:02:52Z

Two things I noticed when taking a look at the recent changes. Replacing an internal function with an import means we might want to change an import in a test file that used the internal function. My memory might be off, but from what I remember some of the lines removed in 726da08 were used to resolve the missing cutoff col issue. Was there a chunk of code that was used to consistently recreate this problem for testing?

thanks for the review @spolisar! i think the problem appeared when we tried to maintain two versions of utilsforecast, but now that we are forcing the latest one, we are only worried about the cutoff col arg. tests/test_live.py doesn't pass without the current changes.

spolisar · 2026-01-27T00:22:21Z

Sounds good then.

spolisar

Sounds like the issue is resolved and changes/questions have been addressed. Seems good to me.

Kushagra7777 added 2 commits December 20, 2025 18:43

Fix utilsforecast evaluation compatibility

1c03d2e

Handle missing cutoff/id_col in evaluation output Make cross-validation robust to utilsforecast schema changes

Add cutoff_col support to mase and forward to mae

f1bb87e

Add cutoff_col to mase and forward it to mae for correct evaluation

AzulGarza self-requested a review January 7, 2026 17:33

AzulGarza requested changes Jan 7, 2026

View reviewed changes

AzulGarza self-requested a review January 7, 2026 17:34

removed cutoff_col from mae

d0c4c3d

AzulGarza requested changes Jan 8, 2026

View reviewed changes

AzulGarza requested a review from Copilot January 8, 2026 17:06

Copilot started reviewing on behalf of AzulGarza January 8, 2026 17:07 View session

AzulGarza mentioned this pull request Jan 8, 2026

bug: Error while attempting to execute docs/examples/agent-quickstart.py example through jupyter #283

Open

2 tasks

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Kushagra7777 and others added 10 commits January 9, 2026 10:53

Fix lint and styling issues

2c7932f

Fix live test flakiness in queryable checks

2747f0c

Update live tests to query using the actual unique_id from generated data instead of a hardcoded value. This avoids empty series selection and prevents flaky failures when checking queryability.

Update timecopilot/utils/experiment_handler.py

7869b10

as suggested by copilot Co-authored-by: Copilot <[email protected]>

Update timecopilot/utils/experiment_handler.py

2b9c697

as suggested by copilot Co-authored-by: Copilot <[email protected]>

Fix ruff lint issues

b79bc2e

lint fix

387e0f5

pulling fixes from main (#286)

88f1c50

Fix test_is_queryable failure

91c2f6b

Remove series-specific query that caused invalid dataframe Keep test focused on queryable state only

Fix live test failure

d0829f7

Disable anomaly detection in test_is_queryable

Revert "Fix live test failure"

1bad91c

This reverts commit d0829f7.

Kushagra7777 added 4 commits January 19, 2026 10:04

fixing conflicts

9f89def

pulling from main. (#290)

f0fa454

Apply ruff formatting fixes

8fabf75

Applied ruff-format changes to satisfy pre-commit checks. No functional changes.

Kushagra7777 requested a review from AzulGarza January 19, 2026 05:56

bad idea removing time_col: str = "ds"

7f6ab3a

Kushagra7777 and others added 4 commits January 27, 2026 01:25

adding tim_col for sorting

bd51e00

chore: add newest version of utilsforecast

820794a

fix: add compat with newest version utilsforecast

726da08

fix: update experiment handler tests with mase updates

452f5a2

AzulGarza requested a review from spolisar January 26, 2026 22:26

AzulGarza approved these changes Jan 26, 2026

View reviewed changes

spolisar reviewed Jan 26, 2026

View reviewed changes

tests/utils/test_experiment_handler.py Outdated Show resolved Hide resolved

timecopilot/utils/experiment_handler.py Outdated Show resolved Hide resolved

fix: import directly from utilsforecast

95deb75

AzulGarza requested a review from spolisar January 27, 2026 00:03

spolisar approved these changes Jan 27, 2026

View reviewed changes

AzulGarza merged commit 2746d20 into main Jan 27, 2026
9 checks passed

AzulGarza deleted the fix/utilsforecast-eval-compat branch January 27, 2026 00:44

Uh oh!

Fix utilsforecast evaluation compatibility #282

Fix utilsforecast evaluation compatibility #282

Uh oh!

Conversation

Kushagra7777 commented Jan 5, 2026

Uh oh!

AzulGarza left a comment

Choose a reason for hiding this comment

Uh oh!

AzulGarza left a comment

Choose a reason for hiding this comment

Uh oh!

AzulGarza commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

spolisar Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spolisar Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

spolisar commented Jan 16, 2026

Uh oh!

Kushagra7777 commented Jan 19, 2026

Uh oh!

spolisar commented Jan 19, 2026

Uh oh!

spolisar commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzulGarza Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

spolisar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AzulGarza commented Jan 27, 2026

Uh oh!

spolisar commented Jan 27, 2026

Uh oh!

spolisar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

spolisar Jan 16, 2026 •

edited

Loading

spolisar commented Jan 26, 2026 •

edited

Loading