Ensemble fix: decouple ID extraction from drop operation by fatemetkl · Pull Request #124 · VectorInstitute/midst-toolkit

fatemetkl · 2026-02-04T17:17:31Z

PR Type

Fix

Short Description

The Berka dataset has two id columns, "account_id" and "trans_id".
transaction IDs should be extracted and passed to blending_attacker.predict. Therefore, we should first extract the main ID column, then remove all ID columns. Previously, the code was performing the extraction and drop operation just for the main ID column.

fatemetkl · 2026-02-04T17:59:56Z

examples/ensemble_attack/configs/experiment_config.yaml


 # Dataset specific information used for processing in this example
 data_processing_config:
+  midst_data_path: /projects/midst-experiments/all_tabddpms/ # Used to collect the data (input)


One step towards config modularization.

coderabbitai · 2026-02-04T18:17:35Z

📝 Walkthrough

Walkthrough

This pull request consolidates configuration restructuring with API and function signature updates. The midst_data_path configuration is moved from the top-level data_paths section into data_processing_config, along with new dataset metadata including split specifications and file naming configurations. The data collection call in run_attack.py is updated to reference the new config path and passes population_splits and challenge_splits parameters. The test_attack_model.py file refactors the ID extraction function—renamed from extract_and_drop_id_column to extract_the_main_id_column with a changed return type from tuple to Series—and updates all usage sites accordingly. Additionally, the dataset partitioning for shadow model training is adjusted by changing split_folders from ["test"] to ["final"]. The TprAtFpr API parameter names are updated across two files, replacing max_fpr with fpr_threshold and predictions with predicted_membership.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description covers the PR type (Fix) and provides a clear short description explaining the rationale (two ID columns in Berka dataset, need to extract main ID before removing all IDs), but the template requires a Tests Added section which is missing.	Add a Tests Added section describing what tests were added or modified to validate the ID extraction and drop decoupling changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: decoupling ID extraction from the drop operation, which aligns with the primary objective of handling multiple ID columns separately.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ft/ensemble_exp_rmia_abl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/ensemble_attack/test_attack_model.py (1)

55-84: ⚠️ Potential issue | 🟡 Minor

Docstring describes wrong return type.

The docstring (lines 69-72) still describes the old return type as a tuple, but the function now returns only a pd.Series. This inconsistency could confuse future maintainers.

📝 Proposed fix for the docstring

     """
-    Extracts and returns the main IDs from the dataframe. The main ID column is identified based on
-    the data types JSON file with "id_column_name" key.
-    Main IDs are not repeated in the dataset.
-    For example, in the Berka dataset, "trans_id" is the main ID column, and "account_id" is not the main ID column.
+    Extracts and returns the main ID column values from the dataframe. The main ID column is
+    identified based on the data types JSON file with "id_column_name" key.
+    Main IDs are not repeated in the dataset.
+    For example, in the Berka dataset, "trans_id" is the main ID column, and "account_id" is not
+    the main ID column.
 
     Args:
         data_frame: Input dataframe.
         data_types_file_path: Path to the data types JSON file.
 
     Returns:
-        A tuple containing:
-            - The modified dataframe with ID columns dropped.
-            - A Series containing the extracted data of ID columns.
+        A Series containing the main ID column values.
     """

emersodb · 2026-02-04T18:49:21Z

examples/ensemble_attack/test_attack_model.py



-def extract_and_drop_id_column(
+def extract_the_main_id_column(


Super minor, but maybe "primary_id" instead of "main_id"? Primary keys are the unique keys in databases. So it might be more suggestive to a reader 🙂

Yeah, I knew there was a better name, but I couldn't remember at the time lol. Thanks!

sarakodeiri

Looks good to me!

sarakodeiri · 2026-02-04T19:24:22Z

examples/ensemble_attack/test_attack_model.py



-def extract_and_drop_id_column(
+def extract_the_main_id_column(


Suuuper nit-picky, but can we rename the function name to extract_main_id_column (drop the "the")? Sounds better

Good point!

fatemetkl added 5 commits February 4, 2026 11:22

error fix: seperated extraction and drop id

43b97f1

minor fixes

a67a0cf

deleted extra config

0ed74e9

midst_data_path goes under data_processing_config

007e33b

fixed a unit test

927eb47

fatemetkl commented Feb 4, 2026

View reviewed changes

fatemetkl changed the title ~~error fix: seperated extraction and drop id~~ Fix: Decouple ID extraction from drop operation Feb 4, 2026

fatemetkl changed the title ~~Fix: Decouple ID extraction from drop operation~~ Ensemble fix: Decouple ID extraction from drop operation Feb 4, 2026

fatemetkl changed the title ~~Ensemble fix: Decouple ID extraction from drop operation~~ Ensemble fix: decouple ID extraction from drop operation Feb 4, 2026

fatemetkl requested a review from sarakodeiri February 4, 2026 18:07

Merge branch 'main' into ft/ensemble_exp_rmia_abl

35fc199

fatemetkl marked this pull request as ready for review February 4, 2026 18:15

fatemetkl requested a review from emersodb February 4, 2026 18:17

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

codrabbitai comment: fix return type in docstring

697a592

emersodb approved these changes Feb 4, 2026

View reviewed changes

sarakodeiri reviewed Feb 4, 2026

View reviewed changes

sarakodeiri approved these changes Feb 4, 2026

View reviewed changes

A better name for the function

d2351cd

fatemetkl merged commit 65d1ac8 into main Feb 4, 2026
6 checks passed

fatemetkl deleted the ft/ensemble_exp_rmia_abl branch February 4, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensemble fix: decouple ID extraction from drop operation#124

Ensemble fix: decouple ID extraction from drop operation#124
fatemetkl merged 8 commits intomainfrom
ft/ensemble_exp_rmia_abl

fatemetkl commented Feb 4, 2026 •

edited

Loading

Uh oh!

fatemetkl Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

emersodb Feb 4, 2026

Uh oh!

fatemetkl Feb 4, 2026

Uh oh!

sarakodeiri left a comment

Uh oh!

sarakodeiri Feb 4, 2026

Uh oh!

fatemetkl Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def extract_and_drop_id_column(
		def extract_the_main_id_column(

Conversation

fatemetkl commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Uh oh!

fatemetkl Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

emersodb Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

fatemetkl Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

sarakodeiri left a comment

Choose a reason for hiding this comment

Uh oh!

sarakodeiri Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

fatemetkl Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fatemetkl commented Feb 4, 2026 •

edited

Loading