Skip to content

Add phase 1 study plumbing#4386

Merged
pcnudde merged 16 commits intoNVIDIA:mainfrom
pcnudde:feature/multistudy-phase1-pr
Apr 2, 2026
Merged

Add phase 1 study plumbing#4386
pcnudde merged 16 commits intoNVIDIA:mainfrom
pcnudde:feature/multistudy-phase1-pr

Conversation

@pcnudde
Copy link
Copy Markdown
Collaborator

@pcnudde pcnudde commented Mar 31, 2026

Summary

  • add phase 1 study session plumbing across Flare API, ProdEnv, server job handling, and the admin CLI launch path
  • scope list_jobs by session study, persist study on submit, preserve the source study on clone, and normalize legacy jobs without study metadata to default
  • split the design docs into phase 1 and phase 2 documents and add focused unit/integration coverage, including admin terminal and legacy-client compatibility paths

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR implements Phase 1 of multi-study support for NVFlare — study metadata plumbing from user-facing APIs through to job persistence and runtime launchers, with full backward compatibility and no access-control changes. The study is established once at session login time, propagated via the signed session token, and used server-side to scope list_jobs and stamp submit_job/clone_job; legacy jobs without a study field are normalised to "default" transparently.

Key changes:

  • JobMetaKey.STUDY + DEFAULT_STUDY constant + get_job_meta_study() normalisation helper in job_def.py
  • Study regex validation (^[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?$) added to format_check.py; validated client-side and again server-side on login
  • study param threaded through Session, AdminAPI, AdminClient, ProdEnv, and new_session/new_secure_session/new_insecure_session factory functions
  • Study sent in cert-login headers; server's LoginModule validates and stores it in the Session object; session token carries it as key \"t\" for round-trip persistence
  • job_cmds.py stamps submitted jobs with the session's active study; clone_job preserves the source job's study (not the session's); list_jobs filters by session study; display methods normalise legacy jobs
  • fl_admin.sh updated to pass \"$@\" through, enabling --study flag; admin.py exits with code 1 for invalid study
  • Phase 1 and Phase 2 design documents split out clearly; good unit and integration test coverage across all new paths

One minor observation worth addressing: when the server rejects a login due to an invalid study name, the rejection is silent on the server side (no log) and the client receives the generic \"Incorrect user name or password\" error, which will mislead operators who typo'd their --study argument.

Confidence Score: 5/5

Safe to merge — implementation is complete, well-tested, and fully backward compatible; only a minor observability gap remains.

All remaining findings are P2 (observability/style). No P0 or P1 issues were found. The study validation is applied at both client and server. Backward compatibility is preserved for legacy jobs and legacy session tokens. The previously flagged issues (sys.exit(1) on invalid study, props-merge in _do_default) are both resolved. Test coverage is thorough across unit, integration, and admin-shell paths.

nvflare/fuel/hci/server/login.py — the invalid-study rejection path should add a server-side warning log and ideally a more descriptive client-facing error.

Important Files Changed

Filename Overview
nvflare/apis/job_def.py Adds DEFAULT_STUDY constant, JobMetaKey.STUDY enum value, and get_job_meta_study() helper that normalises missing/empty study to "default" for legacy jobs. Clean and correct.
nvflare/apis/utils/format_check.py Adds "study" regex pattern matching DNS label semantics (lowercase alphanumeric + hyphens, 1–63 chars, must start/end with alnum). Regex is correct and tests confirm edge cases.
nvflare/fuel/hci/server/sess.py Session object gains active_study field; token round-trip uses key "t" following the existing single-letter encoding convention. Backward compat via get("t", DEFAULT_STUDY). Clean implementation.
nvflare/fuel/hci/server/login.py Reads study from login headers, validates it server-side, and stores it on the created session. Correctly defaults to DEFAULT_STUDY. One P2 issue: rejection for invalid study produces a misleading "Incorrect user name or password" client error and no server-side log.
nvflare/private/fed/server/job_cmds.py Study plumbing for submit (stamped from session), clone (preserves source study via get_job_meta_study), list (filtered by session study), and get_job_meta (normalised for display). STUDY added to CLONED_META_KEYS; explicit assignment also handles legacy-job normalisation. Correct.
nvflare/fuel/flare_api/flare_api.py Session gains study param; validated client-side via name_check before reaching the server. study forwarded to AdminAPI for login-time session binding. new_session/new_secure_session/new_insecure_session all accept and propagate study. Clean.
nvflare/fuel/hci/client/api.py AdminAPI stores study and includes it in _user_login cert-login headers. _do_client_command updated to accept and apply props to CommandContext. Correct; only login path in the codebase, so all session types send the study header.
nvflare/fuel/hci/client/cli.py AdminClient receives study at construction and forwards it to AdminAPI. _do_default passes props from parse_command_line unchanged to do_command — no per-command study injection; study is session-scoped at login. Verified by test_list_jobs_preserves_existing_cmd_props.
nvflare/fuel/hci/tools/admin.py Adds --study CLI arg with validation; exits with code 1 on invalid study (addressing prior review thread). Passes study through to AdminClient. master_template.yml updated to forward "$@" so --study reaches the Python entry point.
nvflare/lighter/templates/master_template.yml fl_admin.sh now passes "$@" to the Python entry point so extra flags like --study reach the admin main() without modification. Simple one-line change.
nvflare/recipe/prod_env.py ProdEnv gains study param; Pydantic validator validates it after startup_kit_location check. study is stored and forwarded to SessionManager params. Clean integration with the recipe API.
nvflare/fuel/hci/server/constants.py Adds ConnProps.ACTIVE_STUDY constant "_activeStudy" for storing the active study on the connection after authentication. Follows existing naming conventions.
tests/integration_test/study_session_test.py Comprehensive end-to-end integration tests covering: default study tagging, legacy job normalisation, session scoping for list/submit, clone preserving source study, fl_admin.sh --study flag, and ProdEnv study propagation to runtime meta.

Sequence Diagram

sequenceDiagram
    participant User as Admin/API User
    participant Client as AdminAPI/AdminClient
    participant Server as LoginModule (Server)
    participant Session as SessionManager
    participant JobCmds as JobCommandModule

    User->>Client: new_secure_session(study="cancer-research")
    Client->>Client: validate study (name_check)
    Client->>Server: _cert_login headers={study: "cancer-research", ...}
    Server->>Server: validate study (name_check)
    Server->>Session: create_session(active_study="cancer-research")
    Session-->>Server: session w/ token (encodes study as "t")
    Server-->>Client: LOGIN OK + token
    Note over Client,Server: Session now scoped to "cancer-research"

    User->>Client: submit_job("/path/to/job")
    Client->>Server: submit_job command (uses existing token)
    Server->>Server: read active_study from conn props
    Server->>JobCmds: meta["study"] = "cancer-research"
    JobCmds-->>Client: job_id

    User->>Client: list_jobs()
    Client->>Server: list_jobs command
    Server->>Server: requested_study = conn.ACTIVE_STUDY ("cancer-research")
    Server->>JobCmds: filter: get_job_meta_study(job) == "cancer-research"
    JobCmds-->>Client: only "cancer-research" jobs

    Note over Server,JobCmds: Legacy job (no study field) → get_job_meta_study → "default"
    Note over Server,Session: Token decode: user.get("t", DEFAULT_STUDY) for backward compat
Loading

Reviews (12): Last reviewed commit: "Merge branch 'main' into feature/multist..." | Re-trigger Greptile

Comment thread nvflare/fuel/hci/tools/admin.py
Comment thread nvflare/fuel/hci/client/cli.py Outdated
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 1, 2026

/build

@pcnudde pcnudde requested a review from chesterxgchen April 1, 2026 16:29
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 1, 2026

/build

Copy link
Copy Markdown
Collaborator

@chesterxgchen chesterxgchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall,

but I did not see Job API changes, I thought by specify study, the deploy_map will be generated assign the app with associated study to specified sites, this might trigger some Job API change to reflect this requirements

@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 1, 2026

but I did not see Job API changes, I thought by specify study, the deploy_map will be generated assign the app with associated study to specified sites, this might trigger some Job API change to reflect this requirements

I was going to look into this in phase2. For this it assumes the same sites are part of all studies.

Comment thread docs/design/multistudy_phase1.md
Comment thread docs/design/multistudy_phase1.md
Comment thread docs/design/multistudy_phase1.md
Comment thread nvflare/private/fed/server/job_cmds.py
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 1, 2026

/build

@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 1, 2026

/build

Comment thread docs/design/multistudy_phase2.md
Comment thread docs/design/multistudy_phase1.md
Comment thread docs/design/multistudy_phase1.md
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

1 similar comment
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

@pcnudde pcnudde enabled auto-merge (squash) April 2, 2026 17:08
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

1 similar comment
@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes LGTM

@pcnudde
Copy link
Copy Markdown
Collaborator Author

pcnudde commented Apr 2, 2026

/build

@pcnudde pcnudde merged commit 35b3a2b into NVIDIA:main Apr 2, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants