feat(middleware): add HITLMiddleware for human-in-the-loop function interception#2060
feat(middleware): add HITLMiddleware for human-in-the-loop function interception#2060ericevans-nv wants to merge 7 commits into
Conversation
…example Address HITL blocking failure, intermittent JSON parse errors, and several edge cases that caused crashes or silent data corruption when Jira credentials were missing, extraction had not run, or configuration was invalid. Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
… example Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…oop function interception Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR adds Human-in-the-Loop (HITL) middleware: a HITLMiddleware base class with configurable pre/post prompts, a HITLMiddlewareConfig, an InvocationAction.SKIP control in InvocationContext, skip handling in function middleware (invoke/stream), tests, and documentation. ChangesHITL Middleware Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…eature/hitl-middleware
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py (1)
373-373:⚠️ Potential issue | 🟠 Major | ⚡ Quick winToken captured at registration time, not runtime.
The
headersdict is created when the generator runs (tool registration), capturingJIRA_TOKENat that moment. The credential validation on line 386-388 runs at_aruncall time, but theheadersalready contain whatever value the token had during registration—potentially empty or stale.Move
headersconstruction inside_arunafter validation to ensure the current token is used:🐛 Proposed fix
`@register_function`(config_type=GetJiraToolConfig) async def get_jira_tickets_tool(config: GetJiraToolConfig, builder: Builder): - headers = {"Authorization": f"Bearer {os.getenv('JIRA_TOKEN')}", "Accept": "application/json"} - # JIRA API endpoint to fetch issues api_endpoint = f"{config.jira_domain}/rest/api/2/search" # Query parameters to fetch all issues from the project query_params = { "jql": f"project={config.jira_project_key}", "maxResults": 100, # Adjust as needed "fields": "summary,issuetype,priority,customfield_10016,description,epic", # Modify if needed } async def _arun(input_text: str) -> str: cred_error: str | None = _validate_jira_credentials(require_userid=False) if cred_error: return cred_error + headers = {"Authorization": f"Bearer {os.getenv('JIRA_TOKEN')}", "Accept": "application/json"} + async with httpx.AsyncClient() as client: response = await client.get(api_endpoint, headers=headers, params=query_params, timeout=30)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py` at line 373, The headers dict is being built at registration time using os.getenv('JIRA_TOKEN') (captured in the module/generator) so it can be empty/stale at runtime; move construction of headers into the _arun method after the existing credential validation (the token check around lines 386-388) so it reads the current os.getenv('JIRA_TOKEN') value, remove or stop using the prebuilt headers variable, and ensure _arun uses the freshly created headers for subsequent HTTP calls (reference the _arun function and the headers variable/JIRA_TOKEN env var).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py`:
- Around line 127-168: The HITLMiddleware defines function_middleware_invoke
which duplicates the orchestration already implemented by the parent
FunctionMiddleware.function_middleware_invoke; remove the override to avoid
duplicate code and rely on the parent to create the InvocationContext and call
pre_invoke and post_invoke (HITLMiddleware.pre_invoke and
HITLMiddleware.post_invoke will still be invoked by the parent). Delete the
function_middleware_invoke method from the HITLMiddleware class so invoke
behavior is delegated to FunctionMiddleware.function_middleware_invoke, ensuring
consistency with function_middleware_stream.
---
Outside diff comments:
In
`@examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py`:
- Line 373: The headers dict is being built at registration time using
os.getenv('JIRA_TOKEN') (captured in the module/generator) so it can be
empty/stale at runtime; move construction of headers into the _arun method after
the existing credential validation (the token check around lines 386-388) so it
reads the current os.getenv('JIRA_TOKEN') value, remove or stop using the
prebuilt headers variable, and ensure _arun uses the freshly created headers for
subsequent HTTP calls (reference the _arun function and the headers
variable/JIRA_TOKEN env var).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 155fff64-6a68-4ed7-a465-2d53d09dfae4
📒 Files selected for processing (12)
docs/source/build-workflows/advanced/middleware.mdexamples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/configs/config.ymlexamples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/extract_por_tool.pyexamples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/hitl_approval_tool.pyexamples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.pypackages/nvidia_nat_core/src/nat/front_ends/console/console_front_end_plugin.pypackages/nvidia_nat_core/src/nat/middleware/function_middleware.pypackages/nvidia_nat_core/src/nat/middleware/hitl/__init__.pypackages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.pypackages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware_config.pypackages/nvidia_nat_core/src/nat/middleware/middleware.pypackages/nvidia_nat_core/tests/nat/middleware/test_hitl_middleware.py
…rride in HITLMiddleware Delegate single-output invocation to FunctionMiddleware.function_middleware_invoke, which already builds the InvocationContext and calls pre_invoke/post_invoke. This drops duplicated orchestration and keeps invoke behavior consistent with the inherited function_middleware_stream. Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
94b27f0 to
edb04fc
Compare
|
This is a useful direction. The part I would make explicit is that HITL middleware is not just prompt plumbing; it becomes an invocation-scoped execution boundary. For pre-invoke SKIP, I would keep the contract visible to the caller:
For post-invoke prompts, I would keep a separate record: the function ran, produced an output, and that output was then accepted, replaced, or cleared. That avoids mixing "the callable did not run" with "the callable ran and its result was rejected or transformed." The regression tests I would add are:
That keeps the middleware small while making the control boundary clear: which exact invocation was proposed, who decided, what changed, and whether the callable actually ran. |
The middleware contract is transparent interception, the caller gets back the function's effective result and should never need to know a middleware ran. Returning SKIPPED leaks the middleware's existence into the caller's API contract. It also changes the effective return type of the intercepted function. Any value other than the function's declared return type breaks the contract the caller already depends on. HITLMiddleware is intentionally minimal: it defines the interception boundary and the decision contract, nothing more. Observability, audit logging, or any behavior around the decision belongs in the concrete subclass, where implementers can customize it to their requirements without affecting callers or the abstract interface. |
|
I agree with that caller-contract boundary. The function caller should not have to handle a new sentinel value just because middleware intercepted the invocation, and the declared return type should stay stable. The distinction I would preserve is control-plane visibility, not necessarily caller-visible return shape. For pre-invoke skip, the function should not run, but the skip can be represented as middleware state or an invocation record rather than as the function's return value. That gives three separate facts:
That also keeps post-invoke prompts separate. If the function ran and the output was replaced or cleared, the record should say function_ran=true and preserve original versus effective output. If pre-invoke denied or skipped, function_ran=false. The tests I would add:
That keeps HITLMiddleware minimal while still making the execution boundary inspectable: what invocation was proposed, who decided, whether args changed, whether the callable ran, and what effective result continued. |
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Any additional tracking or observability is a concrete subclass concern — the abstract interface stays clean. A concrete implementation can override the pre/post hooks to record decisions, track state, or enforce any binding logic it needs. |
|
Agree that the abstract interface should stay clean. If observability is owned by concrete subclasses, the key contract is not a new return type or audit API; it is stable hook context. The subclass needs enough context in the pre and post hooks to bind its own record without changing what the caller sees. For pre-invoke, that means it can consistently see:
For post-invoke, it should be able to record that the callable did run, plus original versus effective output when the output is replaced or cleared. That keeps the split clean:
The tests I would add are:
That gives implementers room to build audit or enforcement behavior in subclasses while keeping HITLMiddleware minimal and transparent to function callers. |
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
|
Keeping the caller contract clean is non-negotiable, and audit/binding logic lives in concrete subclasses. To close the loop on what the abstract interface actually needs to expose, here's a concrete proposal for the hook context objects: For
For
The abstract class never writes a record itself — it just guarantees the subclass sees these fields. Concrete classes can ignore what they don't need. This also covers the stale-args concern: since |
|
This field shape is close. I would add one constraint to make the stale-args boundary enforceable: prompt_id should identify the prompt or decision, but it should not be the whole replay boundary. For pre-invoke, the hook context should carry both:
Then a concrete subclass can enforce that modified_args only applies when the live invocation correlation key and prompt or decision id both match. If either side changes, it should fail closed or require a new prompt cycle. That avoids the subtle case where a prompt response is valid in isolation but gets replayed onto a later call with the same prompt shape. For post-invoke, I like keeping function_ran explicit. I would also attach original_output and effective_output to the same invocation correlation key so replacement or clearing cannot be confused with a different call's result. The regression tests I would add:
That keeps the abstract class clean, while giving concrete subclasses enough structure to make stale decisions non-replayable. |
|
|
||
| #### Configuration | ||
|
|
||
| `HITLMiddlewareConfig` is the base configuration class. It already declares `name="hitl"`, which maps to `_type: hitl` in YAML. Subclass it with a different `name` only if you need a distinct `_type` value. It provides two prompt fields and inherits all function-targeting fields from `DynamicMiddlewareConfig`. |
There was a problem hiding this comment.
Could we tighten this contract before merging? Since HITLMiddleware is abstract and there is no built-in hitl middleware registered in nat.middleware.register, documenting HITLMiddlewareConfig as mapping to _type: hitl makes that YAML type look runnable when it is not. I think the cleaner contract is either to remove name="hitl" from the abstract base and require concrete subclasses to declare their own name, or to add a concrete registered middleware for _type: hitl if that type is intended to work out of the box.
| ```python | ||
| from nat.data_models.interactive import InteractionResponse | ||
| from nat.middleware.hitl import HITLMiddleware, HITLMiddlewareConfig | ||
| from nat.middleware.middleware import InvocationAction, InvocationContext |
There was a problem hiding this comment.
Should we route these new middleware-authoring symbols through nat.plugin_api, or explicitly mark this as a specialized/provisional implementation API? The public plugin API docs tell external plugin authors to import authoring contracts from nat.plugin_api, but this example introduces HITLMiddleware, HITLMiddlewareConfig, and InvocationAction from implementation modules. If this is intended as the supported extension path, I would expect the new symbols to be exported, pinned in the plugin API tests, and added to the public API docs.
| context=function_context) | ||
| ] | ||
|
|
||
| assert chunks == [None, None] |
There was a problem hiding this comment.
Could we make this streaming contract explicit and align the test name/docstring with the assertion? Right now the test says clearing context.output yields no chunks, but the asserted behavior is [None, None] because function_middleware_stream always yields ctx.output after post_invoke. If None is meant to suppress a chunk, the implementation should skip yielding it; if None is meant to be a valid replacement chunk, the test should say that clearly so stream consumers do not get surprised by None values.
mnajafian-nv
left a comment
There was a problem hiding this comment.
LGTM, with conditional approval pending resolution of the inline suggestion, Thanks!
|
I agree with making these merge blockers explicit. I would treat the three comments as one contract boundary:
For HITL, I would prefer the first contract. A reviewer or subclass author should be able to tell whether the human decision suppressed output, replaced output, or passed it through without inferring that from an ambiguous None in the stream. That keeps the merge boundary tight: runnable YAML types match registry behavior, the extension surface is either public or explicitly provisional, and stream consumers do not have to reverse-engineer human decision semantics from emitted chunks. |
Summary
HITLMiddleware, an abstractDynamicFunctionMiddlewaresubclass that presents configurableHumanPrompts before and after any intercepted function callInvocationAction.SKIPtoInvocationContextso pre-invoke can signal the framework to bypass the function call; checked in bothfunction_middleware_invokeandfunction_middleware_streamHITLMiddlewareConfigprovides optionalpre_invoke_prompt/post_invoke_promptfields and inherits all function-targeting fields fromDynamicMiddlewareConfigSummary by CodeRabbit
New Features
Documentation
Tests