Skip to content

feat(middleware): add HITLMiddleware for human-in-the-loop function interception#2060

Open
ericevans-nv wants to merge 7 commits into
NVIDIA:developfrom
ericevans-nv:feature/hitl-middleware
Open

feat(middleware): add HITLMiddleware for human-in-the-loop function interception#2060
ericevans-nv wants to merge 7 commits into
NVIDIA:developfrom
ericevans-nv:feature/hitl-middleware

Conversation

@ericevans-nv

@ericevans-nv ericevans-nv commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Introduces HITLMiddleware, an abstract DynamicFunctionMiddleware subclass that presents configurable HumanPrompts before and after any intercepted function call
  • Adds InvocationAction.SKIP to InvocationContext so pre-invoke can signal the framework to bypass the function call; checked in both function_middleware_invoke and function_middleware_stream
  • HITLMiddlewareConfig provides optional pre_invoke_prompt / post_invoke_prompt fields and inherits all function-targeting fields from DynamicMiddlewareConfig

Summary by CodeRabbit

  • New Features

    • Human-in-the-Loop (HITL) middleware that can prompt users before and/or after function calls.
    • Middleware can explicitly skip execution and control streaming behavior; post-invoke decisions are applied per streamed chunk.
  • Documentation

    • Added a comprehensive guide for implementing, configuring, and registering HITL middleware (including YAML usage and examples).
  • Tests

    • Added tests covering prompt presentation, skip/proceed behavior, and streaming/post-invoke semantics.

…example

Address HITL blocking failure, intermittent JSON parse errors, and several
edge cases that caused crashes or silent data corruption when Jira credentials
were missing, extraction had not run, or configuration was invalid.

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
… example

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…oop function interception

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
@ericevans-nv ericevans-nv requested a review from a team as a code owner June 11, 2026 21:37
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 225a3a6b-1886-46b3-9dbf-fa3fb02f8f00

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6ae02 and 35a3586.

📒 Files selected for processing (1)
  • packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py

Walkthrough

This PR adds Human-in-the-Loop (HITL) middleware: a HITLMiddleware base class with configurable pre/post prompts, a HITLMiddlewareConfig, an InvocationAction.SKIP control in InvocationContext, skip handling in function middleware (invoke/stream), tests, and documentation.

Changes

HITL Middleware Feature

Layer / File(s) Summary
Invocation control mechanism
packages/nvidia_nat_core/src/nat/middleware/middleware.py
InvocationAction enum with SKIP added; InvocationContext extended with optional action field.
Skip action in existing middleware
packages/nvidia_nat_core/src/nat/middleware/function_middleware.py
Adds import and checks after pre_invoke in both single-invocation and streaming paths to return early when ctx.action == InvocationAction.SKIP.
HITL package entrypoint
packages/nvidia_nat_core/src/nat/middleware/hitl/__init__.py
Module-level docstring and exports exposing HITLMiddleware and HITLMiddlewareConfig.
HITLMiddlewareConfig
packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware_config.py
Defines HITLMiddlewareConfig with optional pre_invoke_prompt and post_invoke_prompt fields (`HumanPrompt
HITLMiddleware class and orchestration
packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py
Adds HITLMiddleware extending DynamicFunctionMiddleware with abstract _on_pre_invoke_response/_on_post_invoke_response; pre_invoke/post_invoke conditionally prompt via Context.get().user_interaction_manager.prompt_user_input and delegate responses to hooks.
HITL middleware test suite
packages/nvidia_nat_core/tests/nat/middleware/test_hitl_middleware.py
Pytest suite with _StubHITLMiddleware covering prompt presentation, response forwarding, SKIP/proceed flows for invoke/stream, per-chunk post_invoke, and no-op behavior when prompts are absent.
HITL middleware user documentation
docs/source/build-workflows/advanced/middleware.md
New section documenting HITLMiddleware hooks, configuration, YAML _type usage, subclass/registration example, and single vs streaming invocation semantics (including SKIP and per-chunk post-invoke behavior).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly and accurately describes the main feature being added (HITLMiddleware for human-in-the-loop function interception) using imperative mood.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@ericevans-nv ericevans-nv changed the title feat(middleware): add HITLMiddleware abstract base for human-in-the-loop function interception feat(middleware): add HITLMiddleware for human-in-the-loop function interception Jun 11, 2026
@ericevans-nv ericevans-nv self-assigned this Jun 11, 2026
@ericevans-nv ericevans-nv added feature request New feature or request non-breaking Non-breaking change labels Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py (1)

373-373: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Token captured at registration time, not runtime.

The headers dict is created when the generator runs (tool registration), capturing JIRA_TOKEN at that moment. The credential validation on line 386-388 runs at _arun call time, but the headers already contain whatever value the token had during registration—potentially empty or stale.

Move headers construction inside _arun after validation to ensure the current token is used:

🐛 Proposed fix
 `@register_function`(config_type=GetJiraToolConfig)
 async def get_jira_tickets_tool(config: GetJiraToolConfig, builder: Builder):

-    headers = {"Authorization": f"Bearer {os.getenv('JIRA_TOKEN')}", "Accept": "application/json"}
-
     # JIRA API endpoint to fetch issues
     api_endpoint = f"{config.jira_domain}/rest/api/2/search"

     # Query parameters to fetch all issues from the project
     query_params = {
         "jql": f"project={config.jira_project_key}",
         "maxResults": 100,  # Adjust as needed
         "fields": "summary,issuetype,priority,customfield_10016,description,epic",  # Modify if needed
     }

     async def _arun(input_text: str) -> str:
         cred_error: str | None = _validate_jira_credentials(require_userid=False)
         if cred_error:
             return cred_error

+        headers = {"Authorization": f"Bearer {os.getenv('JIRA_TOKEN')}", "Accept": "application/json"}
+
         async with httpx.AsyncClient() as client:
             response = await client.get(api_endpoint, headers=headers, params=query_params, timeout=30)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py`
at line 373, The headers dict is being built at registration time using
os.getenv('JIRA_TOKEN') (captured in the module/generator) so it can be
empty/stale at runtime; move construction of headers into the _arun method after
the existing credential validation (the token check around lines 386-388) so it
reads the current os.getenv('JIRA_TOKEN') value, remove or stop using the
prebuilt headers variable, and ensure _arun uses the freshly created headers for
subsequent HTTP calls (reference the _arun function and the headers
variable/JIRA_TOKEN env var).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py`:
- Around line 127-168: The HITLMiddleware defines function_middleware_invoke
which duplicates the orchestration already implemented by the parent
FunctionMiddleware.function_middleware_invoke; remove the override to avoid
duplicate code and rely on the parent to create the InvocationContext and call
pre_invoke and post_invoke (HITLMiddleware.pre_invoke and
HITLMiddleware.post_invoke will still be invoked by the parent). Delete the
function_middleware_invoke method from the HITLMiddleware class so invoke
behavior is delegated to FunctionMiddleware.function_middleware_invoke, ensuring
consistency with function_middleware_stream.

---

Outside diff comments:
In
`@examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py`:
- Line 373: The headers dict is being built at registration time using
os.getenv('JIRA_TOKEN') (captured in the module/generator) so it can be
empty/stale at runtime; move construction of headers into the _arun method after
the existing credential validation (the token check around lines 386-388) so it
reads the current os.getenv('JIRA_TOKEN') value, remove or stop using the
prebuilt headers variable, and ensure _arun uses the freshly created headers for
subsequent HTTP calls (reference the _arun function and the headers
variable/JIRA_TOKEN env var).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 155fff64-6a68-4ed7-a465-2d53d09dfae4

📥 Commits

Reviewing files that changed from the base of the PR and between 7bc2deb and ce7824f.

📒 Files selected for processing (12)
  • docs/source/build-workflows/advanced/middleware.md
  • examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/configs/config.yml
  • examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/extract_por_tool.py
  • examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/hitl_approval_tool.py
  • examples/HITL/por_to_jiratickets/src/nat_por_to_jiratickets/jira_tickets_tool.py
  • packages/nvidia_nat_core/src/nat/front_ends/console/console_front_end_plugin.py
  • packages/nvidia_nat_core/src/nat/middleware/function_middleware.py
  • packages/nvidia_nat_core/src/nat/middleware/hitl/__init__.py
  • packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py
  • packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware_config.py
  • packages/nvidia_nat_core/src/nat/middleware/middleware.py
  • packages/nvidia_nat_core/tests/nat/middleware/test_hitl_middleware.py

Comment thread packages/nvidia_nat_core/src/nat/middleware/hitl/hitl_middleware.py Outdated
…rride in HITLMiddleware

Delegate single-output invocation to FunctionMiddleware.function_middleware_invoke,
which already builds the InvocationContext and calls pre_invoke/post_invoke. This
drops duplicated orchestration and keeps invoke behavior consistent with the
inherited function_middleware_stream.

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
@ericevans-nv ericevans-nv force-pushed the feature/hitl-middleware branch from 94b27f0 to edb04fc Compare June 11, 2026 21:53
@rpelevin

Copy link
Copy Markdown

This is a useful direction. The part I would make explicit is that HITL middleware is not just prompt plumbing; it becomes an invocation-scoped execution boundary.

For pre-invoke SKIP, I would keep the contract visible to the caller:

  • the wrapped function is not called in either invoke or stream mode;
  • a skipped invocation is distinguishable from a function that legitimately returns None;
  • the terminal record carries function name, invocation id if available, prompt id or prompt version, decision outcome, responder source, decided_at, and skip reason;
  • modified args are bound to the same decision that allowed the invocation.

For post-invoke prompts, I would keep a separate record: the function ran, produced an output, and that output was then accepted, replaced, or cleared. That avoids mixing "the callable did not run" with "the callable ran and its result was rejected or transformed."

The regression tests I would add are:

  1. pre-invoke skip never calls the function and has the same terminal semantics in single and streaming paths.
  2. a real None return is distinguishable from a skipped invocation.
  3. modified args cannot be applied under a stale prompt response.
  4. post-invoke output replacement preserves original versus effective output state without reclassifying execution as skipped.

That keeps the middleware small while making the control boundary clear: which exact invocation was proposed, who decided, what changed, and whether the callable actually ran.

@ericevans-nv

Copy link
Copy Markdown
Contributor Author

This is a useful direction. The part I would make explicit is that HITL middleware is not just prompt plumbing; it becomes an invocation-scoped execution boundary.

For pre-invoke SKIP, I would keep the contract visible to the caller:

  • the wrapped function is not called in either invoke or stream mode;
  • a skipped invocation is distinguishable from a function that legitimately returns None;
  • the terminal record carries function name, invocation id if available, prompt id or prompt version, decision outcome, responder source, decided_at, and skip reason;
  • modified args are bound to the same decision that allowed the invocation.

For post-invoke prompts, I would keep a separate record: the function ran, produced an output, and that output was then accepted, replaced, or cleared. That avoids mixing "the callable did not run" with "the callable ran and its result was rejected or transformed."

The regression tests I would add are:

  1. pre-invoke skip never calls the function and has the same terminal semantics in single and streaming paths.
  2. a real None return is distinguishable from a skipped invocation.
  3. modified args cannot be applied under a stale prompt response.
  4. post-invoke output replacement preserves original versus effective output state without reclassifying execution as skipped.

That keeps the middleware small while making the control boundary clear: which exact invocation was proposed, who decided, what changed, and whether the callable actually ran.

This is a useful direction. The part I would make explicit is that HITL middleware is not just prompt plumbing; it becomes an invocation-scoped execution boundary.

For pre-invoke SKIP, I would keep the contract visible to the caller:

  • the wrapped function is not called in either invoke or stream mode;
  • a skipped invocation is distinguishable from a function that legitimately returns None;
  • the terminal record carries function name, invocation id if available, prompt id or prompt version, decision outcome, responder source, decided_at, and skip reason;
  • modified args are bound to the same decision that allowed the invocation.

For post-invoke prompts, I would keep a separate record: the function ran, produced an output, and that output was then accepted, replaced, or cleared. That avoids mixing "the callable did not run" with "the callable ran and its result was rejected or transformed."

The regression tests I would add are:

  1. pre-invoke skip never calls the function and has the same terminal semantics in single and streaming paths.
  2. a real None return is distinguishable from a skipped invocation.
  3. modified args cannot be applied under a stale prompt response.
  4. post-invoke output replacement preserves original versus effective output state without reclassifying execution as skipped.

That keeps the middleware small while making the control boundary clear: which exact invocation was proposed, who decided, what changed, and whether the callable actually ran.

The middleware contract is transparent interception, the caller gets back the function's effective result and should never need to know a middleware ran. Returning SKIPPED leaks the middleware's existence into the caller's API contract. It also changes the effective return type of the intercepted function. Any value other than the function's declared return type breaks the contract the caller already depends on.

HITLMiddleware is intentionally minimal: it defines the interception boundary and the decision contract, nothing more. Observability, audit logging, or any behavior around the decision belongs in the concrete subclass, where implementers can customize it to their requirements without affecting callers or the abstract interface.

@rpelevin

Copy link
Copy Markdown

I agree with that caller-contract boundary. The function caller should not have to handle a new sentinel value just because middleware intercepted the invocation, and the declared return type should stay stable.

The distinction I would preserve is control-plane visibility, not necessarily caller-visible return shape.

For pre-invoke skip, the function should not run, but the skip can be represented as middleware state or an invocation record rather than as the function's return value. That gives three separate facts:

  • caller-facing result follows the framework's chosen middleware contract;
  • the control/audit path can distinguish skipped invocation from a real None return;
  • the decision record captures proposed function, proposed args, prompt or decision id, outcome, responder source, decided_at, and skip reason.

That also keeps post-invoke prompts separate. If the function ran and the output was replaced or cleared, the record should say function_ran=true and preserve original versus effective output. If pre-invoke denied or skipped, function_ran=false.

The tests I would add:

  1. pre-invoke skip does not call the function in invoke or stream mode, while the caller-facing contract remains stable.
  2. a real None return and a skipped invocation are distinguishable in the middleware/control record.
  3. modified args are bound to the exact prompt response; a stale response cannot apply to a later invocation.
  4. post-invoke output replacement records that the function ran and preserves original versus effective output.
  5. a concrete subclass can emit audit/logging/export records without changing the abstract interface or caller return contract.

That keeps HITLMiddleware minimal while still making the execution boundary inspectable: what invocation was proposed, who decided, whether args changed, whether the callable ran, and what effective result continued.

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
@ericevans-nv

Copy link
Copy Markdown
Contributor Author

I agree with that caller-contract boundary. The function caller should not have to handle a new sentinel value just because middleware intercepted the invocation, and the declared return type should stay stable.

The distinction I would preserve is control-plane visibility, not necessarily caller-visible return shape.

For pre-invoke skip, the function should not run, but the skip can be represented as middleware state or an invocation record rather than as the function's return value. That gives three separate facts:

  • caller-facing result follows the framework's chosen middleware contract;
  • the control/audit path can distinguish skipped invocation from a real None return;
  • the decision record captures proposed function, proposed args, prompt or decision id, outcome, responder source, decided_at, and skip reason.

That also keeps post-invoke prompts separate. If the function ran and the output was replaced or cleared, the record should say function_ran=true and preserve original versus effective output. If pre-invoke denied or skipped, function_ran=false.

The tests I would add:

  1. pre-invoke skip does not call the function in invoke or stream mode, while the caller-facing contract remains stable.
  2. a real None return and a skipped invocation are distinguishable in the middleware/control record.
  3. modified args are bound to the exact prompt response; a stale response cannot apply to a later invocation.
  4. post-invoke output replacement records that the function ran and preserves original versus effective output.
  5. a concrete subclass can emit audit/logging/export records without changing the abstract interface or caller return contract.

That keeps HITLMiddleware minimal while still making the execution boundary inspectable: what invocation was proposed, who decided, whether args changed, whether the callable ran, and what effective result continued.

Any additional tracking or observability is a concrete subclass concern — the abstract interface stays clean. A concrete implementation can override the pre/post hooks to record decisions, track state, or enforce any binding logic it needs.

@rpelevin

Copy link
Copy Markdown

Agree that the abstract interface should stay clean. If observability is owned by concrete subclasses, the key contract is not a new return type or audit API; it is stable hook context.

The subclass needs enough context in the pre and post hooks to bind its own record without changing what the caller sees. For pre-invoke, that means it can consistently see:

  • invocation id or equivalent correlation key;
  • function name and proposed args, or a digest of them;
  • prompt or decision id;
  • decision outcome and responder source;
  • decided_at;
  • whether args were modified;
  • whether the callable ran;
  • skip or denial reason when it did not run.

For post-invoke, it should be able to record that the callable did run, plus original versus effective output when the output is replaced or cleared.

That keeps the split clean:

  • the abstract middleware defines interception and hook timing;
  • the concrete subclass owns records, logging, export, and binding rules;
  • the caller receives the normal effective result, not audit metadata.

The tests I would add are:

  1. a concrete subclass sees the same invocation correlation key through pre-invoke, invoke or stream, and post-invoke.
  2. pre-invoke skip lets the subclass record function_ran=false without returning a caller-visible sentinel.
  3. modified args are visible to the subclass as proposed versus effective args, or equivalent digests.
  4. post-invoke replacement records function_ran=true and preserves original versus effective output.
  5. disabling the concrete subclass leaves no audit side effects and does not change caller return semantics.

That gives implementers room to build audit or enforcement behavior in subclasses while keeping HITLMiddleware minimal and transparent to function callers.

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
@Yash-xoxo

Copy link
Copy Markdown

Keeping the caller contract clean is non-negotiable, and audit/binding logic lives in concrete subclasses.

To close the loop on what the abstract interface actually needs to expose, here's a concrete proposal for the hook context objects:

For _on_pre_invoke_response:

  • function_name: str
  • proposed_args: dict — snapshot of args at decision time
  • prompt_id: str — correlates the response back to the prompt version that produced it
  • decided_at: datetime
  • responder_source: str — human, system, timeout, etc.
  • outcome: Literal["proceed", "skip"]
  • modified_args: dict | None — set only when args were changed; None means unmodified, not skipped
  • skip_reason: str | None

For _on_post_invoke_response:

  • same correlation fields (function_name, prompt_id, decided_at, responder_source)
  • function_ran: bool — always True here, but explicit for subclass parity
  • original_output: Any
  • effective_output: Any — same object as original_output if unchanged
  • outcome: Literal["accepted", "replaced", "cleared"]

The abstract class never writes a record itself — it just guarantees the subclass sees these fields. Concrete classes can ignore what they don't need.

This also covers the stale-args concern: since modified_args is bound to a specific prompt_id, a subclass can enforce that it's only applied when the prompt response and the live invocation share the same correlation key.

@rpelevin

Copy link
Copy Markdown

This field shape is close. I would add one constraint to make the stale-args boundary enforceable: prompt_id should identify the prompt or decision, but it should not be the whole replay boundary.

For pre-invoke, the hook context should carry both:

  • invocation_id or equivalent per-call correlation key;
  • prompt_id or decision_id for the human or system response;
  • proposed_args or proposed_args_digest at prompt time;
  • effective_args or effective_args_digest after modification;
  • function_name;
  • outcome, responder_source, decided_at, and skip_reason.

Then a concrete subclass can enforce that modified_args only applies when the live invocation correlation key and prompt or decision id both match. If either side changes, it should fail closed or require a new prompt cycle. That avoids the subtle case where a prompt response is valid in isolation but gets replayed onto a later call with the same prompt shape.

For post-invoke, I like keeping function_ran explicit. I would also attach original_output and effective_output to the same invocation correlation key so replacement or clearing cannot be confused with a different call's result.

The regression tests I would add:

  1. modified_args from one invocation cannot be applied to a later invocation with a different correlation key.
  2. same prompt_id with changed proposed_args is rejected unless a new prompt or decision is issued.
  3. stream and non-stream paths expose the same correlation key to the subclass.
  4. pre-invoke skip records function_ran=false without changing caller return semantics.
  5. post-invoke replacement records function_ran=true and preserves original versus effective output under the same invocation key.

That keeps the abstract class clean, while giving concrete subclasses enough structure to make stale decisions non-replayable.


#### Configuration

`HITLMiddlewareConfig` is the base configuration class. It already declares `name="hitl"`, which maps to `_type: hitl` in YAML. Subclass it with a different `name` only if you need a distinct `_type` value. It provides two prompt fields and inherits all function-targeting fields from `DynamicMiddlewareConfig`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we tighten this contract before merging? Since HITLMiddleware is abstract and there is no built-in hitl middleware registered in nat.middleware.register, documenting HITLMiddlewareConfig as mapping to _type: hitl makes that YAML type look runnable when it is not. I think the cleaner contract is either to remove name="hitl" from the abstract base and require concrete subclasses to declare their own name, or to add a concrete registered middleware for _type: hitl if that type is intended to work out of the box.

```python
from nat.data_models.interactive import InteractionResponse
from nat.middleware.hitl import HITLMiddleware, HITLMiddlewareConfig
from nat.middleware.middleware import InvocationAction, InvocationContext

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we route these new middleware-authoring symbols through nat.plugin_api, or explicitly mark this as a specialized/provisional implementation API? The public plugin API docs tell external plugin authors to import authoring contracts from nat.plugin_api, but this example introduces HITLMiddleware, HITLMiddlewareConfig, and InvocationAction from implementation modules. If this is intended as the supported extension path, I would expect the new symbols to be exported, pinned in the plugin API tests, and added to the public API docs.

context=function_context)
]

assert chunks == [None, None]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this streaming contract explicit and align the test name/docstring with the assertion? Right now the test says clearing context.output yields no chunks, but the asserted behavior is [None, None] because function_middleware_stream always yields ctx.output after post_invoke. If None is meant to suppress a chunk, the implementation should skip yielding it; if None is meant to be a valid replacement chunk, the test should say that clearly so stream consumers do not get surprised by None values.

@mnajafian-nv mnajafian-nv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with conditional approval pending resolution of the inline suggestion, Thanks!

@rpelevin

Copy link
Copy Markdown

I agree with making these merge blockers explicit. I would treat the three comments as one contract boundary:

  1. Abstract base versus runnable middleware. If HITLMiddleware is only an authoring base, the base config should not make _type: hitl look runnable. Either remove the concrete name from the abstract base and require concrete subclasses to declare their own _type, or add a real default registered middleware and test that _type: hitl loads end to end. The docs should not imply a YAML type exists unless the registry can resolve it.

  2. Public extension API. If external authors are expected to subclass this, the stable imports should come through nat.plugin_api and be pinned by public API tests: HITLMiddleware, HITLMiddlewareConfig, InvocationAction, and the context/response types needed to implement the hooks. If the surface is provisional, say that directly and keep implementation-module imports out of the public authoring example.

  3. Stream post-invoke semantics. I would avoid bare None doing double duty as both "replacement value" and "suppress output". The test and implementation should choose one explicit contract:

  • post hook returns a suppress/clear action and no chunk is yielded; or
  • None is a valid replacement chunk and the docs/test name say consumers may receive it.

For HITL, I would prefer the first contract. A reviewer or subclass author should be able to tell whether the human decision suppressed output, replaced output, or passed it through without inferring that from an ambiguous None in the stream.

That keeps the merge boundary tight: runnable YAML types match registry behavior, the extension surface is either public or explicitly provisional, and stream consumers do not have to reverse-engineer human decision semantics from emitted chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants