Skip to content

refactor(core): device input primitives actions#2452

Open
yuyutaotao wants to merge 21 commits into
mainfrom
codex/device-input-primitives-actions
Open

refactor(core): device input primitives actions#2452
yuyutaotao wants to merge 21 commits into
mainfrom
codex/device-input-primitives-actions

Conversation

@yuyutaotao
Copy link
Copy Markdown
Collaborator

No description provided.

quanru and others added 7 commits May 9, 2026 10:15
Introduce a separate "native input surface" alongside `actionSpace()` so
that manual UI control (Studio's device preview pointer overlay) does not
have to ride on the AI action space:

- `actionSpace()` is the AI's extensible vocabulary (custom actions injected
  by users live there).
- `pointer` is the device's low-level input surface, fixed by what the
  underlying transport (WDA / ADB / HDC) can synthesize.

Defines `PointerPoint` and `PointerCapability` (tap, doubleClick, longPress,
swipe, dragAndDrop, keyboardPress, input; pinch optional). Adds the
optional `pointer?` field on `AbstractInterface`. No device implements it
yet — that comes in the per-platform follow-ups.

`Scroll`, `ClearInput`, `CursorMove` are intentionally left out: they are
high-level AI vocabulary built on top of the basic gestures, not native
input primitives.
Wrap the existing low-level WDA gesture methods (`mouseClick`, `doubleTap`,
`longPress`, `swipe`, `pressKey`, `typeText`, `wdaBackend.pinch`) into a
`PointerCapability` field so manual UI control can drive iOS devices
without going through the AI action space.

The pointer's `pinch` reuses the same baseDistance / fingerDistance shape
that `normalizePinchParam` derives in the action callback, so AI-driven
and manual pinches stay visually identical.
Wrap the existing low-level ADB / yadb gesture methods (`mouseClick`,
`mouseDoubleClick`, `longPress`, `mouseDrag`, `keyboardPress`,
`keyboardType`, plus the inline yadb pinch shell) into a
`PointerCapability` field.

Coordinate adjustment (logical → physical) lives inside `mouseClick` /
`mouseDrag` already, so the wrapper does not duplicate it. Pinch does
inline its own coordinate-ratio scaling — same code path the action
callback uses.
Wrap the existing low-level HDC gesture methods (`tap`, `doubleTap`,
`longPress`, `hdc.swipe`, `hdc.drag`, `keyboardPress`, `inputText`) into
a `PointerCapability` field.

`pinch` is intentionally not implemented — HDC on HarmonyOS NEXT does
not currently support reliable two-finger gesture injection. Manual UI
pinch on Harmony devices will return 404 from /interact.

`longPress`'s duration option is accepted but ignored, mirroring the
underlying HDC longPress which hard-codes its press duration.
…treamHandler

Manual UI control no longer rides on `actionSpace()` — it talks to the
device's `pointer` capability directly. server.ts becomes plumbing.

/interact route
- Looks up `agent.interface.pointer` (404 if the device doesn't expose
  one — Manual control is for touch devices only).
- Hands the request body to `dispatchPointer` (in pointer-dispatch.ts).
- Dispatcher is a flat switch: parse + range-check fields, call the
  typed pointer method. No more in-server LocateResultElement fabrication
  or fake ExecutorContext.
- Validation errors → 400; unsupported actionType / capability → 404.

/mjpeg route
- Native-probe / negative-cache / polling-fallback / native-comeback
  self-heal moved to a dedicated `MjpegStreamHandler` class. The route
  is ~10 lines.
- `setActiveAgent()` private setter funnels every write to
  `_activeConnection.agent`. Setter calls `_mjpegHandler.reset()` so
  reconnects to a different device drop the stale negative cache.
- Switches `console.log`/`console.warn` to `getDebug('playground:mjpeg')`
  per AGENTS.md's logger rule.

Tests
- `server-interact.test.ts` rewritten to drive the route through a real
  pointer stub; covers Tap, Swipe, missing field, missing pointer,
  optional capability missing (Pinch on Harmony), unknown actionType.
- `build-interact-params.test.ts` removed — its subject (the in-server
  translation table) no longer exists.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 11, 2026

Deploying midscene with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1d46fd7
Status: ✅  Deploy successful!
Preview URL: https://0f5b8885.midscene.pages.dev
Branch Preview URL: https://codex-device-input-primitive.midscene.pages.dev

View logs

@yuyutaotao yuyutaotao marked this pull request as ready for review May 12, 2026 03:48
@yuyutaotao yuyutaotao requested a review from quanru May 12, 2026 03:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 949e661516

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +116 to +117
if (target && opts?.replace !== false) {
await this.clearInput(target);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve replace semantics without a locate target

When Input is called in the default replace mode without a locate target, opts.replace is still true but target is undefined, so this branch skips clearInput(undefined) and the focused field is not cleared before typing. The previous Android action always called clearInput(element) before typing unless the mode was typeOnly, which allowed replacing the currently focused input even when the planner omitted locate; now those calls append text instead.

Useful? React with 👍 / 👎.

)(requirePoint(body));
}

case 'LongPress': {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Dispatch hover and right-click primitives

For interfaces that expose inputPrimitives (for example the computer/RDP devices with rightClick and hover primitives), /interact now always calls dispatchPointer and returns before the existing actionSpace fallback. Since this switch has no RightClick or Hover cases even though the server still builds params for those action types, manual /interact requests for them hit the default Unknown actionType path instead of invoking the supported primitive.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants