refactor(core): device input primitives actions#2452
Conversation
Introduce a separate "native input surface" alongside `actionSpace()` so that manual UI control (Studio's device preview pointer overlay) does not have to ride on the AI action space: - `actionSpace()` is the AI's extensible vocabulary (custom actions injected by users live there). - `pointer` is the device's low-level input surface, fixed by what the underlying transport (WDA / ADB / HDC) can synthesize. Defines `PointerPoint` and `PointerCapability` (tap, doubleClick, longPress, swipe, dragAndDrop, keyboardPress, input; pinch optional). Adds the optional `pointer?` field on `AbstractInterface`. No device implements it yet — that comes in the per-platform follow-ups. `Scroll`, `ClearInput`, `CursorMove` are intentionally left out: they are high-level AI vocabulary built on top of the basic gestures, not native input primitives.
Wrap the existing low-level WDA gesture methods (`mouseClick`, `doubleTap`, `longPress`, `swipe`, `pressKey`, `typeText`, `wdaBackend.pinch`) into a `PointerCapability` field so manual UI control can drive iOS devices without going through the AI action space. The pointer's `pinch` reuses the same baseDistance / fingerDistance shape that `normalizePinchParam` derives in the action callback, so AI-driven and manual pinches stay visually identical.
Wrap the existing low-level ADB / yadb gesture methods (`mouseClick`, `mouseDoubleClick`, `longPress`, `mouseDrag`, `keyboardPress`, `keyboardType`, plus the inline yadb pinch shell) into a `PointerCapability` field. Coordinate adjustment (logical → physical) lives inside `mouseClick` / `mouseDrag` already, so the wrapper does not duplicate it. Pinch does inline its own coordinate-ratio scaling — same code path the action callback uses.
Wrap the existing low-level HDC gesture methods (`tap`, `doubleTap`, `longPress`, `hdc.swipe`, `hdc.drag`, `keyboardPress`, `inputText`) into a `PointerCapability` field. `pinch` is intentionally not implemented — HDC on HarmonyOS NEXT does not currently support reliable two-finger gesture injection. Manual UI pinch on Harmony devices will return 404 from /interact. `longPress`'s duration option is accepted but ignored, mirroring the underlying HDC longPress which hard-codes its press duration.
…treamHandler
Manual UI control no longer rides on `actionSpace()` — it talks to the
device's `pointer` capability directly. server.ts becomes plumbing.
/interact route
- Looks up `agent.interface.pointer` (404 if the device doesn't expose
one — Manual control is for touch devices only).
- Hands the request body to `dispatchPointer` (in pointer-dispatch.ts).
- Dispatcher is a flat switch: parse + range-check fields, call the
typed pointer method. No more in-server LocateResultElement fabrication
or fake ExecutorContext.
- Validation errors → 400; unsupported actionType / capability → 404.
/mjpeg route
- Native-probe / negative-cache / polling-fallback / native-comeback
self-heal moved to a dedicated `MjpegStreamHandler` class. The route
is ~10 lines.
- `setActiveAgent()` private setter funnels every write to
`_activeConnection.agent`. Setter calls `_mjpegHandler.reset()` so
reconnects to a different device drop the stale negative cache.
- Switches `console.log`/`console.warn` to `getDebug('playground:mjpeg')`
per AGENTS.md's logger rule.
Tests
- `server-interact.test.ts` rewritten to drive the route through a real
pointer stub; covers Tap, Swipe, missing field, missing pointer,
optional capability missing (Pinch on Harmony), unknown actionType.
- `build-interact-params.test.ts` removed — its subject (the in-server
translation table) no longer exists.
Deploying midscene with
|
| Latest commit: |
1d46fd7
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://0f5b8885.midscene.pages.dev |
| Branch Preview URL: | https://codex-device-input-primitive.midscene.pages.dev |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 949e661516
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (target && opts?.replace !== false) { | ||
| await this.clearInput(target); |
There was a problem hiding this comment.
Preserve replace semantics without a locate target
When Input is called in the default replace mode without a locate target, opts.replace is still true but target is undefined, so this branch skips clearInput(undefined) and the focused field is not cleared before typing. The previous Android action always called clearInput(element) before typing unless the mode was typeOnly, which allowed replacing the currently focused input even when the planner omitted locate; now those calls append text instead.
Useful? React with 👍 / 👎.
| )(requirePoint(body)); | ||
| } | ||
|
|
||
| case 'LongPress': { |
There was a problem hiding this comment.
Dispatch hover and right-click primitives
For interfaces that expose inputPrimitives (for example the computer/RDP devices with rightClick and hover primitives), /interact now always calls dispatchPointer and returns before the existing actionSpace fallback. Since this switch has no RightClick or Hover cases even though the server still builds params for those action types, manual /interact requests for them hit the default Unknown actionType path instead of invoking the supported primitive.
Useful? React with 👍 / 👎.
No description provided.