Skip to content

Feat/browse mobile#676

Open
TenzinDhonyoe wants to merge 35 commits intogarrytan:mainfrom
TenzinDhonyoe:feat/browse-mobile
Open

Feat/browse mobile#676
TenzinDhonyoe wants to merge 35 commits intogarrytan:mainfrom
TenzinDhonyoe:feat/browse-mobile

Conversation

@TenzinDhonyoe
Copy link
Copy Markdown

No description provided.

TenzinDhonyoe and others added 30 commits March 23, 2026 12:59
New module that implements the same HTTP command protocol as browse/
but backed by Appium WebDriver for mobile app automation. Enables
/qa to test Expo/React Native apps on iOS Simulator.

Key components:
- ref-system.ts: Parse Appium XML accessibility tree into @e refs
- mobile-driver.ts: WebDriverIO wrapper with click, fill, screenshot, snapshot
- server.ts: HTTP server (same protocol as browse — bearer auth, state file)
- cli.ts: CLI entry point + setup-check for dependency validation
- platform/ios.ts: iOS Simulator boot, device listing, app management

Tested against real Expo app (Gluco) — snapshot, click, fill, screenshot
all working. 43 tests passing, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
QA skills now auto-detect Expo/React Native projects and switch to
mobile mode. When app.json is found and browse-mobile is available:
- Automatically starts Appium if not running
- Boots iOS Simulator if needed
- Builds/installs app if not on simulator
- Navigates through Expo dev launcher to actual app
- Uses $BM instead of $B for all browse commands
- Falls back to ~"Label" selector for RN components missing accessibilityRole
- Flags missing accessibility props as QA findings

Web QA behavior is completely unchanged — mobile branches are gated
on detection.

Files changed:
- scripts/gen-skill-docs.ts: BROWSE_MOBILE_SETUP placeholder + mobile
  detection in QA methodology + Expo/RN framework guidance
- qa/SKILL.md.tmpl: mobile setup block + platform parameter
- qa-only/SKILL.md.tmpl: same mobile additions (report-only)
- SKILL.md.tmpl: Mobile Testing section with $BM command reference
- TODOS.md: 3 new items from eng review

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled binary is 58MB (bundles entire Bun runtime + webdriverio).
Same pattern as browse/dist/ which is already gitignored.
Users build it locally via: bun build --compile browse-mobile/src/cli.ts --outfile browse-mobile/dist/browse-mobile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled binary couldn't find server.ts when deployed outside the
gstack repo. Now the CLI spawns itself with --server flag to run the
server in-process, same pattern as browse/. Works both in dev mode
(bun run cli.ts) and as compiled binary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iled binary

Three fixes:
1. Switch from bun --compile (can't resolve webdriverio transitive deps)
   to bun build (JS bundle) + shell launcher script. 3.2MB bundle vs 58MB
   binary, and all npm deps resolve correctly at runtime.
2. Filter --server from process.argv in server.ts so bundle ID isn't
   clobbered when CLI spawns itself in server mode.
3. CLI finds the bundled cli.js relative to itself, works from any directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: handleCommand() threw immediately if not connected. Now it
auto-reconnects to Appium when the first command arrives, handling
the common case where WDA takes 30-60s to compile on first session.

Bug 2: CLI didn't pass BROWSE_MOBILE_BUNDLE_ID env var when spawning
the server subprocess. Now extracts bundle ID from goto app://... and
forwards it so the Appium session is created with the correct app.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote mobile-driver.ts to use raw fetch() for all Appium WebDriver
protocol calls instead of webdriverio. This eliminates the transitive
dependency bundling problem permanently.

Results:
- Bundle: 119KB (was 3.2MB with webdriverio)
- Dependencies: 0 npm packages (was webdriverio + 230 transitive deps)
- All Appium commands work via W3C WebDriver REST protocol over HTTP

Also fixed:
- CLI timeout: 180s for goto (Appium connect), 60s for other commands
- Removed webdriverio from package.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
/execute returns 404 on Appium — the correct W3C route is /execute/sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When /qa detects a mobile project for the first time, it checks if
browse-mobile bash permissions exist in the user's settings.json.
If not, offers to add them — one-time setup that enables fully
automated mobile QA without per-command approval prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Expanded permission patterns to cover inline bash (SID=..., curl -X
   POST, JAVA_HOME=...) that the QA skill generates. Previous patterns
   only matched commands starting with $BM.

2. Added speed guidance: batch multiple $BM commands in single bash calls
   using && instead of separate tool calls. Take screenshots at milestones
   only, not after every tap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
browseDir is ~/.claude/skills/gstack/browse/dist — need ../../ to reach
the gstack root, not ../ which only goes up to browse/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p command

Three fixes:
1. Changed ~"Label" to label:Label syntax — the ~ was being interpreted
   by zsh as home directory expansion, breaking accessibility label clicks.
2. Added tap <x> <y> command for coordinate-based tapping when elements
   can't be found by ref or label.
3. Updated all skill templates and help text to use new label: syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Revyl as a second mobile QA backend alongside browse-mobile (Appium).
When Revyl is authenticated, /qa and /qa-only prefer cloud devices over
local simulator — no Xcode/Appium/Java setup needed.

Changes:
- Revyl auth detection in browse-mobile setup
- Full Revyl QA path: init → app detection → dev loop (with tunnel
  verification + 30s timeout) → static fallback → build caching →
  device provisioning → command mapping
- YAML validation + auto-fix after revyl init (known CLI bug)
- App-id auto-detection with AskUserQuestion for ambiguous matches
- Mobile auth strategy (sign-up attempt, credential request, Apple
  Sign-In scope limitation)
- Mobile exploration checklist (8 items: transitions, scroll, keyboard,
  back nav, empty/loading states, orientation, accessibility)
- Fix Rule 5 contradiction: scoped "never read source" to testing phases
- Batch re-verification for mobile fixes (rebuild once after all fixes)
- Mobile QA timing expectations in setup section
- 3 new TODOs: Revyl E2E test, /browse Revyl integration, Android support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revyl is available as MCP tools (start_device_session, screenshot,
device_tap, etc.), not a CLI binary. The bash-based `revyl auth status`
check always failed because there's no `revyl` in PATH.

Now the skill tells Claude to check for Revyl MCP tool availability
directly — if the tools exist in the conversation context, always
use Revyl for mobile QA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The revyl CLI is installed on the user's machine — detection should
check `command -v revyl` in bash. Previous commit wrongly switched
to MCP tool detection which doesn't work in bash context.

Now: if `revyl` CLI exists in PATH → REVYL_READY, always preferred
over Appium. Auth status printed for diagnostics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a mobile project is detected but revyl CLI isn't installed,
AskUserQuestion now tells the user how to install it and offers
three options: install now, use local Appium, or skip mobile QA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The skill templates used `revyl screenshot` but the actual CLI command
is `revyl device screenshot --out <path>`. All device interaction lives
under the `device` subcommand. Also adds --out flag for explicit output
path control.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Static mode fallback works perfectly — this is a DX improvement for
reusing an existing Metro process instead of starting a conflicting one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bundle IDs and simulator UDIDs are passed to shell commands via string
interpolation. Validate they don't contain shell metacharacters to
prevent command injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llowing

- DRY: pointer action construction was duplicated 4x (performClick,
  tapCoordinates, fill coordinate fallback, scroll). Extract tapAction()
  and swipeAction() helpers.
- findElement() now distinguishes "no such element" (returns null) from
  actual errors like timeouts and network failures (rethrows).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Alive

- server.ts: tap command now validates args are valid numbers before
  passing to tapCoordinates, preventing silent NaN propagation.
- cli.ts: isPidAlive now returns true for EPERM (process exists but
  different user), false only for ESRCH (process doesn't exist).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
browse-mobile source changes now trigger QA evals and the new
browse-mobile-basic test category. Rebuilt dist with all fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- swipe: add --x 220 --y 500 (required start coordinates)
- type: add --target param (single command, no separate tap needed)
- dev loop: detect existing Metro on :8081, verify it's node/metro
  before killing to avoid port conflict with Revyl
- Update all command references across gen-skill-docs.ts and both
  qa/qa-only templates for consistency
- Add TODO for Revyl command table validation test (P2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-detect Revyl auth status and run `revyl auth login` if needed
  instead of passive prose instruction
- Add Revyl permissions to Claude Code settings (Step 0) so commands
  don't trigger 30-50 permission prompts per QA session
- Detect Xcode before attempting local build; try EAS cloud build as
  fallback; give clear guidance if neither is available
- Add cost/billing note for Revyl cloud device sessions
- Add TODO for headless/CI auth environments (P3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l QA

Real-world QA testing revealed 6 issues:
1. revyl dev start reports "ready" with broken tunnel — now parse HMR
   diagnostics and fall back to static mode if all checks fail
2. App loads from cached build with no hot reload — now detect and warn
3. Background process polling was undocumented — add explicit 5s poll loop
4. revyl dev stop doesn't exist — document kill procedure
5. Session times out during fix phases — add keepalive guidance
6. Permission check was weak (grep count) — now checks specific patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… cache

When HMR diagnostics fail but the app still launches, compare the on-device
build's git SHA against HEAD. If they differ, explicitly warn that testing
is on stale code and force static mode rebuild. This catches the most
dangerous failure mode: app appears to work but recent changes are invisible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cloudflare tunnel DNS is inherently racy — first attempt often fails.
Now the skill retries once (kill → wait 5s → restart) before falling
back to static mode. Also adds direct DNS resolution check via nslookup
before HTTP polling, which catches the root cause faster than waiting
for curl timeouts. The flow is now: attempt 1 → verify HMR + DNS →
if broken, retry → attempt 2 → if still broken, stale build check →
static fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace AskUserQuestion permission prompts with automatic setup.
Both /qa and /qa-only now auto-add a comprehensive set of allow
rules to ~/.claude/settings.json on first run, covering browse,
revyl, appium, git, curl, and all other commands used during QA.
Uses a marker comment to only run once. Also expanded the Revyl
permission list to include nslookup, xcode-select, npx eas, and
other commands added in recent fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bugs

fix: Revyl mobile QA reliability — tunnel retry, permissions, first-time UX
Three fixes from live mobile QA testing:

1. Priority flip: check local simulator first (0s setup), then
   DerivedData Debug build (~30s), then Revyl cloud devices. Solo
   devs with the app already running skip Revyl entirely.

2. Fast-fail tunnel DNS: single 15s DNS check instead of 120s x2
   retry loop. If tunnel is dead, fall back immediately instead of
   burning 4+ minutes.

3. Debug builds instead of Release: much faster to build, likely
   already cached in DerivedData from normal dev work. Release
   builds are unnecessary for QA testing.

Net effect: mobile QA setup drops from ~10 min to ~30s for devs
with local tooling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TenzinDhonyoe and others added 5 commits March 27, 2026 18:36
Reverts priority flip (local sim first) — Revyl's AI-grounded targeting
is too valuable to skip. Keeps fast-fail DNS (15s) and Debug builds.

Also fixes ~/.claude/ path leaking into Codex-generated SKILL.md files:
- Settings path now transformed to ~/.codex/ during codex generation
- Browse-mobile permission uses ctx.paths.skillRoot
- Single host-aware cat permission entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The polling grep matched "failed" in HMR diagnostic lines like
"[hmr] Metro health: FAILED" and treated them as fatal errors,
killing a working dev loop that was still provisioning the device.

Now only fatal errors (panic, process died, ENOSPC) trigger
DEV_LOOP_FAILED. HMR warnings emit DEV_LOOP_HMR_WARNING instead —
the device continues provisioning and loads from the cached build.
Hot reload is degraded but QA testing can proceed immediately.

This was the root cause of the 10-minute wasted setup: the skill
killed the process twice over non-fatal warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The QA skill's auto-configure step was missing permissions for variable
assignments (METRO_PID=, TUNNEL_URL=, etc.), shell constructs (for, if,
[), and common tools (echo, ps, sed, head, etc.). Commands starting with
these prefixes would prompt for approval, breaking automation.

Added ~60 new permission patterns covering all commands used in the QA
and Revyl mobile flows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chains QA → Design Review in a Ralph Loop until convergence (zero issues).
Self-regulating with 3 layers: per-sub-skill heuristics, cross-iteration
risk scoring with flapping detection, and a hard iteration cap.

- garry-wiggum/SKILL.md.tmpl: skill template with Phases 0-5.5
- garry-wiggum/SKILL.md: generated from template
- test/helpers/touchfiles.ts: add garry-wiggum touchfile entry
- TODOS.md: add deferred expansions (parallel agents, diff-aware, E2E test)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kill

feat: add /garry-wiggum iterative perfection loop skill
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant