Skip to content

fix(core): fall back to podman CLI when no API socket is found#1858

Open
russellb wants to merge 5 commits into
NVIDIA:mainfrom
russellb:fix/podman-autodetect-cli-fallback
Open

fix(core): fall back to podman CLI when no API socket is found#1858
russellb wants to merge 5 commits into
NVIDIA:mainfrom
russellb:fix/podman-autodetect-cli-fallback

Conversation

@russellb

Copy link
Copy Markdown
Contributor

Summary

Auto-detection fails to discover Podman on macOS because the API socket symlink is not always present — it varies by Podman version, machine provider, and platform. This adds a podman info CLI fallback so auto-detection succeeds when Podman is functional but the socket isn't at a well-known path.

Related Issue

Closes #1834

Changes

  • Add podman_cli_responds() fallback to is_podman_available() in openshell-core config
  • When no socket candidate responds, tries podman info which uses Podman's own internal discovery

Testing

  • cargo test -p openshell-core — 164 tests pass
  • Verified on macOS (Apple Silicon) with Podman 5.8.2 / machine 5.7.1
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

…A#1834)

The Podman API socket symlink is not always present — it varies by
version, machine provider, and platform.  When none of the candidate
socket paths respond, try `podman info` as a fallback so auto-detection
succeeds on macOS setups where Podman is functional but the socket is
not at a well-known path.

Closes NVIDIA#1834

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb russellb requested review from a team, derekwaynecarr and mrunalp as code owners June 10, 2026 18:34
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

drew
drew previously approved these changes Jun 10, 2026
@drew

drew commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

/ok to test e068662

@elezar elezar left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have: This seems to fix detection, but doesn't change how the gateway interacts with the podman driver? Why are no changes needed there?

@russellb

Copy link
Copy Markdown
Contributor Author

One question I have: This seems to fix detection, but doesn't change how the gateway interacts with the podman driver? Why are no changes needed there?

It's a fair question. I have podman on mac, but I don't have this problem. I was hoping the reporter would test this and see if it was enough. It seems like a reasonable change on the detection side. We go from "podman not detected at all" to either:

  1. It works (seems doubtful, but let's see ...)
  2. We detected podman is present, but we aren't finding the socket for some reason

In either case, it's more consistent with docker in this part of the code. Docker does the same thing with falling back to a CLI check at this stage.

A next improvement could be to discover the socket using podman info --format '{{.Host.RemoteSocket.Path}}' if it's not in one of the paths we expected to find it.

cc @r3v5, reporter of the issue

@r3v5

r3v5 commented Jun 10, 2026

Copy link
Copy Markdown

One question I have: This seems to fix detection, but doesn't change how the gateway interacts with the podman driver? Why are no changes needed there?

It's a fair question. I have podman on mac, but I don't have this problem. I was hoping the reporter would test this and see if it was enough. It seems like a reasonable change on the detection side. We go from "podman not detected at all" to either:

  1. It works (seems doubtful, but let's see ...)
  2. We detected podman is present, but we aren't finding the socket for some reason

In either case, it's more consistent with docker in this part of the code. Docker does the same thing with falling back to a CLI check at this stage.

A next improvement could be to discover the socket using podman info --format '{{.Host.RemoteSocket.Path}}' if it's not in one of the paths we expected to find it.

cc @r3v5, reporter of the issue

Hey @russellb ! Sure, I will test your fix, no worries.

@russellb

Copy link
Copy Markdown
Contributor Author

@r3v5 thanks. The output of that podman info command would be helpful too

@elezar

elezar commented Jun 11, 2026

Copy link
Copy Markdown
Member

I have a couple of concerns / questions here.

The first is the one that I've already mentioned. This change checks that podman info works, but does not use the configured socket for actually constructing the driver. I would thus be surprised if this change actually allows the podman driver to be used. Due to the precedence of container engine detection, this also means that this is a breaking change for systems where Podman was running on a non-standard socket, but Docker was also installed and usable. On these systems, the gateway will now try to use the Podman driver and fail.

Then, although this seems to align Podman functionality with Docker, there are subtle differences between the two paths. podman info is a slower (and stronger) contract than running docker --version which only checks CLI existence. Podman always uses the configured socket, whereas Docker includes logic to resolve the actual Docker connection later through the local client.

Although it is a slightly larger change than originally proposed, I think there is some benefit in trying to better align the detection paths for Podman and Docker. Ideally these would return a usable driver config (including, for example socket information) and not just a boolean. This config could then be used directly when instantiating the driver(s) instead of rediscovering the relevant config (as is done in the Docker case).

@russellb

Copy link
Copy Markdown
Contributor Author

Thanks, @elezar. I'm happy to work on the changes you described.

@krishicks

Copy link
Copy Markdown
Collaborator

I made a change to podman auto-detection recently (#1536), to avoid using just the existence of the CLI to determine that podman was available. In that change I made sure to align the auto-detection with the actual client socket choice mechanism.

Podman unfortunately has different behavior on macos depending on how you install it, which I talked about in #1690 (comment)

This PR could supersede #1690 (which is scoped to documentation), but it needs to keep the auto-detection mechanism and what the client uses to make the actual connection be aligned, like @elezar has raised.

@russellb

Copy link
Copy Markdown
Contributor Author

Great feedback, thanks @krishicks. I'll iterate on this.

@r3v5

r3v5 commented Jun 12, 2026

Copy link
Copy Markdown

@r3v5 thanks. The output of that podman info command would be helpful too

Hey @russellb ! I am coming back with results from local testing on my machine.

Tested on macOS (Apple Silicon, M3 Pro RAM 36 GB), Podman 5.7.1 via Homebrew.

Detection fix works — gateway now finds podman (Using compute driver driver=podman). Without this PR, it crashes with:

Error:   × configuration error: no compute driver configured and auto-detection found
  │ no suitable driver; set --drivers or OPENSHELL_DRIVERS to kubernetes,
  │ podman, docker, or vm

I ran RUST_LOG=info ./target/release/openshell-gateway --disable-tls

Gateway output with the fix:

 2026-06-12T10:06:18.394079Z  WARN openshell_server::cli: TLS disabled — listening on plaintext HTTP
2026-06-12T10:06:18.394194Z  WARN openshell_server::cli: Neither mTLS user auth nor OIDC nor sandbox JWT auth is configured — the gateway has no authentication mechanism
2026-06-12T10:06:18.394200Z  INFO openshell_server::cli: Starting OpenShell server bind=127.0.0.1:17670
2026-06-12T10:06:18.690885Z  INFO openshell_server: Using compute driver driver=podman
2026-06-12T10:06:18.691059Z  WARN openshell_driver_podman::driver: Podman socket not found; is podman machine running? Try `podman machine start` or set OPENSHELL_PODMAN_SOCKET to override. path=/Users/ianmiller/.local/share/containers/podman/machine/podman.sock
2026-06-12T10:06:18.691323Z  WARN openshell_driver_podman::driver: Podman socket not ready, retrying attempt=1 max_retries=5 error=connection error: /Users/ianmiller/.local/share/containers/podman/machine/podman.sock: No such file or directory (os error 2)
2026-06-12T10:06:20.693409Z  WARN openshell_driver_podman::driver: Podman socket not ready, retrying attempt=2 max_retries=5 error=connection error: /Users/ianmiller/.local/share/containers/podman/machine/podman.sock: No such file or directory (os error 2)
2026-06-12T10:06:22.695718Z  WARN openshell_driver_podman::driver: Podman socket not ready, retrying attempt=3 max_retries=5 error=connection error: /Users/ianmiller/.local/share/containers/podman/machine/podman.sock: No such file or directory (os error 2)
2026-06-12T10:06:24.697982Z  WARN openshell_driver_podman::driver: Podman socket not ready, retrying attempt=4 max_retries=5 error=connection error: /Users/ianmiller/.local/share/containers/podman/machine/podman.sock: No such file or directory (os error 2)
2026-06-12T10:06:26.700892Z  WARN openshell_driver_podman::driver: Podman socket not ready, retrying attempt=5 max_retries=5 error=connection error: /Users/ianmiller/.local/share/containers/podman/machine/podman.sock: No such file or directory (os error 2)
Error:   × execution error: failed to create compute runtime: connection error: /Users/ianmiller/.local/share/containers/podman/machine/podman.sock: No such file or
  │ directory (os error 2)

Socket mismatch — after detection, driver construction fails because default_socket_path() returns ~/.local/share/containers/podman/machine/podman.sock which doesn't exist on my system. The actual host-side socket is at:

$ podman machine inspect | grep -A1 PodmanSocket
 "PodmanSocket": {
     "Path": "/var/folders/1q/jx7s14b928n8zvstgfk98lj00000gn/T/podman/podman-machine-default-api.sock"

podman info output

Client:
  APIVersion: 5.7.1
  BuildOrigin: brew
  Built: 1765311063
  BuiltTime: Tue Dec  9 20:11:03 2025
  GitCommit: ""
  GoVersion: go1.25.5
  Os: darwin
  OsArch: darwin/arm64
  Version: 5.7.1
host:
  arch: arm64
  buildahVersion: 1.42.2
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.13-2.fc43.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.13, commit: '
  cpuUtilization:
    idlePercent: 98.71
    systemPercent: 0.48
    userPercent: 0.81
  cpus: 6
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: coreos
    version: "43"
  emulatedArchitectures:
  - linux/386
  - linux/amd64
  - linux/arm64be
  eventLogger: journald
  freeLocks: 2038
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 501
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.17.7-300.fc43.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 6541344768
  memTotal: 16718606336
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.17.0-1.fc43.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.17.0
    package: netavark-1.17.1-1.fc43.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.17.1
  ociRuntime:
    name: crun
    package: crun-1.24-1.fc43.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.24
      commit: 54693209039e5e04cbe3c8b1cd5fe2301219f0a1
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/sbin/pasta
    package: passt-0^20250919.g623dbf6-1.fc43.aarch64
    version: |
      pasta 0^20250919.g623dbf6-1.fc43.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: unix:///run/user/501/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/sbin/slirp4netns
    package: slirp4netns-1.3.1-3.fc43.aarch64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.9.1
      SLIRP_CONFIG_VERSION_MAX: 6
      libseccomp: 2.6.0
  swapFree: 0
  swapTotal: 0
  uptime: 28h 24m 39.00s (Approximately 1.17 days)
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 68464410624
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 523
  runRoot: /run/user/501/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 5.7.1
  BuildOrigin: 'Copr: packit/containers-podman-27732'
  Built: 1765238400
  BuiltTime: Tue Dec  9 00:00:00 2025
  GitCommit: f845d14e941889ba4c071f35233d09b29d363c75
  GoVersion: go1.25.4 X:nodwarf5
  Os: linux
  OsArch: linux/arm64
  Version: 5.7.1

@russellb

Copy link
Copy Markdown
Contributor Author

Perfect, thanks. This confirms the non-standard socket location and that discovery needs to include determining socket location to use.

russellb added 4 commits June 12, 2026 12:10
Auto-detection now queries `podman info --format json` to find the
actual API socket when no well-known socket path responds.  On macOS
(serviceIsRemote=true) it follows up with `podman machine inspect` to
get the host-side forwarded socket; on native Linux it uses
remoteSocket.path directly.  The discovered path is threaded into
PodmanComputeConfig so the driver connects to the right socket instead
of falling back to a default that may not exist.

Closes NVIDIA#1834

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Unlike `podman info`, `podman machine inspect` outputs JSON by default.
Passing `--format json` is interpreted as a Go template literal, causing
it to output the string "json" instead of the JSON payload.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
When the socket probe succeeds against a well-known candidate, return
that path so the driver uses the exact socket that was verified rather
than rediscovering it via default_socket_path().  This ensures detection
and driver connection always agree on the socket, regardless of whether
it was found via probe or CLI discovery.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Each variant carries only its own connection metadata — Podman gets a
socket_path, other drivers carry nothing.  Eliminates the generic
Optional field and makes the match arms self-documenting.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb

Copy link
Copy Markdown
Contributor Author

@r3v5 I've pushed follow-up commits that address the socket mismatch you confirmed. Changes:

  • Socket discovery during detection: When the well-known socket paths don't respond, we now query podman info --format json and check serviceIsRemote. If true (macOS), we run podman machine inspect to get the host-side forwarded socket. If false (native Linux), we use remoteSocket.path directly.
  • Detection and driver always agree on the socket: The discovered path is threaded into PodmanComputeConfig, so the driver connects to the exact socket that detection verified. This applies whether the socket was found via the probe or CLI discovery.
  • DetectedDriver is now an enum: Each driver variant carries its own connection metadata (Podman { socket_path }) rather than a generic optional field.

Precedence for the socket path is: OPENSHELL_PODMAN_SOCKET env var > config file > discovered socket.

Could you re-test on your Homebrew Podman setup? The retry loop against the missing ~/.local/share/containers/podman/machine/podman.sock should be gone — it should now find your socket at /var/folders/.../podman-machine-default-api.sock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Gateway auto-detection fails to discover Podman on macOS

5 participants