-
Notifications
You must be signed in to change notification settings - Fork 72
Adding SGLang backend [with dependency fix] #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Dorin Geman <[email protected]>
…pport Co-authored-by: Denys Zhak <[email protected]> Co-authored-by: Dorin Geman <[email protected]>
Summary of ChangesHello @dougyster, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the SGLang backend into the model-runner, addressing critical dependency and compatibility issues encountered in a prior attempt. The changes involve updating the SGLang version, incorporating CUDA 13 for enhanced GPU support, and refining model name handling. This significantly expands the model-runner's capabilities, allowing it to leverage SGLang for efficient chat completions, particularly on modern NVIDIA GPUs like B200s and H200s, and provides new Docker targets for building and running SGLang-specific images. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the SGLang backend, including Dockerfile changes for building the SGLang environment, Makefile targets for new Docker images, and the necessary Go backend implementation. The changes are well-structured, using build tags for conditional compilation of vLLM to keep the SGLang image lean. A new UsesTCP method has been added to the Backend interface to support SGLang's TCP-only communication, with corresponding updates to the scheduler and other backends.
My review focuses on a few key areas:
- A critical issue in the SGLang memory estimation logic which could cause runtime failures.
- A potential incompatibility in the Dockerfile's CUDA setup.
- A minor optimization for the Docker image size.
Overall, this is a solid contribution that significantly extends the capabilities of the model runner. The code is clean and the approach for adding a new backend is robust.
| return inference.RequiredMemory{ | ||
| RAM: 1, | ||
| VRAM: 1, | ||
| }, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetRequiredMemoryForModel function currently returns hardcoded placeholder values of 1 for both RAM and VRAM. This is dangerous as it will cause the scheduler to severely underestimate the memory required to run a model, likely leading to out-of-memory errors and scheduler failures.
If memory estimation is not yet implemented for SGLang, it would be safer to return an error, similar to how the MLX backend handles it. This would prevent the model from being scheduled until proper estimation is in place.
// TODO: Implement accurate memory estimation for SGLang.
return inference.RequiredMemory{}, errors.New("memory estimation for sglang is not yet implemented")| && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb \ | ||
| && dpkg -i cuda-keyring_1.1-1_all.deb \ | ||
| && apt update && apt install -y cuda-toolkit-13-0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dockerfile is configured to use an ubuntu24.04 base image (as defined in the Makefile), but it's fetching the CUDA keyring from the ubuntu2204 repository. This mismatch could lead to package installation issues or incompatibilities. Please use the repository corresponding to the base image's OS version (ubuntu2404).
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb \
&& dpkg -i cuda-keyring_1.1-1_all.deb \
&& apt update && apt install -y cuda-toolkit-13-0 \
| USER root | ||
|
|
||
| # Install CUDA toolkit 13 for nvcc (needed for flashinfer JIT compilation) | ||
| RUN apt update && apt install -y \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code. (link)
- Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code. (link)
General comments:
- The SGLang Docker integration appears broken:
binaryPath()points to/opt/sglang-env/sglang, but the Dockerfile only creates a Python venv withsglanginstalled (so the executable will be under/opt/sglang-env/bin), andinitFromDockernever populatespythonPath, meaningRunwill likely error out in the container; consider aligningbinaryPathwith the venv layout (e.g.,bin/sglangorbin/python3 -m sglang.launch_server) and ensuring the Docker install path is actually used. - The SGLang CUDA image build in
release.ymltargets bothlinux/amd64andlinux/arm64while the Dockerfile installscuda-toolkit-13-0from theubuntu2204/x86_64repo, which will fail or be incorrect for arm64; consider gating the CUDA toolkit setup to amd64 only or using an appropriate arm64-compatible base/repo. - The new TCP backend port selection in
runner.gouses a fixedtcpBackendBasePort(30000) with no configuration hooks, which may conflict with existing services or other instances; it might be safer to make this base port configurable (e.g., via env var or config) or at least document the assumption about available port range.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The SGLang Docker integration appears broken: `binaryPath()` points to `/opt/sglang-env/sglang`, but the Dockerfile only creates a Python venv with `sglang` installed (so the executable will be under `/opt/sglang-env/bin`), and `initFromDocker` never populates `pythonPath`, meaning `Run` will likely error out in the container; consider aligning `binaryPath` with the venv layout (e.g., `bin/sglang` or `bin/python3 -m sglang.launch_server`) and ensuring the Docker install path is actually used.
- The SGLang CUDA image build in `release.yml` targets both `linux/amd64` and `linux/arm64` while the Dockerfile installs `cuda-toolkit-13-0` from the `ubuntu2204/x86_64` repo, which will fail or be incorrect for arm64; consider gating the CUDA toolkit setup to amd64 only or using an appropriate arm64-compatible base/repo.
- The new TCP backend port selection in `runner.go` uses a fixed `tcpBackendBasePort` (30000) with no configuration hooks, which may conflict with existing services or other instances; it might be safer to make this base port configurable (e.g., via env var or config) or at least document the assumption about available port range.
## Individual Comments
### Comment 1
<location> `.github/workflows/release.yml:174` </location>
<code_context>
+ with:
+ file: Dockerfile
+ target: final-sglang
+ platforms: linux/amd64, linux/arm64
+ build-args: |
+ "LLAMA_SERVER_VERSION=${{ inputs.llamaServerVersion }}"
</code_context>
<issue_to_address>
**issue (bug_risk):** Building the SGLang CUDA image for arm64 is likely incompatible with the x86_64-only CUDA repo used in the Dockerfile.
This stage installs CUDA 13 from the `ubuntu2204/x86_64` repo, but the workflow also builds `linux/arm64` images. On arm64, this will likely fail during `apt` setup or pull the wrong architecture. Either limit this image to `linux/amd64` or update the Dockerfile to handle arm64 explicitly (separate repo/logic or skipping the toolchain on non‑x86).
</issue_to_address>
### Comment 2
<location> `Dockerfile:124-132` </location>
<code_context>
+
+USER root
+
+# Install CUDA toolkit 13 for nvcc (needed for flashinfer JIT compilation)
+RUN apt update && apt install -y \
+ python3 python3-venv python3-dev \
+ curl ca-certificates build-essential \
+ libnuma1 libnuma-dev numactl ninja-build \
+ wget gnupg \
+ && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb \
+ && dpkg -i cuda-keyring_1.1-1_all.deb \
+ && apt update && apt install -y cuda-toolkit-13-0 \
+ && rm cuda-keyring_1.1-1_all.deb \
+ && rm -rf /var/lib/apt/lists/*
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Installing CUDA toolkit 13.0 on top of a CUDA 12.9 runtime image may introduce version skew and significantly increase image size.
This stage installs `cuda-toolkit-13-0` and points `PATH`/`LD_LIBRARY_PATH` to `/usr/local/cuda-13.0`, while the base image is `nvidia/cuda:12.9.0-runtime-ubuntu24.04`. That can create conflicting runtime libraries between the base image and the toolkit. If CUDA 13 is required by SGLang/flashinfer, consider switching to a CUDA 13.x runtime base image. If you only need `nvcc`, a better pattern is to use a matching `devel` image in a build stage and keep the final runtime on a single, consistent CUDA version to avoid mismatch and image bloat.
Suggested implementation:
```
# --- SGLang variant ---
FROM llamacpp AS sglang
ARG SGLANG_VERSION=0.5.6
USER root
# Install CUDA toolkit 12.9 for nvcc (matching the CUDA 12.9 runtime)
RUN apt update && apt install -y \
python3 python3-venv python3-dev \
curl ca-certificates build-essential \
libnuma1 libnuma-dev numactl ninja-build \
wget gnupg \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb \
&& dpkg -i cuda-keyring_1.1-1_all.deb \
&& apt update && apt install -y cuda-toolkit-12-9 \
&& rm cuda-keyring_1.1-1_all.deb \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/sglang-env && chown -R modelrunner:modelrunner /opt/sglang-env
USER modelrunner
# Set CUDA paths for nvcc (needed during flashinfer compilation)
ENV PATH=/usr/local/cuda-12.9/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH
# Install uv and SGLang as modelrunner user
```
If SGLang/flashinfer explicitly requires CUDA 13.x features, a more robust solution would be:
1. Introduce a dedicated build stage based on a `nvidia/cuda:13.x-devel-ubuntu24.04` image to compile flashinfer with `nvcc`.
2. Copy only the built artifacts into the final runtime image (which should stay on a single CUDA version, e.g., 12.9 or 13.x, but not both).
Adapting to that pattern will require changes elsewhere in the Dockerfile to add the build stage and adjust `COPY` instructions.
</issue_to_address>
### Comment 3
<location> `pkg/inference/backends/sglang/sglang.go:184-192` </location>
<code_context>
+ }
+
+ // Determine binary path - use Docker installation if available, otherwise use Python
+ binaryPath := s.binaryPath()
+ sandboxPath := sglangDir
+ if _, err := os.Stat(binaryPath); errors.Is(err, fs.ErrNotExist) {
+ // Use Python installation
+ if s.pythonPath == "" {
+ return fmt.Errorf("sglang: no docker binary at %s and no python runtime configured; did you forget to call Install?", binaryPath)
+ }
+ binaryPath = s.pythonPath
+ sandboxPath = ""
+ }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The SGLang runner mixes assumptions about a standalone binary with Python `-m` invocation, which may not be compatible if a real binary is ever present.
When `/opt/sglang-env/sglang` exists you treat it as the binary but still pass `-m sglang.launch_server` and other Python-specific flags. A real compiled `sglang` binary at that path likely wouldn’t accept those arguments. If the contract is “always run as a Python module”, consider removing the `binaryPath()` override and always using `pythonPath`. Otherwise, branch the argument construction so Python invocations (`python3 -m ...`) and a native `sglang` binary receive appropriate, different args.
Suggested implementation:
```golang
// Determine binary path.
// If the Docker/native sglang binary exists, treat it as a standalone executable.
// Otherwise, fall back to invoking Python as a module runner.
usePython := false
binaryPath := s.binaryPath()
sandboxPath := sglangDir
if _, err := os.Stat(binaryPath); errors.Is(err, fs.ErrNotExist) {
// Use Python installation
if s.pythonPath == "" {
return fmt.Errorf("sglang: no docker binary at %s and no python runtime configured; did you forget to call Install?", binaryPath)
}
binaryPath = s.pythonPath
sandboxPath = ""
usePython = true
}
```
To fully implement the behavior your review comment describes, you will also need to:
1. **Branch command construction based on `usePython`:**
Somewhere later in this method (or in a helper) you will have something like:
```go
cmd := exec.Command(binaryPath, args...)
```
or similar. Update it to:
```go
var cmd *exec.Cmd
if usePython {
// Python module invocation: python -m sglang.launch_server ...
pyArgs := append([]string{"-m", "sglang.launch_server"}, args...)
cmd = exec.Command(binaryPath, pyArgs...)
} else {
// Native sglang binary invocation: sglang ...
cmd = exec.Command(binaryPath, args...)
}
```
Ensure any existing `"-m", "sglang.launch_server"` arguments are removed from the generic `args` slice and only added in the `usePython` branch.
2. **Update any other places that assume `binaryPath` is always Python:**
- If you have code that always sets `PYTHONPATH`, or otherwise assumes `binaryPath` is a Python interpreter, guard that logic with `if usePython { ... }`.
- If `usePython` needs to be referenced outside this function, consider adding it as a field on the relevant struct instead of a local variable.
These changes will make the contract clear: a real `sglang` binary is treated as a standalone executable with native-style arguments, while Python-based installs are invoked via `python -m sglang.launch_server`.
</issue_to_address>
### Comment 4
<location> `pkg/inference/backends/sglang/sglang_config_test.go:39-48` </location>
<code_context>
+func TestGetArgs(t *testing.T) {
</code_context>
<issue_to_address>
**suggestion (testing):** Extend TestGetArgs to cover socket parsing errors and non-default host/port
It’d be valuable to extend the table in `TestGetArgs` with two extra cases:
1. An invalid `socket` (e.g. `"127.0.0.1"` or `"bad-socket"`) that triggers the `net.SplitHostPort(socket)` error path, with `expectError: true`.
2. A non-default host/port (e.g. `"0.0.0.0:34567"`) that asserts the resulting `--host`/`--port` flags match those values.
These would close the gaps around socket parsing and help prevent regressions in the new TCP-based scheduling behavior.
Suggested implementation:
```golang
func TestGetArgs(t *testing.T) {
tests := []struct {
name string
config *inference.BackendConfiguration
bundle *mockModelBundle
mode inference.BackendMode
expected []string
expectError bool
}{
{
name: "empty safetensors path should error",
// rest of this case as previously defined (ensuring it expects an error)
},
{
name: "invalid socket returns error",
config: &inference.BackendConfiguration{
Socket: "bad-socket",
},
bundle: &mockModelBundle{},
mode: inference.BackendModeSingle,
expectError: true,
},
{
name: "non-default host and port from socket",
config: &inference.BackendConfiguration{
Socket: "0.0.0.0:34567",
},
bundle: &mockModelBundle{},
mode: inference.BackendModeSingle,
expected: []string{
"--host", "0.0.0.0",
"--port", "34567",
},
expectError: false,
},
```
Because only a fragment of `TestGetArgs` is visible, you’ll need to align a few details with the existing code:
1. **BackendConfiguration field name**:
- Replace `Socket` with the actual field name used by `getArgs` for the scheduler’s TCP socket (e.g. `SchedulerSocket`, `ListenAddr`, etc.).
2. **BackendMode value**:
- Replace `inference.BackendModeSingle` with whichever `BackendMode` constant is appropriate in the existing tests (e.g. `inference.BackendModeServer`, `inference.BackendModeSingle`, etc.).
3. **First test case body**:
- Remove the placeholder comment in the `"empty safetensors path should error"` case and restore its original struct fields (or ensure it is constructed so that `getArgs` errors due to the empty safetensors path and sets `expectError: true`).
4. **Argument expectations order/shape**:
- Confirm that `getArgs` indeed produces `--host <host> --port <port>` in that order, and adjust the `expected` slice accordingly if the flag names or ordering differ.
5. **Imports**:
- If these tests introduce any new types or constants not already imported (unlikely given the snippet), ensure the corresponding imports are present at the top of the file.
</issue_to_address>
### Comment 5
<location> `pkg/inference/backends/sglang/sglang.go:138` </location>
<code_context>
if err := exec.Command(pythonPath, "-c", "import sglang").Run(); err != nil {
</code_context>
<issue_to_address>
**security (go.lang.security.audit.dangerous-exec-command):** Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
*Source: opengrep*
</issue_to_address>
### Comment 6
<location> `pkg/inference/backends/sglang/sglang.go:144` </location>
<code_context>
output, err := exec.Command(pythonPath, "-c", "import sglang; print(sglang.__version__)").Output()
</code_context>
<issue_to_address>
**security (go.lang.security.audit.dangerous-exec-command):** Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # Install CUDA toolkit 13 for nvcc (needed for flashinfer JIT compilation) | ||
| RUN apt update && apt install -y \ | ||
| python3 python3-venv python3-dev \ | ||
| curl ca-certificates build-essential \ | ||
| libnuma1 libnuma-dev numactl ninja-build \ | ||
| wget gnupg \ | ||
| && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb \ | ||
| && dpkg -i cuda-keyring_1.1-1_all.deb \ | ||
| && apt update && apt install -y cuda-toolkit-13-0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Installing CUDA toolkit 13.0 on top of a CUDA 12.9 runtime image may introduce version skew and significantly increase image size.
This stage installs cuda-toolkit-13-0 and points PATH/LD_LIBRARY_PATH to /usr/local/cuda-13.0, while the base image is nvidia/cuda:12.9.0-runtime-ubuntu24.04. That can create conflicting runtime libraries between the base image and the toolkit. If CUDA 13 is required by SGLang/flashinfer, consider switching to a CUDA 13.x runtime base image. If you only need nvcc, a better pattern is to use a matching devel image in a build stage and keep the final runtime on a single, consistent CUDA version to avoid mismatch and image bloat.
Suggested implementation:
# --- SGLang variant ---
FROM llamacpp AS sglang
ARG SGLANG_VERSION=0.5.6
USER root
# Install CUDA toolkit 12.9 for nvcc (matching the CUDA 12.9 runtime)
RUN apt update && apt install -y \
python3 python3-venv python3-dev \
curl ca-certificates build-essential \
libnuma1 libnuma-dev numactl ninja-build \
wget gnupg \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb \
&& dpkg -i cuda-keyring_1.1-1_all.deb \
&& apt update && apt install -y cuda-toolkit-12-9 \
&& rm cuda-keyring_1.1-1_all.deb \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/sglang-env && chown -R modelrunner:modelrunner /opt/sglang-env
USER modelrunner
# Set CUDA paths for nvcc (needed during flashinfer compilation)
ENV PATH=/usr/local/cuda-12.9/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH
# Install uv and SGLang as modelrunner user
If SGLang/flashinfer explicitly requires CUDA 13.x features, a more robust solution would be:
- Introduce a dedicated build stage based on a
nvidia/cuda:13.x-devel-ubuntu24.04image to compile flashinfer withnvcc. - Copy only the built artifacts into the final runtime image (which should stay on a single CUDA version, e.g., 12.9 or 13.x, but not both).
Adapting to that pattern will require changes elsewhere in the Dockerfile to add the build stage and adjustCOPYinstructions.
| binaryPath := s.binaryPath() | ||
| sandboxPath := sglangDir | ||
| if _, err := os.Stat(binaryPath); errors.Is(err, fs.ErrNotExist) { | ||
| // Use Python installation | ||
| if s.pythonPath == "" { | ||
| return fmt.Errorf("sglang: no docker binary at %s and no python runtime configured; did you forget to call Install?", binaryPath) | ||
| } | ||
| binaryPath = s.pythonPath | ||
| sandboxPath = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): The SGLang runner mixes assumptions about a standalone binary with Python -m invocation, which may not be compatible if a real binary is ever present.
When /opt/sglang-env/sglang exists you treat it as the binary but still pass -m sglang.launch_server and other Python-specific flags. A real compiled sglang binary at that path likely wouldn’t accept those arguments. If the contract is “always run as a Python module”, consider removing the binaryPath() override and always using pythonPath. Otherwise, branch the argument construction so Python invocations (python3 -m ...) and a native sglang binary receive appropriate, different args.
Suggested implementation:
// Determine binary path.
// If the Docker/native sglang binary exists, treat it as a standalone executable.
// Otherwise, fall back to invoking Python as a module runner.
usePython := false
binaryPath := s.binaryPath()
sandboxPath := sglangDir
if _, err := os.Stat(binaryPath); errors.Is(err, fs.ErrNotExist) {
// Use Python installation
if s.pythonPath == "" {
return fmt.Errorf("sglang: no docker binary at %s and no python runtime configured; did you forget to call Install?", binaryPath)
}
binaryPath = s.pythonPath
sandboxPath = ""
usePython = true
}To fully implement the behavior your review comment describes, you will also need to:
-
Branch command construction based on
usePython:
Somewhere later in this method (or in a helper) you will have something like:cmd := exec.Command(binaryPath, args...)
or similar. Update it to:
var cmd *exec.Cmd if usePython { // Python module invocation: python -m sglang.launch_server ... pyArgs := append([]string{"-m", "sglang.launch_server"}, args...) cmd = exec.Command(binaryPath, pyArgs...) } else { // Native sglang binary invocation: sglang ... cmd = exec.Command(binaryPath, args...) }
Ensure any existing
"-m", "sglang.launch_server"arguments are removed from the genericargsslice and only added in theusePythonbranch. -
Update any other places that assume
binaryPathis always Python:- If you have code that always sets
PYTHONPATH, or otherwise assumesbinaryPathis a Python interpreter, guard that logic withif usePython { ... }. - If
usePythonneeds to be referenced outside this function, consider adding it as a field on the relevant struct instead of a local variable.
- If you have code that always sets
These changes will make the contract clear: a real sglang binary is treated as a standalone executable with native-style arguments, while Python-based installs are invoked via python -m sglang.launch_server.
|
|
||
| s.pythonPath = pythonPath | ||
|
|
||
| if err := exec.Command(pythonPath, "-c", "import sglang").Run(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (go.lang.security.audit.dangerous-exec-command): Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
Source: opengrep
| return ErrSGLangNotFound | ||
| } | ||
|
|
||
| output, err := exec.Command(pythonPath, "-c", "import sglang; print(sglang.__version__)").Output() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (go.lang.security.audit.dangerous-exec-command): Detected non-static command inside Command. Audit the input to 'exec.Command'. If unverified user data can reach this call site, this is a code injection vulnerability. A malicious actor can inject a malicious script to execute arbitrary code.
Source: opengrep
B200/H200 GPUs are x86_64 architecture only, and we install CUDA toolkit from the x86_64 repository. Building for arm64 would fail. ARM64 CUDA support can be added by the community if needed.
6bb4142 to
dbfc078
Compare
|
@dougyster I appreciate you were on this, I'm still on the PR and I can include your commit as it with you as an author to the original PR because I'm not going to merge broken change anyway and fixed sourcery issues that I'm going to push soon |
Sounds good~ Please see the key fixes I made in the PR if it's helpful to you - mainly just upgrading SGLang to ver 0.5.6 with CUDA 13.0 for x86_64 architecture - thanks! |
|
@dougyster It really is, thank you, appreciate your help! |
Happy to help! I'll close this PR and you can cherrypick this commit as needed - let me know if you need help testing on linux nvidia hardware |
This is a fix of @denyszhak's PR of adding the SGLang backend (see #477). Previously, the SGLang version is 0.4.0 and running a chat completion would result in dependency errors. Below are the fixes implemented:
The docker image builds successfully on both linux with B200s and H200s, and I'm able to load a model and run chat completion with the following:
In one terminal, run:
make docker-run-sglang-cudaIn another:
load the model with:
then make a completion api request