fix: LSP tool startup reliability, shutdown hangs, and refactor into focused files#2120
Conversation
There was a problem hiding this comment.
Review Summary
Assessment: 🟡 NEEDS ATTENTION
This PR successfully addresses the three critical issues described (server restart, shutdown hangs, and stderr data race). The refactoring into focused files improves code organization and the core fixes are well-designed.
However, the new lsp_files.go file introduces 3 medium-severity issues in the text edit handling logic:
- Missing validation for malformed edit ranges (startLine > endLine)
- TOCTOU race condition in file edit application
- Silent error handling that hides LSP protocol violations
These issues could lead to file corruption when applying LSP edits. While the core concurrency fixes are solid, the file editing logic needs additional validation and error handling.
Findings
3 issues found (2 CONFIRMED, 1 LIKELY)
- 0 high severity
- 3 medium severity
- 0 low severity
| newLines := strings.Split(newText, "\n") | ||
|
|
||
| result := make([]string, 0, len(lines)-(endLine-startLine)+len(newLines)-1) | ||
| result = append(result, lines[:startLine]...) |
There was a problem hiding this comment.
Missing validation: startLine > endLine could corrupt file
The applyTextEdit function does not validate that startLine <= endLine. If the LSP server sends malformed edits with startLine > endLine, line 207 creates lines[:startLine] followed by lines[endLine+1:], which skips the region between endLine and startLine, corrupting the file.
Recommendation: Add validation before processing the edit:
if startLine > endLine || (startLine == endLine && startChar > endChar) {
return nil, fmt.Errorf("invalid edit range: start (%d:%d) > end (%d:%d)", startLine, startChar, endLine, endChar)
}| } | ||
| originalMode := info.Mode() | ||
|
|
||
| content, err := os.ReadFile(filePath) |
There was a problem hiding this comment.
TOCTOU race: file may change between LSP edit generation and application
applyTextEditsToFile reads the file at line 148 and writes back at line 168 without locking or versioning. If another process modifies the file between these operations, edits will be applied to the wrong base content, potentially corrupting the file.
Recommendation: Consider one of these approaches:
- Use file locking (e.g.,
flockon Unix) to prevent concurrent modifications - Implement document versioning to detect if the file changed
- Document this as a known limitation if single-user access is assumed
| endLine := edit.Range.End.Line | ||
| endChar := edit.Range.End.Character | ||
|
|
||
| if startLine >= len(lines) { |
There was a problem hiding this comment.
Silent clamping of out-of-bounds edits may hide bugs
Lines 183-189 silently clamp out-of-bounds edit ranges instead of returning an error. If endLine is far beyond the file end, it's clamped to len(lines)-1, causing the edit to be applied to an unintended location. The LSP protocol requires edits to be within document bounds; out-of-bounds edits indicate a server bug or protocol violation.
Recommendation: Return an error instead of silently clamping:
if endLine >= len(lines) {
return nil, fmt.Errorf("edit range out of bounds: endLine %d >= file length %d", endLine, len(lines))
}
🎯 Comprehensive LSP Tools Testing ResultsWe conducted extensive testing of all 13 LSP tools across Go, Python, and TypeScript/JavaScript language servers. Here are our findings and configuration: 📊 Success Rates by Language
🔧 Tool-by-Tool Compatibility Matrix
🌟 Key DiscoveriesGo (gopls) - Most Capable
TypeScript/JS (typescript-language-server) - Excellent Developer Experience
Python (pylsp) - Reliable Basics
🚨 Common Issues Found
⚙️ Complete Multi-Language Agent ConfigurationHere's our full working configuration from Click to expand full configuration#!/usr/bin/env docker agent run
# Multi-language code development with LSP intelligence.
#
# An orchestrator (planner) has read-only LSP access to Go, TypeScript, and
# Python so it can understand the entire codebase before delegating work.
# Language-specialist developer agents have full LSP + filesystem + shell
# access to implement changes.
models:
default:
provider: anthropic
model: claude-sonnet-4-0
max_tokens: 64000
agents:
root:
model: default
description: Technical lead that reads the codebase across all languages, plans tasks, and delegates to language-specialist developers.
instruction: |
You are a technical lead and planner. You have read-only access to the
entire codebase via LSP tools for Go, TypeScript, and Python.
## Your Role
You PLAN and DELEGATE. You never write code yourself.
## Workflow
1. **Understand the request** — use LSP and filesystem tools to explore
the codebase. Start with lsp_workspace on each language server to
discover capabilities.
2. **Analyse** — use lsp_workspace_symbols, lsp_references,
lsp_call_hierarchy, lsp_diagnostics, etc. to build a complete picture
of what needs to change and where.
3. **Plan** — use the think tool to reason about the approach, break the
work into discrete tasks per language, and identify dependencies
between them.
4. **Delegate** — dispatch tasks to the right developer agent:
- **go-developer**: anything touching .go files
- **ts-developer**: anything touching .ts/.tsx/.js/.jsx files
- **py-developer**: anything touching .py files
Run independent tasks in parallel with background_agents.
5. **Verify & synthesise** — review the developers' results, run
additional read-only checks if needed, and present a summary.
sub_agents: [go-developer, ts-developer, py-developer]
toolsets:
- type: think
- type: background_agents
- type: filesystem
tools: [read_file, read_multiple_files, list_directory, directory_tree, search_files_content]
# Go LSP — read-only tools only
- type: lsp
command: gopls
version: "golang/tools@v0.21.0"
file_types: [".go"]
tools: [lsp_workspace, lsp_hover, lsp_definition, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_code_actions, lsp_call_hierarchy, lsp_type_hierarchy, lsp_implementations, lsp_signature_help, lsp_inlay_hints]
# TypeScript/JavaScript LSP — read-only tools only
- type: lsp
command: typescript-language-server
args: ["--stdio"]
file_types: [".ts", ".tsx", ".js", ".jsx"]
tools: [lsp_workspace, lsp_hover, lsp_definition, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_code_actions, lsp_call_hierarchy, lsp_type_hierarchy, lsp_implementations, lsp_signature_help, lsp_inlay_hints]
# Python LSP — read-only tools only
- type: lsp
command: pylsp
file_types: [".py"]
tools: [lsp_workspace, lsp_hover, lsp_definition, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_code_actions, lsp_call_hierarchy, lsp_type_hierarchy, lsp_implementations, lsp_signature_help, lsp_inlay_hints]
go-developer:
model: default
description: Expert Go developer with full LSP code intelligence, filesystem, and shell access.
toolsets:
- type: lsp
command: gopls
version: "golang/tools@v0.21.0"
file_types: [".go"]
- type: filesystem
- type: shell
- type: todo
ts-developer:
model: default
description: Expert TypeScript/JavaScript developer with full LSP code intelligence, filesystem, and shell access.
toolsets:
- type: lsp
command: typescript-language-server
args: ["--stdio"]
file_types: [".ts", ".tsx", ".js", ".jsx"]
- type: filesystem
- type: shell
- type: todo
py-developer:
model: default
description: Expert Python developer with full LSP code intelligence, filesystem, and shell access.
toolsets:
- type: lsp
command: pylsp
file_types: [".py"]
- type: filesystem
- type: shell
- type: todo🎯 RecommendationsFor Go Development
For TypeScript/JavaScript Development
For Python Development
📋 Testing MethodologyOur testing was conducted by three specialized developer agents, each systematically testing all 13 LSP tools:
This provides a definitive reference for LSP tool capabilities across the three major language ecosystems in Docker Agent! 🚀 |
Bug: Self-Deadlock in
|
…focused files
1. Server won't restart after a crash. startLocked() checked h.cmd != nil
to decide if the server was running, but h.cmd stays non-nil after the
process dies. A crashed server could never be restarted — every subsequent
call got "LSP server already running". Fixed by adding a processDone
channel that tracks actual process liveness.
2. stopLocked() hangs forever. It called sendRequestLocked("shutdown")
synchronously, which blocks on reading stdout. If the server was
unresponsive or already dead, this blocked indefinitely — and so did
cmd.Wait() after it. Fixed by running the shutdown handshake in a
goroutine with a 5-second timeout, then killing the process via context
cancellation.
3. Data race on the stderr buffer. os/exec writes to a bytes.Buffer from
one goroutine while readNotifications reads and resets it from another.
bytes.Buffer is not thread-safe. Fixed by introducing a mutex-protected
lockedBuffer.
Assisted-By: docker-agent
format() holds h.mu and then calls NotifyFileChange(), which tries to acquire the same non-reentrant mutex, causing a permanent deadlock. Extract notifyFileChangeLocked() that assumes h.mu is already held, and call it from format(). NotifyFileChange() now acquires the lock itself before delegating to the new method. Assisted-By: docker-agent
abe252c to
59126fa
Compare
Bug Reports from Multi-Agent LSP Testing (jdtls + typescript-language-server)We tested all 15 LSP tools across 4 agents (each with different LSP configs) in a multi-agent YAML setup with jdtls (Java) and typescript-language-server (TypeScript). Found 2 bugs — both present on this PR branch. Testing Setup# 6 agents, 4 with LSP:
root: jdtls (-data data/jdtls-cfp-cr) + typescript-language-server --stdio
spring-boot-engineer: jdtls (-data data/jdtls-cfp-sbe)
angular-engineer: typescript-language-server --stdio
code-reviewer: jdtls (-data data/jdtls-cfp-cr) + typescript-language-server --stdio
test-automator: jdtls (-data data/jdtls-cfp-cr) + typescript-language-server --stdioResults: Every agent gets Bug 1 (P1):
|
| Agent | Java (jdtls) | TypeScript (tsserver) |
|---|---|---|
| root (coordinator) | ❌ Bug 1 | ❌ Bug 1 |
| spring-boot-engineer | ❌ Bug 1 | N/A |
| angular-engineer | N/A | ✅ 9/13 tools (Bug 2 affects lsp_code_actions) |
| code-reviewer | ❌ Bug 1 | ❌ Bug 1 |
| test-automator | ❌ Bug 1 | ❌ Bug 1 |
Tools verified working on angular-engineer (TypeScript): lsp_workspace, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_hover, lsp_definition, lsp_references, lsp_call_hierarchy, lsp_implementations, lsp_signature_help.
Tools broken: lsp_code_actions (Bug 2), lsp_type_hierarchy (tsserver doesn't implement textDocument/prepareTypeHierarchy).
Server won't restart after a crash. startLocked() checked h.cmd != nil to decide if the server was running, but h.cmd stays non-nil after the process dies. A crashed server could never be restarted — every subsequent call got "LSP server already running". Fixed by adding a processDone channel that tracks actual process liveness.
stopLocked() hangs forever. It called sendRequestLocked("shutdown") synchronously, which blocks on reading stdout. If the server was unresponsive or already dead, this blocked indefinitely — and so did cmd.Wait() after it. Fixed by running the shutdown handshake in a goroutine with a 5-second timeout, then killing the process via context cancellation.
Data race on the stderr buffer. os/exec writes to a bytes.Buffer from one goroutine while readNotifications reads and resets it from another. bytes.Buffer is not thread-safe. Fixed by introducing a mutex-protected lockedBuffer.
Assisted-By: docker-agent