fix: LSP tool startup reliability, shutdown hangs, and refactor into focused files#2120
Conversation
There was a problem hiding this comment.
Review Summary
Assessment: 🟡 NEEDS ATTENTION
This PR successfully addresses the three critical issues described (server restart, shutdown hangs, and stderr data race). The refactoring into focused files improves code organization and the core fixes are well-designed.
However, the new lsp_files.go file introduces 3 medium-severity issues in the text edit handling logic:
- Missing validation for malformed edit ranges (startLine > endLine)
- TOCTOU race condition in file edit application
- Silent error handling that hides LSP protocol violations
These issues could lead to file corruption when applying LSP edits. While the core concurrency fixes are solid, the file editing logic needs additional validation and error handling.
Findings
3 issues found (2 CONFIRMED, 1 LIKELY)
- 0 high severity
- 3 medium severity
- 0 low severity
🎯 Comprehensive LSP Tools Testing ResultsWe conducted extensive testing of all 13 LSP tools across Go, Python, and TypeScript/JavaScript language servers. Here are our findings and configuration: 📊 Success Rates by Language
🔧 Tool-by-Tool Compatibility Matrix
🌟 Key DiscoveriesGo (gopls) - Most Capable
TypeScript/JS (typescript-language-server) - Excellent Developer Experience
Python (pylsp) - Reliable Basics
🚨 Common Issues Found
⚙️ Complete Multi-Language Agent ConfigurationHere's our full working configuration from Click to expand full configuration#!/usr/bin/env docker agent run
# Multi-language code development with LSP intelligence.
#
# An orchestrator (planner) has read-only LSP access to Go, TypeScript, and
# Python so it can understand the entire codebase before delegating work.
# Language-specialist developer agents have full LSP + filesystem + shell
# access to implement changes.
models:
default:
provider: anthropic
model: claude-sonnet-4-0
max_tokens: 64000
agents:
root:
model: default
description: Technical lead that reads the codebase across all languages, plans tasks, and delegates to language-specialist developers.
instruction: |
You are a technical lead and planner. You have read-only access to the
entire codebase via LSP tools for Go, TypeScript, and Python.
## Your Role
You PLAN and DELEGATE. You never write code yourself.
## Workflow
1. **Understand the request** — use LSP and filesystem tools to explore
the codebase. Start with lsp_workspace on each language server to
discover capabilities.
2. **Analyse** — use lsp_workspace_symbols, lsp_references,
lsp_call_hierarchy, lsp_diagnostics, etc. to build a complete picture
of what needs to change and where.
3. **Plan** — use the think tool to reason about the approach, break the
work into discrete tasks per language, and identify dependencies
between them.
4. **Delegate** — dispatch tasks to the right developer agent:
- **go-developer**: anything touching .go files
- **ts-developer**: anything touching .ts/.tsx/.js/.jsx files
- **py-developer**: anything touching .py files
Run independent tasks in parallel with background_agents.
5. **Verify & synthesise** — review the developers' results, run
additional read-only checks if needed, and present a summary.
sub_agents: [go-developer, ts-developer, py-developer]
toolsets:
- type: think
- type: background_agents
- type: filesystem
tools: [read_file, read_multiple_files, list_directory, directory_tree, search_files_content]
# Go LSP — read-only tools only
- type: lsp
command: gopls
version: "golang/tools@v0.21.0"
file_types: [".go"]
tools: [lsp_workspace, lsp_hover, lsp_definition, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_code_actions, lsp_call_hierarchy, lsp_type_hierarchy, lsp_implementations, lsp_signature_help, lsp_inlay_hints]
# TypeScript/JavaScript LSP — read-only tools only
- type: lsp
command: typescript-language-server
args: ["--stdio"]
file_types: [".ts", ".tsx", ".js", ".jsx"]
tools: [lsp_workspace, lsp_hover, lsp_definition, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_code_actions, lsp_call_hierarchy, lsp_type_hierarchy, lsp_implementations, lsp_signature_help, lsp_inlay_hints]
# Python LSP — read-only tools only
- type: lsp
command: pylsp
file_types: [".py"]
tools: [lsp_workspace, lsp_hover, lsp_definition, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_code_actions, lsp_call_hierarchy, lsp_type_hierarchy, lsp_implementations, lsp_signature_help, lsp_inlay_hints]
go-developer:
model: default
description: Expert Go developer with full LSP code intelligence, filesystem, and shell access.
toolsets:
- type: lsp
command: gopls
version: "golang/tools@v0.21.0"
file_types: [".go"]
- type: filesystem
- type: shell
- type: todo
ts-developer:
model: default
description: Expert TypeScript/JavaScript developer with full LSP code intelligence, filesystem, and shell access.
toolsets:
- type: lsp
command: typescript-language-server
args: ["--stdio"]
file_types: [".ts", ".tsx", ".js", ".jsx"]
- type: filesystem
- type: shell
- type: todo
py-developer:
model: default
description: Expert Python developer with full LSP code intelligence, filesystem, and shell access.
toolsets:
- type: lsp
command: pylsp
file_types: [".py"]
- type: filesystem
- type: shell
- type: todo🎯 RecommendationsFor Go Development
For TypeScript/JavaScript Development
For Python Development
📋 Testing MethodologyOur testing was conducted by three specialized developer agents, each systematically testing all 13 LSP tools:
This provides a definitive reference for LSP tool capabilities across the three major language ecosystems in Docker Agent! 🚀 |
Bug: Self-Deadlock in
|
abe252c to
59126fa
Compare
Bug Reports from Multi-Agent LSP Testing (jdtls + typescript-language-server)We tested all 15 LSP tools across 4 agents (each with different LSP configs) in a multi-agent YAML setup with jdtls (Java) and typescript-language-server (TypeScript). Found 2 bugs — both present on this PR branch. Testing Setup# 6 agents, 4 with LSP:
root: jdtls (-data data/jdtls-cfp-cr) + typescript-language-server --stdio
spring-boot-engineer: jdtls (-data data/jdtls-cfp-sbe)
angular-engineer: typescript-language-server --stdio
code-reviewer: jdtls (-data data/jdtls-cfp-cr) + typescript-language-server --stdio
test-automator: jdtls (-data data/jdtls-cfp-cr) + typescript-language-server --stdioResults: Every agent gets Bug 1 (P1):
|
| Agent | Java (jdtls) | TypeScript (tsserver) |
|---|---|---|
| root (coordinator) | ❌ Bug 1 | ❌ Bug 1 |
| spring-boot-engineer | ❌ Bug 1 | N/A |
| angular-engineer | N/A | ✅ 9/13 tools (Bug 2 affects lsp_code_actions) |
| code-reviewer | ❌ Bug 1 | ❌ Bug 1 |
| test-automator | ❌ Bug 1 | ❌ Bug 1 |
Tools verified working on angular-engineer (TypeScript): lsp_workspace, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_hover, lsp_definition, lsp_references, lsp_call_hierarchy, lsp_implementations, lsp_signature_help.
Tools broken: lsp_code_actions (Bug 2), lsp_type_hierarchy (tsserver doesn't implement textDocument/prepareTypeHierarchy).
… and simplify code - Fix self-deadlock in format(): NotifyFileChange acquires h.mu but format() already holds it. Extract notifyFileChangeLocked() and use it from both format() and applyWorkspaceEdit() - Fix codeActions() nil diagnostics slice producing JSON null instead of [] - Fix applyWorkspaceEdit() not notifying LSP server of file changes after rename, causing stale diagnostics/hover results - Simplify tool registration by replacing lspToolDef struct with lspTool() factory function - Deduplicate definition/implementations into shared locationRequest() helper - Extract initializeLocked() from ensureInitialized() for readability Assisted-By: docker-agent
59126fa to
46d222d
Compare
Code ReviewCI: ✅ All checks pass
|
| Area | Type | Verdict |
|---|---|---|
format() self-deadlock |
Bug fix (blocking) | ✅ Correct |
applyWorkspaceEdit() stale LSP state after rename |
Bug fix | ✅ Correct |
codeActions() sending JSON null instead of [] |
Protocol fix | ✅ Correct |
lspToolDef struct → lspTool() factory |
Refactor | ✅ Clean |
definition/implementations → locationRequest() |
Refactor (DRY) | ✅ Clean + bonus fix |
initializeLocked() extraction |
Refactor | ✅ Cleaner SRP |
Key Technical Findings
Fix 1 — format() Deadlock (Blocking bug):
format() held h.mu and then called NotifyFileChange, which also acquires h.mu → guaranteed deadlock on every format() call. Fixed correctly by introducing notifyFileChangeLocked(). Lock ordering (h.mu → h.openFilesMu) verified consistent throughout.
Fix 2 — applyWorkspaceEdit() stale state:
After a file rename, the LSP server was never notified of the content change. Fixed correctly using notifyFileChangeLocked() (caller already holds h.mu) gated on isFileOpen(). Thread-safety confirmed — no TOCTOU possible since stopLocked() also requires h.mu.
Fix 3 — codeActions() null diagnostics:
var []lspDiagnostic serializes as JSON null; changed to make([]lspDiagnostic, 0) → []. Correct per LSP spec.
🔴 Three Pre-Existing Issues Still Unaddressed
These were described in the PR title but are not present in the diff — they remain open for a follow-up PR:
- Server can't restart after crash —
h.cmd != nilcheck inensureInitialized()can't detect a dead process stopLocked()hangs forever — synchronoussendRequestLocked("shutdown")with no timeout- Data race on
stderrBuf—bytes.Bufferused concurrently byos/execandreadNotifications
Test Coverage
❌ No tests added for any of the three bug fixes. Missing regression tests for:
- The
format()deadlock didChangenotifications after rename- JSON
[]vsnullincodeActions()payload
Recommendation
✅ Approve the code — the actual changes fix real, significant bugs correctly.
Suggested before merge:
- Fix the PR title and description to match what's actually in the diff (the commit message "fix: LSP format deadlock, null diagnostics, stale state after rename, and simplify code" is accurate)
- Add regression tests for the three bug fixes (especially the
format()deadlock)
Follow-up PR needed for the three pre-existing issues the title promised but this diff doesn't deliver.
|
@dgageot using https://github.com/jdubois/jj-language-server for Java and https://www.npmjs.com/package/typescript-language-server for Typescript I am able to do complex settings with LSP configured for several languages in different agents About the 3 issues reported in the previous comment ( Three Pre-Existing Issues Still Unaddressed) I am not sure they have real impacts if confirmed However it could be great to add some tests to avoid regressions ... 📊 Complete LSP Tool Test ReportTest Setup
Cross-Agent × Cross-Language Results Matrix
Score Summary
Agent ConsistencyResults were consistent across all agents — the same tool at the same position returns the same result whether called from the coordinator, spring-boot-engineer, angular-engineer, or code-reviewer. No agent-specific differences detected. Key TakeawaysJava (
|
| Reliable ✅ | Unreliable ❌ |
|---|---|
lsp_workspace, lsp_references, lsp_diagnostics, lsp_code_actions, lsp_type_hierarchy, lsp_format |
lsp_definition, lsp_implementations, lsp_call_hierarchy, lsp_signature_help, lsp_inlay_hints |
Root cause:
jj-language-servercannot resolve symbols across files. Any tool requiring cross-file navigation fails. Usesearch_files_contentorgrepas fallback.
TypeScript (tsserver) — Strong
| Reliable ✅ | Unreliable ❌ |
|---|---|
lsp_workspace, lsp_hover, lsp_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_format, lsp_call_hierarchy, lsp_signature_help, lsp_implementations, lsp_inlay_hints |
lsp_type_hierarchy (unsupported), lsp_code_actions (crash bug in 5.5.4) |
Root cause:
lsp_type_hierarchyis unimplemented intypescript-language-server.lsp_code_actionscrashes due to a ts-server 5.5.4 assertion bug in the refactoring path.
Recommendations
- For Java: Prefer
lsp_hover+lsp_references+lsp_diagnostics+lsp_code_actionsfor in-file analysis. Usesearch_files_contentfor cross-file navigation. - For TypeScript: Use the full LSP toolset freely (except
type_hierarchyandcode_actions).lsp_call_hierarchyis especially powerful for understanding component dependencies. - For code-reviewer: Stick to read-only tools; the Java LSP still provides good diagnostics and code actions listing for review work.
Stack trace analysis of the shutdown hangI reproduced the hang while ending session Blocked goroutine chainGoroutine 1 (main) is stuck in Why
|
Server won't restart after a crash. startLocked() checked h.cmd != nil to decide if the server was running, but h.cmd stays non-nil after the process dies. A crashed server could never be restarted — every subsequent call got "LSP server already running". Fixed by adding a processDone channel that tracks actual process liveness.
stopLocked() hangs forever. It called sendRequestLocked("shutdown") synchronously, which blocks on reading stdout. If the server was unresponsive or already dead, this blocked indefinitely — and so did cmd.Wait() after it. Fixed by running the shutdown handshake in a goroutine with a 5-second timeout, then killing the process via context cancellation.
Data race on the stderr buffer. os/exec writes to a bytes.Buffer from one goroutine while readNotifications reads and resets it from another. bytes.Buffer is not thread-safe. Fixed by introducing a mutex-protected lockedBuffer.
Assisted-By: docker-agent