Skip to content

Commit 9ca7d6f

Browse files
authored
feat(langchaingo): add tests using testcontainers-go (#97)
This PR refactors the chat code to be more testable, using dependency injection of the OpenAI values (baseURL, key and model) in the form of function arguments. Therefore, it's possible to call the chat with differnt values (for production, or at test-time). Once the code is more testable, this PR adds four different tests for the chat response: using string comparison: it checks if the response contains certain string. Simple, although not very reliable. using cosine similary and embeddings: it calculates the embeddings (numerical vectors) for the answer and compares them using the cosine similarity with the embeddings of a reference question. using a vector database to store the "knowledge" or reference, doing RAG to augment the prompt with the relevant docs obtained from the vector database. using an LLM-as-a-Judge: it creates an Evaluator, which is another model with a strict prompt, that evaluates the quality of the response of the chat LLM. With these four tests, users can understand the different approaches to testing when building GenAI apps.
1 parent 09bc107 commit 9ca7d6f

File tree

9 files changed

+1024
-120
lines changed

9 files changed

+1024
-120
lines changed

langchaingo/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,83 @@ docker compose up
3333

3434
No setup, API keys, or additional configuration required.
3535

36+
### Test the project
37+
38+
```sh
39+
go test -v ./...
40+
```
41+
42+
This command runs all the tests in the project, using [Testcontainers Go] to spin up the different
43+
containers needed for the tests:
44+
45+
1. [Docker Model Runner]: a socat container to forward the model runner's API to the test process.
46+
It allows to talk to the local LLM models, provided by [Docker Desktop], from the test process.
47+
2. [Docker MCP Gateway]: Docker's MCP gateway container to facilitate the access to the MCP servers and tools.
48+
It allows to talk to the MCP servers provided by [Docker Desktop], in this case DuckDuckGo, from the test process.
49+
50+
No port conflicts happen, thanks to the [Testcontainers Go] library, which automatically exposes the known ports
51+
of the containers on a random, free port in the host. Therefore, you can run the tests as many times as you want,
52+
even without stopping the Docker Compose application.
53+
54+
All containers started by [Testcontainers Go] are automatically cleaned up after the tests finish,
55+
so you don't need to worry about cleaning them up manually.
56+
57+
#### String comparison tests
58+
59+
This test is a simple test that checks if the answer is correct by comparing it to a reference answer.
60+
As you can imagine, given the non-deterministic nature of the LLM, this check is not very robust.
61+
62+
Run this test with:
63+
64+
```sh
65+
go test -v -run TestChat_stringComparison ./...
66+
```
67+
68+
#### Cosine similarity tests
69+
70+
This test is a more robust test that checks if the answer is correct by using the cosine similarity
71+
between the reference answer and the answer of the model. To calculate the cosine similarity,
72+
the test obtains the embeddings of the reference answer and the answer of the model,
73+
and then calculates the cosine similarity between them. If the result is greater than a threshold,
74+
which is defined by the team, the test is considered to be passed.
75+
76+
Run this test with:
77+
78+
```sh
79+
go test -v -run TestChat_embeddings ./...
80+
```
81+
82+
#### RAG tests
83+
84+
This test is a more robust test that checks if the answer is correct by using the RAG technique.
85+
It creates a Weaviate store to store the content that will serve as a reference, and it uses the built-in mechanisms
86+
in the Vector Database to obtain the most relevant documents to the question. Then, it includes
87+
those relevant documents in the prompt of the LLM to answer the question.
88+
89+
Run this test with:
90+
91+
```sh
92+
go test -v -run TestChat_rag ./...
93+
```
94+
95+
#### Evaluator tests
96+
97+
This test uses the concept of [LLM-as-a-judge] to evaluate the accuracy of the answer. It creates an evaluator,
98+
using another LLM, maybe with a more specialised, different model, to evaluate the accuracy of the answer.
99+
For that, it uses a strict system message and a user message that forces the LLM to return a JSON object
100+
with the following fields:
101+
102+
+ "provided_answer": the answer to the question
103+
+ "is_correct": true if the answer is correct, false otherwise
104+
+ "reasoning": the reasoning behind the answer
105+
The response should be a valid JSON object.
106+
107+
Run this test with:
108+
109+
```sh
110+
go test -v -run TestChat_usingEvaluator ./...
111+
```
112+
36113
# 🧠 Inference Options
37114

38115
By default, this project uses [Docker Model Runner] to handle LLM inference locally — no internet
@@ -105,13 +182,17 @@ flowchart TD
105182
+ [Langchaingo]
106183
+ [DuckDuckGo]
107184
+ [Docker Compose]
185+
+ [Testcontainers Go]
108186

109187
[DuckDuckGo]: https://duckduckgo.com
110188
[Langchaingo]: https://github.com/tmc/langchaingo
189+
[LLM-as-a-judge]: https://eugeneyan.com/writing/llm-evaluators/
190+
[Testcontainers Go]: https://github.com/testcontainers/testcontainers-go
111191
[Model Context Protocol's Go SDK]: https://github.com/modelcontextprotocol/go-sdk/
112192
[Docker Compose]: https://github.com/docker/compose
113193
[Docker Desktop]: https://www.docker.com/products/docker-desktop/
114194
[Docker Engine]: https://docs.docker.com/engine/
115195
[Docker Model Runner]: https://docs.docker.com/ai/model-runner/
196+
[Docker MCP Gateway]: https://docs.docker.com/ai/mcp-gateway/
116197
[Docker Model Runner requirements]: https://docs.docker.com/ai/model-runner/
117198
[Docker Offload]: https://www.docker.com/products/docker-offload/

langchaingo/chat.go

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
package main
2+
3+
import (
4+
"context"
5+
"fmt"
6+
"os"
7+
8+
"github.com/modelcontextprotocol/go-sdk/mcp"
9+
"github.com/tmc/langchaingo/agents"
10+
"github.com/tmc/langchaingo/callbacks"
11+
"github.com/tmc/langchaingo/chains"
12+
)
13+
14+
// chat is the main function that initializes the LLM, MCP tools, and runs the agent.
15+
// It receives the question and the MCP gateway URL, returning the answer from the agent.
16+
func chat(question string, mcpGatewayURL string, apiKey string, baseURL string, modelName string, agentOpts ...agents.Option) (string, error) {
17+
llm, err := initializeLLM(apiKey, baseURL, modelName)
18+
if err != nil {
19+
return "", fmt.Errorf("initialize LLM: %v", err)
20+
}
21+
22+
// Create a new client, with no features.
23+
client := mcp.NewClient(&mcp.Implementation{Name: "mcp-client", Version: "v1.0.0"}, nil)
24+
25+
toolBelt, err := initializeMCPTools(client, mcpGatewayURL)
26+
if err != nil {
27+
return "", fmt.Errorf("initialize MCP tools: %v", err)
28+
}
29+
30+
if os.Getenv("DEBUG") == "true" {
31+
agentOpts = append(agentOpts, agents.WithCallbacksHandler(callbacks.LogHandler{}))
32+
}
33+
34+
agent := agents.NewOneShotAgent(llm, toolBelt, agentOpts...)
35+
executor := agents.NewExecutor(agent)
36+
37+
answer, err := chains.Run(context.Background(), executor, question)
38+
if err != nil {
39+
return "", fmt.Errorf("chains run: %v", err)
40+
}
41+
42+
return answer, nil
43+
}

langchaingo/chat_test.go

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
package main
2+
3+
import (
4+
"context"
5+
"encoding/json"
6+
"fmt"
7+
"testing"
8+
9+
"github.com/stretchr/testify/require"
10+
"github.com/testcontainers/testcontainers-go"
11+
dmcpg "github.com/testcontainers/testcontainers-go/modules/dockermcpgateway"
12+
"github.com/testcontainers/testcontainers-go/modules/dockermodelrunner"
13+
"github.com/tmc/langchaingo/agents"
14+
"github.com/tmc/langchaingo/embeddings"
15+
"github.com/tmc/langchaingo/schema"
16+
"github.com/tmc/langchaingo/vectorstores"
17+
)
18+
19+
const (
20+
modelNamespace = "ai"
21+
modelName = "gemma3-qat"
22+
modelTag = "latest"
23+
fqModelName = modelNamespace + "/" + modelName + ":" + modelTag
24+
)
25+
26+
func TestChat_stringComparison(t *testing.T) {
27+
ctx := context.Background()
28+
29+
// Docker Model Runner container, which talks to Docker Desktop's model runner
30+
dmrCtr, err := dockermodelrunner.Run(ctx, dockermodelrunner.WithModel(fqModelName))
31+
testcontainers.CleanupContainer(t, dmrCtr)
32+
require.NoError(t, err)
33+
34+
// Docker MCP Gateway container, which talks to the MCP servers, in this case DuckDuckGo
35+
mcpgCtr, err := dmcpg.Run(
36+
ctx, "docker/mcp-gateway:latest",
37+
dmcpg.WithTools("duckduckgo", []string{"search", "fetch_content"}),
38+
)
39+
testcontainers.CleanupContainer(t, mcpgCtr)
40+
require.NoError(t, err)
41+
42+
mcpGatewayURL, err := mcpgCtr.GatewayEndpoint(ctx)
43+
require.NoError(t, err)
44+
45+
question := "Does Golang support the Model Context Protocol? Please provide some references."
46+
47+
answer, err := chat(question, mcpGatewayURL, "no-apiKey", dmrCtr.OpenAIEndpoint(), fqModelName)
48+
require.NoError(t, err)
49+
require.NotEmpty(t, answer)
50+
require.Contains(t, answer, "https://github.com/modelcontextprotocol/go-sdk")
51+
}
52+
53+
func TestChat_embeddings(t *testing.T) {
54+
embeddingModel, dmrBaseURL := buildEmbeddingsModel(t)
55+
56+
embedder, err := embeddings.NewEmbedder(embeddingModel)
57+
require.NoError(t, err)
58+
59+
reference := `Golang does have an official Go SDK for Model Context Protocol servers and clients, which is maintained in collaboration with Google.
60+
It's URL is https://github.com/modelcontextprotocol/go-sdk`
61+
62+
// calculate the embeddings for the reference answer
63+
referenceEmbeddings, err := embedder.EmbedDocuments(context.Background(), []string{reference})
64+
require.NoError(t, err)
65+
66+
ctx := context.Background()
67+
68+
// Docker MCP Gateway container, which talks to the MCP servers, in this case DuckDuckGo
69+
mcpgCtr, err := dmcpg.Run(
70+
ctx, "docker/mcp-gateway:latest",
71+
dmcpg.WithTools("duckduckgo", []string{"search", "fetch_content"}),
72+
)
73+
testcontainers.CleanupContainer(t, mcpgCtr)
74+
require.NoError(t, err)
75+
76+
mcpGatewayURL, err := mcpgCtr.GatewayEndpoint(ctx)
77+
require.NoError(t, err)
78+
79+
question := "Does Golang support the Model Context Protocol? Please provide some references."
80+
answer, err := chat(question, mcpGatewayURL, "no-apiKey", dmrBaseURL, fqModelName)
81+
require.NoError(t, err)
82+
require.NotEmpty(t, answer)
83+
84+
t.Logf("answer: %s", answer)
85+
86+
// calculate the embeddings for the answer of the model
87+
answerEmbeddings, err := embedder.EmbedDocuments(context.Background(), []string{answer})
88+
require.NoError(t, err)
89+
90+
// calculate the cosine similarity between the reference and the answer
91+
cosineSimilarity := cosineSimilarity(t, referenceEmbeddings[0], answerEmbeddings[0])
92+
t.Logf("cosine similarity: %f", cosineSimilarity)
93+
94+
// Define a threshold for the cosine similarity: this is a team decision to accept or reject the answer
95+
// within the given threshold.
96+
require.Greater(t, cosineSimilarity, float32(0.8))
97+
}
98+
99+
func TestChat_rag(t *testing.T) {
100+
const question = "Does Golang support the Model Context Protocol? Please provide some references."
101+
102+
embeddingModel, dmrBaseURL := buildEmbeddingsModel(t)
103+
104+
embedder, err := embeddings.NewEmbedder(embeddingModel)
105+
require.NoError(t, err)
106+
107+
reference := `Golang does have an official Go SDK for Model Context Protocol servers and clients, which is maintained in collaboration with Google.
108+
It's URL is https://github.com/modelcontextprotocol/go-sdk`
109+
110+
// create a new Weaviate store to store the reference answer
111+
store, err := NewStore(t, embedder)
112+
require.NoError(t, err)
113+
114+
_, err = store.AddDocuments(context.Background(), []schema.Document{
115+
{
116+
PageContent: reference,
117+
},
118+
})
119+
require.NoError(t, err)
120+
121+
optionsVector := []vectorstores.Option{
122+
vectorstores.WithScoreThreshold(0.80), // use for precision, when you want to get only the most relevant documents
123+
vectorstores.WithEmbedder(embedder), // use when you want add documents or doing similarity search
124+
}
125+
126+
relevantDocs, err := store.SimilaritySearch(context.Background(), question, 1, optionsVector...)
127+
require.NoError(t, err)
128+
require.NotEmpty(t, relevantDocs)
129+
130+
ctx := context.Background()
131+
132+
// Docker MCP Gateway container, which talks to the MCP servers, in this case DuckDuckGo
133+
mcpgCtr, err := dmcpg.Run(
134+
ctx, "docker/mcp-gateway:latest",
135+
dmcpg.WithTools("duckduckgo", []string{"search", "fetch_content"}),
136+
)
137+
testcontainers.CleanupContainer(t, mcpgCtr)
138+
require.NoError(t, err)
139+
140+
mcpGatewayURL, err := mcpgCtr.GatewayEndpoint(ctx)
141+
require.NoError(t, err)
142+
143+
answer, err := chat(
144+
question,
145+
mcpGatewayURL,
146+
"no-apiKey",
147+
dmrBaseURL,
148+
fqModelName,
149+
agents.WithPromptSuffix(fmt.Sprintf("Use the following relevant documents to answer the question: %s", relevantDocs[0].PageContent)),
150+
)
151+
require.NoError(t, err)
152+
require.NotEmpty(t, answer)
153+
154+
t.Logf("answer: %s", answer)
155+
}
156+
157+
func TestChat_usingEvaluator(t *testing.T) {
158+
ctx := context.Background()
159+
160+
// Docker Model Runner container, which talks to Docker Desktop's model runner
161+
dmrCtr, err := dockermodelrunner.Run(ctx, dockermodelrunner.WithModel(fqModelName))
162+
testcontainers.CleanupContainer(t, dmrCtr)
163+
require.NoError(t, err)
164+
165+
// Docker MCP Gateway container, which talks to the MCP servers, in this case DuckDuckGo
166+
mcpgCtr, err := dmcpg.Run(
167+
ctx, "docker/mcp-gateway:latest",
168+
dmcpg.WithTools("duckduckgo", []string{"search", "fetch_content"}),
169+
)
170+
testcontainers.CleanupContainer(t, mcpgCtr)
171+
require.NoError(t, err)
172+
173+
mcpGatewayURL, err := mcpgCtr.GatewayEndpoint(ctx)
174+
require.NoError(t, err)
175+
176+
question := "Does Golang support the Model Context Protocol? Please provide some references."
177+
178+
answer, err := chat(question, mcpGatewayURL, "no-apiKey", dmrCtr.OpenAIEndpoint(), fqModelName)
179+
require.NoError(t, err)
180+
require.NotEmpty(t, answer)
181+
182+
t.Logf("answer: %s", answer)
183+
184+
// cross the answer with the evaluator
185+
reference := `There is an official Go SDK for Model Context Protocol servers and clients, which is maintained in collaboration with Google.
186+
It's URL is https://github.com/modelcontextprotocol/go-sdk`
187+
188+
evaluator := NewEvaluator(question, fqModelName, "no-apiKey", dmrCtr.OpenAIEndpoint())
189+
evaluation, err := evaluator.Evaluate(ctx, question, answer, reference)
190+
require.NoError(t, err)
191+
t.Logf("evaluation: %#v", evaluation)
192+
193+
type evalResponse struct {
194+
ProvidedAnswer string `json:"provided_answer"`
195+
IsCorrect bool `json:"is_correct"`
196+
Reasoning string `json:"reasoning"`
197+
}
198+
199+
var eval evalResponse
200+
err = json.Unmarshal([]byte(evaluation), &eval)
201+
require.NoError(t, err)
202+
203+
t.Logf("evaluation: %#v", eval)
204+
require.True(t, eval.IsCorrect)
205+
}

0 commit comments

Comments
 (0)