-
Notifications
You must be signed in to change notification settings - Fork 610
feat(go): refactored formatters + added support for formatting streams #3905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring of the formatters to better support streaming and non-JSON formats. It introduces a new StreamingFormatHandler interface, which is a great improvement for handling streaming data parsing without in-place modifications. The changes are well-implemented across the various formatters (text, json, jsonl, array, enum). The addition of comprehensive unit tests for each formatter is also a major plus.
I have a couple of suggestions for improvement:
- For performance,
strings.Buildershould be used for accumulating text in streaming chunks across all formatters. - For consistency with the new design and other formatters, the legacy
ParseMessagemethod injsonHandlerandenumHandlershould be simplified to a no-op, as the parsing logic is now handled by the newParseOutputandParseChunkmethods.
| newParts = append(newParts, NewJSONPart(line)) | ||
| for _, part := range chunk.Content { | ||
| if part.IsText() { | ||
| a.accumulatedText += part.Text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance, it's better to use strings.Builder for accumulating text from chunks instead of string concatenation with +=. This avoids repeated memory allocations for the string.
To implement this, you could change the accumulatedText field in the arrayHandler struct to strings.Builder.
This would require a few adjustments:
- In the struct definition:
accumulatedText strings.Builder - When resetting:
a.accumulatedText.Reset()instead ofa.accumulatedText = "" - Here:
a.accumulatedText.WriteString(part.Text) - When reading:
base.ExtractItems(a.accumulatedText.String(), a.cursor)
This advice applies to the other format handlers (enumHandler, jsonHandler, jsonlHandler, textHandler) as well.
| func (j *jsonHandler) ParseMessage(m *Message) (*Message, error) { | ||
| if m == nil { | ||
| return nil, errors.New("message is empty") | ||
| } | ||
| if len(m.Content) == 0 { | ||
| return nil, errors.New("message has no content") | ||
| } | ||
|
|
||
| var nonTextParts []*Part | ||
| accumulatedText := strings.Builder{} | ||
|
|
||
| for _, part := range m.Content { | ||
| if !part.IsText() { | ||
| nonTextParts = append(nonTextParts, part) | ||
| } else { | ||
| accumulatedText.WriteString(part.Text) | ||
| } | ||
| } | ||
|
|
||
| newParts := []*Part{} | ||
| text := base.ExtractJSONFromMarkdown(accumulatedText.String()) | ||
| if text != "" { | ||
| if j.config.Schema != nil { | ||
| schemaBytes, err := json.Marshal(j.config.Schema) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("expected schema is not valid: %w", err) | ||
| } | ||
| if err = base.ValidateRaw([]byte(text), schemaBytes); err != nil { | ||
| return nil, err | ||
| } | ||
| } else { | ||
| if !base.ValidJSON(text) { | ||
| return nil, errors.New("message is not a valid JSON") | ||
| } | ||
| newParts := []*Part{} | ||
| text := base.ExtractJSONFromMarkdown(accumulatedText.String()) | ||
| if text != "" { | ||
| if j.config.Schema != nil { | ||
| schemaBytes, err := json.Marshal(j.config.Schema) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("expected schema is not valid: %w", err) | ||
| } | ||
| if err = base.ValidateRaw([]byte(text), schemaBytes); err != nil { | ||
| return nil, err | ||
| } | ||
| } else { | ||
| if !base.ValidJSON(text) { | ||
| return nil, errors.New("message is not a valid JSON") | ||
| } | ||
| newParts = append(newParts, NewJSONPart(text)) | ||
| } | ||
| newParts = append(newParts, NewJSONPart(text)) | ||
| } | ||
|
|
||
| newParts = append(newParts, nonTextParts...) | ||
| newParts = append(newParts, nonTextParts...) | ||
|
|
||
| m.Content = newParts | ||
| } | ||
| m.Content = newParts | ||
|
|
||
| return m, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ParseMessage implementation still modifies the message in-place, which seems to contradict the main goal of this refactoring. The PR description states that in-place modification was an error, and the new StreamingFormatHandler interface is meant to provide a better way to parse output.
For consistency with other updated formatters like arrayHandler and jsonlHandler, and to align with the // Legacy: ... comment in formatter.go, this method should probably be a no-op that just returns the message without modification. The actual parsing is now handled by ParseOutput and ParseChunk.
This also applies to enumHandler's ParseMessage implementation.
func (j *jsonHandler) ParseMessage(m *Message) (*Message, error) {
return m, nil
}
Originally, formatters were implemented as something that modifies the model response message in place which was different from how it was implemented in JS. This was done in error as this solution was not capable of handling non-JSON formats (e.g.
ModelResponse.Output()assumed that the content was JSON and since there was no formatter attached to the type, it was impossible to hook into the format-specific logic).In this PR, we introduce a new interface called
StreamingFormatHandlerwhich, despite the name, is the general format handler interface. It implementsParseOutput(*Message)andParseChunk(*ModelResponseChunk)which are used when the user callsModelResponse.Output(),ModelResponseChunk.Output(), or the convenience functionOutputFrom[T any]()without modifying content in place.Checklist (if applicable):