Skip to content

[bugfix] guard _repair_ms_bench against an empty messages list#9608

Open
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/repair-ms-bench-empty-messages
Open

[bugfix] guard _repair_ms_bench against an empty messages list#9608
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/repair-ms-bench-empty-messages

Conversation

@he-yufeng

Copy link
Copy Markdown

Summary

_repair_ms_bench (the repair_messages hook for the iic/ms_bench dataset) reads messages[0] right after ast.literal_eval, so a row whose messages are empty — e.g. the literal string "[]" — raises IndexError: list index out of range and aborts the whole ms_bench dataset load instead of just skipping that one row.

The function already returns None to skip rows it can't use (the MOSS case), and MessagesPreprocessor drops None rows, so returning None for an empty list is the same, consistent behaviour.

Fix

An early if not messages: return None guard right after the parse.

Test

Added tests/general/test_repair_ms_bench.py (pure unit tests, no network): empty "[]" / [] returns None, the default system message is stripped, a normal conversation passes through, and a MOSS row is skipped. The empty case raises IndexError without the fix.

Verified locally: pytest tests/general/test_repair_ms_bench.py (4 passed); flake8 / isort / yapf clean.

_repair_ms_bench reads messages[0] right after ast.literal_eval, so a row whose
messages are empty (e.g. the string "[]") raises IndexError and aborts the whole
ms_bench dataset load instead of just skipping that row. The function already
returns None to skip rows it can't use (the MOSS case), so do the same for an
empty list. Added pure unit tests for the empty, default-system, normal and
MOSS cases.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a safety check to the _repair_ms_bench function in swift/dataset/dataset/llm.py to return None when messages is empty, preventing potential index out of bounds errors. Additionally, a comprehensive suite of unit tests has been introduced in tests/general/test_repair_ms_bench.py to verify this behavior and other edge cases. There are no review comments, and the changes look solid.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant