[WIP] Fix OnlineDPO vLLM server completion handling by JohnGiorgi · Pull Request #5516 · huggingface/trl

JohnGiorgi · 2026-04-10T20:24:56Z

What does this PR do?

This removes an extra flattening step in OnlineDPOTrainer._generate_vllm_server() when using vLLM server mode.

trl/scripts/vllm_serve.py already returns completion_ids as one list[int] per completion. OnlineDPOTrainer was flattening that result a second time, which turned multi-token completions into one-token completions before decode.

This PR removes that second flatten and adds a regression test covering the server return shape.

Validation:

make precommit
PYTHONPATH=/tmp/trl-upstream-fix /mnt/home/john/elms-ai/.venv/bin/python -m pytest tests/experimental/test_online_dpo_trainer.py -k preserves_completion_lists

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
OnlineDPOTrainer._generate_vllm_server() flattens vllm-serve completion_ids twice #5514
Did you make sure to update the documentation with your changes?
No documentation changes are needed for this internal bug fix.
Did you write any new necessary tests?

AI writing disclosure

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone familiar with OnlineDPOTrainer server-mode generation or the trl vllm-serve response shape.

JohnGiorgi changed the title ~~Fix OnlineDPO vLLM server completion handling~~ [WIP] Fix OnlineDPO vLLM server completion handling Apr 10, 2026

JohnGiorgi force-pushed the fix-online-dpo-vllm-server-completion-shape branch from fcf5d00 to a2ad425 Compare April 10, 2026 20:27

Fix OnlineDPO vLLM server completion handling

be038f1

JohnGiorgi force-pushed the fix-online-dpo-vllm-server-completion-shape branch from a2ad425 to be038f1 Compare April 13, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix OnlineDPO vLLM server completion handling#5516

[WIP] Fix OnlineDPO vLLM server completion handling#5516
JohnGiorgi wants to merge 1 commit intohuggingface:mainfrom
JohnGiorgi:fix-online-dpo-vllm-server-completion-shape

JohnGiorgi commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohnGiorgi commented Apr 10, 2026

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant