Skip to content

[scheduler] Fix race condition in multi-scheduler environments.#2191

Merged
DiegoTavares merged 5 commits intoAcademySoftwareFoundation:masterfrom
DiegoTavares:fix_host_memory_sched
Mar 9, 2026
Merged

[scheduler] Fix race condition in multi-scheduler environments.#2191
DiegoTavares merged 5 commits intoAcademySoftwareFoundation:masterfrom
DiegoTavares:fix_host_memory_sched

Conversation

@DiegoTavares
Copy link
Copy Markdown
Collaborator

@DiegoTavares DiegoTavares commented Mar 4, 2026

When multiple instances or async tasks are updating the same host, there was a race condition that would cause over booking errors instead of gracefully treating the condition that caused the race.

This PR is intended to reduce the number database errors like the following:

WARN cue_scheduler::pipeline::dispatcher::actor: Wasn't able to dispatch all frames: FailedToUpdateResources( × (57b07e8d-6813-4fc5-8f71-132e968f4e29) Failed to update host resources 
├─▶ error returned from database: unable to allocate additional memory 
╰─▶ unable to allocate additional memory
)

AI Disclosure

Claude Code Opus was used to brainstorm the root cause of this problem and also to review the changes before pushing the PR.

When multiple instances or async tasks are updating the same host, there was a race condition that
would cause over booking errors instead of gracefully treating the condition that caused the race.

This PR is intended to reduce the number of the database error:

WARN cue_scheduler::pipeline::dispatcher::actor: Wasn't able to dispatch all frames:
FailedToUpdateResources( × (57b07e8d-6813-4fc5-8f71-132e968f4e29) Failed to update host resources
├─▶ error returned from database: unable to allocate additional memory
╰─▶ unable to allocate additional memory
)
@DiegoTavares DiegoTavares marked this pull request as ready for review March 4, 2026 02:34
These messages are being handled as warnings when they are not really abnormal.
@ramonfigueiredo
Copy link
Copy Markdown
Collaborator

Hi @DiegoTavares,

Thanks for your contribution. I’ve added a few comments with suggestions that could help improve the PR.

@ramonfigueiredo ramonfigueiredo self-requested a review March 9, 2026 22:36
Copy link
Copy Markdown
Collaborator

@ramonfigueiredo ramonfigueiredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Thanks, @DiegoTavares

@DiegoTavares DiegoTavares merged commit f6ecd97 into AcademySoftwareFoundation:master Mar 9, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants