Does a Huge Number of Pending Workflows Block the Controller? #15087
Unanswered
cdangelo-kline
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Dear all,
we are working on a POC to understand if Argo Workflows(3.7.4 + argo events 1.9.0) can be useful for our purposes.
We need to process computational workflows that receive input via Kafka. To allow greater parallelization, we have split the processing into chunks.
A test with current data results in the creation of more than 12,500 chunks. Each chunk is a Kafka message.
We have used Argo Events to define the source and the sensor. The sensor executes a workflow template, passing the Kafka message as input. The template allows a maximum of 10 concurrent workflows.
What happens is that initially Argo starts well, instantiating 10 workflows and continuing to instantiate new ones as they finish. After a few minutes, Argo becomes extremely slow, showing workflow states as “running” for tens of minutes even though they have already completed. Eventually, the system fails to start new workflows; on the web page, we see 3 instances marked as running for 20 minutes, which are actually finished, and it does not proceed further.
On Kubernetes, we only see the pods of completed workflows and no attempt to start new pods.
I suspect that the high number of pending workflows is blocking something, or that some configuration/scaling is required.
I would like to understand:
Thanks in advance for your support.
Kind regards
Claudio
Beta Was this translation helpful? Give feedback.
All reactions