-
Notifications
You must be signed in to change notification settings - Fork 10
Explain mechanics and use of “Group batching” #28
Description
This comes from here, but I have slightly touched up the wording.
I don't get the description of “group batching” in the README.
Reading the text with muy current (assumed) understanding in mind, it
does not seem incorrect, but it does not convey (to me) how “group
batching” works, how it is intended to be used, and how it relates to
“a "cache" functionality”.
By observation, I came up with my own assumption about how this
might be intended to work. I might be wrong, though!
Here's my take:
clc-stackagetries to build all Haskell packages listed in a given
stack snapshot.To speed things up,
clc-stackagegroups the packages into batches,
then iterates over them, for each batche trying to build all
contained packages. If one package fails, then the entire batch
fails, and all packages in that batch are marked as failure.By default (i.e., without using
--batch),clc-stackagewill put
all packages into one single batch. This is supposed to be
quick, but one failing package makes the entire batch fail, thus
reporting all packages as failed. So this gives you a yes/no
answer, but is not enough to single out the failing packages. To do
that, you will need to modify the batch sizes.With
--batch 100, batches of 100 packages each are created (except
for the last one, I guess). This should give you better performance
than with--batch 1, but may still report excessive failures. If,
say, 5 failing packages exist and are distributed over 2 batches,
thenclc-stackagestill reports 200 failed packages.After running with batches of 100, choose a smaller batch size (say,
--batch 10) and also add--retry-failures. This will only retry
the 200 packages reported as failig from the previous run,
rearranged into smaller batches. If 5 failing packages are
distributed over 4 batches-of-10, then only their 40 packages are
reported as failure now. We are narrowing down.Finally, run with
--batch 1 --retry-failures, which will only retry
the 40 failed packages from above, and report on them individually.Obviously, one can try to optimise the number of steps, and the
batch sizes used.
Is this correct?.
Also, why is working in batches at all supposed to be faster than just
iterating over the packages?