Skip to content

Explain mechanics and use of “Group batching” #28

@s5k6

Description

@s5k6

This comes from here, but I have slightly touched up the wording.

I don't get the description of “group batching” in the README.

Reading the text with muy current (assumed) understanding in mind, it
does not seem incorrect, but it does not convey (to me) how “group
batching” works, how it is intended to be used, and how it relates to
“a "cache" functionality”.

By observation, I came up with my own assumption about how this
might be intended to work. I might be wrong, though!

Here's my take:

clc-stackage tries to build all Haskell packages listed in a given
stack snapshot.

To speed things up, clc-stackage groups the packages into batches,
then iterates over them, for each batche trying to build all
contained packages. If one package fails, then the entire batch
fails, and all packages in that batch are marked as failure.

By default (i.e., without using --batch), clc-stackage will put
all packages into one single batch. This is supposed to be
quick, but one failing package makes the entire batch fail, thus
reporting all packages as failed. So this gives you a yes/no
answer, but is not enough to single out the failing packages. To do
that, you will need to modify the batch sizes.

With --batch 100, batches of 100 packages each are created (except
for the last one, I guess). This should give you better performance
than with --batch 1, but may still report excessive failures. If,
say, 5 failing packages exist and are distributed over 2 batches,
then clc-stackage still reports 200 failed packages.

After running with batches of 100, choose a smaller batch size (say,
--batch 10) and also add --retry-failures. This will only retry
the 200 packages reported as failig from the previous run,
rearranged into smaller batches. If 5 failing packages are
distributed over 4 batches-of-10, then only their 40 packages are
reported as failure now. We are narrowing down.

Finally, run with --batch 1 --retry-failures, which will only retry
the 40 failed packages from above, and report on them individually.

Obviously, one can try to optimise the number of steps, and the
batch sizes used.

Is this correct?.

Also, why is working in batches at all supposed to be faster than just
iterating over the packages?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions