Explain mechanics and use of “Group batching”

This comes from [here][1], but I have slightly touched up the wording.

[1]: https://github.com/haskell/core-libraries-committee/issues/367

I don't get the description of “group batching” in the README.

Reading the text with muy current (assumed) understanding in mind, it
does not seem incorrect, but it does not convey (to me) how “group
batching” works, how it is intended to be used, and how it relates to
“a "cache" functionality”.

By observation, I came up with my own *assumption* about how this
might be intended to work.  I might be wrong, though!

Here's my take:

> `clc-stackage` tries to build all Haskell packages listed in a given
> stack snapshot.
>
> To speed things up, `clc-stackage` groups the packages into batches,
> then iterates over them, for each batche trying to build all
> contained packages.  If one package fails, then the entire batch
> fails, and all packages in that batch are marked as failure.
>
> By default (i.e., without using `--batch`), `clc-stackage` will put
> all packages into **one single batch**.  This is supposed to be
> quick, but one failing package makes the entire batch fail, thus
> reporting all packages as failed.  So this gives you a yes/no
> answer, but is not enough to single out the failing packages.  To do
> that, you will need to modify the batch sizes.
>
> With `--batch 100`, batches of 100 packages each are created (except
> for the last one, I guess).  This should give you better performance
> than with `--batch 1`, but may still report excessive failures.  If,
> say, 5 failing packages exist and are distributed over 2 batches,
> then `clc-stackage` still reports 200 failed packages.
>
> After running with batches of 100, choose a smaller batch size (say,
> `--batch 10`) and also add `--retry-failures`.  This will only retry
> the 200 packages reported as failig from the previous run,
> rearranged into smaller batches.  If 5 failing packages are
> distributed over 4 batches-of-10, then only their 40 packages are
> reported as failure now.  We are narrowing down.
>
> Finally, run with `--batch 1 --retry-failures`, which will only retry
> the 40 failed packages from above, and report on them individually.
>
> Obviously, one can try to optimise the number of steps, and the
> batch sizes used.

Is this correct?.

Also, why is working in batches at all supposed to be faster than just
iterating over the packages?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain mechanics and use of “Group batching” #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explain mechanics and use of “Group batching” #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions