You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of pinning the snapshot to a specific date, we set it to
'nightly', which should allow more flexibility i.e. users can build
whichever ghc corresponds to the latest snapshot. This should
hopefully decrease the chance of users running into issues with
ghc minor version upgrades e.g. bounds errors.
Snapshot updates that upgrade the ghc major version will almost
certainly require manual intervention here, but failures will likely
be caught by the CI job that builds everything with --dry-run, so
there should be some warning.
Copy file name to clipboardExpand all lines: README.md
+70-6Lines changed: 70 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ An impact assessment is due when
14
14
15
15
The procedure is as follows:
16
16
17
-
1. Rebase changes, mandated by your proposal, atop of `ghc-9.10` branch.
17
+
1. Rebase changes, mandated by your proposal, atop the ghc branch (or tag) that corresponds to the current [stackage nightly](https://www.stackage.org/nightly). For example, if the latest snapshot ghc is `ghc-9.12.3`, we would want to rebase our changes on the `ghc-9.12.3-release` tag.
18
18
19
19
2. Compile a patched GHC, say, `~/ghc/_build/stage1/bin/ghc`.
20
20
@@ -36,6 +36,7 @@ The procedure is as follows:
36
36
* You can interrupt `cabal` at any time and rerun again later.
37
37
* Consider setting `--jobs` to retain free CPU cores for other tasks.
38
38
* Full build requires roughly 7 Gb of free disk space.
39
+
* If the build fails with an error about max amount of arguments in `gcc`, run again, but with smaller batch size. 250 worked well for me.
39
40
40
41
To get an idea of the current progress, we can run the following commands
41
42
on the log file:
@@ -59,6 +60,61 @@ The procedure is as follows:
59
60
60
61
8. When everything finally builds, get back to CLC with a list of packages affected and patches required.
61
62
63
+
### Troubleshooting
64
+
65
+
Because we build with `nightly` and are at the mercy of cabal's constraint solver, it is possible to run into solver / build issues that have nothing to do with our custom GHC. Some of the most common problems include:
66
+
67
+
- Nightly adds a new, problematic package `p` e.g.
68
+
69
+
- `p` requires a new system dependency (e.g. a C library).
70
+
- `p` is an executable.
71
+
- `p` depends on a package in [./excluded_pkgs.jsonc](excluded_pkgs.jsonc).
72
+
73
+
- A cabal flag is set in a way that breaks the build. For example, our snapshot requires that the `bson` library does *not* have its `_old-network` flag set, as this will cause a build error with our version of `network`. This flag is automatic, so we have to force it in `generated/cabal.project` with `constraints: bson -_old-network`.
74
+
75
+
- Nightly has many packages drop out for some reason, increasing the chance for solver non-determinism.
76
+
77
+
We attempt to mitigate such issues by:
78
+
79
+
- Writing most of the snapshot's exact package versions as cabal constraints to the generated `./generated/cabal.project.local`, which ensures we (transitively) build the same package version every time. Note that boot packages like `text` are deliberately excluded so that we can build a snapshot with multiple GHCs. Otherwise even a GHC minor version difference would fail because `ghc` is in the build plan.
80
+
81
+
- Ignoring bounds in `generated/cabal.project`:
82
+
83
+
```
84
+
allow-newer: *:*
85
+
allow-older: *:*
86
+
```
87
+
88
+
Nevertheless, it is still possible for issues to slip through. When a package `p` fails to build for some reason, we should first:
89
+
90
+
- Verify that `p` is not in `excluded_pkgs.jsonc`. If it is, nightly probably pulled in some new reverse-dependency `q` that should be added to `excluded_pkgs.jsonc`.
91
+
92
+
- Verify that `p` does not have cabal flags that can affect dependencies / API.
93
+
94
+
- Verify that `p`'s version matches what it is in the current snapshot (e.g. `https://www.stackage.org/nightly`). If it does not, either a package needs to be excluded or constraints need to be added.
95
+
96
+
In general, user mitigations for solver / build problems include:
97
+
98
+
- Adding `p` to `excluded_pkgs.jsonc`. Note that `p` will still be built if it is a (transitive) dependency of some other package in the snapshot, but will not have its exact bounds written to `cabal.project.local`.
99
+
100
+
- Manually downloading a snapshot (e.g. `https://www.stackage.org/nightly/cabal.config`), changing / removing the offending package(s), and supplying the file with the `--snapshot-path` param. Like `excluded_pkgs.jsonc`, take care that the problematic package is not a (transitive) dependency of something in the snapshot.
101
+
102
+
- Adding constraints to `generated/cabal.project` e.g. flags or version constraints like `constraints: filepath > 1.5`.
103
+
104
+
#### Misc
105
+
106
+
- Note that while a GHC minor version difference is usually okay, a GHC *major* difference will very likely lead to errors.
107
+
108
+
- The `flake.nix` line:
109
+
110
+
```nix
111
+
compiler = pkgs.haskell.packages.ghc<vers>;
112
+
```
113
+
114
+
can be a useful guide as to which GHC was last tested, as CI uses this ghc to build everything with `--dry-run`, which should report solver errors (e.g. bounds) at the very least.
115
+
116
+
- If you encounter an error that you think indicates a problem with the configuration here (e.g. new package needs to be excluded, new constraint added), please open an issue. While that is being resolved, the mitigations from the [previous section](#troubleshooting) may be useful.
117
+
62
118
### The clc-stackage exe
63
119
64
120
`clc-stackage` is an executable that will:
@@ -94,13 +150,21 @@ By default (`--write-logs save-failures`), the build logs are saved to the `./ou
94
150
95
151
#### Group batching
96
152
97
-
The `clc-stackage` exe allows for splitting the entire package set into subset groups of size `N` with the `--batch N` option. Each group is then built sequentially. Not only can this be useful for situations where building the entire package set in one go is infeasible, but it also provides a "cache" functionality, that allows us to interrupt the program at any point (e.g. `CTRL-C`), and pick up where we left off. For example:
153
+
By default, the `clc-stackage` exe tries to build all packages at once i.e. every package is written to `generated/generated.cabal`. This can cause problems e.g. we do not have enough memory to build everything simultaneously, or we receive an error that `gcc` has been given too many arguments. Hence we provide the `--batch N` option, which will split the package set into disjoint groups of size `N`. Each group is then built sequentially.
98
154
99
-
```sh
100
-
$ clc-stackage --batch 100
101
-
```
155
+
The default behavior is:
156
+
157
+
1.`clc-stackage` will try to build everything in the same group, even if some package fails (equivalent to cabal's `--keep-going` flag.). If instead `--package-fail-fast` is enabled, the first failure will cause the entire group to immediately fail, and we will move onto the next group.
102
158
103
-
This will split the entire downloaded package set into groups of size 100. Each time a group finishes (success or failure), stdout/err will be updated, and then the next group will start. If the group failed to build and we have `--write-logs save-failures` (the default), then the logs and error output will be in `./output/logs/<pkg>/`, where `<pkg>` is the name of the first package in the group.
159
+
2.`clc-stackage` will try every group, even if some prior group fails. The `--group-fail-fast` option changes this so that the first failure will cause `clc-stackage` to exit.
160
+
161
+
Each time a group finishes (success or failure), stdout/err will be updated, and then the next group will start. If the group failed to build and we have `--write-logs save-failures` (the default), then the logs and error output will be in `./output/logs/<pkg>/`, where `<pkg>` is the name of the first package in the group.
162
+
163
+
When `clc-stackage` itself finishes (either on its own or via an interrupt like `CTRL-C`), the results are saved to a cache which records all successes, failures, and untested packages. This allows us to pick up where we left off with untested packages (including failures if the `--retry-failures` flag is active).
164
+
165
+
> [!IMPORTANT]
166
+
>
167
+
> The cache operates at the *batch group* level, so only packages that have been part of a successful group will be considered successes. Conversely, a package will be considered a failure if it is part of a failing group, even if it was built successfully. Therefore, to see what packages actually failed, we will want to check the logs directory. Alternatively, we can first run `clc-stackage` initially with a large `--batch` group (for maximum performance), then run it again with, say, `--batch 1`.
Copy file name to clipboardExpand all lines: dev.md
+13-18Lines changed: 13 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,15 +8,15 @@ This project is organized into several libraries and a single executable. Roughl
8
8
2. Prune `s` based on packages we know we do not want (e.g. system deps).
9
9
3. Generate a custom `generated.cabal` file for the given package set, and try to build it.
10
10
11
-
Futhermore, we allow for building subsets of the entire stackage package set with the `--batch` feature. This will split the package set into disjoint groups, and build each group sequentially. The process can be interrupted at any time (e.g. `CTRL-C`), and progress will be saved in a "cache" (json file), so we can pick up where we left off.
11
+
Furthermore, we allow for building subsets of the entire stackage package set with the `--batch` feature. This will split the package set into disjoint groups, and build each group sequentially. The process can be interrupted at any time (e.g. `CTRL-C`), and progress will be saved in a "cache" (json file), so we can pick up where we left off.
12
12
13
13
## Components
14
14
15
15
The `clc-stackage` library is namespaced by functionality:
16
16
17
17
### utils
18
18
19
-
`CLC.Stackage.Utils`ontains common utilities e.g. logging and hardcoded file paths.
19
+
`CLC.Stackage.Utils`contains common utilities e.g. logging and hardcoded file paths.
20
20
21
21
### parser
22
22
@@ -73,30 +73,25 @@ The reason this logic is a library function and not the executable itself is for
73
73
74
74
The executable that actually runs.This is a very thin wrapper over `runner`, which merely sets up the logging handler.
75
75
76
-
## Updating to a new shapshot
76
+
## Updating to a new snapshot
77
77
78
-
1.Updateto the desired snapshot:
78
+
`clc-stackage` is based on `nightly`-- which changes automatically -- meaning we do not necessarily have to do anything when a new (minor) snapshot is released. On the other hand, *major* snapshot updates will almost certainly bring in new packages that need to be excluded, so there are some general "update steps" we will want to take:
79
79
80
-
```haskell
81
-
-- CLC.Stackage.Parser.API
82
-
stackageSnapshot::String
83
-
stackageSnapshot ="nightly-yyyy-mm-dd"
84
-
```
80
+
1.Modify [excluded_pkgs.json](excluded_pkgs.json) as needed.That is, updating the snapshot major version will probably bring in some new packages that we donot want.The update process is essentially trial-and-error i.e. run `clc-stackage` as normal, and later add any failing packages that should be excluded.
85
81
86
-
2.Updatethe `index-state` in [cabal.project](cabal.project) and [generated/cabal.project](generated/cabal.project).
82
+
2.Update`ghc-version` in [.github/workflows/ci.yaml](.github/workflows/ci.yaml).
87
83
88
-
3.Modify [excluded_pkgs.json](excluded_pkgs.json) as needed.That is, updating the snapshot will probably bring in some new packages that we donot want.The update process is essentially trial-and-errori.e.run `clc-stackage` as normal, andlater add any failing packages that should be excluded.
84
+
3.Update functional tests as neededi.e.exact package versions in `*golden` and`test/functional/snapshot.txt`.
89
85
90
-
4.Updatereferences to the current ghc e.g.
86
+
4.Optional:Updatenix:
91
87
92
-
1. `ghc-version` in [.github/workflows/ci.yaml](.github/workflows/ci.yaml).
93
-
2. [README.md](README.md).
88
+
-Inputs (`nix flake update`).
89
+
-GHC:Update the `compiler = pkgs.haskell.packages.ghc<vers>;` line.
90
+
-Add to the `flake.nix`'s `ldDeps`and`deps` as needed to have the `nix`CI job pass.System libs available on nix can be found here: https://search.nixos.org/packages?channel=unstable.
94
91
95
-
5.Update functional tests as needed i.e.exact package versions in`*golden` and `test/functional/snapshot.txt`.
92
+
This job builds everything with `--dry-run`, so its success is a useful proxy for `clc-stackage`'s health. In other words, if the nix job fails, there is almost certainly a general issue (i.e. either a package should be excluded or new system dep is required), but if it succeeds, the package set is in pretty good shape (there may still be sporadic issues e.g. a package does not properly declare its system dependencies at config time).
0 commit comments