The active contributors and maintainers of Zig's linker code have
generally found the current linker test harness to be cumbersome. The
tests require a lot of maintenance, but do not provide a lot of
coverage, and when they fail it is painful to troubleshoot.
Furthermore, as part of working on #31691, I don't want to port over the
CheckObject step, because I don't like the code anyway.
The plan forward is to start enhancing `zig objdump` to assist in
linker development, as well as using it as the basis for snapshot
testing.
We absolutely need linker test coverage, but we need to try to improve
these things about the next attempt:
* less effort to create and maintain tests
* less CPU overhead - we should be able to add a lot of tests without
adding a lot of CI time.
* more helpful failures. A failed linker test should provide the next
steps a developer can take to understand why the test failed.
* a goal of porting over all of LLD's test suite, or at least the good
ones.
I'm not going to open an issue to track the lost linker test coverage,
because there was already so much lack of coverage for linker stuff.
However I will open issues to track this lost coverage:
* the deleted checks from test/standalone/glibc_compat/build.zig
* the deleted checks from test/standalone/compiler_rt_panic/build.zig
* the deleted checks from test/standalone/ios/build.zig
Importantly, adds ability to get Clock resolution, which may be zero.
This allows error.Unexpected and error.ClockUnsupported to be removed
from timeout and clock reading error sets.
This commit shows a proof-of-concept direction for std.Io.VTable to go,
which is to have general support for batching, timeouts, and
non-blocking.
I'm not sure if this is a good idea or not so I'm putting it up for
scrutiny.
This commit introduces `std.Io.operate`, `std.Io.Operation`, and
implements it experimentally for `FileReadStreaming`.
In `std.Io.Threaded`, the implementation is based on poll().
This commit shows how it can be used in `std.process.run` to collect
both stdout and stderr in a single-threaded program using
`std.Threaded.Io`.
It also demonstrates how to upgrade code that was previously using
`std.Io.poll` (*not* integrated with the interface!) using concurrency.
This may not be ideal since it makes the build runner no longer support
single-threaded mode. There is still a needed abstraction for
conveniently reading multiple File streams concurrently without
io.concurrent, but this commit demonstrates that such an API can be
built on top of the new `std.Io.operate` functionality.
The previous logic was made really messy by the fact that upon entry to
the step eval worker, the step may not be ready to run, we may be racing
with other workers doing the same check, and we had already acquired our
RSS requirement even though we might not run. It also required iterating
all dependencies each time we were called to check whether we were even
ready to run yet.
A much better strategy is for each step to have an atomic counter
representing how many of its dependencies are yet to complete. When a
step completes (successfully or otherwise), it decrements this value for
all of its dependants, and if it drops any to 0, it schedules that step
to run. This means each step is scheduled exactly once, and only when
all of its dependencies have finished, reducing redundant checks and
hence contention. If the step being scheduled needs to claim RSS which
isn't available, then it is instead added to `memory_blocked_steps`,
which is iterated by the step worker after a step with an RSS claim
finishes.
This logic is more concise than before, simpler to understand, generally
more efficient, and fixes a bug in the RSS tracking. Also, as a nice
side effect, it should also play a little bit nicer with `Io.Threaded`'s
scheduling strategy, because we no longer spawn extremely short-lived
tasks all the time as we previously did.
Resolves: https://codeberg.org/ziglang/zig/issues/30742
Remove the RemoveDir step with no replacement. This step had no valid
purpose. Mutating source files? That should be done with
UpdateSourceFiles step. Deleting temporary directories? That required
creating the tmp directories in the configure phase which is broken.
Deleting cached artifacts? That's going to cause problems.
Similarly, remove the `Build.makeTempPath` function. This was used to
create a temporary path in the configure place which, again, is the
wrong place to do it.
Instead, the WriteFile step has been updated with more functionality:
tmp mode: In this mode, the directory will be placed inside "tmp" rather
than "o", and caching will be skipped. During the `make` phase, the step
will always do all the file system operations, and on successful build
completion, the dir will be deleted along with all other tmp
directories. The directory is therefore eligible to be used for
mutations by other steps. `Build.addTempFiles` is introduced to
initialize a WriteFile step with this mode.
mutate mode: The operations will not be performed against a freshly
created directory, but instead act against a temporary directory.
`Build.addMutateFiles` is introduced to initialize a WriteFile step with
this mode.
`Build.tmpPath` is introduced, which is a shortcut for
`Build.addTempFiles` followed by `WriteFile.getDirectory`.
* give Cache a gpa rather than arena because that's what it asks for
this gets the build runner compiling again on linux
this work is incomplete; it only moves code around so that environment
variables can be wrangled properly. a future commit will need to audit
the cancelation and error handling of this moved logic.
Little-endian is what `std.zig.Server` expects, but the old logic just
send the raw bytes of the struct, so sent in native endian (causing a
crash on big-endian targets).
The build runner was previously forcing child processes to have their
stderr colorization match the build runner by setting `CLICOLOR_FORCE`
or `NO_COLOR`. This is a nice idea in some cases---for instance a simple
`Run` step which we just expect to exit with code 0 and whose stderr is
not being programmatically inspected---but is a bad idea in others, for
instance if there is a check on stderr or if stderr is captured, in
which case forcing color on the child could cause checks to fail.
Instead, this commit adds a field to `std.Build.Step.Run` which
specifies a behavior for the build runner to employ in terms of
assigning the `CLICOLOR_FORCE` and `NO_COLOR` environment variables. The
default behavior is to set `CLICOLOR_FORCE` if the build runner's output
is colorized and the step's stderr is not captured, and to set
`NO_COLOR` otherwise. Alternatively, colors can be always enabled,
always disabled, always match the build runner, or the environment
variables can be left untouched so they can be manually controlled
through `env_map`.
Notably, this fixes a failure when running `zig build test-cli` in a
TTY (or with colors explicitly enabled). GitHub CI hadn't caught this
because it does not request color, but Codeberg CI now does, and we were
seeing a failure in the `zig init` test because the actual output had
color escape codes in it due to 6d280dc.
This is a major refactor to `Step.Run` which adds new functionality,
primarily to the execution of Zig tests.
* All tests are run, even if a test crashes. This happens through the
same mechanism as timeouts where the test processes is repeatedly
respawned as needed.
* The build status output is more precise. For each unit test, it
differentiates pass, skip, fail, crash, and timeout. Memory leaks are
reported separately, as they do not indicate a test's "status", but
are rather an additional property (a test with leaks may still pass!).
* The number of memory leaks is tracked and reported, both per-test and
for a whole `Run` step.
* Reporting is made clearer when a step is failed solely due to error
logs (`std.log.err`) where every unit test passed.
For now, there is a flag to `zig build` called `--test-timeout-ms` which
accepts a value in milliseconds. If the execution time of any individual
unit test exceeds that number of milliseconds, the test is terminated
and marked as timed out.
In the future, we may want to increase the granularity of this feature
by allowing timeouts to be specified per-step or even per-test. However,
a global option is actually very useful. In particular, it can be used
in CI scripts to ensure that no individual unit test exceeds some
reasonable limit (e.g. 60 seconds) without having to assign limits to
every individual test step in the build script.
Also, individual unit test durations are now shown in the time report
web interface -- this was fairly trivial to add since we're timing tests
(to check for timeouts) anyway.
This commit makes progress on #19821, but does not close it, because
that proposal includes a more sophisticated mechanism for setting
timeouts.
Co-Authored-By: David Rubin <david@vortan.dev>
As well as the exact byte count, include a human-readable value so it's
clearer what the error is actually telling you. The exact byte count
might not be worth keeping, but I decided I would in case it's useful in
any scenario.