- New Features
-- Multiprocess Fuzzing
The fuzzer now is able to utilize multiple cores. This is controllable
with the `-j` build option. Limited fuzzing still uses one core.
-- Fuzzing Infinite Mode
When provided multiple tests, the fuzzer now switches between them and
prioritizes the most effective and interesting ones. Over time already
explored tests will become barely run compared to tests yielding new
inputs.
-- Crash Dumps
Crashing inputs are now saved to a file indicated by the crash message.
It is recommended to use these files to reproduce the crash using
`std.testing.FuzzInputOptions.corpus` and @embedFile.
- Design
Each fuzzing process is assigned an instance id which has the following
uses:
* In conjunction with the pc hash and running test index, they uniquely
identify input files in the case of a crash.
* It is combined with the test seed for a unique rng seed.
* Instance 0 is solely responsible for syncing the filesystem corpus.
When new inputs are found, they are sent to the build server. It then
distributes the new input to the other instances. Each instance has a
concurrent poller managed by the test runner which sends received
inputs to libfuzzer. (note that this is affected by #31718 and so can
(rarely) deadlock)
For fuzzing infinite mode, the test runner now receives a list of tests
from the build server. The fuzzer runs tests in batches of one second,
approximated in cycles by the previous batch's run speed. Tests finding
new inputs or with few runs are given a higher run chance. The baseline
run chance is based off the recency of the last find and the number of
pcs the test has hit.
Users can proceed by telling the build system how much memory to assume
the system has. I have improved the failure message to communicate this.
Partially reverts 87f8f47ba5
Even if we wanted to unrevert this reverted commit, it's not sufficient
to merely downgrade the failure to a warning, because the main
scheduling logic will fail to schedule steps that have a max_rss
exceeding the detected system value, causing some steps to never get
executed, eventually tripping an assert. Such a change would need to
also override the max_rss value to be equal to the greatest value across
all steps.
closes#31510
Modifies the `Allocator` implementation provided by `ArenaAllocator` to be
threadsafe using only atomics and no synchronization primitives locked
behind an `Io` implementation.
At its core this is a lock-free singly linked list which uses CAS loops to
exchange the head node. A nice property of `ArenaAllocator` is that the
only functions that can ever remove nodes from its linked list are `reset`
and `deinit`, both of which are not part of the `Allocator` interface and
thus aren't threadsafe, so node-related ABA problems are impossible.
There *are* some trade-offs: end index tracking is now per node instead of
per allocator instance. It's not possible to publish a head node and its
end index at the same time if the latter isn't part of the former.
Another compromise had to be made in regards to resizing existing nodes.
Annoyingly, `rawResize` of an arbitrary thread-safe child allocator can
of course never be guaranteed to be an atomic operation, so only one
`alloc` call can ever resize at the same time, other threads have to
consider any resizes they attempt during that time failed. This causes
slightly less optimal behavior than what could be achieved with a mutex.
The LSB of `Node.size` is used to signal that a node is being resized.
This means that all nodes have to have an even size.
Calls to `alloc` have to allocate new nodes optimistically as they can
only know whether any CAS on a head node will succeed after attempting it,
and to attempt the CAS they of course already need to know the address of
the freshly allocated node they are trying to make the new head.
The simplest solution to this would be to just free the new node again if
a CAS fails, however this can be expensive and would mean that in practice
arenas could only really be used with a GPA as their child allocator. To
work around this, this implementation keeps its own free list of nodes
which didn't make their CAS to be reused by a later `alloc` invocation.
To keep things simple and avoid ABA problems the free list is only ever
be accessed beyond its head by 'stealing' the head node (and thus the
entire list) with an atomic swap. This makes iteration and removal trivial
since there's only ever one thread doing it at a time which also owns all
nodes it's holding. When the thread is done it can just push its list onto
the free list again.
This implementation offers comparable performance to the previous one when
only being accessed by a single thread and a slight speedup compared to
the previous implementation wrapped into a `ThreadSafeAllocator` up to ~7
threads performing operations on it concurrently.
(measured on a base model MacBook Pro M1)
Importantly, adds ability to get Clock resolution, which may be zero.
This allows error.Unexpected and error.ClockUnsupported to be removed
from timeout and clock reading error sets.
- delete std.Thread.Futex
- delete std.Thread.Mutex
- delete std.Thread.Semaphore
- delete std.Thread.Condition
- delete std.Thread.RwLock
- delete std.once
std.Thread.Mutex.Recursive remains... for now. it will be replaced with
a special purpose mechanism used only by panic logic.
std.Io.Threaded exposes mutexLock and mutexUnlock for the advanced case
when you need to call them directly.
In Zig standard library, Dir means an open directory handle. path
represents a file system identifier string. This function is better
named after "current path" than "current dir". "get" and "working" are
superfluous.
The previous logic was made really messy by the fact that upon entry to
the step eval worker, the step may not be ready to run, we may be racing
with other workers doing the same check, and we had already acquired our
RSS requirement even though we might not run. It also required iterating
all dependencies each time we were called to check whether we were even
ready to run yet.
A much better strategy is for each step to have an atomic counter
representing how many of its dependencies are yet to complete. When a
step completes (successfully or otherwise), it decrements this value for
all of its dependants, and if it drops any to 0, it schedules that step
to run. This means each step is scheduled exactly once, and only when
all of its dependencies have finished, reducing redundant checks and
hence contention. If the step being scheduled needs to claim RSS which
isn't available, then it is instead added to `memory_blocked_steps`,
which is iterated by the step worker after a step with an RSS claim
finishes.
This logic is more concise than before, simpler to understand, generally
more efficient, and fixes a bug in the RSS tracking. Also, as a nice
side effect, it should also play a little bit nicer with `Io.Threaded`'s
scheduling strategy, because we no longer spawn extremely short-lived
tasks all the time as we previously did.
Resolves: https://codeberg.org/ziglang/zig/issues/30742
Remove the RemoveDir step with no replacement. This step had no valid
purpose. Mutating source files? That should be done with
UpdateSourceFiles step. Deleting temporary directories? That required
creating the tmp directories in the configure phase which is broken.
Deleting cached artifacts? That's going to cause problems.
Similarly, remove the `Build.makeTempPath` function. This was used to
create a temporary path in the configure place which, again, is the
wrong place to do it.
Instead, the WriteFile step has been updated with more functionality:
tmp mode: In this mode, the directory will be placed inside "tmp" rather
than "o", and caching will be skipped. During the `make` phase, the step
will always do all the file system operations, and on successful build
completion, the dir will be deleted along with all other tmp
directories. The directory is therefore eligible to be used for
mutations by other steps. `Build.addTempFiles` is introduced to
initialize a WriteFile step with this mode.
mutate mode: The operations will not be performed against a freshly
created directory, but instead act against a temporary directory.
`Build.addMutateFiles` is introduced to initialize a WriteFile step with
this mode.
`Build.tmpPath` is introduced, which is a shortcut for
`Build.addTempFiles` followed by `WriteFile.getDirectory`.
* give Cache a gpa rather than arena because that's what it asks for
This commit includes some API changes which I agreed with Andrew as a
follow-up to the recent `Io.Group` changes:
* `Io.Group.await` *does* propagate cancelation to group tasks; it then
waits for them to complete, and *also* returns `error.Canceled`. The
assertion that group tasks handle `error.Canceled` "correctly" means
this behavior is loosely analagous to how awaiting a future works. The
important thing is that the semantics of `Group.await` and
`Future.await` are similar, and `error.Canceled` will always be
visible to the caller (assuming correct API usage).
* `Io.Group.awaitUncancelable` is removed.
* `Future.await` calls `recancel` only if the "child" task (the future
being awaited) did not acknowledge cancelation. If it did, then it is
assumed that the future will propagate `error.Canceled` through
`await` as needed.
Rename `wait` to `await` to be consistent with Future API. The
convention here is that this set of functionality goes together:
* async/concurrent
* await/cancel
Also rename Select `wait` to `await` for the same reason.
`Group.await` now can return `error.Canceled`. Furthermore,
`Group.await` does not auto-propagate cancelation. Instead, users should
follow the pattern of `defer group.cancel(io);` after initialization,
and doing `try group.await(io);` at the end of the success path.
Advanced logic can choose to do something other than this pattern in the
event of cancelation.
Additionally, fixes a bug in `std.Io.Threaded` future await, in which it
swallowed an `error.Canceled`. Now if a task is canceled while awaiting
a future, after propagating the cancel request, it also recancels,
meaning that the awaiting task will properly detect its own cancelation
at the next cancelation point.
Furthermore, fixes a bug in the compiler where `error.Canceled` was
being swallowed in `dispatchPrelinkWork`.
Finally, fixes std.crypto code that inappropriately used
`catch unreachable` in response to cancelation without even so much as a
comment explaining why it was believed to be unreachable. Now, those
functions have `error.Canceled` in the error set and propagate
cancelation properly.
With this way of doing things, `Group.await` has a nice property: even if
all tasks in the group are CPU bound and without cancelation points, the
`Group.await` can still be canceled. In such case, the task that was
waiting for `await` wakes up with a chance to do some more resource
cleanup tasks, such as canceling more things, before entering the
deferred `Group.cancel` call at which point it has to suspend until the
canceled but uninterruptible CPU bound tasks complete.
closes#30601
* std.option allows overriding the debug Io instance
* if the default is used, start code initializes environ and argv0
also fix some places that needed recancel(), thanks mlugg!
See #30562
This is needed unfortunately for OpenBSD and Haiku for process
executable path.
I made it so that you can omit the options usually, but you get a
compile error if you omit the options on those targets.
use the application's Io implementation where possible. This correctly
makes writing to stderr cancelable, fallible, and participate in the
application's event loop. It also removes one more hard-coded
dependency on a secondary Io implementation.
instead, allow the user to set it as a field.
this fixes a bug where leak printing and error printing would run tty
config detection for stderr, and then emit a log, which is not necessary
going to print to stderr.
however, the nice defaults are gone; the user must explicitly assign the
tty_config field during initialization or else the logging will not have
color.
related: https://github.com/ziglang/zig/issues/24510
Move ensureUnusedCapacity inside the mutex call to prevent
a race condition where other worker threads could append to
memory_blocked_steps between checking the length and iterating.
repro branch: https://codeberg.org/erg/zig/src/branch/fix-30014-repro
check out that branch, and depending on if you have the fix-30014 patch or not,
it will either race condition or succeed
Fixes#30014
This was presumably unintentional, as the old behavior looked quite odd
when you noticed it. All of the line-drawing characters ought to be the
same color; only the step name/status is dimmed when a step is "reused"
(i.e. appears multiple times in the tree).
The build runner was previously forcing child processes to have their
stderr colorization match the build runner by setting `CLICOLOR_FORCE`
or `NO_COLOR`. This is a nice idea in some cases---for instance a simple
`Run` step which we just expect to exit with code 0 and whose stderr is
not being programmatically inspected---but is a bad idea in others, for
instance if there is a check on stderr or if stderr is captured, in
which case forcing color on the child could cause checks to fail.
Instead, this commit adds a field to `std.Build.Step.Run` which
specifies a behavior for the build runner to employ in terms of
assigning the `CLICOLOR_FORCE` and `NO_COLOR` environment variables. The
default behavior is to set `CLICOLOR_FORCE` if the build runner's output
is colorized and the step's stderr is not captured, and to set
`NO_COLOR` otherwise. Alternatively, colors can be always enabled,
always disabled, always match the build runner, or the environment
variables can be left untouched so they can be manually controlled
through `env_map`.
Notably, this fixes a failure when running `zig build test-cli` in a
TTY (or with colors explicitly enabled). GitHub CI hadn't caught this
because it does not request color, but Codeberg CI now does, and we were
seeing a failure in the `zig init` test because the actual output had
color escape codes in it due to 6d280dc.
`std.Io.tty.Config.detect` may be an expensive check (e.g. involving
syscalls), and doing it every time we need to print isn't really
necessary; under normal usage, we can compute the value once and cache
it for the whole program's execution. Since anyone outputting to stderr
may reasonably want this information (in fact they are very likely to),
it makes sense to cache it and return it from `lockStderrWriter`. Call
sites who do not need it will experience no significant overhead, and
can just ignore the TTY config with a `const w, _` destructure.