- New Features
-- Multiprocess Fuzzing
The fuzzer now is able to utilize multiple cores. This is controllable
with the `-j` build option. Limited fuzzing still uses one core.
-- Fuzzing Infinite Mode
When provided multiple tests, the fuzzer now switches between them and
prioritizes the most effective and interesting ones. Over time already
explored tests will become barely run compared to tests yielding new
inputs.
-- Crash Dumps
Crashing inputs are now saved to a file indicated by the crash message.
It is recommended to use these files to reproduce the crash using
`std.testing.FuzzInputOptions.corpus` and @embedFile.
- Design
Each fuzzing process is assigned an instance id which has the following
uses:
* In conjunction with the pc hash and running test index, they uniquely
identify input files in the case of a crash.
* It is combined with the test seed for a unique rng seed.
* Instance 0 is solely responsible for syncing the filesystem corpus.
When new inputs are found, they are sent to the build server. It then
distributes the new input to the other instances. Each instance has a
concurrent poller managed by the test runner which sends received
inputs to libfuzzer. (note that this is affected by #31718 and so can
(rarely) deadlock)
For fuzzing infinite mode, the test runner now receives a list of tests
from the build server. The fuzzer runs tests in batches of one second,
approximated in cycles by the previous batch's run speed. Tests finding
new inputs or with few runs are given a higher run chance. The baseline
run chance is based off the recency of the last find and the number of
pcs the test has hit.
Previously, each message requires an unseekable error to be returned
from a syscall before proceeding. Ideally, the code would just pass
around `*std.Io.Writer` instead of `std.Io.File` in the first place, but
even then, you could argue for saving a syscall with `writerStreaming`.
This generates zig ASTs from `testing.Smith` and is based off the
langref's PEG.
The choice to not build the Ast while generating and instead parsing it
afterwards makes the smith more versatile by not being tied to a single
implementation at a cost of efficiency.
Additionally, a new function `boolWeighted` was added to `Smith` due to
its frequent use in `AstSmith`.
The indexes can change between recompilation due to conditional
compilation and compiler quirks. While unit test names are still not a
perfect solution, they are better than indexes.
The tar format expects `/`, although some untar implementations do seem to handle Windows-style `\` path separators (7-Zip at least).
The tar.Writer API can't really enforce this, though, as doing so would effectively make `\` an illegal character when it's really not. So, it's up to the user to provide paths with the correct path separators.
`Build/WebServer.zig` will still output tars with `\` as a path separator on Windows, but that's currently only used during fuzzing which is not yet implemented on Windows.
Remove the `{D}` format specifier. It is moved into `std.Io.Duration` as
a format method.
Migration plan:
```diff
-writer.print("{D}", .{ns});
+writer.print("{f}", .{std.Io.Duration{ .nanoseconds = ns }});
```
All instances where `{D}` was used have been changed to use
`std.Io.Duration` and `{f}`.
Fixes#31281
-- On the standard library side:
The `input: []const u8` parameter of functions passed to `testing.fuzz`
has changed to `smith: *testing.Smith`. `Smith` is used to generate
values from libfuzzer or input bytes generated by libfuzzer.
`Smith` contains the following base methods:
* `value` as a generic method for generating any type
* `eos` for generating end-of-stream markers. Provides the additional
guarantee `true` will eventually by provided.
* `bytes` for filling a byte array.
* `slice` for filling part of a buffer and providing the length.
`Smith.Weight` is used for giving value ranges a higher probability of
being selected. By default, every value has a weight of zero (i.e. they
will not be selected). Weights can only apply to values that fit within
a u64. The above functions have corresponding ones that accept weights.
Additionally, the following functions are provided:
* `baselineWeights` which provides a set of weights containing every
possible value of a type.
* `eosSimpleWeighted` for unique weights for `true` and `false`
* `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range
of values.
-- On the libfuzzer and abi side:
--- Uids
These are u32s which are used to classify requested values. This solves
the problem of a mutation causing a new value to be requested and
shifting all future values; for example:
1. An initial input contains the values 1, 2, 3 which are interpreted
as a, b, and c respectively by the test.
2. The 1 is mutated to a 4 which causes the test to request an extra
value interpreted as d. The input is now 4, 2, 3, 5 (new value) which
the test corresponds to a, d, b, c; however, b and c no longer
correspond to their original values.
Uids contain a hash component and type component. The hash component
is currently determined in `Smith` by taking a hash of the calling
`@returnAddress()` or via an argument in the corresponding `WithHash`
functions. The type component is used extensively in libfuzzer with its
hashmaps.
--- Mutations
At the start of a cycle (a run), a random number of values to mutate is
selected with less being exponentially more likely. The indexes of the
values are selected from a selected uid with a logarithmic bias to uids
with more values.
Mutations may change a single values, several consecutive values in a
uid, or several consecutive values in the uid-independent order they
were requested. They may generate random values, mutate from previous
ones, or copy from other values in the same uid from the same input or
spliced from another.
For integers, mutations from previous ones currently only generates
random values. For bytes, mutations from previous mix new random data
and previous bytes with a set number of mutations.
--- Passive Minimization
A different approach has been taken for minimizing inputs: instead of
trying a fixed set of mutations when a fresh input is found, the input
is instead simply added to the corpus and removed when it is no longer
valuable.
The quality of an input is measured based off how many unique pcs it
hit and how many values it needed from the fuzzer. It is tracked which
inputs hold the best qualities for each pc for hitting the minimum and
maximum unique pcs while needing the least values.
Once all an input's qualities have been superseded for the pcs it hit,
it is removed from the corpus.
-- Comparison to byte-based smith
A byte-based smith would be much more inefficient and complex than this
solution. It would be unable to solve the shifting problem that Uids
do. It is unable to provide values from the fuzzer past end-of-stream.
Even with feedback, it would be unable to act on dynamic weights which
have proven essential with the updated tests (e.g. to constrain values
to a range).
-- Test updates
All the standard library tests have been updated to use the new smith
interface. For `Deque`, an ad hoc allocator was written to improve
performance and remove reliance on heap allocation. `TokenSmith` has
been added to aid in testing Ast and help inform decisions on the smith
interface.
The end of the archive needs to also be aligned to a two-byte boundary,
not just the start of records. This was causing lld to reject archives.
Notably, this was happening with compiler_rt when rebuilding in fuzz
mode, which is why this commit is included in this patchset.
Importantly, adds ability to get Clock resolution, which may be zero.
This allows error.Unexpected and error.ClockUnsupported to be removed
from timeout and clock reading error sets.