Commit Graph

20 Commits

Author SHA1 Message Date
Kendall Condon 5d58306162 rework fuzz testing to be smith based
-- On the standard library side:

The `input: []const u8` parameter of functions passed to `testing.fuzz`
has changed to `smith: *testing.Smith`. `Smith` is used to generate
values from libfuzzer or input bytes generated by libfuzzer.

`Smith` contains the following base methods:
* `value` as a generic method for generating any type
* `eos` for generating end-of-stream markers. Provides the additional
  guarantee `true` will eventually by provided.
* `bytes` for filling a byte array.
* `slice` for filling part of a buffer and providing the length.

`Smith.Weight` is used for giving value ranges a higher probability of
being selected. By default, every value has a weight of zero (i.e. they
will not be selected). Weights can only apply to values that fit within
a u64. The above functions have corresponding ones that accept weights.
Additionally, the following functions are provided:
* `baselineWeights` which provides a set of weights containing every
  possible value of a type.
* `eosSimpleWeighted` for unique weights for `true` and `false`
* `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range
  of values.

-- On the libfuzzer and abi side:

--- Uids

These are u32s which are used to classify requested values. This solves
the problem of a mutation causing a new value to be requested and
shifting all future values; for example:

1. An initial input contains the values 1, 2, 3 which are interpreted
as a, b, and c respectively by the test.

2. The 1 is mutated to a 4 which causes the test to request an extra
value interpreted as d. The input is now 4, 2, 3, 5 (new value) which
the test corresponds to a, d, b, c; however, b and c no longer
correspond to their original values.

Uids contain a hash component and type component. The hash component
is currently determined in `Smith` by taking a hash of the calling
`@returnAddress()` or via an argument in the corresponding `WithHash`
functions. The type component is used extensively in libfuzzer with its
hashmaps.

--- Mutations

At the start of a cycle (a run), a random number of values to mutate is
selected with less being exponentially more likely. The indexes of the
values are selected from a selected uid with a logarithmic bias to uids
with more values.

Mutations may change a single values, several consecutive values in a
uid, or several consecutive values in the uid-independent order they
were requested. They may generate random values, mutate from previous
ones, or copy from other values in the same uid from the same input or
spliced from another.

For integers, mutations from previous ones currently only generates
random values. For bytes, mutations from previous mix new random data
and previous bytes with a set number of mutations.

--- Passive Minimization

A different approach has been taken for minimizing inputs: instead of
trying a fixed set of mutations when a fresh input is found, the input
is instead simply added to the corpus and removed when it is no longer
valuable.

The quality of an input is measured based off how many unique pcs it
hit and how many values it needed from the fuzzer. It is tracked which
inputs hold the best qualities for each pc for hitting the minimum and
maximum unique pcs while needing the least values.

Once all an input's qualities have been superseded for the pcs it hit,
it is removed from the corpus.

-- Comparison to byte-based smith

A byte-based smith would be much more inefficient and complex than this
solution. It would be unable to solve the shifting problem that Uids
do. It is unable to provide values from the fuzzer past end-of-stream.
Even with feedback, it would be unable to act on dynamic weights which
have proven essential with the updated tests (e.g. to constrain values
to a range).

-- Test updates

All the standard library tests have been updated to use the new smith
interface. For `Deque`, an ad hoc allocator was written to improve
performance and remove reliance on heap allocation. `TokenSmith` has
been added to aid in testing Ast and help inform decisions on the smith
interface.
2026-02-13 22:12:19 -05:00
Andrew Kelley 2fee64ceb0 update init template for new main API 2026-01-04 00:27:09 -08:00
Andrew Kelley 33e302d67a update remaining calls to std.Io.Threaded.init 2025-12-23 22:15:12 -08:00
Andrew Kelley 7d955274bb update the init templates to new std API 2025-12-23 22:15:11 -08:00
Andrew Kelley 749f10af49 std.ArrayList: make unmanaged the default 2025-08-11 15:52:49 -07:00
Andrew Kelley 8c9dfcbd0f std.Io: remove BufferedWriter 2025-08-08 17:17:53 -07:00
Andrew Kelley 0e37ff0d59 std.fmt: breaking API changes
added adapter to AnyWriter and GenericWriter to help bridge the gap
between old and new API

make std.testing.expectFmt work at compile-time

std.fmt no longer has a dependency on std.unicode. Formatted printing
was never properly unicode-aware. Now it no longer pretends to be.

Breakage/deprecations:
* std.fs.File.reader -> std.fs.File.deprecatedReader
* std.fs.File.writer -> std.fs.File.deprecatedWriter
* std.io.GenericReader -> std.io.Reader
* std.io.GenericWriter -> std.io.Writer
* std.io.AnyReader -> std.io.Reader
* std.io.AnyWriter -> std.io.Writer
* std.fmt.format -> std.fmt.deprecatedFormat
* std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
* std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
* std.fmt.fmtSliceHexLower -> {x}
* std.fmt.fmtSliceHexUpper -> {X}
* std.fmt.fmtIntSizeDec -> {B}
* std.fmt.fmtIntSizeBin -> {Bi}
* std.fmt.fmtDuration -> {D}
* std.fmt.fmtDurationSigned -> {D}
* {} -> {f} when there is a format method
* format method signature
  - anytype -> *std.io.Writer
  - inferred error set -> error{WriteFailed}
  - options -> (deleted)
* std.fmt.Formatted
  - now takes context type explicitly
  - no fmt string
2025-07-07 22:43:51 -07:00
Andrew Kelley 0b3f0124dc std.io: move getStdIn, getStdOut, getStdErr functions to fs.File
preparing to rearrange std.io namespace into an interface

how to upgrade:

std.io.getStdIn() -> std.fs.File.stdin()
std.io.getStdOut() -> std.fs.File.stdout()
std.io.getStdErr() -> std.fs.File.stderr()
2025-07-07 22:43:51 -07:00
Loris Cro 180e8442af zig init: simplify templating logic (#24170)
and also rename `advancedPrint` to `bufferedPrint` in the zig init templates

These are left overs from my previous changes to zig init.

The new templating system removes LITNAME because the new restrictions on package names make it redundant with NAME, and the use of underscores for marking templated identifiers lets us template variable names while still keeping zig fmt happy.
2025-06-13 22:31:29 +00:00
Loris Cro 041eedc1cf zig init: appease zig fmt check
last commit introduced a templated variable name that made zig fmt angry
2025-06-02 15:42:21 +02:00
Loris Cro 1116d88196 zig init: add new --strip flag and improve template files
This commit introduces a new flag to generate a new Zig project using
`zig init` without comments for users who are already familiar with the
Zig build system.

Additionally, the generated files are now different. Previously we would
generate a set of files that defined a static library and an executable,
which real-life experience has shown to cause confusion to newcomers.

The new template generates one Zig module and one executable both in
order to accommodate the two most common use cases, but also to suggest
that a library could use a CLI tool (e.g. a parser library could use a
CLI tool that provides syntax checking) and vice-versa a CLI tool might
want to expose its core functionality as a Zig module.

All references to C interoperability are removed from the template under
the assumption that if you're tall enough to do C interop, you're also
tall enough to find your way around the build system. Experienced users
will still be able to use the current template and adapt it with minimal
changes in order to perform more advanced operations. As an example, one
only needs to change `b.addExecutable` to `b.addLibrary` to switch from
generating an executable to a dynamic (or static) library.
2025-06-02 13:13:56 +02:00
Andrew Kelley 67904e925d zig init: adjust template lang to allow zig fmt passthrough 2025-02-26 11:42:04 -08:00
Andrew Kelley d6a88ed74d introduce package id and redo hash format again
Introduces the `id` field to `build.zig.zon`.

Together with name, this represents a globally unique package
identifier. This field should be initialized with a 16-bit random number
when the package is first created, and then *never change*. This allows
Zig to unambiguously detect when one package is an updated version of
another.

When forking a Zig project, this id should be regenerated with a new
random number if the upstream project is still maintained. Otherwise,
the fork is *hostile*, attempting to take control over the original
project's identity.

`0x0000` is invalid because it obviously means a random number wasn't
used.

`0xffff` is reserved to represent "naked" packages.

Tracking issue #14288

Additionally:

* Fix bad path in error messages regarding build.zig.zon file.
* Manifest validates that `name` and `version` field of build.zig.zon
  are maximum 32 bytes.
* Introduce error for root package to not switch to enum literal for
  name.
* Introduce error for root package to omit `id`.
* Update init template to generate `id`
* Update init template to populate `minimum_zig_version`.
* New package hash format changes:
  - name and version limited to 32 bytes via error rather than truncation
  - truncate sha256 to 192 bits rather than 40 bits
  - include the package id

This means that, given only the package hashes for a complete dependency
tree, it is possible to perform version selection and know the final
size on disk, without doing any fetching whatsoever. This prevents
wasted bandwidth since package versions not selected do not need to be
fetched.
2025-02-26 11:42:03 -08:00
Andrew Kelley d789f1e5cf fuzzer: write inputs to shared memory before running
breaking change to the fuzz testing API; it now passes a type-safe
context parameter to the fuzz function.

libfuzzer is reworked to select inputs from the entire corpus.

I tested that it's roughly as good as it was before in that it can find
the panics in the simple examples, as well as achieve decent coverage on
the tokenizer fuzz test.

however I think the next step here will be figuring out why so many
points of interest are missing from the tokenizer in both Debug and
ReleaseSafe modes.

does not quite close #20803 yet since there are some more important
things to be done, such as opening the previous corpus, continuing
fuzzing after finding bugs, storing the length of the inputs, etc.
2025-02-11 13:39:20 -08:00
mlugg afc77f0603 init template: expand slightly, migrate from deprecated std.Build APIs 2024-12-18 01:49:14 +05:00
Andrew Kelley 9dc75f03e2 fix init template for new fuzz testing API 2024-09-11 13:41:29 -07:00
Andrew Kelley cf9f8de661 update comment in init template
Unit tests are not run from the install step.

closes #21123
2024-08-18 14:23:49 -07:00
Karim Mk 3b3c9d2081 Fix typo in init files. 2024-07-26 14:33:59 -07:00
Andrew Kelley 5d3a1cfdf5 update init template
* add fuzz example
* explain that you might want to delete main.zig or root.zig
2024-07-26 12:18:23 -07:00
Andrew Kelley f645022d16 merge zig init-exe and zig init-lib into zig init
Instead of `zig init-lib` and `zig init-exe`, now there is only
`zig init`, which initializes any of the template files that do not
already exist, and makes a package that contains both an executable and
a static library. The idea is that the user can delete whatever they
don't want. In fact, I think even more things should be added to the
build.zig template.
2023-11-20 23:01:45 -07:00