79 Commits

Author SHA1 Message Date
Matthew Lugg fdac89d6cd remove uses of array multiplication
In preparation for its removal as accepted in
https://github.com/ziglang/zig/issues/24738.
2026-04-30 08:57:51 +01:00
Ryan Liptak 3252a05531 Prefer <err> => |e| return e over <err> => return <err>
Avoids the potential for a typo on the `return <err>` side of the prong
2026-04-20 18:03:14 -07:00
Kendall Condon d8ba173e5e multiprocess fuzzing
- New Features

-- Multiprocess Fuzzing

The fuzzer now is able to utilize multiple cores. This is controllable
with the `-j` build option. Limited fuzzing still uses one core.

-- Fuzzing Infinite Mode

When provided multiple tests, the fuzzer now switches between them and
prioritizes the most effective and interesting ones. Over time already
explored tests will become barely run compared to tests yielding new
inputs.

-- Crash Dumps

Crashing inputs are now saved to a file indicated by the crash message.
It is recommended to use these files to reproduce the crash using
`std.testing.FuzzInputOptions.corpus` and @embedFile.

- Design

Each fuzzing process is assigned an instance id which has the following
uses:
* In conjunction with the pc hash and running test index, they uniquely
  identify input files in the case of a crash.
* It is combined with the test seed for a unique rng seed.
* Instance 0 is solely responsible for syncing the filesystem corpus.

When new inputs are found, they are sent to the build server. It then
distributes the new input to the other instances. Each instance has a
concurrent poller managed by the test runner which sends received
inputs to libfuzzer. (note that this is affected by #31718 and so can
(rarely) deadlock)

For fuzzing infinite mode, the test runner now receives a list of tests
from the build server. The fuzzer runs tests in batches of one second,
approximated in cycles by the previous batch's run speed. Tests finding
new inputs or with few runs are given a higher run chance. The baseline
run chance is based off the recency of the last find and the number of
pcs the test has hit.
2026-04-03 12:27:34 +02:00
Kendall Condon 6e5a95bd7c implement proper deflate flush semantics
To end a flate stream, `finish` must now be called. `flush` now follows
regular semantics and byte-aligns the stream.

Byte-aligning the stream is done with empty fixed or store blocks. To
implement flush, a variable history length was added and it is tracked
if the final bytes of history have been hashed yet.
2026-03-11 02:28:19 +01:00
Andrew Kelley e0173c2ce0 Merge pull request 'rework fuzz testing to be smith based' (#31205) from gooncreeper/zig:integrated-smith into master
Reviewed-on: https://codeberg.org/ziglang/zig/pulls/31205
Reviewed-by: Andrew Kelley <andrew@ziglang.org>
2026-02-25 20:23:36 +01:00
Kendall Condon bb304796f4 optimize flate decompression
Matches now use memcpy and memset when possible.

Block loops have been rewritten to be more optimizer friendly.

Reworks Symbol and HuffmanDecoder
* Symbol now only includes the value and number of code bits.
  decodeSymbol returns only the value.
* HuffmanDecoder now takes the regular bits instead of the reversed.
* Code table construction now uses buckets instead of sorting.
* For linked codes, the value field of Symbol is now used as the next
  index. The actual value is the element index.
* InvalidCode is now detected only once with a special linked index.

Performance is 39.7% faster than before and 1.1% faster than gzip using
a sample created from compressing a tar of the src directory.
2026-02-25 20:05:48 +01:00
Kendall Condon 5d58306162 rework fuzz testing to be smith based
-- On the standard library side:

The `input: []const u8` parameter of functions passed to `testing.fuzz`
has changed to `smith: *testing.Smith`. `Smith` is used to generate
values from libfuzzer or input bytes generated by libfuzzer.

`Smith` contains the following base methods:
* `value` as a generic method for generating any type
* `eos` for generating end-of-stream markers. Provides the additional
  guarantee `true` will eventually by provided.
* `bytes` for filling a byte array.
* `slice` for filling part of a buffer and providing the length.

`Smith.Weight` is used for giving value ranges a higher probability of
being selected. By default, every value has a weight of zero (i.e. they
will not be selected). Weights can only apply to values that fit within
a u64. The above functions have corresponding ones that accept weights.
Additionally, the following functions are provided:
* `baselineWeights` which provides a set of weights containing every
  possible value of a type.
* `eosSimpleWeighted` for unique weights for `true` and `false`
* `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range
  of values.

-- On the libfuzzer and abi side:

--- Uids

These are u32s which are used to classify requested values. This solves
the problem of a mutation causing a new value to be requested and
shifting all future values; for example:

1. An initial input contains the values 1, 2, 3 which are interpreted
as a, b, and c respectively by the test.

2. The 1 is mutated to a 4 which causes the test to request an extra
value interpreted as d. The input is now 4, 2, 3, 5 (new value) which
the test corresponds to a, d, b, c; however, b and c no longer
correspond to their original values.

Uids contain a hash component and type component. The hash component
is currently determined in `Smith` by taking a hash of the calling
`@returnAddress()` or via an argument in the corresponding `WithHash`
functions. The type component is used extensively in libfuzzer with its
hashmaps.

--- Mutations

At the start of a cycle (a run), a random number of values to mutate is
selected with less being exponentially more likely. The indexes of the
values are selected from a selected uid with a logarithmic bias to uids
with more values.

Mutations may change a single values, several consecutive values in a
uid, or several consecutive values in the uid-independent order they
were requested. They may generate random values, mutate from previous
ones, or copy from other values in the same uid from the same input or
spliced from another.

For integers, mutations from previous ones currently only generates
random values. For bytes, mutations from previous mix new random data
and previous bytes with a set number of mutations.

--- Passive Minimization

A different approach has been taken for minimizing inputs: instead of
trying a fixed set of mutations when a fresh input is found, the input
is instead simply added to the corpus and removed when it is no longer
valuable.

The quality of an input is measured based off how many unique pcs it
hit and how many values it needed from the fuzzer. It is tracked which
inputs hold the best qualities for each pc for hitting the minimum and
maximum unique pcs while needing the least values.

Once all an input's qualities have been superseded for the pcs it hit,
it is removed from the corpus.

-- Comparison to byte-based smith

A byte-based smith would be much more inefficient and complex than this
solution. It would be unable to solve the shifting problem that Uids
do. It is unable to provide values from the fuzzer past end-of-stream.
Even with feedback, it would be unable to act on dynamic weights which
have proven essential with the updated tests (e.g. to constrain values
to a range).

-- Test updates

All the standard library tests have been updated to use the new smith
interface. For `Deque`, an ad hoc allocator was written to improve
performance and remove reliance on heap allocation. `TokenSmith` has
been added to aid in testing Ast and help inform decisions on the smith
interface.
2026-02-13 22:12:19 -05:00
Andrew Kelley ee21a1f988 fetch: implement recompression
After fetching a package and applying the filter by deleting files that
are not part of the hash, creates a recompressed $GLOBAL_CACHE/p/$PKG_HASH.tar.gz

Checking this cache before fetching network URLs is not yet implemented.
2026-02-05 16:50:41 -08:00
Adrià Arrufat 02c5f05e2f std: replace usages of std.mem.indexOf with std.mem.find 2025-12-05 14:31:27 +01:00
Kendall Condon 8284da2f3d flate.Compress: simplify huffman node comparisons
Instead of comparing each field, nodes are now compared as 32-bit
values where `freq` is in the most significant bits.
2025-11-22 22:11:33 -08:00
Kendall Condon f50c647977 add deflate compression, simplify decompression
Implements deflate compression from scratch. A history window is kept in
the writer's buffer for matching and a chained hash table is used to
find matches. Tokens are accumulated until a threshold is reached and
then outputted as a block. Flush is used to indicate end of stream.

Additionally, two other deflate writers are provided:
* `Raw` writes only in store blocks (the uncompressed bytes). It
  utilizes data vectors to efficiently send block headers and data.
* `Huffman` only performs Huffman compression on data and no matching.

The above are also able to take advantage of writer semantics since they
do not need to keep a history.

Literal and distance code parameters in `token` have also been reworked.
Their parameters are now derived mathematically, however the more
expensive ones are still obtained through a lookup table (expect on
ReleaseSmall).

Decompression bit reading has been greatly simplified, taking advantage
of the ability to peek on the underlying reader. Additionally, a few
bugs with limit handling have been fixed.
2025-09-30 18:28:47 -07:00
Andrew Kelley 79f267f6b9 std.Io: delete GenericReader
and delete deprecated alias std.io
2025-08-29 17:14:26 -07:00
Andrew Kelley 30b41dc510 std.compress.zstd.Decompress fixes
* std.Io.Reader: appendRemaining no longer supports alignment and has
  different rules about how exceeding limit. Fixed bug where it would
  return success instead of error.StreamTooLong like it was supposed to.

* std.Io.Reader: simplify appendRemaining and appendRemainingUnlimited
  to be implemented based on std.Io.Writer.Allocating

* std.Io.Writer: introduce unreachableRebase

* std.Io.Writer: remove minimum_unused_capacity from Allocating. maybe
  that flexibility could have been handy, but let's see if anyone
  actually needs it. The field is redundant with the superlinear growth
  of ArrayList capacity.

* std.Io.Writer: growingRebase also ensures total capacity on the
  preserve parameter, making it no longer necessary to do
  ensureTotalCapacity at the usage site of decompression streams.

* std.compress.flate.Decompress: fix rebase not taking into account seek

* std.compress.zstd.Decompress: split into "direct" and "indirect" usage
  patterns depending on whether a buffer is provided to init, matching
  how flate works. Remove some overzealous asserts that prevented buffer
  expansion from within rebase implementation.

* std.zig: fix readSourceFileToAlloc returning an overaligned slice
  which was difficult to free correctly.

fixes #24608
2025-08-15 10:44:35 -07:00
Andrew Kelley af7e142485 std.Io.Writer: introduce rebase to the vtable
fixes #24814
2025-08-14 12:56:37 -07:00
Isaac Freund b8124d9c0b std.io.Writer.Allocating: rename getWritten() to written()
This "get" is useless noise and was copied from FixedBufferWriter.
Since this API has not yet landed in a release, now is a good time
to make the breaking change to fix this.
2025-08-13 01:43:52 -07:00
Andrew Kelley df46ee61c4 std.Io.Writer.Allocating: configurable bump amount 2025-08-08 19:22:08 -07:00
Andrew Kelley 91a81d3846 std.compress.flate.Decompress: fix buffer size in test 2025-08-08 15:03:05 -07:00
Ryan Liptak 23fff3442d flate: Handle invalid block type
Fixes `panic: invalid enum value` when the type bits had the u2 value of 3.

Contributes towards #24741
2025-08-08 12:27:25 -07:00
Igor Anić 6de2310035 flate change bit reader Bits to usize (#24719)
Don't see why byte returned from specialPeek needs to be shifted by
remaining_needed_bits.
I believe that decision in specialPeek should be done on the number of
the remaining bits not of the content of that bits.

Some test result are changed, but they are now consistent with the
original state as found in:
https://github.com/ziglang/zig/blame/5f790464b0d5da3c4c1a7252643e7cdd4c4b605e/lib/std/compress/flate/Decompress.zig

Changing Bits from usize to u32 or u64 now returns same results.

* flate: simplify peekBitsEnding

`peekBits` returns at most asked number of bits. Fails with EndOfStream
when there are no available bits. If there are less bits available than
asked still returns that available bits.
Hopefully this change better reflects intention. On first input stream
peek error we break the loop.
2025-08-07 14:40:08 -07:00
Igor Anić d2149106a6 flate zlib fix end of block reading
`n` is wanted number of bits to toss
`buffered_n` is actual number of bytes in `next_int`
2025-08-05 17:09:41 -07:00
Ian Johnson 96be6f6566 std.compress.flate.Decompress: return correct size for unbuffered decompression
Closes #24686

As a bonus, this commit also makes the `git.zig` "testing `main`" compile again.
2025-08-04 19:32:14 -07:00
Andrew Kelley a6f7927764 std.compress.flate.Decompress: use 64 buffered bits
will have to find out why usize doesn't work for 32 bit targets some
other time
2025-08-01 09:04:27 -07:00
Andrew Kelley 64814dc986 std.compress.flate.Decompress: respect stream limit 2025-07-31 22:10:11 -07:00
Andrew Kelley a7808892f7 std.compress.flate.Decompress: be in indirect or direct mode
depending on whether buffered
2025-07-31 22:10:11 -07:00
Andrew Kelley 6eac56caf7 std.compress.flate.Decompress: allow users to swap out Writer 2025-07-31 22:10:11 -07:00
Andrew Kelley c9ff068391 std.compress: fix discard impl and flate error detection 2025-07-31 22:10:11 -07:00
Andrew Kelley 111305678c std: match readVec fn prototype exactly
this is not necessary according to zig language, but works around a flaw
in the C backend
2025-07-31 22:10:11 -07:00
Andrew Kelley 84e4343b0c fix test failures by adding readVec 2025-07-31 22:10:11 -07:00
Andrew Kelley afe9f3a9ec std.compress.flate.Decompress: implement readVec and discard 2025-07-31 22:10:11 -07:00
Andrew Kelley 6bcced31a0 fix 32-bit compilation 2025-07-31 22:10:11 -07:00
Andrew Kelley 42b10f08cc std.compress.flate.Decompress: delete bad unit tests
if I remove the last input byte from "don't read past deflate stream's
end" (on master branch), the test fails with error.EndOfStream. what,
then, is it supposed to be testing?
2025-07-31 22:10:11 -07:00
Andrew Kelley 5f790464b0 std.compress.flate.Decompress: hashing is out of scope
This API provides the data; applications can verify their own checksums.
2025-07-31 22:10:11 -07:00
Andrew Kelley 4741a16d9a putting stuff back does not require mutation 2025-07-31 22:10:11 -07:00
Andrew Kelley 05ce1f99a6 compiler: update to new flate API 2025-07-31 22:10:11 -07:00
Andrew Kelley 2569f4ff85 simplify tossBitsEnding 2025-07-31 22:10:11 -07:00
Andrew Kelley 5bc63794cc fix takeBitsEnding 2025-07-31 22:10:11 -07:00
Andrew Kelley 63f496c4f9 make takeBits deal with integers only 2025-07-31 22:10:11 -07:00
Andrew Kelley c00fb86db6 fix peekBitsEnding 2025-07-31 22:10:11 -07:00
Andrew Kelley 8ab91a6fe9 error.EndOfStream disambiguation 2025-07-31 22:10:11 -07:00
Andrew Kelley f644f40702 implement tossBitsEnding 2025-07-31 22:10:11 -07:00
Andrew Kelley 2d8d0dd9b0 std.compress.flate.Decompress: unfuck the test suite 2025-07-31 22:10:11 -07:00
Andrew Kelley c684b21b4f simplify test cases 2025-07-31 22:10:11 -07:00
Andrew Kelley ac4fbb427b std.compress.flate.Decompress: don't compute checksums
These have no business being in-bound; simply provide the expected
values to user code for maximum flexibility.
2025-07-31 22:10:11 -07:00
Andrew Kelley 5f571f53d6 refactor gzip test cases
zig newbies love using for loops in unit tests
2025-07-31 22:10:11 -07:00
Andrew Kelley e73ca2444e std.compress.flate.Decompress: implement peekBitsEnding and writeMatch 2025-07-31 22:10:11 -07:00
Andrew Kelley 7bf91d705c fix bit read not at eof 2025-07-31 22:10:11 -07:00
Andrew Kelley 73e5594c78 std.compress.flate.Decompress: fix bit read at eof 2025-07-31 22:10:11 -07:00
Andrew Kelley 9c8cb777d4 std.compress.flate.Decompress: implement more bit reading 2025-07-31 22:10:11 -07:00
Andrew Kelley 6509fa1cf3 std.compress.flate.Decompress: passing basic test case 2025-07-31 22:10:11 -07:00
Andrew Kelley 88ca750209 std.compress.flate.Decompress: add rebase impl 2025-07-31 22:10:11 -07:00