I had this idea to make b.dupe() also intern the strings since they will
be ultimately serialized to Configuration. Unfortunately the idea does
not work, because although a process-lived arena is used for the
string_bytes ArrayList of the Configuration.Wip, when the ArrayList is
resized, Allocator.free() memsets the freed memory to undefined, even
though it still technically lives due to being in a process-scoped
arena. So this commit will need to be partially reverted. However, I
kept it for posterity, and there are some more changes which I will now
note below.
- dupePaths: don't rewrite backslashes to forward slashes. backslashes
are valid in filenames on non-windows systems.
- always compile configurer in single-threaded mode
- use arena allocator for everything, no gpa for anything
- construct the Configuration.Wip instance earlier, so some stuff can be
prepopulated as desired.
- don't forget to flush
Number of generated files is recorded in serialized Configuration. Maker
preallocates array of generated files so that loads and stores can be
synchronization-free (protected by the dependency tree ordering).
More progress on Compile Step Zig CLI lowering.
next thing to do is figure out how LazyPath is supposed to work now.
something like this:
* each Step that provides LazyPath objects has a setLazyPath and
getLazyPath function which takes a tagged union identifying which one
to access
* steps that fulfill LazyPath objects can freely call setLazyPath
without obtaining a lock because the dependency graph prevents
simultaneous access.
* similarly, steps that access LazyPath results can freely call
getLazyPath without obtaining a lock, because after modification,
there may be simultaneous reads from dependencies but they will all be
read-only
* a fulfilled LazyPath object is a read-only std.Build.Cache.Path.
`zig build` CLI kicks off async task to compile optimized make runner
executable, does fetch, compiles configure process in debug mode, then
checks cache for the CLI options that affect configuration only. On hit,
skips building/running the configure script. On miss, runs it, saves
result in cache.
The cached artifact is a "configuration" file - a serialized build step
graph, which also includes unlazy package dependencies and additional
file system dependencies.
Next, awaits task for compiling optimized make runner executable, passes
configuration file to it. Make runner is responsible for the CLI after
that point.
For the use case of detecting when `git describe` needs to be rerun, we
can allow the configure process to manually add a file system mtime
dependencies, in this case it would be on `.git/index` and `.git/HEAD`.
This will enable two optimizations:
1. The bulk of the build system will not be rebuilt when user changes
their configure script.
2. The user logic can be completely bypassed when the CLI options
provided do not affect the configure phase - even if they affect the
make phase.
Remaining tasks in the branch:
* some stuff in `zig build` CLI is `@panic("TODO")`.
* configure runner needs to implement serialization of build graph using
std.zig.Configuration
* build runner needs to be transformed into make runner, consuming
configuration file as input and deserializing the step graph.
* introduce depending only on a file's metadata and *not* its contents
into the cache system, and add a std.Build API for using it.
Implements a thread-safe allocator with the following guarantees:
* `deinit` reports all leaks and frees all backing memory.
* All allocation mismatches result in either a panic or segmentation
fault.
* Allocations from other `SafeAllocator` instances cause a panic (if
`Options.canary` differ).
* Double frees and operation (resize, remap, and free) races panic or
segmentation fault.
Given the backing allocator does not reuse memory, it does not reuse
memory either and
* Most writes after free will segmentation fault or are eventually
detected and panic.
`std.heap.DebugAllocator` has been deprecated (I have also deprecated
`std.heap.Check` since this was its last usage and returning a `usize`
leak count is a much cleaner approach).
- General Design
Every allocation is trailed by an `AllocFooter` which contains metadata
for the allocation and stack traces. It is protected by a checksum to
catch corruption from allocation overwrites and report canary
mismatches. An allocation's memory has a minimum alignment of
`AllocFooter` so that the footer is at a fixed offset determined from
the allocation size. An allocation's memory is stored either:
* Inside linearly-filled buckets for small allocations.
* Inside an allocation directly from the backing allocator.
To track allocations, each thread maintains a table of backing
allocations. The table may be modified by other threads in the case of
a producer-consumer operation, so the table is a linked list only
expanded by creating new segments. Each thread maintains a linked list
of free entries, which may contain entries from other threads' tables.
In the case of producer-consumer operations, acquire/release ordering
is assumed to be provided externally. This is also assumed by all other
thread-safe allocators that reuse memory as otherwise there would be
data races on reuse of allocated memory.
- Fuzz Tests
Two fuzz tests have also been added for the allocator. They check that
there is no memory reuse, that returned memory is writable, and that
it is not overwritten. The multi-threaded fuzz test spawns a number of
worker threads which are used for all the test runs. I have run these
tests extensively under TSAN.
- Performance Measurements
Building the standard library tests with a RelaseSafe compiler build
and `-Ddebug-allocator`:
```
Benchmark 1 (3 runs): ./master-out/bin/zig test --zig-lib-dir lib lib/std/std.zig -femit-bin=test --test-no-exec
measurement mean ± σ min … max outliers delta
wall_time 29.4s ± 157ms 29.2s … 29.5s 0 ( 0%) 0%
peak_rss 2.24GB ± 3.49MB 2.23GB … 2.24GB 0 ( 0%) 0%
cpu_cycles 143G ± 999M 142G … 144G 0 ( 0%) 0%
instructions 268G ± 5.22M 268G … 268G 0 ( 0%) 0%
cache_references 13.1G ± 88.8M 13.0G … 13.2G 0 ( 0%) 0%
cache_misses 2.38G ± 30.7M 2.35G … 2.41G 0 ( 0%) 0%
branch_misses 634M ± 6.22M 629M … 641M 0 ( 0%) 0%
Benchmark 2 (3 runs): ./branch-out/bin/zig test --zig-lib-dir lib lib/std/std.zig -femit-bin=test --test-no-exec
measurement mean ± σ min … max outliers delta
wall_time 22.1s ± 88.6ms 22.0s … 22.2s 0 ( 0%) ⚡- 24.7% ± 1.0%
peak_rss 1.11GB ± 799KB 1.11GB … 1.11GB 0 ( 0%) ⚡- 50.3% ± 0.3%
cpu_cycles 136G ± 480M 136G … 137G 0 ( 0%) ⚡- 4.4% ± 1.2%
instructions 273G ± 2.07M 273G … 273G 0 ( 0%) 💩+ 1.6% ± 0.0%
cache_references 12.3G ± 71.3M 12.2G … 12.4G 0 ( 0%) ⚡- 6.0% ± 1.4%
cache_misses 2.02G ± 11.5M 2.01G … 2.03G 0 ( 0%) ⚡- 14.9% ± 2.2%
branch_misses 569M ± 2.65M 567M … 572M 0 ( 0%) ⚡- 10.2% ± 1.7%
```