After https://codeberg.org/ziglang/zig/pulls/30103, `raw_c_allocator` is
redundant. It existed to avoid overhead when you could assert that all
of your `Allocator` usage was going to be compatible with the C `malloc`
API, but the standard `c_allocator` is now able to avoid that overhead
*in the case* that your usage is compatible (and use the less efficient
path in the rare case where it's not), so there's no need for the raw
version anymore. Leaving it in `std.heap` at this point seems like it
would just be a footgun.
This was presumably unintentional, as the old behavior looked quite odd
when you noticed it. All of the line-drawing characters ought to be the
same color; only the step name/status is dimmed when a step is "reused"
(i.e. appears multiple times in the tree).
https://github.com/ziglang/zig/issues/26027#issuecomment-3571227050
tracked some bad performance in `DebugAllocator` on macOS down to a
function in dyld which `std.debug.SelfInfo` was calling into. It turns
out `dladdr`'s symbol lookup logic is horrendously slow (looking at its
source code, it appears to be doing a *linear scan* over all symbols in
the image?!). However, we don't actually need the symbol, so we want to
try and avoid this logic.
Luckily, dyld has more precise APIs for what we need! Unluckily, Apple,
in their infinite wisdom, decided they should be deprecated in favour of
`dladdr`, despite the latter being several times slower (and by "several
times", I have measured a 50x slowdown on repeated calls to `dladdr`
compared to the other API). But luckily again, the deprecated APIs are
still exposed.
So, after a careful analysis of the situation (reading dyld code and
cursing Apple engineers), I think it makes sense to just use these
deprecated APIs for now. If they ever go away, we can write our own
cache for this data to bypass Apple's awfully slow code, but I suspect
these functions will stick around for the foreseeable future.
Uh, and if `_dyld_get_image_header_containing_address` goes away,
there's also `dyld_image_header_containing_address`, which is a
seemingly identical function, exported by dyld just the same, but with a
separate (functionally identical) implementation, and not documented in
the public header file. Apple work in mysterious ways, I guess.
The main goal here was to avoid allocating padding and header space if
`malloc` already guarantees the alignment we need via `max_align_t`.
Previously, the compiler was using `std.heap.raw_c_allocator` as its GPA
in some cases depending on `std.c.max_align_t`, but that's pretty
fragile (it meant we had to encode our alignment requirements into
`src/main.zig`!). Perhaps more importantly, that solution is
unnecessarily restrictive: since Zig's `Allocator` API passes the
`Alignment` not only to `alloc`, but also to `free` etc, we are able to
use a different strategy depending on its value. So `c_allocator` can
simply compare the requested align to `Alignment.of(std.c.max_align_t)`,
and use a raw `malloc` call (no header needed!) if it will guarantee a
suitable alignment (which, in practice, will be true the vast majority
of the time).
So in short, this makes `std.heap.c_allocator` more memory efficient,
and probably removes any incentive to use `std.heap.raw_c_allocator`.
I also refactored the `c_allocator` implementation while doing this,
just to neaten things up a little.
This avoids pessimizing concurrency on all machines due to e.g. the macOS
machine having high memory usage across the board due to 16K page size.
This also adds max_rss to test-unit and test-c-abi since those tend to eat a
decent chunk of memory too.
If ECANCELED occurs, it's from pthread_cancel which will *permanently*
set that thread to be in a "canceling" state, which means the cancel
cannot be ignored. That means it cannot be retried, like EINTR. It must
be acknowledged.
Unfortunately, trying again until the cancellation request is
acknowledged has been observed to incur a large amount of overhead,
and usually strong cancellation guarantees are not needed, so the
race condition is not handled here. Users who want to avoid this
have this menu of options instead:
* Use no libc, in which case Zig std lib can avoid the race (tracking
issue: https://codeberg.org/ziglang/zig/issues/30049)
* Use musl libc
* Use `std.Io.Evented`. But this is not implemented yet. Tracked by
- https://codeberg.org/ziglang/zig/issues/30050
- https://codeberg.org/ziglang/zig/issues/30051
glibc + threaded is the only problematic combination.
On a heavily loaded Linux 6.17.5, I observed a maximum of 20 attempts
not acknowledged before the timeout (including exponential backoff) was
sufficient, despite the heavy load.
The time wasted here sleeping is mitigated by the fact that, later on,
the system will likely wait for the canceled task, causing it to
indefinitely yield until the canceled task finishes, and the task must
acknowledge the cancel before it proceeds to that point.