After https://codeberg.org/ziglang/zig/pulls/30103, `raw_c_allocator` is
redundant. It existed to avoid overhead when you could assert that all
of your `Allocator` usage was going to be compatible with the C `malloc`
API, but the standard `c_allocator` is now able to avoid that overhead
*in the case* that your usage is compatible (and use the less efficient
path in the rare case where it's not), so there's no need for the raw
version anymore. Leaving it in `std.heap` at this point seems like it
would just be a footgun.
https://github.com/ziglang/zig/issues/26027#issuecomment-3571227050
tracked some bad performance in `DebugAllocator` on macOS down to a
function in dyld which `std.debug.SelfInfo` was calling into. It turns
out `dladdr`'s symbol lookup logic is horrendously slow (looking at its
source code, it appears to be doing a *linear scan* over all symbols in
the image?!). However, we don't actually need the symbol, so we want to
try and avoid this logic.
Luckily, dyld has more precise APIs for what we need! Unluckily, Apple,
in their infinite wisdom, decided they should be deprecated in favour of
`dladdr`, despite the latter being several times slower (and by "several
times", I have measured a 50x slowdown on repeated calls to `dladdr`
compared to the other API). But luckily again, the deprecated APIs are
still exposed.
So, after a careful analysis of the situation (reading dyld code and
cursing Apple engineers), I think it makes sense to just use these
deprecated APIs for now. If they ever go away, we can write our own
cache for this data to bypass Apple's awfully slow code, but I suspect
these functions will stick around for the foreseeable future.
Uh, and if `_dyld_get_image_header_containing_address` goes away,
there's also `dyld_image_header_containing_address`, which is a
seemingly identical function, exported by dyld just the same, but with a
separate (functionally identical) implementation, and not documented in
the public header file. Apple work in mysterious ways, I guess.
The main goal here was to avoid allocating padding and header space if
`malloc` already guarantees the alignment we need via `max_align_t`.
Previously, the compiler was using `std.heap.raw_c_allocator` as its GPA
in some cases depending on `std.c.max_align_t`, but that's pretty
fragile (it meant we had to encode our alignment requirements into
`src/main.zig`!). Perhaps more importantly, that solution is
unnecessarily restrictive: since Zig's `Allocator` API passes the
`Alignment` not only to `alloc`, but also to `free` etc, we are able to
use a different strategy depending on its value. So `c_allocator` can
simply compare the requested align to `Alignment.of(std.c.max_align_t)`,
and use a raw `malloc` call (no header needed!) if it will guarantee a
suitable alignment (which, in practice, will be true the vast majority
of the time).
So in short, this makes `std.heap.c_allocator` more memory efficient,
and probably removes any incentive to use `std.heap.raw_c_allocator`.
I also refactored the `c_allocator` implementation while doing this,
just to neaten things up a little.
If ECANCELED occurs, it's from pthread_cancel which will *permanently*
set that thread to be in a "canceling" state, which means the cancel
cannot be ignored. That means it cannot be retried, like EINTR. It must
be acknowledged.
Unfortunately, trying again until the cancellation request is
acknowledged has been observed to incur a large amount of overhead,
and usually strong cancellation guarantees are not needed, so the
race condition is not handled here. Users who want to avoid this
have this menu of options instead:
* Use no libc, in which case Zig std lib can avoid the race (tracking
issue: https://codeberg.org/ziglang/zig/issues/30049)
* Use musl libc
* Use `std.Io.Evented`. But this is not implemented yet. Tracked by
- https://codeberg.org/ziglang/zig/issues/30050
- https://codeberg.org/ziglang/zig/issues/30051
glibc + threaded is the only problematic combination.
On a heavily loaded Linux 6.17.5, I observed a maximum of 20 attempts
not acknowledged before the timeout (including exponential backoff) was
sufficient, despite the heavy load.
The time wasted here sleeping is mitigated by the fact that, later on,
the system will likely wait for the canceled task, causing it to
indefinitely yield until the canceled task finishes, and the task must
acknowledge the cancel before it proceeds to that point.
Now, before a syscall is entered, beginSyscall is called, which may
return error.Canceled. After syscall returns, whether error or success,
endSyscall is called. If the syscall returns EINTR then checkCancel is
called.
`cancelRequested` is removed from the std.Io VTable for now, with plans
to replace it with a more powerful API that allows protection against
cancellation requests.
closes#25751
The inverse MixColumns operation is already used internally for
AES decryption, but it wasn’t exposed in the public API because
it didn’t seem necessary at the time.
Since then, several new AES-based block ciphers and permutations
(such as Vistrutah and Areion) have been developed, and they require
this operation to be implementable in Zig.
Since then, new interesting AES-based block ciphers and permutations
(Vistrutah, Areion, etc). have been invented, and require that
operation to be implementable in Zig.
Rewrite `Reader.takeLeb128` to not use `takeMultipleOf7Leb128` and
instead:
* Use byte aligned integers
* Turn the main reading loop into an inlined loop of static length
* Special case small integers (<= 7 bits)
Notably signed and unsigned 32 bit integers have 5x to 12x(!)
performance improvement.
Outside of that:
For u8, u16 and u64 performance increases ~1.5x to ~6x
For i8, i16 and i64 performance increases ~1.5x to ~3.5x
For integers with bit multiples of 7 performance is roughly equal within the
margin or error.
Also expand on test coverage
Microbenchmark: https://zigbin.io/242cb1
Rewrite `writeLeb128` to no longer use `writeMultipleOf7Leb128` and instead:
* Make use of byte aligned ints
* Special case small numbers (fitting inside 7 bits)
Amongst u8, u16, u32 and u64 performance gains are between ~1.5x and ~2x
Amongst i8, i16, i32 ane i64 perfromance gains are between ~2x and >4x
Additinally add test coverage for written encodings
Microbenchmark: https://zigbin.io/7ed5fe
Hybrid KEMs combine a post-quantum secure KEM with a traditional
elliptic curve Diffie-Hellman key exchange.
The hybrid construction provides security against both classical and quantum
adversaries: even if one component is broken, the combined scheme remains
secure as long as the other component holds.
The implementation follows the IETF CFRG draft specification for concrete
hybrid KEMs:
https://datatracker.ietf.org/doc/draft-irtf-cfrg-concrete-hybrid-kems/