Commit Graph

260 Commits

Author SHA1 Message Date
Mason Remaley e2c3920fb1 Renames buffer first allocator in compiler and std 2026-04-18 14:51:49 -07:00
Mason Remaley 6d40d374d8 Merges together the two buffer first allocator implementations 2026-04-18 14:51:49 -07:00
jmcaine 73ecc6333f std: implement heap.StackFirstAllocator
second attempt
2026-04-18 14:51:25 -07:00
Mason Remaley c2cbb944ba Further improvements to stack trace type 2026-04-12 04:01:29 -07:00
Mason Remaley 6bf583c4ba Further separation of stack trace and error return trace 2026-04-12 04:01:29 -07:00
Mason Remaley 94ff38af87 Separates error return traces from stack traces
Doesn't commit the changes to stage1, we can generate those at the end
once we're not making any more changes to it to avoid wasting storage.
2026-04-12 04:01:29 -07:00
Mason Remaley 156f54d8f0 Adds includes_inlined_frames option to builtin.StackTrace
This will be relevant once #31605 is merged.

In general, stack traces do *not* contain unique addresses for inlined
frames, but for error return traces, they will after the above PR. This
bool indicates that code printing the trace should not try to resolve
inline frames since they're explicitly encoded into the instruction
addresses.

This is set as state on stack trace rather than passed into the
formatting methods as an argument, as it's not really a formatting
option--whether or not it's correct to resolve inlines is decided at the
time of capture!
2026-04-12 04:01:29 -07:00
Justus Klausecker ce3f254526 std.heap.ArenaAllocator: do not cmpxchg in hot path when it would be a noop
The cmpxchg is there to recover alignment padding that isn't needed (which
can only be determined after the fetch-and-add that reserves it as allocated
memory). As cmpxchg tends to be a very expensive operation, it is actually
faster to introduce an additional branch here that checks if the cmpxchg
would be a noop (because all of the reserved alignment padding was in fact
necessary) and skips it if that's the case.

This does not measurably regress performance if the arena is only accessed
by a single thread and yields slight performance benefits for multi-threaded
usage. If the arena is commonly used for unaligned allocations, the perf
benefits are quite significant.

Co-authored-by: Jacob Young <amazingjacob@gmail.com>
2026-04-02 23:00:26 +02:00
Andrew Kelley e9df86aed0 Merge pull request 'std.heap.ArenaAllocator: decrease fuzz test workload per run' (#31596) from justusk/zig:fuzz-arena-2 into master
Reviewed-on: https://codeberg.org/ziglang/zig/pulls/31596
Reviewed-by: Andrew Kelley <andrew@ziglang.org>
2026-04-02 15:57:41 +02:00
Justus Klausecker 5363a81a57 std.heap.FixedBufferAllocator: fix end_index memory ordering
This prevents a race between `alloc` and `free` where T1 receives memory
from `alloc` that is semantically about to be freed by T2 and still being
accessed, but the `free` is already visible to T1. Using acquire-release
here guarantees that any `free` is only published after all accesses to
the memory being freed have already happened.

Co-authored-by: Jacob Young <amazingjacob@gmail.com>
2026-03-25 11:48:45 +01:00
Justus Klausecker 3af5f81e11 std.heap.ArenaAllocator: fix end_index memory ordering
This prevents a race between `alloc` and `free` where T1 receives memory
from `alloc` that is semantically about to be freed by T2 and still being
accessed, but the `free` is already visible to T1. Using acquire-release
here guarantees that any `free` is only published after all accesses to
the memory being freed have already happened.

Co-authored-by: Jacob Young <amazingjacob@gmail.com>
2026-03-25 11:48:43 +01:00
Justus Klausecker 9bfe827ade Revert "std.heap.ArenaAllocator: Make resize and free check whether allocation is within current node more rigorously"
This reverts commit 589bcb2544.

The scenario presented in the reverted commit cannot actually happen.
Even if there are two contiguous arena nodes N1 and N2 and the `end_index`
of N1 points to somewhere in N2, a `resize` can never lead to an increase
of the `end_index` of N1 since it checks whether it's `<= size` first.
A `resize`/`free` *can* decrease `end_index`, but even if it is wrongly
assumed that some allocation that belongs to N2 actually belongs to N1
based on the `end_index` of N1, it can only ever be decreased to the start
of the buffer of N2. That's because a valid allocation of N2 logically
cannot be at any lower address than N2 itself. And any point still in N2
can never also be in N1, so there's no danger of overwriting any other
allocations of N1.
2026-03-25 11:20:21 +01:00
Justus Klausecker 589bcb2544 std.heap.ArenaAllocator: Make resize and free check whether allocation is within current node more rigorously
This prevents the following scenario where an allocation is wrongly assumed
to be part of the current head node (`node0`):

```
| node0 - - - - | node1 - - - - - - - - - - - - |
          |   |         |   |           |
          |   |         |   end_index0  end_index1
          |   |         |   |
          alloc0        alloc1

free(alloc1):
    load node0
    buf0.ptr + end_index0 == alloc1.ptr + alloc1.len ? yes!
    end_index0 -= alloc1.len

| node0 - - - - | node1 - - - - - - - - - - - |
          | | |                         |
          | end_index0                  end_index1
          |   |
          alloc0
```

which could move `end_index0` *into* `alloc0` and make it possible for any
subsequent calls to `alloc` to overwrite its contents!
2026-03-25 00:54:44 +01:00
Justus Klausecker d78f096c49 zig fmt 2026-03-20 18:09:01 +01:00
Justus Klausecker 591bc39e57 std.heap.ArenaAllocator: decrease fuzz test workload per run
At smaller workloads the overhead of setting up a new `std.Io.Threaded`
for every run to reset thread-local state becomes more noticeable, so this
commit also switches from thread-local storage to a shared atomic variable
for keeping track of the most recent allocation. This has the side-effect
of simplifying the overall implementation a bit.
2026-03-20 17:32:17 +01:00
Justus Klausecker 4d6bef538e std.heap.ArenaAllocator: relax memory ordering for stealing free list
We only need acquire instead of acq_rel here, since we're always swapping
in `null` there's no node whose content we'd need to release.
2026-03-12 21:02:43 +01:00
Justus Klausecker 7649868663 std.heap.ArenaAllocator/std.heap.FixedBufferAllocator: make shrinking always succeed
Shrinking allocations should always succeed with these allocators, even
if the allocation in question is the most recent one and `resize` didn't
manage to decrement the end index of its buffer successfully.
2026-03-06 13:08:37 +01:00
Justus Klausecker 2ba8c94df6 std.heap.ArenaAllocator: add fuzz test
The fuzz test consists of a planning phase where the fuzzing smith is used
to generate a list of actions to be executed and an execution phase where
the actions are all executed by multiple threads at the same time. Each
action is only executed exactly once and is performed on an `ArenaAllocator`
and on a `FixedBufferAllocator` (for reference). The arena is backed by a
special allocator that purposely introduces spurious allocation failures.
After all actions are executed, the contents of all allocation pairs are
compared to each other.
2026-03-06 10:09:06 +01:00
Justus Klausecker 0e348d415f std.heap.ArenaAllocator: clean up some yucky bits
and add a bunch of asserts. No functional changes.
2026-03-06 10:09:06 +01:00
Justus Klausecker 7b9865b046 std.heap.FixedBufferAllocator: complete thread-safe implementation
`FixedBufferAllocator.threadSafeAllocator()` already provided a thread-safe
`alloc` implementation, but all other functions were nops. This commit
implements the remaining `Allocator` functions and tightens up the memory
orderings in `alloc` a bit, `monotonic` is good enough here.
2026-03-06 10:09:05 +01:00
Justus Klausecker 46c72ed970 std.heap.ArenaAllocator: do not retry failed CAS in resize/free
If we use `@cmpxchgStrong` instead of `@cmpxchgWeak` to adjust the `end_index`
in `resize` and `free`, the only reason the CAS can fail is that another
thread has changed `end_index` in the meantime. If that's happened, the
allocation we were trying to resize/free isn't the most recent allocation
anymore and there's no point in retrying, so we can get rid of the loop.
2026-03-06 10:09:05 +01:00
Justus Klausecker f09386cce9 std.heap.ArenaAllocator: optimize aligned index calculation
The `alignedIndex` function is very hot (literally every single `alloc`
call invokes it at least once) and `std.mem.alignPointerOffset` seems to
be very slow, so this commit replaces this functions with a custom
implementation that doesn't do any unnecessary validation and doesn't have
any branches as a result of that. The validation `std.mem.alignPointerOffset`
does isn't necessary anyway, we're not actually calculating an offset that
we plan to apply to a pointer directly, but an offset into a valid buffer
that we only apply to a pointer if the result is inside of that buffer.

This leads to a ~4% speedup in a synthetic benchmark that just puts a lot
of concurrent load on an `ArenaAllocator`.
2026-03-06 10:09:05 +01:00
Justus Klausecker bbc77df3eb std.heap: delete ThreadSafeAllocator
We can keep ourselves safe from those threads perfectly well without you, thanks!
2026-02-26 21:20:34 +01:00
Justus Klausecker de41123957 std.heap.ArenaAllocator: fix reset creating undersized nodes
Previously resetting with `retain_capacity < @sizeOf(Node)` would create
an invalid node. This is now fixed, plus `Node.size` now has its own `Size`
type that provides additional safety via assertions to prevent bugs like
this in the future.
2026-02-26 15:40:48 +01:00
Justus Klausecker 2fa2300ba4 std.heap.ArenaAllocator: Get rid of cmpxchg loop in hot path
This is achieved by bumping `end_index` by a large enough amount so that
a suitably aligned region of memory can always be provided. The potential
wasted space this creates is then recovered by a single cmpxchg. This is
always successful for single-threaded arenas which means that this version
still behaves exactly the same as the old single-threaded implementation
when only being accessed by one thread at a time. It can however fail when
another thread bumps `end_index` in the meantime. The observerd failure
rates under extreme load are:

2 Threads: 4-5%
3 Threads: 13-15%
4 Threads: 15-17%
5 Threads: 17-18%
6 Threads: 19-20%
7 Threads: 18-21%

This version offers ~25% faster performance under extreme load from 7 threads,
with diminishing speedups for less threads. The performance for 1 and 2
threads is nearly identical.
2026-02-26 15:30:55 +01:00
Justus Klausecker a3a9dc111d std.heap.ArenaAllocator: make it threadsafe
Modifies the `Allocator` implementation provided by `ArenaAllocator` to be
threadsafe using only atomics and no synchronization primitives locked
behind an `Io` implementation.

At its core this is a lock-free singly linked list which uses CAS loops to
exchange the head node. A nice property of `ArenaAllocator` is that the
only functions that can ever remove nodes from its linked list are `reset`
and `deinit`, both of which are not part of the `Allocator` interface and
thus aren't threadsafe, so node-related ABA problems are impossible.

There *are* some trade-offs: end index tracking is now per node instead of
per allocator instance. It's not possible to publish a head node and its
end index at the same time if the latter isn't part of the former.

Another compromise had to be made in regards to resizing existing nodes.
Annoyingly, `rawResize` of an arbitrary thread-safe child allocator can
of course never be guaranteed to be an atomic operation, so only one
`alloc` call can ever resize at the same time, other threads have to
consider any resizes they attempt during that time failed. This causes
slightly less optimal behavior than what could be achieved with a mutex.
The LSB of `Node.size` is used to signal that a node is being resized.
This means that all nodes have to have an even size.

Calls to `alloc` have to allocate new nodes optimistically as they can
only know whether any CAS on a head node will succeed after attempting it,
and to attempt the CAS they of course already need to know the address of
the freshly allocated node they are trying to make the new head.
The simplest solution to this would be to just free the new node again if
a CAS fails, however this can be expensive and would mean that in practice
arenas could only really be used with a GPA as their child allocator. To
work around this, this implementation keeps its own free list of nodes
which didn't make their CAS to be reused by a later `alloc` invocation.
To keep things simple and avoid ABA problems the free list is only ever
be accessed beyond its head by 'stealing' the head node (and thus the
entire list) with an atomic swap. This makes iteration and removal trivial
since there's only ever one thread doing it at a time which also owns all
nodes it's holding. When the thread is done it can just push its list onto
the free list again.

This implementation offers comparable performance to the previous one when
only being accessed by a single thread and a slight speedup compared to
the previous implementation wrapped into a `ThreadSafeAllocator` up to ~7
threads performing operations on it concurrently.
(measured on a base model MacBook Pro M1)
2026-02-25 19:12:35 +01:00
Alex Rønne Petersen b5bcbf2a62 std.heap.DebugAllocator: make BucketHeader.fromPage() use wrapping arithmetic
If we've allocated the very last page in the address space then these operations
will overflow and underflow respectively - which is fine.
2026-02-21 23:39:34 +01:00
Matthew Lugg a9d18c4a0c std.heap.PageAllocator: avoid mremaps which may reserve potential stack space
Linux's approach to mapping the main thread's stack is quite odd: it essentially
tries to select an mmap address (assuming unhinted mmap calls) which do not
cover the region of virtual address space into which the stack *would* grow
(based on the stack rlimit), but it doesn't actually *prevent* those pages from
being mapped. It also doesn't try particularly hard: it's been observed that the
first (unhinted) mmap call in a simple application is usually put at an address
which is within a gigabyte or two of the stack, which is close enough to make
issues somewhat likely. In particular, if we get an address which is close-ish
to the stack, and then `mremap` it without the MAY_MOVE flag, we are *very*
likely to map pages in this "theoretical stack region". This is particularly a
problem on loongarch64, where the initial mmap address is empirically only
around 200 megabytes from the stack (whereas on most other 64-bit targets it's
closer to a gigabyte).

To work around this, we just need to avoid mremap in some cases. Unfortunately,
this system call isn't used too heavily by musl or glibc, so design issues like
this can and do exist without being caught. So, when `PageAllocator.resize` is
called, let's not try to `mremap` to grow the pages. We can still call `mremap`
in the `PageAllocator.remap` path, because in that case we can set the
`MAY_MOVE` flag, which empirically appears to make the Linux kernel avoid the
problematic "theoretical stack region".
2026-02-21 23:39:34 +01:00
Alex Rønne Petersen c8dd050305 std.heap.PageAllocator: hint mmaps in the same direction as stack growth
The old logic was fine for targets where the stack grows up (so, literally just
hppa), but problematic on targets where it grows down, because we could hint
that we wanted an allocation to happen in an area of the address space that the
kernel expects to be able to expand the stack into. The kernel is happy to
satisfy such a hint despite the obvious problems this leads to later down the
road.

Co-authored-by: rpkak <rpkak@noreply.codeberg.org>
2026-02-21 23:39:20 +01:00
Andrew Kelley 0957761d5c std.heap.BrkAllocator: fix incorrect assumptions 2026-02-12 16:30:27 -08:00
Andrew Kelley 6ccabbd4e5 std: brk allocator for single-threaded mode 2026-02-12 13:14:51 -08:00
Andrew Kelley 6744160211 zig libc: implement malloc 2026-02-12 13:14:51 -08:00
Andrew Kelley 5c59a46238 std.heap.PageAllocator: fix not respecting alignments
in remap and resize, alignments larger than page size were incorrectly ignored.
2026-02-12 13:14:51 -08:00
rpkak 184c8f9545 std.heap.PageAllocator: align hint 2026-02-03 20:27:28 +01:00
Andrew Kelley 550da1b676 std: migrate remaining sync primitives to Io
- delete std.Thread.Futex
- delete std.Thread.Mutex
- delete std.Thread.Semaphore
- delete std.Thread.Condition
- delete std.Thread.RwLock
- delete std.once

std.Thread.Mutex.Recursive remains... for now. it will be replaced with
a special purpose mechanism used only by panic logic.

std.Io.Threaded exposes mutexLock and mutexUnlock for the advanced case
when you need to call them directly.
2026-02-02 18:57:17 -08:00
Andrew Kelley 255aeb57b2 std: introduce atomic.Mutex and use it in heap.SmpAllocator
This allocator implementation uses only lock-free operations.
2026-02-02 18:36:40 -08:00
Brian Orora 4e3fadd90e std.heap.DebugAllocator: fix account total_requested_bytes on resizeSmall 2026-01-27 00:09:48 +01:00
Andrew Kelley 4d6d2922b8 std: move memory locking and memory protection to process
and introduce type safety for posix.PROT (mmap, mprotect)

progress towards #6600
2026-01-09 13:52:00 -08:00
Andrew Kelley e3b7cad81e std.heap.DebugAllocator: disable already flaky test
tracked by #22731

counterpart to ef1ddbe2f0
2026-01-04 07:29:35 -08:00
Andrew Kelley ef1ddbe2f0 std.heap.DebugAllocator: disable already flaky test
tracked by #22731
2026-01-04 00:27:09 -08:00
Andrew Kelley b243e8f8cc std: integrate DebugAllocator with terminal mode
by adding a new std.Option for log.terminalMode

this is an alternative to the approach that was deleted in
aa57793b68
2025-12-26 19:58:56 -08:00
Andrew Kelley ffcbd48a12 std: rework TTY detection and printing
This commit sketches an idea for how to deal with detection of file
streams as being terminals.

When a File stream is a terminal, writes through the stream should have
their escapes stripped unless the programmer explicitly enables terminal
escapes. Furthermore, the programmer needs a convenient API for
intentionally outputting escapes into the stream. In particular it
should be possible to set colors that are silently discarded when the
stream is not a terminal.

This commit makes `Io.File.Writer` track the terminal mode in the
already-existing `mode` field, making it the appropriate place to
implement escape stripping.

`Io.lockStderrWriter` returns a `*Io.File.Writer` with terminal
detection already done by default. This is a higher-level application
layer stream for writing to stderr.

Meanwhile, `std.debug.lockStderrWriter` also returns a `*Io.File.Writer`
but a lower-level one that is hard-coded to use a static single-threaded
`std.Io.Threaded` instance. This is the same instance that is used for
collecting debug information and iterating the unwind info.
2025-12-23 22:15:09 -08:00
Andrew Kelley bee8005fe6 std.heap.DebugAllocator: never detect TTY config
instead, allow the user to set it as a field.

this fixes a bug where leak printing and error printing would run tty
config detection for stderr, and then emit a log, which is not necessary
going to print to stderr.

however, the nice defaults are gone; the user must explicitly assign the
tty_config field during initialization or else the logging will not have
color.

related: https://github.com/ziglang/zig/issues/24510
2025-12-23 22:15:08 -08:00
Jacob Young c13857e504 windows: type safety improvements and more ntdll functions 2025-12-12 01:58:21 -05:00
Linus Groh 39fa831947 std: Remove a handful of things deprecated during the 0.15 release cycle
- std.Build.Step.Compile.root_module mutators -> std.Build.Module
- std.Build.Step.Compile.want_lto -> std.Build.Step.Compile.lto
- std.Build.Step.ConfigHeader.getOutput -> std.Build.Step.ConfigHeader.getOutputFile
- std.Build.Step.Run.max_stdio_size -> std.Build.Step.Run.stdio_limit
- std.enums.nameCast -> @field(E, tag_name) / @field(E, @tagName(tag))
- std.Io.tty.detectConfig -> std.Io.tty.Config.detect
- std.mem.trimLeft -> std.mem.trimStart
- std.mem.trimRight -> std.mem.trimEnd
- std.meta.intToEnum -> std.enums.fromInt
- std.meta.TagPayload -> @FieldType(U, @tagName(tag))
- std.meta.TagPayloadByName -> @FieldType(U, tag_name)
2025-11-27 20:17:04 +00:00
Justus Klausecker 4187d0e8fe MemoryPool: add unmanaged variants and make them the default 2025-11-15 09:30:57 +00:00
Andrew Kelley 10b1eef2d3 std: fix compilation errors on Windows 2025-10-29 06:20:50 -07:00
Adrian 4e9dd099c5 std.heap.debug_allocator outdated doc (#25634)
Fixed a relatively small outdated doc string, referring to the bucket linked list.
2025-10-28 10:26:04 +01:00
mlugg e4456d03f3 std.Build.Step.Run: many enhancements
This is a major refactor to `Step.Run` which adds new functionality,
primarily to the execution of Zig tests.

* All tests are run, even if a test crashes. This happens through the
  same mechanism as timeouts where the test processes is repeatedly
  respawned as needed.
* The build status output is more precise. For each unit test, it
  differentiates pass, skip, fail, crash, and timeout. Memory leaks are
  reported separately, as they do not indicate a test's "status", but
  are rather an additional property (a test with leaks may still pass!).
* The number of memory leaks is tracked and reported, both per-test and
  for a whole `Run` step.
* Reporting is made clearer when a step is failed solely due to error
  logs (`std.log.err`) where every unit test passed.
2025-10-18 09:28:41 +01:00
mlugg a18fd41064 std: rework/remove ucontext_t
Our usage of `ucontext_t` in the standard library was kind of
problematic. We unnecessarily mimiced libc-specific structures, and our
`getcontext` implementation was overkill for our use case of stack
tracing.

This commit introduces a new namespace, `std.debug.cpu_context`, which
contains "context" types for various architectures (currently x86,
x86_64, ARM, and AARCH64) containing the general-purpose CPU registers;
the ones needed in practice for stack unwinding. Each implementation has
a function `current` which populates the structure using inline
assembly. The structure is user-overrideable, though that should only be
necessary if the standard library does not have an implementation for
the *architecture*: that is to say, none of this is OS-dependent.

Of course, in POSIX signal handlers, we get a `ucontext_t` from the
kernel. The function `std.debug.cpu_context.fromPosixSignalContext`
converts this to a `std.debug.cpu_context.Native` with a big ol' target
switch.

This functionality is not exposed from `std.c` or `std.posix`, and
neither are `ucontext_t`, `mcontext_t`, or `getcontext`. The rationale
is that these types and functions do not conform to a specific ABI, and
in fact tend to get updated over time based on CPU features and
extensions; in addition, different libcs use different structures which
are "partially compatible" with the kernel structure. Overall, it's a
mess, but all we need is the kernel context, so we can just define a
kernel-compatible structure as long as we don't claim C compatibility by
putting it in `std.c` or `std.posix`.

This change resulted in a few nice `std.debug` simplifications, but
nothing too noteworthy. However, the main benefit of this change is that
DWARF unwinding---sometimes necessary for collecting stack traces
reliably---now requires far less target-specific integration.

Also fix a bug I noticed in `PageAllocator` (I found this due to a bug
in my distro's QEMU distribution; thanks, broken QEMU patch!) and I
think a couple of minor bugs in `std.debug`.

Resolves: #23801
Resolves: #23802
2025-09-30 13:44:54 +01:00