Commit Graph

52 Commits

Author SHA1 Message Date
Matthew Lugg 97fe49a80f Elf2: rework the symtab, and fix a bunch of stuff
Sorry for the mega-commit, this diff got a little out of control.

The main thing here is a complete rework of how Elf2 handles the symbol
table. I messed around with the design for a while and landed on
something which is fairly memory-efficient (in particular the overhead
for STB_LOCAL symbols is as low as possible) and fulfils some of the
more awkward constraints of the ELF format. The main such constraint is
that all STB_LOCAL symbols in a symbol table are required to appear
before any STB_GLOBAL/STB_WEAK symbols. This is further complicated by
the fact that when producing a DSO, symbols with STV_HIDDEN or
STV_INTERNAL visibility are required to have STB_LOCAL binding in the
symbol table, even though they are global symbols from the perspective
of the link editor. Plus, when combining multiple symbols with the same
name, the resulting visibility is the strictest of all of the inputs, so
it is possible at any point in compilation to discover an extern/export
symbol which forces an existing STB_GLOBAL symbol to become STB_LOCAL
and therefore requires it to move to an earlier symtab index. Dealing
with all of this was quite awkward.

But I got there! I also implemented a lot of features in the process. I
don't remember everything perfectly, but here's a vague list:

* Multiple definitions of and/or unresolved references to symbols are
  now combined correctly in all cases

* `.bss` sections from inputs are correctly lowered (we don't actually
  emit a `.bss` section of our own yet, but I was able to put that data
  into the `.data` section so that the functionality is correct)

* Relocations in link inputs are now always processed (previously they
  would be silently ignored in most cases)

* Linker errors are triggered if a supported input section has a
  relocation which targets an unsupported input section (previously
  the unsupported section's symbol was dropped and associated
  relocations would be silently ignored)

* When linking a static executable, an error is emitted if a required
  symbol (i.e. an undefined reference with strong linkage) was never
  defined

* Duplicate symbol errors now work correctly

* When emitting a relocatable, the offsets of relocation entries are now
  correct (previously the offsets written were relative to a symbol
  rather than a section, meaning that e.g. almost all text relocations
  were just in a single function)

The changes in all of the other linkers and codegen backends are some
added type-safety at the codegen-linker API boundary. There are now
distinct `u32`-backed types for identifying an "atom" (the thing we're
codegenning) and a "symbol" (the thing which a relocation targets).
Linker implementations can use a couple of private helper functions to
convert between this implementation-agnostic type and their specific
type; for instance, `Elf2` can convert between a `Symbol.Id` and a
`link.File.SymbolId` with `Symbol.Id.fromTypeErased` and
`Symbol.Id.toTypeErased`. I didn't implement this nicely for any other
linker, so right now there's a lot of `@enumFromInt`/`@intFromEnum`
sprinkled all over the place, particularly with the legacy ELF and
Mach-O linkers.

I tested that I could still perform incremental updates to the Zig
compiler using this commit. In terms of the new behaviors, the most
interesting stuff is symbol and relocation resolution, so I ran a few
tests involving building a "Hello World" binary in various different
ways:

* `build-exe` correctly succeeds

* `build-exe -fno-compiler-rt` correctly reports undefined symbols

* `build-obj` linked with `build-exe` correctly succeeds

* `build-obj` linked with `build-exe -fno-compiler-rt` correctly reports
  undefined symbols

* `build-obj -fcompiler-rt` linked with `build-exe -fno-compiler-rt`
  correctly succeeds

* `build-obj -fcompiler-rt` linked with `build-exe` correctly succeeds
  (the compiler-rt symbols are weak so the global symbols are
  arbitrarily resolved to one of the two implementations)

I also manually verified with `readelf` that symbol tables were always
ordered correctly (before this PR, `readelf -s` would usually emit
warnings about incorrectly-ordered symtabs!), and verified that various
visibility attributes worked as expected.

No actual test coverage is added due to the current lack of a useful
linker test harness. Once a good test harness is available I will be
willing to write some tests.
2026-05-17 18:55:26 +01:00
Matthew Lugg 4c330e053b compiler: use 'std.lang' instead of 'std.builtin' 2026-05-03 12:23:30 +01:00
Matthew Lugg e133f793ee compiler: depend on 'std.lang' instead of 'std.builtin' 2026-05-03 12:23:29 +01:00
Matthew Lugg fdac89d6cd remove uses of array multiplication
In preparation for its removal as accepted in
https://github.com/ziglang/zig/issues/24738.
2026-04-30 08:57:51 +01:00
Matthew Lugg e67c344fc0 compiler,tests,tools: remove uses of capturing errdefer
In preparation for its removal, as accepted in
https://github.com/ziglang/zig/issues/23734.
2026-04-29 23:27:58 +01:00
Jacob Young cb1c7319b5 llvm: fix aarch64 c abi HFA detection
Aggregate types do not count as Homogeneous Aggregates if they have
padding gaps between fields or at the end due to field alignments.
2026-04-25 12:01:14 -04:00
Ryan Liptak 3252a05531 Prefer <err> => |e| return e over <err> => return <err>
Avoids the potential for a typo on the `return <err>` side of the prong
2026-04-20 18:03:14 -07:00
Pavel Verigo d840583458 remove AIR .bool_or/.bool_and 2026-04-19 21:49:51 +02:00
rpkak f564a7733c remove code, which is only reached if c_longdouble is only 16 or 32 bits big 2026-04-16 07:05:31 +02:00
nektro e73257dec2 lib/std: BitSet,EnumSet: replace initEmpty/initFull with decl literals (#31469)
Reviewed-on: https://codeberg.org/ziglang/zig/pulls/31469
Reviewed-by: Andrew Kelley <andrew@ziglang.org>
Co-authored-by: nektro <hello@nektro.net>
Co-committed-by: nektro <hello@nektro.net>
2026-04-05 05:12:13 +02:00
Matthew Lugg fb224178aa Air: change misleading instruction tag name 2026-03-28 16:47:02 +00:00
Matthew Lugg c0f3a23831 llvm: get rid of a bunch of PerThread usages
Also, notably, remove `Air.value`! The `onePossibleValue` check was
actually dead code, because it is a bug if Sema ever emits code which
considers a value of OPV type to be runtime-known---and at that point
`Air.value` is just a thin wrapper around `Air.Ref.toInterned`.
2026-03-28 16:46:59 +00:00
Matthew Lugg 5d215838a7 InternPool.Nav: fix race, refactor
I've realised that the cause of at least some of our weird CI flakiness
was a bug in how `Nav` values were resolved. Consider this scenario: the
frontend resolves the type of a `Nav`, and then sends a function to the
backend, which requires the backend to lower a pointer to that `Nav`.
The backend calls `InternPool.getNav` to determine the `Nav`'s type.
However, this races with the frontend resolving the *value* of that
`Nav`. This involves writing separately to two fields, `bits` and
`type_or_value`. If only one of these changes is observed, then the
backend will incorrectly interpret the type as the value or vice versa,
leading to a crash or even a miscompilation. (Of course, there's also
the straightforward issue that the racing loads were non-atomic, making
them illegal).

The only good solution to this was to make `Nav` 4 bytes bigger, giving
it separate `type` and `value` fields. In theory that's a quite small
change, but it ended up having a bunch of nice consequences which led to
this diff being a bit bulkier than expected:

* `Nav.Repr.Bits` was simplified, because it no longer has to track
  "resolution status": we can use `.none` for that. This frees up some
  bits to make things more consistent between the "type resolved" and
  "fully resolved" states.

* This consistency allowed the `Nav.status` union to be replaced with a
  simpler field `Nav.resolved`, which is a bit nicer to work with.

* Most of the "getter" functions were able to be removed from `Nav`
  because the state they were fetching had been moved to simple fields
  on `Nav.resolved`.

* There were still a handful of free bits in `Nav.Repr.Bits`, which
  could be used to represent the "const" and "threadlocal" flags rather
  than these being stored on `Key.Extern` and `Key.Variable`. This is a
  bit more convenient for linkers.

* With those bits gone, `Key.Variable` is a trivial wrapper around a
  type and an initial value, and the fact that a declaration is mutable
  can be represented solely through the "const" flag. Therefore,
  `Key.Variable` no longer served a purpose, and could be eliminated
  entirely in favour of storing the variable's initial value directly in
  the "value" field of the `Nav`.

So, I'm quite pleased with this refactor! But anyway, regarding the bug
fix which actually motivated this: if I've done my job correctly, this
should solve some crashes, such as these (which were what tipped me off
to this bug in the first place):

https://codeberg.org/ziglang/zig/actions/runs/2306/jobs/7/attempt/1
https://codeberg.org/ziglang/zig/actions/runs/2173/jobs/6/attempt/1

...and, who knows, perhaps even the random SIGSEGVs we've seen on some
targets! Probably not, but one can hope.
2026-03-15 11:47:14 +00:00
Matthew Lugg b27c56fe50 compiler: get everything building
Several backends are crashing right now. I'll need to fix at least the C
backend before this branch is ready to PR.
2026-03-10 10:26:12 +00:00
Matthew Lugg f7a1ccfc56 compiler: fix up LLVM backend, and improve its debug info
The LLVM backend can now run the behavior tests and standard library
tests, like the x86_64 backend can. This commit required me to make a
lot of changes to how the LLVM backend lowers debug information, and
while I was doing that, I improved a few things:

* `anyerror` is now an enum type (and other error sets just wrap it), so
  error values appear by name in debuggers

* Fixed broken lowering for tagged unions with zero-width payloads

* Associate container types with source locations in all cases

* Avoid depending on the order of type resolution (using the new
  `DebugConstPool` abstraction), so debug information will contain all
  available type information rather than just the subset which happens
  to be resolved when the backend lowers that debug type
2026-03-10 10:26:12 +00:00
Matthew Lugg 187fef209f compiler: rework OPV and noreturn-like types 2026-03-10 10:26:08 +00:00
Matthew Lugg b19074d252 compiler: represent bitpacks as their backing integer
Now that https://github.com/ziglang/zig/issues/24657 has been
implemented, the compiler can simplify its internal representation of
comptime-known `packed struct` and `packed union` values. Instead of
storing them field-wise, we can simply store their backing integer
value. This simplifies many operations and improves efficiency in some
cases.
2026-03-10 10:26:08 +00:00
Matthew Lugg 3086c7977b type resolution progress 2026-03-10 10:26:07 +00:00
Matthew Lugg 510ea6f61f type resolution progress 2026-03-10 10:26:07 +00:00
Mathieu Suen 36b65ab59e Air: add "unwrap" functions for loading extra data 2026-02-06 13:06:49 +00:00
Andrew Kelley a5b719e9eb compiler: fix build failures from std.Io-fs 2025-12-23 22:15:10 -08:00
Andrew Kelley 608145c2f0 fix more fallout from locking stderr 2025-12-23 22:15:10 -08:00
Andrew Kelley 1925e0319f update lockStderrWriter sites
use the application's Io implementation where possible. This correctly
makes writing to stderr cancelable, fallible, and participate in the
application's event loop. It also removes one more hard-coded
dependency on a secondary Io implementation.
2025-12-23 22:15:09 -08:00
Ali Cheraghi dec1163fbb all: replace all @Type usages
Co-authored-by: Matthew Lugg <mlugg@mlugg.co.uk>
2025-11-22 22:42:38 +00:00
Benjamin Jurk 4b5351bc0d update deprecated ArrayListUnmanaged usage (#25958) 2025-11-20 14:46:23 -08:00
Andrew Kelley a9568ed296 Merge pull request #25898 from jacobly0/elfv2-progress
Elf2: more progress
2025-11-20 04:33:04 -08:00
Matthew Lugg bc78d8efdb Legalize: implement soft-float legalizations
A new `Legalize.Feature` tag is introduced for each float bit width
(16/32/64/80/128). When e.g. `soft_f16` is enabled, all arithmetic and
comparison operations on `f16` are converted to calls to the appropriate
compiler_rt function using the new AIR tag `.legalize_compiler_rt_call`.
This includes casts where the source *or* target type is `f16`, or
integer<=>float conversions to or from `f16`. Occasionally, operations
are legalized to blocks because there is extra code required; for
instance, legalizing `@floatFromInt` where the integer type is larger
than 64 bits requires calling an arbitrary-width integer conversion
function which accepts a pointer to the integer, so we need to use
`alloc` to create such a pointer, and store the integer there (after
possibly zero-extending or sign-extending it).

No backend currently uses these new legalizations (and as such, no
backend currently needs to implement `.legalize_compiler_rt_call`).
However, for testing purposes, I tried modifying the self-hosted x86_64
backend to enable all of the soft-float features (and implement the AIR
instruction). This modified backend was able to pass all of the behavior
tests (except for one `@mod` test where the LLVM backend has a bug
resulting in incorrect compiler-rt behavior!), including the tests
specific to the self-hosted x86_64 backend.

`f16` and `f80` legalizations are likely of particular interest to
backend developers, because most architectures do not have instructions
to operate on these types. However, enabling *all* of these legalization
passes can be useful when developing a new backend to hit the ground
running and pass a good amount of tests more easily.
2025-11-15 09:49:01 +00:00
Matthew Lugg 69f39868b4 Air.Legalize: revert to loops for scalarizations
I had tried unrolling the loops to avoid requiring the
`vector_store_elem` instruction, but it's arguably a problem to generate
O(N) code for an operation on `@Vector(N, T)`. In addition, that
lowering emitted a lot of `.aggregate_init` instructions, which is
itself a quite difficult operation to codegen.

This requires reintroducing runtime vector indexing internally. However,
I've put it in a couple of instructions which are intended only for use
by `Air.Legalize`, named `legalize_vec_elem_val` (like `array_elem_val`,
but for indexing a vector with a runtime-known index) and
`legalize_vec_store_elem` (like the old `vector_store_elem`
instruction). These are explicitly documented as *not* being emitted by
Sema, so need only be implemented by backends if they actually use an
`Air.Legalize.Feature` which emits them (otherwise they can be marked as
`unreachable`).
2025-11-12 16:00:16 +00:00
Matthew Lugg c091e27aac compiler: spring cleaning
I started this diff trying to remove a little dead code from the C
backend, but ended up finding a bunch of dead code sprinkled all over
the place:

* `packed` handling in the C backend which was made dead by `Legalize`
* Representation of pointers to runtime-known vector indices
* Handling for the `vector_store_elem` AIR instruction (now removed)
* Old tuple handling from when they used the InternPool repr of structs
* Straightforward unused functions
* TODOs in the LLVM backend for features which Zig just does not support
2025-11-12 16:00:15 +00:00
Jacob Young 8647e4d311 aarch64: cleanup register lock 2025-11-11 01:47:27 -05:00
Jacob Young 32779a7c73 aarch64: fix macho external references 2025-10-30 09:31:30 +00:00
Jacob Young 402c14f86a aarch64: implement optional comparisons 2025-10-30 09:31:30 +00:00
Matthew Lugg 74931fe25c std.debug.lockStderrWriter: also return ttyconf
`std.Io.tty.Config.detect` may be an expensive check (e.g. involving
syscalls), and doing it every time we need to print isn't really
necessary; under normal usage, we can compute the value once and cache
it for the whole program's execution. Since anyone outputting to stderr
may reasonably want this information (in fact they are very likely to),
it makes sense to cache it and return it from `lockStderrWriter`. Call
sites who do not need it will experience no significant overhead, and
can just ignore the TTY config with a `const w, _` destructure.
2025-10-30 09:31:28 +00:00
Jacob Young 1fa11e0954 Coff: delete 2025-10-02 17:44:52 -04:00
Jacob Young f58200e3f2 Elf2: create a new linker from scratch
This iteration already has significantly better incremental support.

Closes #24110
2025-09-21 14:09:14 -07:00
Frank Denis bdc31c9561 aarch64/zonCast: don't return a pointer to a stack element
Elements are computed at comptime, so don't declare them as "var".
2025-09-21 05:01:41 -07:00
Jacob Young 5144f10ec9 aarch64: fix behavior failures 2025-09-20 18:33:01 -07:00
Jacob Young f12c4f86fc aarch64: implement ptr_slice_*_ptr 2025-09-20 18:33:00 -07:00
Jacob Young 56d62395d1 aarch64: more assembler instructions
Closes #24848
2025-08-15 06:12:45 -04:00
Jacob Young d625158354 aarch64: implement more assembler instructions 2025-08-11 15:47:51 -07:00
David Rubin d6c74a95fd remove usages of .alignment = 0 2025-08-01 14:57:16 -07:00
Jacob Young 3fbdd58a87 aarch64: implement scalar @mod 2025-07-28 22:23:19 -07:00
Jacob Young c334956a54 aarch64: workaround some optional/union issues 2025-07-28 09:03:17 -07:00
Jacob Young b26e732bd0 aarch64: fix error union constants 2025-07-27 08:01:07 -04:00
Jacob Young 771523c675 aarch64: implement var args 2025-07-27 06:59:38 -04:00
Jacob Young 7894703ee7 aarch64: implement more optional/error union/union support 2025-07-26 21:39:50 -04:00
Jacob Young 69abc945e4 aarch64: implement some safety checks
Closes #24553
2025-07-26 17:31:04 -04:00
Jacob Young 1274254c48 aarch64: implement stack probing 2025-07-26 16:08:40 -04:00
Jacob Young 7c349da49c aarch64: implement complex switch prongs 2025-07-26 16:08:40 -04:00
Jacob Young 869ef00602 aarch64: more progress
- factor out `loadReg`
 - support all general system control registers in inline asm
 - fix asserts after iterating field offsets
 - fix typo in `slice_elem_val`
 - fix translation of argument locations
2025-07-25 14:20:23 -04:00