`zig build` CLI kicks off async task to compile optimized make runner
executable, does fetch, compiles configure process in debug mode, then
checks cache for the CLI options that affect configuration only. On hit,
skips building/running the configure script. On miss, runs it, saves
result in cache.
The cached artifact is a "configuration" file - a serialized build step
graph, which also includes unlazy package dependencies and additional
file system dependencies.
Next, awaits task for compiling optimized make runner executable, passes
configuration file to it. Make runner is responsible for the CLI after
that point.
For the use case of detecting when `git describe` needs to be rerun, we
can allow the configure process to manually add a file system mtime
dependencies, in this case it would be on `.git/index` and `.git/HEAD`.
This will enable two optimizations:
1. The bulk of the build system will not be rebuilt when user changes
their configure script.
2. The user logic can be completely bypassed when the CLI options
provided do not affect the configure phase - even if they affect the
make phase.
Remaining tasks in the branch:
* some stuff in `zig build` CLI is `@panic("TODO")`.
* configure runner needs to implement serialization of build graph using
std.zig.Configuration
* build runner needs to be transformed into make runner, consuming
configuration file as input and deserializing the step graph.
* introduce depending only on a file's metadata and *not* its contents
into the cache system, and add a std.Build API for using it.
This logic existed when actually analyzing a `nav_ty` unit directly; it
was just missing in the code path which resolves a `nav_ty` unit due to
a `nav_val` being resolved.
Resolves: #35307
I should have realised what was going on here sooner, because it was
really simple! We had a file offset which was being flushed in
`flushMoved` instead of `flushFileOffset`, and since `flushMoved` does
not bubble down to the PHDR segment from the "parent" read-only LOAD
segment, we weren't updating `ehdr.phoff` if the rodata segment had to
move. The tricky thing which meant I didn't catch this sooner is that
this wasn't happening on all filesystems, because the behavior of
`link.MappedFile` differs depending on the capabilities of the target
filesystem.
Resolves: https://codeberg.org/ziglang/zig/issues/32123
Resolves: https://codeberg.org/ziglang/zig/issues/35367
* The llvm-ints subcommand hasn't been useful for a while since we're just using
hardcoded data layout strings based on the target, rather than the old
approach of building them up piecemeal. We additionally have the
tools/generate_c_size_and_align_checks.zig script for catching C ABI
mismatches. So this libLLVM dependency is not justified anymore.
* The detect-cpu subcommand was at least somewhat useful to compare CPU
detection results between Zig and LLVM. However, its usefulness hinged on
running it natively on every relevant CPU, which we were not actually doing
anyway. Besides, I make a point of porting CPU detection changes in LLVM to
our CPU detection code on every LLVM upgrade - and in some cases, we even do
it more correctly than LLVM now. So the libLLVM dependency brought by this
subcommand also isn't really justified anymore.
Sorry for the mega-commit, this diff got a little out of control.
The main thing here is a complete rework of how Elf2 handles the symbol
table. I messed around with the design for a while and landed on
something which is fairly memory-efficient (in particular the overhead
for STB_LOCAL symbols is as low as possible) and fulfils some of the
more awkward constraints of the ELF format. The main such constraint is
that all STB_LOCAL symbols in a symbol table are required to appear
before any STB_GLOBAL/STB_WEAK symbols. This is further complicated by
the fact that when producing a DSO, symbols with STV_HIDDEN or
STV_INTERNAL visibility are required to have STB_LOCAL binding in the
symbol table, even though they are global symbols from the perspective
of the link editor. Plus, when combining multiple symbols with the same
name, the resulting visibility is the strictest of all of the inputs, so
it is possible at any point in compilation to discover an extern/export
symbol which forces an existing STB_GLOBAL symbol to become STB_LOCAL
and therefore requires it to move to an earlier symtab index. Dealing
with all of this was quite awkward.
But I got there! I also implemented a lot of features in the process. I
don't remember everything perfectly, but here's a vague list:
* Multiple definitions of and/or unresolved references to symbols are
now combined correctly in all cases
* `.bss` sections from inputs are correctly lowered (we don't actually
emit a `.bss` section of our own yet, but I was able to put that data
into the `.data` section so that the functionality is correct)
* Relocations in link inputs are now always processed (previously they
would be silently ignored in most cases)
* Linker errors are triggered if a supported input section has a
relocation which targets an unsupported input section (previously
the unsupported section's symbol was dropped and associated
relocations would be silently ignored)
* When linking a static executable, an error is emitted if a required
symbol (i.e. an undefined reference with strong linkage) was never
defined
* Duplicate symbol errors now work correctly
* When emitting a relocatable, the offsets of relocation entries are now
correct (previously the offsets written were relative to a symbol
rather than a section, meaning that e.g. almost all text relocations
were just in a single function)
The changes in all of the other linkers and codegen backends are some
added type-safety at the codegen-linker API boundary. There are now
distinct `u32`-backed types for identifying an "atom" (the thing we're
codegenning) and a "symbol" (the thing which a relocation targets).
Linker implementations can use a couple of private helper functions to
convert between this implementation-agnostic type and their specific
type; for instance, `Elf2` can convert between a `Symbol.Id` and a
`link.File.SymbolId` with `Symbol.Id.fromTypeErased` and
`Symbol.Id.toTypeErased`. I didn't implement this nicely for any other
linker, so right now there's a lot of `@enumFromInt`/`@intFromEnum`
sprinkled all over the place, particularly with the legacy ELF and
Mach-O linkers.
I tested that I could still perform incremental updates to the Zig
compiler using this commit. In terms of the new behaviors, the most
interesting stuff is symbol and relocation resolution, so I ran a few
tests involving building a "Hello World" binary in various different
ways:
* `build-exe` correctly succeeds
* `build-exe -fno-compiler-rt` correctly reports undefined symbols
* `build-obj` linked with `build-exe` correctly succeeds
* `build-obj` linked with `build-exe -fno-compiler-rt` correctly reports
undefined symbols
* `build-obj -fcompiler-rt` linked with `build-exe -fno-compiler-rt`
correctly succeeds
* `build-obj -fcompiler-rt` linked with `build-exe` correctly succeeds
(the compiler-rt symbols are weak so the global symbols are
arbitrarily resolved to one of the two implementations)
I also manually verified with `readelf` that symbol tables were always
ordered correctly (before this PR, `readelf -s` would usually emit
warnings about incorrectly-ordered symtabs!), and verified that various
visibility attributes worked as expected.
No actual test coverage is added due to the current lack of a useful
linker test harness. Once a good test harness is available I will be
willing to write some tests.
I'm not sure what the basis was for the old logic here, but it was
incorrect and caused an assertion failure in some cases. The
dependencies on `maybe_interp` and `any_non_single_threaded` are already
correctly modeled by `phnum`, so do not need to be accounted for a
second time.
On Linux, the OSABI field can be either ELFOSABI_GNU or ELFOSABI_NONE
(aka ELFOSABI_SYSV). Therefore, even if we have chosen ELFOSABI_GNU, we
still need to accept ELFOSABI_NONE in link inputs.
Then, since we're now having to check the ident componentwise anyway, we
may as well give more precise error messages on mismatch.
Fixes a regression where this conversion crashes. When the conversion was
still possible it would produce a slice with a length of zero, which doesn't
really make a lot of sense either. There's no way to determine the length
of the destination slice from a pointer to an opaque type, so it's a compile
error now. Users should just cast to a many-item pointer and slice to the
desired length manually instead.
Since `packed` containers are now internally represented by a `bitpack`,
they need special handling on initialization: they need to be either
bitpacked or bitcasted to their backing integer. `Sema` already did this,
but `LowerZon` didn't yet.
This PR merges the functionality of the `getLastOrNull` method into `getLast`, which improves consistency as its
based on methods like `front`, `back`, and `peek` in the `Deque` and `PriorityQueue` containers.
Reviewed-on: https://codeberg.org/ziglang/zig/pulls/32008
Reviewed-by: Andrew Kelley <andrew@ziglang.org>