Mirrors/zig - zig - Gitea @ Femelysm.ru

mirror of https://codeberg.org/ziglang/zig.git synced 2026-05-31 05:15:29 +03:00

Author	SHA1	Message	Date
Matthew Lugg	97fe49a80f	Elf2: rework the symtab, and fix a bunch of stuff Sorry for the mega-commit, this diff got a little out of control. The main thing here is a complete rework of how Elf2 handles the symbol table. I messed around with the design for a while and landed on something which is fairly memory-efficient (in particular the overhead for STB_LOCAL symbols is as low as possible) and fulfils some of the more awkward constraints of the ELF format. The main such constraint is that all STB_LOCAL symbols in a symbol table are required to appear before any STB_GLOBAL/STB_WEAK symbols. This is further complicated by the fact that when producing a DSO, symbols with STV_HIDDEN or STV_INTERNAL visibility are required to have STB_LOCAL binding in the symbol table, even though they are global symbols from the perspective of the link editor. Plus, when combining multiple symbols with the same name, the resulting visibility is the strictest of all of the inputs, so it is possible at any point in compilation to discover an extern/export symbol which forces an existing STB_GLOBAL symbol to become STB_LOCAL and therefore requires it to move to an earlier symtab index. Dealing with all of this was quite awkward. But I got there! I also implemented a lot of features in the process. I don't remember everything perfectly, but here's a vague list: * Multiple definitions of and/or unresolved references to symbols are now combined correctly in all cases * `.bss` sections from inputs are correctly lowered (we don't actually emit a `.bss` section of our own yet, but I was able to put that data into the `.data` section so that the functionality is correct) * Relocations in link inputs are now always processed (previously they would be silently ignored in most cases) * Linker errors are triggered if a supported input section has a relocation which targets an unsupported input section (previously the unsupported section's symbol was dropped and associated relocations would be silently ignored) * When linking a static executable, an error is emitted if a required symbol (i.e. an undefined reference with strong linkage) was never defined * Duplicate symbol errors now work correctly * When emitting a relocatable, the offsets of relocation entries are now correct (previously the offsets written were relative to a symbol rather than a section, meaning that e.g. almost all text relocations were just in a single function) The changes in all of the other linkers and codegen backends are some added type-safety at the codegen-linker API boundary. There are now distinct `u32`-backed types for identifying an "atom" (the thing we're codegenning) and a "symbol" (the thing which a relocation targets). Linker implementations can use a couple of private helper functions to convert between this implementation-agnostic type and their specific type; for instance, `Elf2` can convert between a `Symbol.Id` and a `link.File.SymbolId` with `Symbol.Id.fromTypeErased` and `Symbol.Id.toTypeErased`. I didn't implement this nicely for any other linker, so right now there's a lot of `@enumFromInt`/`@intFromEnum` sprinkled all over the place, particularly with the legacy ELF and Mach-O linkers. I tested that I could still perform incremental updates to the Zig compiler using this commit. In terms of the new behaviors, the most interesting stuff is symbol and relocation resolution, so I ran a few tests involving building a "Hello World" binary in various different ways: * `build-exe` correctly succeeds * `build-exe -fno-compiler-rt` correctly reports undefined symbols * `build-obj` linked with `build-exe` correctly succeeds * `build-obj` linked with `build-exe -fno-compiler-rt` correctly reports undefined symbols * `build-obj -fcompiler-rt` linked with `build-exe -fno-compiler-rt` correctly succeeds * `build-obj -fcompiler-rt` linked with `build-exe` correctly succeeds (the compiler-rt symbols are weak so the global symbols are arbitrarily resolved to one of the two implementations) I also manually verified with `readelf` that symbol tables were always ordered correctly (before this PR, `readelf -s` would usually emit warnings about incorrectly-ordered symtabs!), and verified that various visibility attributes worked as expected. No actual test coverage is added due to the current lack of a useful linker test harness. Once a good test harness is available I will be willing to write some tests.	2026-05-17 18:55:26 +01:00
Matthew Lugg	4c330e053b	compiler: use 'std.lang' instead of 'std.builtin'	2026-05-03 12:23:30 +01:00
Pavel Verigo	22945fbbdc	stage2-wasm: vector, std tests	2026-04-22 00:19:46 +02:00
Ryan Liptak	3252a05531	Prefer `<err> => \|e\| return e` over `<err> => return <err>` Avoids the potential for a typo on the `return <err>` side of the prong	2026-04-20 18:03:14 -07:00
Matthew Lugg	5d215838a7	InternPool.Nav: fix race, refactor I've realised that the cause of at least some of our weird CI flakiness was a bug in how `Nav` values were resolved. Consider this scenario: the frontend resolves the type of a `Nav`, and then sends a function to the backend, which requires the backend to lower a pointer to that `Nav`. The backend calls `InternPool.getNav` to determine the `Nav`'s type. However, this races with the frontend resolving the value of that `Nav`. This involves writing separately to two fields, `bits` and `type_or_value`. If only one of these changes is observed, then the backend will incorrectly interpret the type as the value or vice versa, leading to a crash or even a miscompilation. (Of course, there's also the straightforward issue that the racing loads were non-atomic, making them illegal). The only good solution to this was to make `Nav` 4 bytes bigger, giving it separate `type` and `value` fields. In theory that's a quite small change, but it ended up having a bunch of nice consequences which led to this diff being a bit bulkier than expected: * `Nav.Repr.Bits` was simplified, because it no longer has to track "resolution status": we can use `.none` for that. This frees up some bits to make things more consistent between the "type resolved" and "fully resolved" states. * This consistency allowed the `Nav.status` union to be replaced with a simpler field `Nav.resolved`, which is a bit nicer to work with. * Most of the "getter" functions were able to be removed from `Nav` because the state they were fetching had been moved to simple fields on `Nav.resolved`. * There were still a handful of free bits in `Nav.Repr.Bits`, which could be used to represent the "const" and "threadlocal" flags rather than these being stored on `Key.Extern` and `Key.Variable`. This is a bit more convenient for linkers. * With those bits gone, `Key.Variable` is a trivial wrapper around a type and an initial value, and the fact that a declaration is mutable can be represented solely through the "const" flag. Therefore, `Key.Variable` no longer served a purpose, and could be eliminated entirely in favour of storing the variable's initial value directly in the "value" field of the `Nav`. So, I'm quite pleased with this refactor! But anyway, regarding the bug fix which actually motivated this: if I've done my job correctly, this should solve some crashes, such as these (which were what tipped me off to this bug in the first place): https://codeberg.org/ziglang/zig/actions/runs/2306/jobs/7/attempt/1 https://codeberg.org/ziglang/zig/actions/runs/2173/jobs/6/attempt/1 ...and, who knows, perhaps even the random SIGSEGVs we've seen on some targets! Probably not, but one can hope.	2026-03-15 11:47:14 +00:00
Matthew Lugg	bcb1a6bdf3	compiler: make Dwarf and self-hosted x86_64 happy Introduces a small abstraction, `link.DebugConstPool`, to deal with lowering type/value information into debug info when it may not be known until type resolution (which in some cases will never happen). It is currently only used by self-hosted DWARF logic, but it will also be of use to the LLVM backend (which is my next focus).	2026-03-10 10:26:11 +00:00
Matthew Lugg	187fef209f	compiler: rework OPV and noreturn-like types	2026-03-10 10:26:08 +00:00
Matthew Lugg	b19074d252	compiler: represent bitpacks as their backing integer Now that https://github.com/ziglang/zig/issues/24657 has been implemented, the compiler can simplify its internal representation of comptime-known `packed struct` and `packed union` values. Instead of storing them field-wise, we can simply store their backing integer value. This simplifies many operations and improves efficiency in some cases.	2026-03-10 10:26:08 +00:00
Matthew Lugg	3086c7977b	type resolution progress	2026-03-10 10:26:07 +00:00
Matthew Lugg	6e49697ef5	backend progress The x86_64 backend now compiles and, with `-fstrip`, kinda works!	2026-03-10 10:26:07 +00:00
Matthew Lugg	510ea6f61f	type resolution progress	2026-03-10 10:26:07 +00:00
Jacob Young	459f3b7ede	codegen: fix tuple padding Closes #25797	2025-11-04 06:04:30 -05:00
Jacob Young	1fa11e0954	Coff: delete	2025-10-02 17:44:52 -04:00
Jacob Young	e1f3fc6ce2	Coff2: create a new linker from scratch	2025-10-02 17:44:52 -04:00
Alex Rønne Petersen	86077fe6bd	compiler: move self-hosted backends from src/arch to src/codegen	2025-09-26 02:02:07 +02:00
Jacob Young	f58200e3f2	Elf2: create a new linker from scratch This iteration already has significantly better incremental support. Closes #24110	2025-09-21 14:09:14 -07:00
Andrew Kelley	2151b10a41	more updates to not use GenericWriter	2025-08-28 18:30:57 -07:00
Justus Klausecker	d0586da18e	Sema: Improve comptime arithmetic undef handling This commit expands on the foundations laid by https://github.com/ziglang/zig/pull/23177 and moves even more `Sema`-only functionality from `Value` to `Sema.arith`. Specifically all shift and bitwise operations, `@truncate`, `@bitReverse` and `@byteSwap` have been moved and adapted to the new rules around `undefined`. Especially the comptime shift operations have been basically rewritten, fixing many open issues in the process. New rules applied to operators: * `<<`, `@shlExact`, `@shlWithOverflow`, `>>`, `@shrExact`: compile error if any operand is undef * `<<\|`, `~`, `^`, `@truncate`, `@bitReverse`, `@byteSwap`: return undef if any operand is undef * `&`, `\|`: Return undef if both operands are undef, turn undef into actual `0xAA` bytes otherwise Additionally this commit canonicalizes the representation of aggregates with all-undefined members in the `InternPool` by disallowing them and enforcing the usage of a single typed `undef` value instead. This reduces the amount of edge cases and fixes a bunch of bugs related to partially undefined vecs. List of operations directly affected by this patch: * `<<`, `<<\|`, `@shlExact`, `@shlWithOverflow` * `>>`, `@shrExact` * `&`, `\|`, `~`, `^` and their atomic rmw + reduce pendants * `@truncate`, `@bitReverse`, `@byteSwap`	2025-08-12 16:33:57 +02:00
Andrew Kelley	0b3c3c02e3	linker: delete plan9 support This experimental target was never fully completed. The operating system is not that interesting or popular anyway, and the maintainer is no longer around. Not worth the maintenance burden. This code can be resurrected later if it is worth it. In such case it will be subject to greater scrutiny.	2025-08-11 10:56:20 -07:00
Ali Cheraghi	246e1de554	Watch: do not fail when file is removed before this we would get a crash	2025-08-03 13:16:49 +03:30
Ali Cheraghi	31de2c873f	spirv: refactor	2025-08-02 04:16:01 +03:30
Andrew Kelley	e9b9a27a52	codegen: prevent AnyMir from bloating zig1.wasm	2025-07-22 19:43:47 -07:00
Jacob Young	5060ab99c9	aarch64: add new from scratch self-hosted backend	2025-07-22 19:43:47 -07:00
Andrew Kelley	30c2921eb8	compiler: update a bunch of format strings	2025-07-07 22:43:52 -07:00
Andrew Kelley	a13f0d40eb	compiler: delete arm backend this backend was abandoned before it was completed, and it is not worth salvaging.	2025-07-02 14:50:41 -07:00
Andrew Kelley	20a543097b	compiler: delete aarch64 backend this backend was abandoned before it was completed, and it is not worth salvaging.	2025-07-02 14:42:20 -07:00
Andrew Kelley	80a9b8f326	compiler: delete powerpc backend stub nobody is currently working on this	2025-07-02 14:35:13 -07:00
Ali Cheraghi	1df79ab895	remove `spirv` cpu arch	2025-06-23 06:03:03 +02:00
Jacob Young	1f98c98fff	x86_64: increase passing test coverage on windows Now that codegen has no references to linker state this is much easier. Closes #24153	2025-06-19 18:41:12 -04:00
Jacob Young	917640810e	Target: pass and use locals by pointer instead of by value This struct is larger than 256 bytes and code that copies it consistently shows up in profiles of the compiler.	2025-06-19 11:45:06 -04:00
Ali Cheraghi	872f68c9cb	rename spirv backend name `stage2_spirv64` -> `stage2_spirv`	2025-06-16 13:22:19 +03:30
Jacob Young	afa07f723f	x86_64: implement coff relocations	2025-06-12 17:51:30 +01:00
Jacob Young	d312dfc1f2	codegen: make threadlocal logic consistent	2025-06-12 17:51:29 +01:00
Jacob Young	ba53b14028	x86_64: remove linker references from codegen	2025-06-12 13:55:41 +01:00
Jacob Young	c95b1bf2d3	x86_64: remove air references from mir	2025-06-12 13:55:41 +01:00
mlugg	c0df707066	wasm: get self-hosted compiling, and supporting `separate_thread` My original goal here was just to get the self-hosted Wasm backend compiling again after the pipeline change, but it turned out that from there it was pretty simple to entirely eliminate the shared state between `codegen.wasm` and `link.Wasm`. As such, this commit not only fixes the backend, but makes it the second backend (after CBE) to support the new 1:N:1 threading model.	2025-06-12 13:55:40 +01:00
mlugg	5ab307cf47	compiler: get most backends compiling again As of this commit, every backend other than self-hosted Wasm and self-hosted SPIR-V compiles and (at least somewhat) functions again. Those two backends are currently disabled with panics. Note that `Zcu.Feature.separate_thread` is not enabled for the fixed backends. Avoiding linker references from codegen is a non-trivial task, and can be done after this branch.	2025-06-12 13:55:40 +01:00
mlugg	9eb400ef19	compiler: rework backend pipeline to separate codegen and link The idea here is that instead of the linker calling into codegen, instead codegen should run before we touch the linker, and after MIR is produced, it is sent to the linker. Aside from simplifying the call graph (by preventing N linkers from each calling into M codegen backends!), this has the huge benefit that it is possible to parallellize codegen separately from linking. The threading model can look like this: * 1 semantic analysis thread, which generates AIR * N codegen threads, which process AIR into MIR * 1 linker thread, which emits MIR to the binary The codegen threads are also responsible for `Air.Legalize` and `Air.Liveness`; it's more efficient to do this work here instead of blocking the main thread for this trivially parallel task. I have repurposed the `Zcu.Feature.separate_thread` backend feature to indicate support for this 1:N:1 threading pattern. This commit makes the C backend support this feature, since it was relatively easy to divorce from `link.C`: it just required eliminating some shared buffers. Other backends don't currently support this feature. In fact, they don't even compile -- the next few commits will fix them back up.	2025-06-12 13:55:40 +01:00
Jacob Young	0bf8617d96	x86_64: add support for pie executables	2025-06-06 23:42:14 -07:00
Jacob Young	77e6513030	cbe: implement `stdbool.h` reserved identifiers Also remove the legalize pass from zig1.	2025-05-31 18:54:28 -04:00
Jacob Young	6198f7afb7	Sema: remove `all_vector_instructions` logic Backends can instead ask legalization on a per-instruction basis.	2025-05-31 18:54:28 -04:00
mlugg	b4a0a082dc	codegen: fix accidental stack UAF	2025-05-31 18:54:28 -04:00
Jacob Young	b483defc5a	Legalize: implement scalarization of binary operations	2025-05-31 18:54:28 -04:00
Jacob Young	c04be630d9	Legalize: introduce a new pass before liveness Each target can opt into different sets of legalize features. By performing these transformations before liveness, instructions that become unreferenced will have up-to-date liveness information.	2025-05-29 03:57:48 -04:00
mlugg	92c63126e8	compiler: tlv pointers are not comptime-known Pointers to thread-local variables do not have their addresses known until runtime, so it is nonsensical for them to be comptime-known. There was logic in the compiler which was essentially attempting to treat them as not being comptime-known despite the pointer being an interned value. This was a bit of a mess, the check was frequent enough to actually show up in compiler profiles, and it was very awkward for backends to deal with, because they had to grapple with the fact that a "constant" they were lowering might actually require runtime operations. So, instead, do not consider these pointers to be comptime-known in any way. Never intern such a pointer; instead, when the address of a threadlocal is taken, emit an AIR instruction which computes the pointer at runtime. This avoids lots of special handling for TLVs across basically all codegen backends; of all somewhat-functional backends, the only one which wasn't improved by this change was the LLVM backend, because LLVM pretends this complexity around threadlocals doesn't exist. This change simplifies Sema and codegen, avoids a potential source of bugs, and potentially improves Sema performance very slightly by avoiding a non-trivial check on a hot path.	2025-05-27 19:23:11 +01:00
Alex Rønne Petersen	999777e73a	compiler: Scaffold stage2_powerpc backend. Nothing interesting here; literally just the bare minimum so I can work on this on and off in a branch without worrying about merge conflicts in the non-backend code.	2025-05-20 10:23:16 +02:00
mlugg	37a9a4e0f1	compiler: refactor `Zcu.File` and path representation This commit makes some big changes to how we track state for Zig source files. In particular, it changes: * How `File` tracks its path on-disk * How AstGen discovers files * How file-level errors are tracked * How `builtin.zig` files and modules are created The original motivation here was to address incremental compilation bugs with the handling of files, such as #22696. To fix this, a few changes are necessary. Just like declarations may become unreferenced on an incremental update, meaning we suppress analysis errors associated with them, it is also possible for all imports of a file to be removed on an incremental update, in which case file-level errors for that file should be suppressed. As such, after AstGen, the compiler must traverse files (starting from analysis roots) and discover the set of "live files" for this update. Additionally, the compiler's previous handling of retryable file errors was not very good; the source location the error was reported as was based only on the first discovered import of that file. This source location also disappeared on future incremental updates. So, as a part of the file traversal above, we also need to figure out the source locations of imports which errors should be reported against. Another observation I made is that the "file exists in multiple modules" error was not implemented in a particularly good way (I get to say that because I wrote it!). It was subject to races, where the order in which different imports of a file were discovered affects both how errors are printed, and which module the file is arbitrarily assigned, with the latter in turn affecting which other files are considered for import. The thing I realised here is that while the AstGen worker pool is running, we cannot know for sure which module(s) a file is in; we could always discover an import later which changes the answer. So, here's how the AstGen workers have changed. We initially ensure that `zcu.import_table` contains the root files for all modules in this Zcu, even if we don't know any imports for them yet. Then, the AstGen workers do not need to be aware of modules. Instead, they simply ignore module imports, and only spin off more workers when they see a by-path import. During AstGen, we can't use module-root-relative paths, since we don't know which modules files are in; but we don't want to unnecessarily use absolute files either, because those are non-portable and can make `error.NameTooLong` more likely. As such, I have introduced a new abstraction, `Compilation.Path`. This type is a way of representing a filesystem path which has a canonical form. The path is represented relative to one of a few special directories: the lib directory, the global cache directory, or the local cache directory. As a fallback, we use absolute (or cwd-relative on WASI) paths. This is kind of similar to `std.Build.Cache.Path` with a pre-defined list of possible `std.Build.Cache.Directory`, but has stricter canonicalization rules based on path resolution to make sure deduplicating files works properly. A `Compilation.Path` can be trivially converted to a `std.Build.Cache.Path` from a `Compilation`, but is smaller, has a canonical form, and has a digest which will be consistent across different compiler processes with the same lib and cache directories (important when we serialize incremental compilation state in the future). `Zcu.File` and `Zcu.EmbedFile` both contain a `Compilation.Path`, which is used to access the file on-disk; module-relative sub paths are used quite rarely (`EmbedFile` doesn't even have one now for simplicity). After the AstGen workers all complete, we know that any file which might be imported is definitely in `import_table` and up-to-date. So, we perform a single-threaded graph traversal; similar to what `resolveReferences` plays for `AnalUnit`s, but for files instead. We figure out which files are alive, and which module each file is in. If a file turns out to be in multiple modules, we set a field on `Zcu` to indicate this error. If a file is in a different module to a prior update, we set a flag instructing `updateZirRefs` to invalidate all dependencies on the file. This traversal also discovers "import errors"; these are errors associated with a specific `@import`. With Zig's current design, there is only one possible error here: "import outside of module root". This must be identified during this traversal instead of during AstGen, because it depends on which module the file is in. I tried also representing "module not found" errors in this same way, but it turns out to be much more useful to report those in Sema, because of use cases like optional dependencies where a module import is behind a comptime-known build option. For simplicity, `failed_files` now just maps to `?[]u8`, since the source location is always the whole file. In fact, this allows removing `LazySrcLoc.Offset.entire_file` completely, slightly simplifying some error reporting logic. File-level errors are now directly built in the `std.zig.ErrorBundle.Wip`. If the payload is not `null`, it is the message for a retryable error (i.e. an error loading the source file), and will be reported with a "file imported here" note pointing to the import site discovered during the single-threaded file traversal. The last piece of fallout here is how `Builtin` works. Rather than constructing "builtin" modules when creating `Package.Module`s, they are now constructed on-the-fly by `Zcu`. The map `Zcu.builtin_modules` maps from digests to `Package.Module`s. These digests are abstract hashes of the `Builtin` value; i.e. all of the options which are placed into "builtin.zig". During the file traversal, we populate `builtin_modules` as needed, so that when we see this imports in Sema, we just grab the relevant entry from this map. This eliminates a bunch of awkward state tracking during construction of the module graph. It's also now clearer exactly what options the builtin module has, since previously it inherited some options arbitrarily from the first-created module with that "builtin" module! The user-visible effects of this commit are: retryable file errors are now consistently reported against the whole file, with a note pointing to a live import of that file * some theoretical bugs where imports are wrongly considered distinct (when the import path moves out of the cwd and then back in) are fixed * some consistency issues with how file-level errors are reported are fixed; these errors will now always be printed in the same order regardless of how the AstGen pass assigns file indices * incremental updates do not print retryable file errors differently between updates or depending on file structure/contents * incremental updates support files changing modules * incremental updates support files becoming unreferenced Resolves: #22696	2025-05-18 17:37:02 +01:00
Alex Rønne Petersen	837e0f9c37	std.Target: Remove ObjectFormat.nvptx (and associated linker code). Textual PTX is just assembly language like any other. And if we do ever add support for emitting PTX object files after reverse engineering the bytecode format, we'd be emitting ELF files like the CUDA toolchain. So there's really no need for a special ObjectFormat tag here, nor linker code that treats it as a distinct format.	2025-05-10 12:21:57 +02:00
Jacob Young	6705cbd5eb	codegen: fix packed byte-aligned relocations Closes #23131	2025-03-23 18:35:34 -04:00
Andrew Kelley	eb3c7f5706	zig build fmt	2025-02-22 17:09:20 -08:00

1 2 3 4 5 ...

588 Commits