Commit Graph

173 Commits

Author SHA1 Message Date
Luuk de Gram 8033767082 wasm-linker: Implement linker tests (#12006)
* test/link: initial wasm support

This adds basic parsing and dumping of wasm section so they
can be tested using the new linker-test infrastructure.

* test/link: all wasm sections parsing and dumping

We now parse and dump all sections for the wasm binary format.
Currently, this only dumps the name of a custom section.
Later this should also dump symbol table, name, linking metadata and relocations.
All of those live within the custom sections.

* Add wasm linker test

This also fixes a parser mistake in reading the flags.

* test/link: implement linker tests wasm & fixes

Adds several test cases to test the wasm self-hosted linker.
This also introduces fixes that were caught during the implementation
of those tests.

* test-runner: obey omit_stage2 for standalone

When a standalone test requires stage2, but stage2 is omit
from the compiler, such test case will not be included as part
of the test suite that is being ran. This is to support CI's
where we omit stage2 to lower the memory usage.
2022-07-12 14:36:33 +02:00
Jakub Konka efc5c97bff macho: implement -dead_strip_dylibs linker flag 2022-06-27 19:53:38 +02:00
Jakub Konka 08459ff1c2 Merge pull request #11933 from Luukdegram/wasm-link-bss
stage2: wasm-linker - Place decls in the correct segment and order segments
2022-06-25 22:51:06 +02:00
Luuk de Gram 140bac6395 link/wasm: Sort data segments
We now ensure the "bss" section is last, which allows us to not
emit this section and let the runtime initialize the memory with 0's instead.
This allows for smaller binaries.
The order of the other segments is arbitrary and does not matter, this may
change in the future.
2022-06-25 18:36:56 +02:00
Jakub Konka 589bf67635 macho: implement -headerpad_max_install_names 2022-06-25 18:04:40 +02:00
Jakub Konka dfdb807543 cache setting macho search strategy flags 2022-06-25 10:50:00 +02:00
Luuk de Gram e32a5ba78b link/wasm: Put decls into the correct segments
Decls will now be put into their respective segment.
e.g. a constant decl will be inserted into the "rodata" segment,
whereas an uninitialized decl will be put in the "bss" segment instead.
2022-06-24 22:01:41 +02:00
Luuk de Gram 7c87f9c828 link:clarification & enable MachO getGlobalSymbol
This adds clarification to the getGlobalSymbol doc comments,
as well as renames the `addExternFn` function for MachO to `getGlobalSymbol`.
This function will now be called from 'src/link.zig' as well.

Finally, this also enables compiling zig's libc using LLVM even though
the `fno-LLVM` flag is given.
2022-06-24 08:12:17 +02:00
Luuk de Gram 6ae898b244 wasm: more f16 support and cleanup of intrinsics
`genFunctype` now accepts calling convention, param types, and return type
as part of its function signature rather than `fnData`. This means
we no longer have to create a dummy for our intrinsic call abstraction.
This also adds support for f16 division and builtins such as `@ceil` & more.
2022-06-24 08:12:17 +02:00
Luuk de Gram a6747d328c stage2: Enable compiler-rt when LLVM is existant
Rather than checking if the user wants to use LLVM for the current compilation,
check for the existance of LLVM as part of the compiler. This is temporarily,
until other backends gain the ability to compiler LLVM themselves.
This means that when a user passed `-fno-LLVM` we will use the native
backend for the user's code, but use LLVM for compiler-rt.

This also fixes emitting names for symbols in the Wasm linker,
by deduplicating symbol names when multiple symbols point the same object.
2022-06-24 08:12:17 +02:00
Luuk de Gram c9f929a18b fix memory leaks 2022-06-24 08:12:17 +02:00
Luuk de Gram 16daf3f3bc wasm-link: Discard old symbols correctly
When a new symbol is resolved to an existing symbol where
it doesn't overwrite the existing symbol, we now add this symbol
to the discarded list. This is required so when any relocation points
to the symbol, we can retrieve the correct symbol it's resolved by instead.
2022-06-24 08:12:17 +02:00
Luuk de Gram cb28fc2e63 wasm-linker: Resolve symbols from archives
Lazily load object files by default, and only load the object file
when an unresolved symbol has been found within an archive.
2022-06-24 08:12:17 +02:00
Luuk de Gram 4d3715d89f wasm-linker: de-duplicate functions+atom sorting
Multiple symbols can point to the same function, this means that when we loop over
the symbol list, we must deduplicate those functions being added twice.
Additionaly, we must also ensure that when we append a new type and set the type
index on a function, we must not do this again for the same function.

This commit also implements sorting of code atoms to ensure their order matches
the order of the function section to ensure the function signature matches
that of the function body.
2022-06-24 08:12:17 +02:00
Luuk de Gram 8d03e4fc6b link: Implement API to get global symbol index 2022-06-24 08:12:17 +02:00
Luuk de Gram 359b61aec3 wasm: Create compiler-rt symbols and lowering
Implements the creation of an undefined symbol for a compiler-rt intrinsic.
Also implements the building of the function call to said compiler-rt intrinsic.
2022-06-24 08:12:17 +02:00
Motiejus Jakštys 98138ba78c [MachO] add -pagezero_size
Pass `-pagezero_size` to the MachO linker. This is the final
"unsupported linker arg" that I could chase that CGo uses. After this
and #11874 we may be able to fail on an "unsupported linker arg" instead
of emiting a warning.

Test case:

    zig=/code/zig/build/zig
    CGO_ENABLED=1 GOOS=darwin GOARCH=amd64 CC="$zig cc -target x86_64-macos" CXX="$zig c++ -target x86_64-macos" go build -a -ldflags "-s -w" cgo.go

I compiled a trivial CGo program and executed it on an amd64 Darwin
host.

To be honest, I am not entirely sure what this is doing. This feels
right after reading what this argument does in LLVM sources, but I am by
no means qualified to make MachO pull requests. Will take feedback.
2022-06-20 13:39:33 +02:00
Jakub Konka 2259d629d3 compiler_rt: use single cache for libcompiler_rt.a static lib 2022-06-17 16:38:59 -07:00
Jakub Konka 80790be309 compiler_rt: compile each unit separately for improved archiving 2022-06-17 16:38:59 -07:00
Ali Chraghi 58943fc627 wasm-linker: add -mwasm64 linker parameter for wasm64 target 2022-05-20 08:26:41 +02:00
Motiejus Jakštys 1d532f12b5 [Elf] add -z nocopyreloc
Warnings about non-implemented `-z nocopyreloc` are common when
compiling go code (including Go's tests themselves). Let's just
make it stop complaining.
2022-05-19 20:21:07 -04:00
Luuk de Gram 62453496ba wasm: Write nops for padding debug info 2022-05-09 18:51:46 +02:00
Luuk de Gram 2ae2ac33d9 wasm: Emit debug sections
This commit adds the ability to emit the following debug sections:
.debug_info
.debug_abbrev
.debug_line
.debug_str

Line information and files are now being loaded correctly by browser debuggers.
2022-05-09 18:51:46 +02:00
Luuk de Gram 9b6b7034c2 wasm: Flush debug information + commit decl
This implements parts to commit a decl's debug information into
a linear memory buffer. The goal is to write this buffer at once
after we finished linking.
2022-05-09 18:51:46 +02:00
Luuk de Gram 33b2f4f382 wasm: Implement debug info for parameters 2022-05-09 18:51:46 +02:00
Luuk de Gram 8e1c220be2 wasm: Add basic debug info references 2022-05-09 18:51:46 +02:00
Jimmi Holst Christensen a0a2ce92ca std: Do not allocate the result for ChildProcess.init
Instead, just return ChildProcess directly. This structure does not
require a stable address, so we can put it on the stack just fine. If
someone wants it on the heap they should do.

  const proc = try allocator.create(ChildProcess);
  proc.* = ChildProcess.init(args, allocator);
2022-04-29 22:50:34 -04:00
Andrew Kelley 31758f79db link: Wasm: don't assume we have a zig module 2022-04-20 18:14:38 -07:00
Andrew Kelley f7596ae942 stage2: use indexes for Decl objects
Rather than allocating Decl objects with an Allocator, we instead allocate
them with a SegmentedList. This provides four advantages:
 * Stable memory so that one thread can access a Decl object while another
   thread allocates additional Decl objects from this list.
 * It allows us to use u32 indexes to reference Decl objects rather than
   pointers, saving memory in Type, Value, and dependency sets.
 * Using integers to reference Decl objects rather than pointers makes
   serialization trivial.
 * It provides a unique integer to be used for anonymous symbol names,
   avoiding multi-threaded contention on an atomic counter.
2022-04-20 17:37:35 -07:00
Luuk de Gram be08d2bdbd wasm: Fix unreachable paths
When the last instruction is a debug instruction, the type of it is void.
Similarly for 'noreturn' emit an 'unreachable' instruction to tell the wasm-validator
the path cannot be reached.

Also respect the '--strip' flag in the self-hosted wasm linker and not emit a 'name' section
when the flag is set to `true`.
2022-04-19 19:58:49 +02:00
Andrew Kelley a7c05c06be stage2: expose progress bar API to linker backends
This gives us insight as to what is happening when we are waiting for
things such as LLVM emit object and LLD linking.
2022-04-17 04:09:35 -07:00
Luuk de Gram d66c61a2cf wasm-linker: Prevent overalignment for segments
Previously, the data segments were being aligned twice.
This caused us to overalign the segment and therefore allocate a much larger
size for each segment than was required. This fix ensures we align and set the size
just once, ensuring semantically correct binaries as well as smaller binaries.
2022-04-14 22:53:13 +02:00
Luuk de Gram cf37101108 wasm-linker: Add function table indexes
When linking with an object file, verify if a relocation is a table index relocation.
If that's the case, add the relocation target to the function table.
2022-04-14 22:53:13 +02:00
Andrew Kelley 2587474717 stage2: progress towards stage3
* The `@bitCast` workaround is removed in favor of `@ptrCast` properly
   doing element casting for slice element types. This required an
   enhancement both to stage1 and stage2.
 * stage1 incorrectly accepts `.{}` instead of `{}`. stage2 code that
   abused this is fixed.
 * Make some parameters comptime to support functions in switch
   expressions (as opposed to making them function pointers).
 * Avoid relying on local temporaries being mutable.
 * Workarounds for when stage1 and stage2 disagree on function pointer
   types.
 * Workaround recursive formatting bug with a `@panic("TODO")`.
 * Remove unreachable `else` prongs for some inferred error sets.

All in effort towards #89.
2022-04-14 10:12:45 -07:00
Luuk de Gram 97448e4d5f wasm: Only generate import when referenced
Rather than creating an import for externs on updateDecl, we now
generate them when they're referenced. This is required so using @TypeOf(extern_fn())
will not emit the import into the binary (causing an incorrect function type index
as it won't be fully analyzed).
2022-03-26 21:20:29 +01:00
Luuk de Gram 49051c0651 wasm: Implement @errorName
This implements the `error_name` instruction, which is emit for runtime `@errorName` callsites.

The implementation works by creating 2 symbols and corresponding atoms.
The initial symbol contains a table which each element consisting of a slice where the ptr field
points towards the error name, and the len field contains the error name length without the sentinel.

The secondary symbol contains a list of all error names from the global error set.

During the error_name instruction, we first get a pointer to the first symbol.
Then based on the operand we perform pointer arithmetic, to get the correct index into this table.
e.g. error index 2 = ptr + (2 * ptr size). The result of this will be stored in a local
and then returned as instruction result.

During `flush()` we populate the error names table by looping over the global error set
and creating a relocation for each error name. This relocation is appended to the table symbol.
Then finally, this name is written to the names list itself.

Finally, both symbols' atom are allocated within the rest of the binary.
When no error name is referenced, the `error_name_symbol` is never set, and therefore
no error name table will be emit into the final binary.
2022-03-23 21:40:32 +01:00
Luuk de Gram c7e4c711fc wasm: Fix incremental compilation
- atoms may have relocations, so freeing them when we update the parent
atom will cause segfaults.
- Not all declarations will live in symbol_atom
2022-03-06 23:33:50 +01:00
Jakub Konka 27c084065a Merge pull request #11070 from Luukdegram/wasm-unify
stage2: wasm - unify codegen with other backends
2022-03-06 20:44:51 +01:00
Luuk de Gram 6d84f22fa0 stage2: Fix wasm linker for llvm backend
This fixes 2 entrypoints within the self-hosted wasm linker that would be called
for the llvm backend, whereas we should simply call into the llvm backend to perform such action.
i.e. not allocate a decl index when we have an llvm object, and when flushing a module,
we should be calling it on llvm's object, rather than have the wasm linker perform the operation.

Also, this fixes the wasm intrinsics for wasm.memory.size and wasm.memory.grow.
Lastly, this commit ensures that when an extern function is being resolved, we tell LLVM how
to import such function.
2022-03-06 14:17:36 -05:00
Luuk de Gram 13fca53b92 wasm: Unify function generation
Like decl code generation, also unify the wasm backend and the wasm linker to call into
the general purpose `codegen.zig` to generate the code for a function.
2022-03-06 19:38:53 +01:00
Luuk de Gram 70fc6e3776 wasm: call into generateSymbol when lowering
This also unifies the wasm backend to use `generateSymbol` when lowering a constant
that cannot be lowered to an immediate value.
As both decls and constants are now refactored, the old `genTypedValue` is removed.
2022-03-06 19:38:53 +01:00
Luuk de Gram 5a45fe2dba wasm: Call generateSymbol for updateDecl
To unify the wasm backend with the other backends, we will now call `generateSymbol` to
lower a Decl into bytes. This means we also have to change some function signatures
to comply with the linker interface.

Since the general purpose generateSymbol is less featureful than wasm's, some tests are
temporarily disabled.
2022-03-06 19:38:50 +01:00
Luuk de Gram f5a31cb0d6 wasm-linker: Intern globals, exports & imports
Symbols that have globals used to have their lookup key be the symbol name.
This key is now the offset into the string table.

Imports have both the module name (library name) and name (of the symbol), those strings are now
also being interned. This can save us up to 24bytes per import which have both their module name and name de-duplicated.
Module names are almost entirely the same for all imports, providing us with a big chance of saving us 12 bytes at least.

Just like imports, exports can also have a seperate name than the internal symbol name. Rather than storing the slice,
we now store the offset of this string instead.
2022-03-01 08:35:20 +01:00
Luuk de Gram b1159ab7ae wasm-linker: Intern all symbol names
For all symbols read from object files as well as generated from Zig code
will now be interned and have their offset into the string table saved on the `Symbol` instead.

Besides interning, local symbols now also use a decl's fully qualified name.
When a decl/symbol is extern/to-be-imported, the name of the decl itself will be used for symbol resolving.
Similarly for symbols that will be exported, will have their 'export name' set.
2022-03-01 08:35:20 +01:00
Luuk de Gram 49f01c0a0c wasm-object: Use given allocator rather than arena
This is preliminary work for string interning in the wasm linker.
Using an arena would defeat the purpose of de-duplicating strings as we wouldn't be able to free memory
of duplicated strings.
This change also means we can simplify wasm binary parsing, by creating a general purpose parser that
parses the binary into its sections, but untyped. Doing this, allows us to re-use the base of that, for
object file, but also debug info parsing.
2022-03-01 08:35:20 +01:00
Luuk de Gram f4adb53bcf wasm: Refactor lowerUnnamedConst
Rather than ping ponging between codegen and the linker to generate the symbols/atoms
for a local constant and its relocations. We now create all neccesary objects within the linker.

This simplifies the code as we can now simply call `lowerUnnamedConst` from anywhere in codegen,
allowing us to further improve lowering constants into .rodata so we do not have to sacrifice
lowering certain types such as decl_ref's where its type is a slice.
2022-02-25 09:33:15 +01:00
Luuk de Gram acec06cfaf wasm-linker: Implement updateDeclExports
We now correctly implement exporting decls. This means it is possible to export
a decl with a different name than the decl that is doing the export.
This also sets the symbols with the correct flags, so when we emit a relocatable
object file, a linker can correctly resolve symbols and/or export the symbol to the host environment.

This commit also includes fixes to ensure relocations have the correct offset to how other
linkers will expect the offset, rather than what we use internally.
Other linkers accept the offset, relative to the section.
Internally we use an offset relative to the atom.
2022-02-23 16:07:36 +01:00
Luuk de Gram 0a48a763fd wasm-linker: Emit relocations for object files
When generating a relocatable object file, we now emit a custom "reloc.CODE" and "reloc.DATA" section
which will contain the relocations for each section.

Using a new symbol location -> Atom mapping, we can now easily find the corresponding `Atom` from a symbol.
This can be used to construct the symbol table, as well as easier access to a target atom when performing
a relocation for a data symbol.
2022-02-23 16:07:36 +01:00
Luuk de Gram 2b0431a8d3 wasm-linker: Do not merge data segments for obj
When creating a relocatable object file, we do no longer perform the following actions:
- Merge data segments
- Calculate stack size
- Relocations

We now also make the stack pointer symbol `undefined` for this use case as well as add the symbol
as an import.
2022-02-23 16:07:36 +01:00
Luuk de Gram daf741318e wasm-linker: Emit segment info
When creating a relocatable object file, also emit the segment information
2022-02-23 16:07:36 +01:00