This implements a new unstable compiler flag `-Zannotate-moves` that makes
move and copy operations visible in profilers by creating synthetic debug
information. This is achieved with zero runtime cost by manipulating debug
info scopes to make moves/copies appear as calls to `compiler_move<T, SIZE>`
and `compiler_copy<T, SIZE>` marker functions in profiling tools.
This allows developers to identify expensive move/copy operations in their
code using standard profiling tools, without requiring specialized tooling
or runtime instrumentation.
The implementation works at codegen time. When processing MIR operands
(`Operand::Move` and `Operand::Copy`), the codegen creates an `OperandRef`
with an optional `move_annotation` field containing an `Instance` of the
appropriate profiling marker function. When storing the operand,
`store_with_annotation()` wraps the store operation in a synthetic debug
scope that makes it appear inlined from the marker.
Two marker functions (`compiler_move` and `compiler_copy`) are defined
in `library/core/src/profiling.rs`. These are never actually called -
they exist solely as debug info anchors.
Operations are only annotated if the type:
- Meets the size threshold (default: 65 bytes, configurable via
`-Zannotate-moves=SIZE`)
- Has a non-scalar backend representation (scalars use registers,
not memcpy)
This has a very small size impact on object file size. With the default
limit it's well under 0.1%, and even with a very small limit of 8 bytes
it's still ~1.5%. This could be enabled by default.
Make sure that compiler and linker don't optimize the section's contents
away by adding the global holding the data to "llvm.used". The volatile
load in the main shim is retained because "llvm.used", which translates
to SHF_GNU_RETAIN on ELF targets, requires a reasonably recent linker;
emitting the volatile load ensures compatibility with older linkers, at
least when libstd is used.
Pretty printers in dylib dependencies are now emitted by the main crate
instead of the dylib; apart from matching how rlibs are handled, this
approach has the advantage that `omit_gdb_pretty_printer_section` keeps
working with dylib dependencies.
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
This is continuation of https://github.com/rust-lang/rust/pull/132282 .
I'm pretty sure I did everything right. In particular, I searched all occurrences of `Lrc` in submodules and made sure that they don't need replacement.
There are other possibilities, through.
We can define `enum Lrc<T> { Rc(Rc<T>), Arc(Arc<T>) }`. Or we can make `Lrc` a union and on every clone we can read from special thread-local variable. Or we can add a generic parameter to `Lrc` and, yes, this parameter will be everywhere across all codebase.
So, if you think we should take some alternative approach, then don't merge this PR. But if it is decided to stick with `Arc`, then, please, merge.
cc "Parallel Rustc Front-end" ( https://github.com/rust-lang/rust/issues/113349 )
r? SparrowLii
`@rustbot` label WG-compiler-parallel
Revert most of #133194 (except the test and the comment fixes). Then refix
not emitting locations at all when the correct location discriminator value
exceeds LLVM's capacity.
The maximum discriminator value LLVM can currently encode is 2^12. If macro use
results in more than 2^12 calls to the same function attributed to the same
callsite, and those calls are MIR-inlined, we will require more than the maximum
discriminator value to completely represent the debug information. Once we reach
that point drop the debug info instead.
Supertraits of `BuilderMethods` are all called `XyzBuilderMethods`.
Supertraits of `CodegenMethods` are all called `XyzMethods`. This commit
changes the latter to `XyzCodegenMethods`, for consistency.
Because constants are currently emitted *before* the prologue, leaving the
debug location on the IRBuilder spills onto other instructions in the prologue
and messes up both line numbers as well as the point LLVM chooses to be the
prologue end.
Example LLVM IR (irrelevant IR elided):
Before:
define internal { i64, i64 } @_ZN3tmp3Foo18var_return_opt_try17he02116165b0fc08cE(ptr align 8 %self) !dbg !347 {
start:
%self.dbg.spill = alloca [8 x i8], align 8
%_0 = alloca [16 x i8], align 8
%residual.dbg.spill = alloca [0 x i8], align 1
#dbg_declare(ptr %residual.dbg.spill, !353, !DIExpression(), !357)
store ptr %self, ptr %self.dbg.spill, align 8, !dbg !357
#dbg_declare(ptr %self.dbg.spill, !350, !DIExpression(), !358)
After:
define internal { i64, i64 } @_ZN3tmp3Foo18var_return_opt_try17h00b17d08874ddd90E(ptr align 8 %self) !dbg !347 {
start:
%self.dbg.spill = alloca [8 x i8], align 8
%_0 = alloca [16 x i8], align 8
%residual.dbg.spill = alloca [0 x i8], align 1
#dbg_declare(ptr %residual.dbg.spill, !353, !DIExpression(), !357)
store ptr %self, ptr %self.dbg.spill, align 8
#dbg_declare(ptr %self.dbg.spill, !350, !DIExpression(), !358)
Note in particular how !357 from %residual.dbg.spill's dbg_declare no longer
falls through onto the store to %self.dbg.spill. This fixes argument values
at entry when the constant is a ZST (e.g. <Option as Try>::Residual). This
fixes#130003 (but note that it does *not* fix issues with argument values and
non-ZST constants, which emit their own stores that have debug info on them,
like #128945).
Before this commit all vtables would have the same name "vtable" in
debuginfo. Now they get a name that identifies the implementing type
and the trait that is being implemented.