From f3d0e1826478d6da609a78e2a4f05fa5b8b2eecc Mon Sep 17 00:00:00 2001 From: Scott Olson Date: Tue, 12 Apr 2016 18:42:28 -0600 Subject: [PATCH] report: Fill in most of the language support section, plus data layout and determinism. --- tex/report/miri-report.tex | 218 ++++++++++++++++++++++++++++++++++--- 1 file changed, 201 insertions(+), 17 deletions(-) diff --git a/tex/report/miri-report.tex b/tex/report/miri-report.tex index e588291f4e8e..9655dcdaddb0 100644 --- a/tex/report/miri-report.tex +++ b/tex/report/miri-report.tex @@ -20,10 +20,9 @@ \begin{document} \title{Miri: \\ \smaller{An interpreter for Rust's mid-level intermediate representation}} -% \subtitle{test} \author{Scott Olson\footnote{\href{mailto:scott@solson.me}{scott@solson.me}} \\ \smaller{Supervised by Christopher Dutchyn}} -\date{April 8th, 2016} +\date{April 12th, 2016} \maketitle %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -155,14 +154,15 @@ fundamentally impossible. \section{Current implementation} -Roughly halfway through my time working on Miri, Rust compiler team member Eduard -Burtescu\footnote{\href{https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made -a post on Rust's internal -forums\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's -``Rust Abstract Machine'' forum post}} about a ``Rust Abstract Machine'' specification which could -be used to implement more powerful compile-time function execution, similar to what is supported by -C++14's \mintinline{cpp}{constexpr} feature. After clarifying some of the details of the abstract -machine's data layout with Burtescu via IRC, I started implementing it in Miri. +Roughly halfway through my time working on Miri, Eduard +Burtescu\footnote{\href{https://github.com/eddyb}{Eduard Burtescu on GitHub}} from the Rust compiler +team\footnote{\href{https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made a +post on Rust's internal forums about a ``Rust Abstract Machine'' +specification\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's +``Rust Abstract Machine'' forum post}} which could be used to implement more powerful compile-time +function execution, similar to what is supported by C++14's \mintinline{cpp}{constexpr} feature. +After clarifying some of the details of the abstract machine's data layout with Burtescu via IRC, I +started implementing it in Miri. \subsection{Raw value representation} @@ -224,7 +224,7 @@ comparatively trivial. See \autoref{fig:undef} for an example undefined byte, represented by underscores. Note that there would still be a value for the second byte in the byte array, but we don't care what it is. The -bitmask would be $10_2$, i.e. \rust{[true, false]}. +bitmask would be $10_2$, i.e.\ \rust{[true, false]}. \begin{figure}[hb] \begin{minted}[autogobble]{rust} @@ -237,12 +237,179 @@ bitmask would be $10_2$, i.e. \rust{[true, false]}. \label{fig:undef} \end{figure} -% TODO(tsion): Find a place for this text. -% Making Miri work was primarily an implementation problem. Writing an interpreter which models values -% of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some -% unconventional techniques compared to many interpreters. Miri's execution remains safe even while -% simulating execution of unsafe code, which allows it to detect when unsafe code does something -% invalid. +\subsection{Computing data layout} + +Currently, the Rust compiler's data layout computations used in translation from MIR to LLVM IR are +hidden from Miri, so I do my own basic data layout computation which doesn't generally match what +translation does. In the future, the Rust compiler may be modified so that Miri can use the exact +same data layout. + +Miri's data layout calculation is a relatively simple transformation from Rust types to a basic +structure with constant size values for primitives and sets of fields with offsets for aggregate +types. These layouts are cached for performance. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Deterministic execution} +\label{sec:deterministic} + +In order to be effective as a compile-time evaluator, Miri must have \emph{deterministic execution}, +as explained by Burtescu in the ``Rust Abstract Machine'' post. That is, given a function and +arguments to that function, Miri should always produce identical results. This is important for +coherence in the type checker when constant evaluations are involved in types, such as for sizes of +array types: + +\begin{minted}[autogobble,mathescape]{rust} + const fn get_size() -> usize { /* $\ldots$ */ } + let array: [i32; get_size()]; +\end{minted} + +Since Miri allows execution of unsafe code\footnote{In fact, the distinction between safe and unsafe +doesn't exist at the MIR level.}, it is specifically designed to remain safe while interpreting +potentially unsafe code. When Miri encounters an unrecoverable error, it reports it via the Rust +compiler's usual error reporting mechanism, pointing to the part of the original code where the +error occurred. For example: + +\begin{minted}[autogobble]{rust} + let b = Box::new(42); + let p: *const i32 = &*b; + drop(b); + unsafe { *p } + // ~~ error: dangling pointer + // was dereferenced +\end{minted} +\label{dangling-pointer} + +There are more examples in Miri's +repository.\footnote{\href{https://github.com/tsion/miri/blob/master/test/errors.rs}{Miri's error +tests}} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Language support} + +In its current state, Miri supports a large proportion of the Rust language, with a few major +exceptions such as the lack of support for FFI\footnote{Foreign Function Interface, e.g.\ calling +functions defined in Assembly, C, or C++.}, which eliminates possibilities like reading and writing +files, user input, graphics, and more. The following is a tour of what is currently supported. + +\subsection{Primitives} + +Miri supports booleans and integers of various sizes and signed-ness (i.e.\ \rust{i8}, \rust{i16}, +\rust{i32}, \rust{i64}, \rust{isize}, \rust{u8}, \rust{u16}, \rust{u32}, \rust{u64}, \rust{usize}), +as well as unary and boolean operations over these types. The \rust{isize} and \rust{usize} types +will be sized according to the target machine's pointer size just like in compiled Rust. The +\rust{char} and float types (\rust{f32}, \rust{f64}) are not supported yet, but there are no known +barriers to doing so. + +When examining a boolean in an \rust{if} condition, Miri will report an error if it is not precisely +0 or 1, since this is undefined behaviour in Rust. The \rust{char} type has similar restrictions to +check for once it is implemented. + +\subsection{Pointers} + +Both references and raw pointers are supported, with essentially no difference between them in Miri. +It is also possible to do basic pointer comparisons and math. However, a few operations are +considered errors and a few require special support. + +Firstly, pointers into the same allocations may be compared for ordering, but pointers into +different allocations are considered unordered and Miri will complain if you attempt this. The +reasoning is that different allocations may have different orderings in the global address space at +runtime, making this non-deterministic. However, pointers into different allocations \emph{may} be +compared for direct equality (they are always, automatically unequal). + +Finally, for things like null pointer checks, abstract pointers (the kind represented using +relocations) may be compared against pointers casted from integers (e.g.\ \rust{0 as *const i32}). +To handle these cases, Miri has a concept of ``integer pointers'' which are always unequal to +abstract pointers. Integer pointers can be compared and operated upon freely. However, note that it +is impossible to go from an integer pointer to an abstract pointer backed by a relocation. It is not +valid to dereference an integer pointer. + +\subsubsection{Slice pointers} + +Rust supports pointers to ``dynamically-sized types'' such as \rust{[T]} and \rust{str} which +represent arrays of indeterminate size. Pointers to such types contain an address \emph{and} the +length of the referenced array. Miri supports these fully. + +\subsubsection{Trait objects} + +Rust also supports pointers to ``trait objects'' which represent some type that implements a trait, +with the specific type unknown at compile-time. These are implemented using virtual dispatch with a +vtable, similar to virtual methods in C++. Miri does not currently support this at all. + +\subsection{Aggregates} + +Aggregates include types declared as \rust{struct} or \rust{enum} as well as tuples, arrays, and +closures\footnote{Closures are essentially structs with a field for each variable captured by the +closure.}. Miri supports all common usage of all of these types. The main missing piece is to handle +\texttt{\#[repr(..)]} annotations which adjust the layout of a \rust{struct} or \rust{enum}. + +\subsection{Control flow} + +All of Rust's standard control flow features, including \rust{loop}, \rust{while}, \rust{for}, +\rust{if}, \rust{if let}, \rust{while let}, \rust{match}, \rust{break}, \rust{continue}, and +\rust{return} are supported. In fact, supporting these were quite easy since the Rust compiler +reduces them all down to a comparatively smaller set of control-flow graph primitives in MIR. + +\subsection{Closures} + +Closures are like structs containing a field for each captured variable, but closures also have an +associated function. Supporting closure function calls required some extra machinery to get the +necessary information from the compiler, but it is all supported except for one edge case on my todo +list\footnote{The edge case is calling a closure that takes a reference to its captures via a +closure interface that passes the captures by value.}. + +\subsection{Intrinsics} + +To support unsafe code, and in particular the unsafe code used to implement Rust's standard library, +it became clear that Miri would have to support calls to compiler +intrinsics\footnote{\href{https://doc.rust-lang.org/stable/std/intrinsics/index.html}{Rust +intrinsics documentation}}. Intrinsics are function calls which cause the Rust compiler to produce +special-purpose code instead of a regular function call. Miri simply recognizes intrinsic calls by +their unique ABI\footnote{Application Binary Interface, which defines calling conventions. Includes +``C'', ``Rust'', and ``rust-intrinsic''.} and name and runs special purpose code to handle them. + +An example of an important intrinsic is \rust{size_of} which will cause Miri to write the size of +the type in question to the return value location. The Rust standard library uses intrinsics heavily +to implement various data structures, so this was a major step toward supporting them. So far, I've +been implementing intrinsics on a case-by-case basis as I write test cases which require missing +ones, so I haven't yet exhaustively implemented them all. + +\subsection{Heap allocations} + +The next piece of the puzzle for supporting interesting programs (and the standard library) was heap +allocations. There are two main interfaces for heap allocation in Rust, the built-in \rust{Box} +rvalue in MIR and a set of C ABI foreign functions including \rust{__rust_allocate}, +\rust{__rust_reallocate}, and \rust{__rust_deallocate}. These correspond approximately to +\mintinline{c}{malloc}, \mintinline{c}{realloc}, and \mintinline{c}{free} in C. + +The \rust{Box} rvalue allocates enough space for a single value of a given type. This was easy to +support in Miri. It simply creates a new abstract allocation in the same manner as for +stack-allocated values, since there's no major difference between them in Miri. + +The allocator functions, which are used to implement things like Rust's standard \rust{Vec} type, +were a bit trickier. Rust declares them as \rust{extern "C" fn} so that different allocator +libraries can be linked in at the user's option. Since Miri doesn't actually support FFI and we want +full control of allocations for safety, Miri ``cheats'' and recognizes these allocator function in +essentially the same way it recognizes compiler intrinsics. Then, a call to \rust{__rust_allocate} +simply creates another abstract allocation with the requested size and \rust{__rust_reallocate} +grows one. + +In the future, Miri should also track which allocations came from \rust{__rust_allocate} so it can +reject reallocate or deallocate calls on stack allocations. + +\subsection{Destructors} + +Miri doesn't yet support calling user-defined destructors, but it has most of the machinery in place +to do so already and it's next on my to-do list. There \emph{is} support for dropping \rust{Box} +types, including deallocating their associated allocations. This is enough to properly execute the +dangling pointer example in \autoref{sec:deterministic}. + +\subsection{Standard library} +\blindtext + +\section{Unsupported} +\blindtext \begin{figure}[t] \begin{minted}[autogobble]{rust} @@ -280,6 +447,12 @@ bitmask would be $10_2$, i.e. \rust{[true, false]}. \section{Future work} +\subsection{Finishing the implementation} + +\blindtext + +\subsection{Alternative applications} + Other possible uses for Miri include: \begin{itemize} @@ -299,6 +472,17 @@ Other possible uses for Miri include: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Final thoughts} + +% TODO(tsion): Reword this. +Making Miri work was primarily an implementation problem. Writing an interpreter which models values +of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some +unconventional techniques compared to many interpreters. Miri's execution remains safe even while +simulating execution of unsafe code, which allows it to detect when unsafe code does something +invalid. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + \section{Thanks} Eduard Burtescu, Niko Matsakis, and Christopher Dutchyn.