diff options
Diffstat (limited to 'docs/InAlloca.rst')
-rw-r--r-- | docs/InAlloca.rst | 160 |
1 files changed, 160 insertions, 0 deletions
diff --git a/docs/InAlloca.rst b/docs/InAlloca.rst new file mode 100644 index 0000000..c7609cd --- /dev/null +++ b/docs/InAlloca.rst @@ -0,0 +1,160 @@ +========================================== +Design and Usage of the InAlloca Attribute +========================================== + +Introduction +============ + +The :ref:`inalloca <attr_inalloca>` attribute is designed to allow +taking the address of an aggregate argument that is being passed by +value through memory. Primarily, this feature is required for +compatibility with the Microsoft C++ ABI. Under that ABI, class +instances that are passed by value are constructed directly into +argument stack memory. Prior to the addition of inalloca, calls in LLVM +were indivisible instructions. There was no way to perform intermediate +work, such as object construction, between the first stack adjustment +and the final control transfer. With inalloca, all arguments passed in +memory are modelled as a single alloca, which can be stored to prior to +the call. Unfortunately, this complicated feature comes with a large +set of restrictions designed to bound the lifetime of the argument +memory around the call. + +For now, it is recommended that frontends and optimizers avoid producing +this construct, primarily because it forces the use of a base pointer. +This feature may grow in the future to allow general mid-level +optimization, but for now, it should be regarded as less efficient than +passing by value with a copy. + +Intended Usage +============== + +The example below is the intended LLVM IR lowering for some C++ code +that passes two default-constructed ``Foo`` objects to ``g`` in the +32-bit Microsoft C++ ABI. + +.. code-block:: c++ + + // Foo is non-trivial. + struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); }; + void g(Foo a, Foo b); + void f() { + g(Foo(), Foo()); + } + +.. code-block:: llvm + + %struct.Foo = type { i32, i32 } + declare void @Foo_ctor(%struct.Foo* %this) + declare void @Foo_dtor(%struct.Foo* %this) + declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) + + define void @f() { + entry: + %base = call i8* @llvm.stacksave() + %memargs = alloca <{ %struct.Foo, %struct.Foo }> + %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1 + call void @Foo_ctor(%struct.Foo* %b) + + ; If a's ctor throws, we must destruct b. + %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0 + invoke void @Foo_ctor(%struct.Foo* %a) + to label %invoke.cont unwind %invoke.unwind + + invoke.cont: + call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) + call void @llvm.stackrestore(i8* %base) + ... + + invoke.unwind: + call void @Foo_dtor(%struct.Foo* %b) + call void @llvm.stackrestore(i8* %base) + ... + } + +To avoid stack leaks, the frontend saves the current stack pointer with +a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the +argument stack space with alloca and calls the default constructor. The +default constructor could throw an exception, so the frontend has to +create a landing pad. The frontend has to destroy the already +constructed argument ``b`` before restoring the stack pointer. If the +constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI, +``g`` will destroy its arguments, and then the stack is restored in +``f``. + +Design Considerations +===================== + +Lifetime +-------- + +The biggest design consideration for this feature is object lifetime. +We cannot model the arguments as static allocas in the entry block, +because all calls need to use the memory at the top of the stack to pass +arguments. We cannot vend pointers to that memory at function entry +because after code generation they will alias. + +The rule against allocas between argument allocations and the call site +avoids this problem, but it creates a cleanup problem. Cleanup and +lifetime is handled explicitly with stack save and restore calls. In +the future, we may want to introduce a new construct such as ``freea`` +or ``afree`` to make it clear that this stack adjusting cleanup is less +powerful than a full stack save and restore. + +Nested Calls and Copy Elision +----------------------------- + +We also want to be able to support copy elision into these argument +slots. This means we have to support multiple live argument +allocations. + +Consider the evaluation of: + +.. code-block:: c++ + + // Foo is non-trivial. + struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); }; + Foo bar(Foo b); + int main() { + bar(bar(Foo())); + } + +In this case, we want to be able to elide copies into ``bar``'s argument +slots. That means we need to have more than one set of argument frames +active at the same time. First, we need to allocate the frame for the +outer call so we can pass it in as the hidden struct return pointer to +the middle call. Then we do the same for the middle call, allocating a +frame and passing its address to ``Foo``'s default constructor. By +wrapping the evaluation of the inner ``bar`` with stack save and +restore, we can have multiple overlapping active call frames. + +Callee-cleanup Calling Conventions +---------------------------------- + +Another wrinkle is the existence of callee-cleanup conventions. On +Windows, all methods and many other functions adjust the stack to clear +the memory used to pass their arguments. In some sense, this means that +the allocas are automatically cleared by the call. However, LLVM +instead models this as a write of undef to all of the inalloca values +passed to the call instead of a stack adjustment. Frontends should +still restore the stack pointer to avoid a stack leak. + +Exceptions +---------- + +There is also the possibility of an exception. If argument evaluation +or copy construction throws an exception, the landing pad must do +cleanup, which includes adjusting the stack pointer to avoid a stack +leak. This means the cleanup of the stack memory cannot be tied to the +call itself. There needs to be a separate IR-level instruction that can +perform independent cleanup of arguments. + +Efficiency +---------- + +Eventually, it should be possible to generate efficient code for this +construct. In particular, using inalloca should not require a base +pointer. If the backend can prove that all points in the CFG only have +one possible stack level, then it can address the stack directly from +the stack pointer. While this is not yet implemented, the plan is that +the inalloca attribute should not change much, but the frontend IR +generation recommendations may change. |