summaryrefslogtreecommitdiffstats
path: root/docs/InAlloca.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/InAlloca.rst')
-rw-r--r--docs/InAlloca.rst160
1 files changed, 160 insertions, 0 deletions
diff --git a/docs/InAlloca.rst b/docs/InAlloca.rst
new file mode 100644
index 0000000..c7609cd
--- /dev/null
+++ b/docs/InAlloca.rst
@@ -0,0 +1,160 @@
+==========================================
+Design and Usage of the InAlloca Attribute
+==========================================
+
+Introduction
+============
+
+The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
+taking the address of an aggregate argument that is being passed by
+value through memory. Primarily, this feature is required for
+compatibility with the Microsoft C++ ABI. Under that ABI, class
+instances that are passed by value are constructed directly into
+argument stack memory. Prior to the addition of inalloca, calls in LLVM
+were indivisible instructions. There was no way to perform intermediate
+work, such as object construction, between the first stack adjustment
+and the final control transfer. With inalloca, all arguments passed in
+memory are modelled as a single alloca, which can be stored to prior to
+the call. Unfortunately, this complicated feature comes with a large
+set of restrictions designed to bound the lifetime of the argument
+memory around the call.
+
+For now, it is recommended that frontends and optimizers avoid producing
+this construct, primarily because it forces the use of a base pointer.
+This feature may grow in the future to allow general mid-level
+optimization, but for now, it should be regarded as less efficient than
+passing by value with a copy.
+
+Intended Usage
+==============
+
+The example below is the intended LLVM IR lowering for some C++ code
+that passes two default-constructed ``Foo`` objects to ``g`` in the
+32-bit Microsoft C++ ABI.
+
+.. code-block:: c++
+
+ // Foo is non-trivial.
+ struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
+ void g(Foo a, Foo b);
+ void f() {
+ g(Foo(), Foo());
+ }
+
+.. code-block:: llvm
+
+ %struct.Foo = type { i32, i32 }
+ declare void @Foo_ctor(%struct.Foo* %this)
+ declare void @Foo_dtor(%struct.Foo* %this)
+ declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
+
+ define void @f() {
+ entry:
+ %base = call i8* @llvm.stacksave()
+ %memargs = alloca <{ %struct.Foo, %struct.Foo }>
+ %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
+ call void @Foo_ctor(%struct.Foo* %b)
+
+ ; If a's ctor throws, we must destruct b.
+ %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
+ invoke void @Foo_ctor(%struct.Foo* %a)
+ to label %invoke.cont unwind %invoke.unwind
+
+ invoke.cont:
+ call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
+ call void @llvm.stackrestore(i8* %base)
+ ...
+
+ invoke.unwind:
+ call void @Foo_dtor(%struct.Foo* %b)
+ call void @llvm.stackrestore(i8* %base)
+ ...
+ }
+
+To avoid stack leaks, the frontend saves the current stack pointer with
+a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the
+argument stack space with alloca and calls the default constructor. The
+default constructor could throw an exception, so the frontend has to
+create a landing pad. The frontend has to destroy the already
+constructed argument ``b`` before restoring the stack pointer. If the
+constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI,
+``g`` will destroy its arguments, and then the stack is restored in
+``f``.
+
+Design Considerations
+=====================
+
+Lifetime
+--------
+
+The biggest design consideration for this feature is object lifetime.
+We cannot model the arguments as static allocas in the entry block,
+because all calls need to use the memory at the top of the stack to pass
+arguments. We cannot vend pointers to that memory at function entry
+because after code generation they will alias.
+
+The rule against allocas between argument allocations and the call site
+avoids this problem, but it creates a cleanup problem. Cleanup and
+lifetime is handled explicitly with stack save and restore calls. In
+the future, we may want to introduce a new construct such as ``freea``
+or ``afree`` to make it clear that this stack adjusting cleanup is less
+powerful than a full stack save and restore.
+
+Nested Calls and Copy Elision
+-----------------------------
+
+We also want to be able to support copy elision into these argument
+slots. This means we have to support multiple live argument
+allocations.
+
+Consider the evaluation of:
+
+.. code-block:: c++
+
+ // Foo is non-trivial.
+ struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
+ Foo bar(Foo b);
+ int main() {
+ bar(bar(Foo()));
+ }
+
+In this case, we want to be able to elide copies into ``bar``'s argument
+slots. That means we need to have more than one set of argument frames
+active at the same time. First, we need to allocate the frame for the
+outer call so we can pass it in as the hidden struct return pointer to
+the middle call. Then we do the same for the middle call, allocating a
+frame and passing its address to ``Foo``'s default constructor. By
+wrapping the evaluation of the inner ``bar`` with stack save and
+restore, we can have multiple overlapping active call frames.
+
+Callee-cleanup Calling Conventions
+----------------------------------
+
+Another wrinkle is the existence of callee-cleanup conventions. On
+Windows, all methods and many other functions adjust the stack to clear
+the memory used to pass their arguments. In some sense, this means that
+the allocas are automatically cleared by the call. However, LLVM
+instead models this as a write of undef to all of the inalloca values
+passed to the call instead of a stack adjustment. Frontends should
+still restore the stack pointer to avoid a stack leak.
+
+Exceptions
+----------
+
+There is also the possibility of an exception. If argument evaluation
+or copy construction throws an exception, the landing pad must do
+cleanup, which includes adjusting the stack pointer to avoid a stack
+leak. This means the cleanup of the stack memory cannot be tied to the
+call itself. There needs to be a separate IR-level instruction that can
+perform independent cleanup of arguments.
+
+Efficiency
+----------
+
+Eventually, it should be possible to generate efficient code for this
+construct. In particular, using inalloca should not require a base
+pointer. If the backend can prove that all points in the CFG only have
+one possible stack level, then it can address the stack directly from
+the stack pointer. While this is not yet implemented, the plan is that
+the inalloca attribute should not change much, but the frontend IR
+generation recommendations may change.