summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorChris Lattner <sabre@nondot.org>2004-06-04 00:16:02 +0000
committerChris Lattner <sabre@nondot.org>2004-06-04 00:16:02 +0000
commitec94f80b077ada57fd16e673d4bdab2bb0728e71 (patch)
treeb964533f456b553450011088342236f56717cd01
parent994d7ae649f6b1d11015d1c295bd656772c5b734 (diff)
downloadexternal_llvm-ec94f80b077ada57fd16e673d4bdab2bb0728e71.zip
external_llvm-ec94f80b077ada57fd16e673d4bdab2bb0728e71.tar.gz
external_llvm-ec94f80b077ada57fd16e673d4bdab2bb0728e71.tar.bz2
Fix PR356: [doc] lib/Target/X86/README.txt needs update
Also add some documentation about how instructions work git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14006 91177308-0d34-0410-b5e6-96231b3b80d8
-rw-r--r--docs/CodeGenerator.html281
1 files changed, 280 insertions, 1 deletions
diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html
index 2145449..0dddad6 100644
--- a/docs/CodeGenerator.html
+++ b/docs/CodeGenerator.html
@@ -30,6 +30,9 @@
</ul>
</li>
<li><a href="#codegendesc">Machine code description classes</a>
+ <ul>
+ <li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
+ </ul>
</li>
<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
</li>
@@ -61,7 +64,7 @@
suite of reusable components for translating the LLVM internal representation to
the machine code for a specified target -- either in assembly form (suitable for
a static compiler) or in binary machine code format (usable for a JIT compiler).
-The LLVM target-independent code generator consists of four main components:</p>
+The LLVM target-independent code generator consists of five main components:</p>
<ol>
<li><a href="#targetdesc">Abstract target description</a> interfaces which
@@ -84,6 +87,11 @@ the components provided by LLVM, and can optionally provide custom
target-specific passes, to build complete code generators for a specific target.
Target descriptions live in <tt>lib/Target/</tt>.</li>
+<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
+completely target independent (it uses the <tt>TargetJITInfo</tt> structure to
+interface for target-specific issues. The code for the target-independent
+JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
+
</ol>
<p>
@@ -345,7 +353,278 @@ href="TableGenFundamentals.html">TableGen</a> description of the register file.
</div>
<!-- *********************************************************************** -->
+<div class="doc_text">
+
+<p>
+At the high-level, LLVM code is translated to a machine specific representation
+formed out of MachineFunction, MachineBasicBlock, and <a
+href="#machineinstr"><tt>MachineInstr</tt></a> instances
+(defined in include/llvm/CodeGen). This representation is completely target
+agnostic, representing instructions in their most abstract form: an opcode and a
+series of operands. This representation is designed to support both SSA
+representation for machine code, as well as a register allocated, non-SSA form.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="machineinstr">The <tt>MachineInstr</tt> class</a>
+</div>
+
+<div class="doc_text">
+
+<p>Target machine instructions are represented as instances of the
+<tt>MachineInstr</tt> class. This class is an extremely abstract way of
+representing machine instructions. In particular, all it keeps track of is
+an opcode number and some number of operands.</p>
+
+<p>The opcode number is an simple unsigned number that only has meaning to a
+specific backend. All of the instructions for a target should be defined in
+the <tt>*InstrInfo.td</tt> file for the target, and the opcode enum values
+are autogenerated from this description. The <tt>MachineInstr</tt> class does
+not have any information about how to intepret the instruction (i.e., what the
+semantics of the instruction are): for that you must refer to the
+<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
+
+<p>The operands of a machine instruction can be of several different types:
+they can be a register reference, constant integer, basic block reference, etc.
+In addition, a machine operand should be marked as a def or a use of the value
+(though only registers are allowed to be defs).</p>
+
+<p>By convention, the LLVM code generator orders instruction operands so that
+all register definitions come before the register uses, even on architectures
+that are normally printed in other orders. For example, the sparc add
+instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
+and stores the result into the "%i3" register. In the LLVM code generator,
+the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the destination
+first.</p>
+
+<p>Keeping destination operands at the beginning of the operand list has several
+advantages. In particular, the debugging printer will print the instruction
+like this:</p>
+
+<pre>
+ %r3 = add %i1, %i2
+</pre>
+
+<p>If the first operand is a def, and it is also easier to <a
+href="#buildmi">create instructions</a> whose only def is the first
+operand.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
+</div>
+
+<div class="doc_text">
+
+<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
+located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
+<tt>BuildMI</tt> functions make it easy to build arbitrary machine
+instructions. Usage of the <tt>BuildMI</tt> functions look like this:
+</p>
+
+<pre>
+ // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
+ // instruction. The '1' specifies how many operands will be added.
+ MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
+
+ // Create the same instr, but insert it at the end of a basic block.
+ MachineBasicBlock &amp;MBB = ...
+ BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
+
+ // Create the same instr, but insert it before a specified iterator point.
+ MachineBasicBlock::iterator MBBI = ...
+ BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
+
+ // Create a 'cmp Reg, 0' instruction, no destination reg.
+ MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
+ // Create an 'sahf' instruction which takes no operands and stores nothing.
+ MI = BuildMI(X86::SAHF, 0);
+
+ // Create a self looping branch instruction.
+ BuildMI(MBB, X86::JNE, 1).addMBB(&amp;MBB);
+</pre>
+
+<p>
+The key thing to remember with the <tt>BuildMI</tt> functions is that you have
+to specify the number of operands that the machine instruction will take
+(allowing efficient memory allocation). Also, if operands default to be uses
+of values, not definitions. If you need to add a definition operand (other
+than the optional destination register), you must explicitly mark it as such.
+</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="fixedregs">Fixed (aka preassigned) registers</a>
+</div>
+
+<div class="doc_text">
+
+<p>One important issue that the code generator needs to be aware of is the
+presence of fixed registers. In particular, there are often places in the
+instruction stream where the register allocator <em>must</em> arrange for a
+particular value to be in a particular register. This can occur due to
+limitations in the instruction set (e.g., the X86 can only do a 32-bit divide
+with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like calling
+conventions. In any case, the instruction selector should emit code that
+copies a virtual register into or out of a physical register when needed.</p>
+
+<p>For example, consider this simple LLVM example:</p>
+
+<pre>
+ int %test(int %X, int %Y) {
+ %Z = div int %X, %Y
+ ret int %Z
+ }
+</pre>
+
+<p>The X86 instruction selector produces this machine code for the div
+and ret (use
+"<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to get this):</p>
+
+<pre>
+ ;; Start of div
+ %EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
+ %reg1027 = sar %reg1024, 31
+ %EDX = mov %reg1027 ;; Sign extend X into EDX
+ idiv %reg1025 ;; Divide by Y (in reg1025)
+ %reg1026 = mov %EAX ;; Read the result (Z) out of EAX
+
+ ;; Start of ret
+ %EAX = mov %reg1026 ;; 32-bit return value goes in EAX
+ ret
+</pre>
+
+<p>By the end of code generation, the register allocator has coallesced
+the registers and deleted the resultant identity moves, producing the
+following code:</p>
+
+<pre>
+ ;; X is in EAX, Y is in ECX
+ mov %EAX, %EDX
+ sar %EDX, 31
+ idiv %ECX
+ ret
+</pre>
+
+<p>This approach is extremely general (if it can handle the X86 architecture,
+it can handle anything!) and allows all of the target specific
+knowledge about the instruction stream to be isolated in the instruction
+selector. Note that physical registers should have a short lifetime for good
+code generation, and all physical registers are assumed dead on entry and
+exit of basic blocks (before register allocation). Thus if you need a value
+to be live across basic block boundaries, it <em>must</em> live in a virtual
+register.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ssa">Machine code SSA form</a>
+</div>
+
+<div class="doc_text">
+<p><tt>MachineInstr</tt>'s are initially instruction selected in SSA-form, and
+are maintained in SSA-form until register allocation happens. For the most
+part, this is trivially simple since LLVM is already in SSA form: LLVM PHI nodes
+become machine code PHI nodes, and virtual registers are only allowed to have a
+single definition.</p>
+
+<p>After register allocation, machine code is no longer in SSA-form, as there
+are no virtual registers left in the code.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+ <a name="targetimpls">Target description implementations</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>This section of the document explains any features or design decisions that
+are specific to the code generator for a particular target.</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="x86">The X86 backend</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
+code generator currently targets a generic P6-like processor. As such, it
+produces a few P6-and-above instructions (like conditional moves), but it does
+not make use of newer features like MMX or SSE. In the future, the X86 backend
+will have subtarget support added for specific processor families and
+implementations.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The x86 has a very, uhm, flexible, way of accessing memory. It is capable of
+forming memory addresses of the following expression directly in integer
+instructions (which use ModR/M addressing):</p>
+
+<pre>
+ Base+[1,2,4,8]*IndexReg+Disp32
+</pre>
+
+<p>Wow, that's crazy. In order to represent this, LLVM tracks no less that 4
+operands for each memory operand of this form. This means that the "load" form
+of 'mov' has the following "Operands" in this order:</p>
+
+<pre>
+Index: 0 | 1 2 3 4
+Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement
+OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm
+</pre>
+
+<p>Stores and all other instructions treat the four memory operands in the same
+way, in the same order.</p>
+</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="x86_names">Instruction naming</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+An instruction name consists of the base name, a default operand size
+followed by a character per operand with an optional special size. For
+example:</p>
+
+<p>
+<tt>ADD8rr</tt> -&gt; add, 8-bit register, 8-bit register<br>
+<tt>IMUL16rmi</tt> -&gt; imul, 16-bit register, 16-bit memory, 16-bit immediate<br>
+<tt>IMUL16rmi8</tt> -&gt; imul, 16-bit register, 16-bit memory, 8-bit immediate<br>
+<tt>MOVSX32rm16</tt> -&gt; movsx, 32-bit register, 16-bit memory
+</p>
+
+</div>
<!-- *********************************************************************** -->
<hr>