diff options
author | dgarrett@chromium.org <dgarrett@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98> | 2013-08-28 23:09:07 +0000 |
---|---|---|
committer | dgarrett@chromium.org <dgarrett@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98> | 2013-08-28 23:09:07 +0000 |
commit | fd0a76f109c0fa3f9dd973ee6921957501f317e5 (patch) | |
tree | f3d89cbcac08539def3855726bdedb211c083112 /courgette/description.md | |
parent | 1517a2d2f05d29426704a151b1a94b5f5f854bed (diff) | |
download | chromium_src-fd0a76f109c0fa3f9dd973ee6921957501f317e5.zip chromium_src-fd0a76f109c0fa3f9dd973ee6921957501f317e5.tar.gz chromium_src-fd0a76f109c0fa3f9dd973ee6921957501f317e5.tar.bz2 |
Added documentation for Courgette internals.
It consists of a markdown file, png diagrams, and the generated html output.
BUG=274829
R=benchan@chromium.org, dgarrett@chromium.org
Review URL: https://codereview.chromium.org/23003015
git-svn-id: svn://svn.chromium.org/chrome/trunk/src@220113 0039d316-1c4b-4281-b951-d872f2087c98
Diffstat (limited to 'courgette/description.md')
-rw-r--r-- | courgette/description.md | 157 |
1 files changed, 157 insertions, 0 deletions
diff --git a/courgette/description.md b/courgette/description.md new file mode 100644 index 0000000..f79f99e --- /dev/null +++ b/courgette/description.md @@ -0,0 +1,157 @@ +Courgette Internals +=================== + +Patch Generation +---------------- + +![Patch Generation](generation.png) + +- courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch + generation by calling ensemble\_create.cc:GenerateEnsemblePatch + +- The files are read in by in courgette:SourceStream objects + +- ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which + uses MakeGenerator to create + patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes. + +- PatchGeneratorX86\_32's Transform method transforms the input file + using Courgette's core techniques that make the bsdiff delta + smaller. The steps it takes are the following: + + - _disassemble_ the old and new binaries into AssemblyProgram + objects, + + - _adjust_ the new AssemblyProgram object, and + + - _encode_ the AssemblyProgram object back into raw bytes. + +### Disassemble + +- The input is a pointer to a buffer containing the raw bytes of the + input file. + +- Disassembly converts certain machine instructions that reference + addresses to Courgette instructions. It is not actually + disassembly, but this is the term the code-base uses. Specifically, + it detects instructions that use absolute addresses given by the + binary file's relocation table, and relative addresses used in + relative branches. + +- Done by disassemble:ParseDetectedExecutable, which selects the + appropriate Disassembler subclass by looking at the binary file's + headers. + + - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler + + - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler + + - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler + +- The Disassembler replaces the relocation table with a Courgette + instruction that can regenerate the relocation table. + +- The Disassembler builds a list of addresses referenced by the + machine code, numbering each one. + +- The Disassembler replaces and address used in machine instructions + with its index number. + +- The output is an assembly\_program.h:AssemblyProgram class, which + contains a list of instructions, machine or Courgette, and a mapping + of indices to actual addresses. + +### Adjust + +- This step takes the AssemblyProgram for the old file and reassigns + the indices that map to actual addresses. It is performed by + adjustment_method.cc:Adjust(). + +- The goal is the match the indices from the old program to the new + program as closely as possible. + +- When matched correctly, machine instructions that jump to the + function in both the new and old binary will look the same to + bsdiff, even the function is located in a different part of the + binary. + +### Encode + +- This step takes an AssemblyProgram object and encodes both the + instructions and the mapping of indices to addresses as byte + vectors. This format can be written to a file directly, and is also + more appropriate for bsdiffing. It is done by + AssemblyProgram.Encode(). + +- encoded_program.h:EncodedProgram defines the binary format and a + WriteTo method that writes to a file. + +### bsdiff + +- simple_delta.c:GenerateSimpleDelta + +Patch Application +----------------- + +![Patch Application](application.png) + +- courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation + by calling ensemble\_apply.cc:ApplyEnsemblePatch + +- ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the + patch's header, then calls the overloaded version of + ensemble\_create.cc:ApplyEnsemblePatch. + +- The patch is read into an ensemble_apply.cc:EnsemblePatchApplication + object, which generates a set of patcher_x86_32.h:PatcherX86_32 + objects for the sections in the patch. + +- The original file is disassembled and encoded via a call + EnsemblePatchApplication.TransformUp, which in turn call + patcher_x86_32.h:PatcherX86_32.Transform. + +- The transformed file is then bspatched via + EnsemblePatchApplication.SubpatchTransformedElements, which calls + EnsemblePatchApplication.SubpatchStreamSets, which calls + simple_delta.cc:ApplySimpleDelta, Courgette's built-in + implementation of bspatch. + +- Finally, EnsemblePatchApplication.TransformDown assembles, i.e., + reverses the encoding and disassembly, on the patched binary data. + This is done by calling PatcherX86_32.Reform, which in turn calls + the global function encoded_program.cc:Assemble, which calls + EncodedProgram.AssembleTo. + + +Glossary +-------- + +**Adjust**: Reassign address indices in the new program to match more + closely those from the old. + +**Assembly program**: The output of _disassembly_. Contains a list of + _Courgette instructions_ and an index of branch target addresses. + +**Assemble**: Convert an _assembly program_ back into an object file + by evaluating the _Courgette instructions_ and leaving the machine + instructions in place. + +**Courgette instruction**: Replaces machine instructions in the + program. Courgette instructions replace branches with an index to + the target addresses and replace part of the relocation table. + +**Disassembler**: Takes a binary file and produces an _assembly + program_. + +**Encode**: Convert an _assembly program_ into an _encoded program_ by + serializing its data structures into byte vectors more appropriate + for storage in a file. + +**Encoded Program**: The output of encoding. + +**Ensemble**: A Courgette-style patch containing sections for the list + of branch addresses, the encoded program. It supports patching + multiple object files at once. + +**Opcode**: The number corresponding to either a machine or _Courgette + instruction_. |