Courgette Internals =================== Patch Generation ---------------- ![Patch Generation](generation.png) - courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch generation by calling ensemble\_create.cc:GenerateEnsemblePatch - The files are read in by in courgette:SourceStream objects - ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which uses MakeGenerator to create patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes. - PatchGeneratorX86\_32's Transform method transforms the input file using Courgette's core techniques that make the bsdiff delta smaller. The steps it takes are the following: - _disassemble_ the old and new binaries into AssemblyProgram objects, - _adjust_ the new AssemblyProgram object, and - _encode_ the AssemblyProgram object back into raw bytes. ### Disassemble - The input is a pointer to a buffer containing the raw bytes of the input file. - Disassembly converts certain machine instructions that reference addresses to Courgette instructions. It is not actually disassembly, but this is the term the code-base uses. Specifically, it detects instructions that use absolute addresses given by the binary file's relocation table, and relative addresses used in relative branches. - Done by disassemble:ParseDetectedExecutable, which selects the appropriate Disassembler subclass by looking at the binary file's headers. - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler - The Disassembler replaces the relocation table with a Courgette instruction that can regenerate the relocation table. - The Disassembler builds a list of addresses referenced by the machine code, numbering each one. - The Disassembler replaces and address used in machine instructions with its index number. - The output is an assembly\_program.h:AssemblyProgram class, which contains a list of instructions, machine or Courgette, and a mapping of indices to actual addresses. ### Adjust - This step takes the AssemblyProgram for the old file and reassigns the indices that map to actual addresses. It is performed by adjustment_method.cc:Adjust(). - The goal is the match the indices from the old program to the new program as closely as possible. - When matched correctly, machine instructions that jump to the function in both the new and old binary will look the same to bsdiff, even the function is located in a different part of the binary. ### Encode - This step takes an AssemblyProgram object and encodes both the instructions and the mapping of indices to addresses as byte vectors. This format can be written to a file directly, and is also more appropriate for bsdiffing. It is done by AssemblyProgram.Encode(). - encoded_program.h:EncodedProgram defines the binary format and a WriteTo method that writes to a file. ### bsdiff - simple_delta.c:GenerateSimpleDelta Patch Application ----------------- ![Patch Application](application.png) - courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation by calling ensemble\_apply.cc:ApplyEnsemblePatch - ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the patch's header, then calls the overloaded version of ensemble\_create.cc:ApplyEnsemblePatch. - The patch is read into an ensemble_apply.cc:EnsemblePatchApplication object, which generates a set of patcher_x86_32.h:PatcherX86_32 objects for the sections in the patch. - The original file is disassembled and encoded via a call EnsemblePatchApplication.TransformUp, which in turn call patcher_x86_32.h:PatcherX86_32.Transform. - The transformed file is then bspatched via EnsemblePatchApplication.SubpatchTransformedElements, which calls EnsemblePatchApplication.SubpatchStreamSets, which calls simple_delta.cc:ApplySimpleDelta, Courgette's built-in implementation of bspatch. - Finally, EnsemblePatchApplication.TransformDown assembles, i.e., reverses the encoding and disassembly, on the patched binary data. This is done by calling PatcherX86_32.Reform, which in turn calls the global function encoded_program.cc:Assemble, which calls EncodedProgram.AssembleTo. Glossary -------- **Adjust**: Reassign address indices in the new program to match more closely those from the old. **Assembly program**: The output of _disassembly_. Contains a list of _Courgette instructions_ and an index of branch target addresses. **Assemble**: Convert an _assembly program_ back into an object file by evaluating the _Courgette instructions_ and leaving the machine instructions in place. **Courgette instruction**: Replaces machine instructions in the program. Courgette instructions replace branches with an index to the target addresses and replace part of the relocation table. **Disassembler**: Takes a binary file and produces an _assembly program_. **Encode**: Convert an _assembly program_ into an _encoded program_ by serializing its data structures into byte vectors more appropriate for storage in a file. **Encoded Program**: The output of encoding. **Ensemble**: A Courgette-style patch containing sections for the list of branch addresses, the encoded program. It supports patching multiple object files at once. **Opcode**: The number corresponding to either a machine or _Courgette instruction_.