summaryrefslogtreecommitdiffstats
path: root/content/git-bundle-hosting.rst
blob: 2431c4624c601c386fc8aa76872faf0ed84fff63 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
:title: Git Hosting: Reducing Server Load with Bundles
:date: 2016-12-18 23:40
:tags: self-hosting, Git, free software
:category: General
:author: Wolfgang Wiedmeyer
:status: published
:summary: Cloning large repositories can result in quite some load on the server side. A possible solution is the use of bundles. Git can package a certain revision in an archive. The client can fetch the bundle and set up a clone locally based on the bundle.

Cloning large repositories can result in quite some load on the server side. Depending on the server specs, the server may run out of RAM or the CPU load increases heavily. In my case, the limiting factor is the CPU. Too much load can even result in fatal errors that make it impossible to freshly clone a repository. 

A possible solution is the use of bundles. Git can package a certain revision in an archive. The client can fetch the bundle and set up a clone locally based on the bundle. The `Git documentation <https://git-scm.com/docs/git-bundle>`_ describes how this works. The server then has the only task of serving the bundle which takes almost no load. When the client has set up the clone with the bundle, subsequent pull or fetch requests will take a lot less server load because the server only needs to handle the diff between the revision archived in the bundle and the revision that currently gets fetched.

The Linux kernel project uses bundles on their Git hosting servers and `they recommend to directly get the bundle with wget if you have connection problems <https://www.kernel.org/cloning-linux-from-a-bundle.html>`_. The `repo tool <https://code.google.com/p/git-repo/>`_, which manages the various git repositories of Android-based operating systems, by default even expects that a bundle with the name clone.bundle is present in every repository on the server during the initial sync. The repo tool automatically fetches the bundles and uses them to set up the individual Git repositories. 

Creating bundles on the server
##############################

Bundles are easily created inside a git repository with the command ``git bundle create clone.bundle $REVISON``. ``$REVISION`` can be a branch or a tag. In case you have a lot of git repositories and if all of them are in the same directory, running the following command in the parent directory may be helpful to create bundles in all of them:

.. code-block:: shell

   for i in *.git; do ( echo $i; cd $i; git bundle create clone.bundle $REVISION; ); done

Making the bundles accessible
#############################

If you do your own Git hosting, you probably have a web server like Apache running and some software like `cgit <https://git.zx2c4.com/cgit/>`_ serves as Git web frontend behind the web server. As I'm using `Gitolite <http://gitolite.com/gitolite/>`_ to manage access to my repositories, all repositories reside in the directory ``/var/lib/gitolite3/repositories``.

First, Apache needs to be told where it can find the bundles:

.. code-block:: apache

   AliasMatch ^/(.*).git/clone.bundle /var/lib/gitolite3/repositories/$1.git/clone.bundle
   AliasMatch ^/(.*)/clone.bundle /var/lib/gitolite3/repositories/$1.git/clone.bundle

These directives make sure that regardless if the URL contains the ``.git`` suffix Apache finds the corresponding ``*.git`` folder.

Then clients need to be allowed to access the bundles in the git repositories:

.. code-block:: apache

   <Directory /var/lib/gitolite3/repositories/>
   	Require all denied
	<FilesMatch "clone.bundle">
		Require all granted
	</FilesMatch>
   </Directory>

This makes sure that only files named clone.bundle are accessible.

I hope that having bundles available causes a lot less issues when syncing with my `Replicant 6.0 <https://blog.replicant.us/2016/08/replicant-6-early-work-upstream-work-and-f-droid-issue/>`_ repositories or with my `Replicant 4.2 mirror <https://replicantmirror.fossencdi.org/>`_.