GitPython Tutorial¶

GitPython provides object model access to your git repository. This tutorial is composed of multiple sections, most of which explains a real-life usecase.

All code presented here originated from test_docs.py to assure correctness. Knowing this should also allow you to more easily run the code for your own testing purposes, all you need is a developer installation of git-python.

Meet the Repo type¶

The first step is to create a git.Repo object to represent your repository.

        from git import Repo
        join = os.path.join

        # rorepo is a Repo instance pointing to the git-python repository.
        # For all you know, the first argument to Repo is a path to the repository
        # you want to work with
        repo = Repo(self.rorepo.working_tree_dir)
        assert not repo.bare

In the above example, the directory self.rorepo.working_tree_dir equals /Users/mtrier/Development/git-python and is my working repository which contains the .git directory. You can also initialize GitPython with a bare repository.

        bare_repo = Repo.init(join(rw_dir, 'bare-repo'), bare=True)
        assert bare_repo.bare

A repo object provides high-level access to your data, it allows you to create and delete heads, tags and remotes and access the configuration of the repository.

        repo.config_reader()             # get a config reader for read-only access
        cw = repo.config_writer()        # get a config writer to change configuration
        cw.release()                     # call release() to be sure changes are written and locks are released

Query the active branch, query untracked files or whether the repository data has been modified.

        assert not bare_repo.is_dirty()  # check the dirty state
        repo.untracked_files             # retrieve a list of untracked files
        # ['my_untracked_file']

Clone from existing repositories or initialize new empty ones.

        cloned_repo = repo.clone(join(rw_dir, 'to/this/path'))
        assert cloned_repo.__class__ is Repo     # clone an existing repository
        assert Repo.init(join(rw_dir, 'path/for/new/repo')).__class__ is Repo

Archive the repository contents to a tar file.

        repo.archive(open(join(rw_dir, 'repo.tar'), 'wb'))

Advanced Repo Usage¶

And of course, there is much more you can do with this type, most of the following will be explained in greater detail in specific tutorials. Don’t worry if you don’t understand some of these examples right away, as they may require a thorough understanding of gits inner workings.

Query relevant repository paths ...

        assert os.path.isdir(cloned_repo.working_tree_dir)                    # directory with your work files
        assert cloned_repo.git_dir.startswith(cloned_repo.working_tree_dir)   # directory containing the git repository
        assert bare_repo.working_tree_dir is None                             # bare repositories have no working tree

Heads Heads are branches in git-speak. References are pointers to a specific commit or to other references. Heads and Tags are a kind of references. GitPython allows you to query them rather intuitively.

        assert repo.head.ref == repo.heads.master                   # head is a symbolic reference pointing to master
        assert repo.tags['0.3.5'] == repo.tag('refs/tags/0.3.5')    # you can access tags in various ways too
        assert repo.refs.master == repo.heads['master']             # .refs provides access to all refs, i.e. heads ...
        assert repo.refs['origin/master'] == repo.remotes.origin.refs.master  # ... remotes ...
        assert repo.refs['0.3.5'] == repo.tags['0.3.5']             # ... and tags

You can also create new heads ...

        new_branch = cloned_repo.create_head('feature')               # create a new branch ...
        assert cloned_repo.active_branch != new_branch                # which wasn't checked out yet ...
        assert new_branch.commit == cloned_repo.active_branch.commit  # and which points to the checked-out commit
        # It's easy to let a branch point to the previous commit, without affecting anything else
        # Each reference provides access to the git object it points to, usually commits
        assert new_branch.set_commit('HEAD~1').commit == cloned_repo.active_branch.commit.parents[0]

... and tags ...

        past = cloned_repo.create_tag('past', ref=new_branch,
                                      message="This is a tag-object pointing to %s" % new_branch.name)
        assert past.commit == new_branch.commit        # the tag points to the specified commit
        assert past.tag.message.startswith("This is")  # and its object carries the message provided

        now = cloned_repo.create_tag('now')            # This is a tag-reference. It may not carry meta-data
        assert now.tag is None

You can traverse down to git objects through references and other objects. Some objects like commits have additional meta-data to query.

        assert now.commit.message != past.commit.message
        # You can read objects directly through binary streams, no working tree required
        assert (now.commit.tree / 'VERSION').data_stream.read().decode('ascii').startswith('2')

        # You can traverse trees as well to handle all contained files of a particular commit
        file_count = 0
        tree_count = 0
        tree = past.commit.tree
        for item in tree.traverse():
            file_count += item.type == 'blob'
            tree_count += item.type == 'tree'
        assert file_count and tree_count                        # we have accumulated all directories and files
        assert len(tree.blobs) + len(tree.trees) == len(tree)   # a tree is iterable itself to traverse its children

Remotes allow to handle fetch, pull and push operations, while providing optional real-time progress information to progress delegates.

        from git import RemoteProgress

        class MyProgressPrinter(RemoteProgress):
            def update(self, op_code, cur_count, max_count=None, message=''):
                print(op_code, cur_count, max_count, cur_count / (max_count or 100.0), message or "NO MESSAGE")
        # end

        assert len(cloned_repo.remotes) == 1                    # we have been cloned, so there should be one remote
        assert len(bare_repo.remotes) == 0                      # this one was just initialized
        origin = bare_repo.create_remote('origin', url=cloned_repo.working_tree_dir)
        assert origin.exists()
        for fetch_info in origin.fetch(progress=MyProgressPrinter()):
            print("Updated %s to %s" % (fetch_info.ref, fetch_info.commit))
        # create a local branch at the latest fetched master. We specify the name statically, but you have all
        # information to do it programatically as well.
        bare_master = bare_repo.create_head('master', origin.refs.master)
        bare_repo.head.set_reference(bare_master)
        assert not bare_repo.delete_remote(origin).exists()
        # push and pull behave very similarly

The index is also called stage in git-speak. It is used to prepare new commits, and can be used to keep results of merge operations. Our index implementation allows to stream date into the index, which is useful for bare repositories that do not have a working tree.

        assert new_branch.checkout() == cloned_repo.active_branch     # checking out a branch adjusts the working tree
        assert new_branch.commit == past.commit                       # Now the past is checked out

        new_file_path = os.path.join(cloned_repo.working_tree_dir, 'my-new-file')
        open(new_file_path, 'wb').close()                             # create new file in working tree
        cloned_repo.index.add([new_file_path])                        # add it to the index
        # Commit the changes to deviate masters history
        cloned_repo.index.commit("Added a new file in the past - for later merege")

        # prepare a merge
        master = cloned_repo.heads.master                         # right-hand side is ahead of us, in the future
        merge_base = cloned_repo.merge_base(new_branch, master)   # allwos for a three-way merge
        cloned_repo.index.merge_tree(master, base=merge_base)     # write the merge result into index
        cloned_repo.index.commit("Merged past and now into future ;)",
                                 parent_commits=(new_branch.commit, master.commit))

        # now new_branch is ahead of master, which probably should be checked out and reset softly.
        # note that all these operations didn't touch the working tree, as we managed it ourselves.
        # This definitely requires you to know what you are doing :) !
        assert os.path.basename(new_file_path) in new_branch.commit.tree  # new file is now in tree
        master.commit = new_branch.commit            # let master point to most recent commit
        cloned_repo.head.reference = master          # we adjusted just the reference, not the working tree or index

Submodules represent all aspects of git submodules, which allows you query all of their related information, and manipulate in various ways.

        # create a new submodule and check it out on the spot, setup to track master branch of `bare_repo`
        # As our GitPython repository has submodules already that point to github, make sure we don't
        # interact with them
        for sm in cloned_repo.submodules:
            assert not sm.remove().exists()                   # after removal, the sm doesn't exist anymore
        sm = cloned_repo.create_submodule('mysubrepo', 'path/to/subrepo', url=bare_repo.git_dir, branch='master')

        # .gitmodules was written and added to the index, which is now being committed
        cloned_repo.index.commit("Added submodule")
        assert sm.exists() and sm.module_exists()             # this submodule is defintely available
        sm.remove(module=True, configuration=False)           # remove the working tree
        assert sm.exists() and not sm.module_exists()         # the submodule itself is still available

        # update all submodules, non-recursively to save time, this method is very powerful, go have a look
        cloned_repo.submodule_update(recursive=False)
        assert sm.module_exists()                             # The submodules working tree was checked out by update

Examining References¶

References are the tips of your commit graph from which you can easily examine the history of your project.

        import git
        repo = git.Repo.clone_from(self._small_repo_url(), os.path.join(rw_dir, 'repo'), branch='master')

        heads = repo.heads
        master = heads.master       # lists can be accessed by name for convenience
        master.commit               # the commit pointed to by head called master
        master.rename('new_name')   # rename heads
        master.rename('master')

Tags are (usually immutable) references to a commit and/or a tag object.

        tags = repo.tags
        tagref = tags[0]
        tagref.tag                  # tags may have tag objects carrying additional information
        tagref.commit               # but they always point to commits
        repo.delete_tag(tagref)     # delete or
        repo.create_tag("my_tag")   # create tags using the repo for convenience

A symbolic reference is a special case of a reference as it points to another reference instead of a commit.

        head = repo.head            # the head points to the active branch/ref
        master = head.reference     # retrieve the reference the head points to
        master.commit               # from here you use it as any other reference

Access the reflog easily.

        log = master.log()
        log[0]                      # first (i.e. oldest) reflog entry
        log[-1]                     # last (i.e. most recent) reflog entry

Modifying References¶

You can easily create and delete reference types or modify where they point to.

        new_branch = repo.create_head('new')     # create a new one
        new_branch.commit = 'HEAD~10'            # set branch to another commit without changing index or working trees
        repo.delete_head(new_branch)             # delete an existing head - only works if it is not checked out

Create or delete tags the same way except you may not change them afterwards.

        new_tag = repo.create_tag('my_new_tag', message='my message')
        # You cannot change the commit a tag points to. Tags need to be re-created
        self.failUnlessRaises(AttributeError, setattr, new_tag, 'commit', repo.commit('HEAD~1'))
        repo.delete_tag(new_tag)

Change the symbolic reference to switch branches cheaply (without adjusting the index or the working tree).

        new_branch = repo.create_head('another-branch')
        repo.head.reference = new_branch

Understanding Objects¶

An Object is anything storable in git’s object database. Objects contain information about their type, their uncompressed size as well as the actual data. Each object is uniquely identified by a binary SHA1 hash, being 20 bytes in size, or 40 bytes in hexadecimal notation.

Git only knows 4 distinct object types being Blobs, Trees, Commits and Tags.

In GitPython, all objects can be accessed through their common base, can be compared and hashed. They are usually not instantiated directly, but through references or specialized repository functions.

        hc = repo.head.commit
        hct = hc.tree
        hc != hct
        hc != repo.tags[0]
        hc == repo.head.reference.commit

Common fields are ...

        assert hct.type == 'tree'           # preset string type, being a class attribute
        assert hct.size > 0                 # size in bytes
        assert len(hct.hexsha) == 40
        assert len(hct.binsha) == 20

Index objects are objects that can be put into git’s index. These objects are trees, blobs and submodules which additionally know about their path in the file system as well as their mode.

        assert hct.path == ''                  # root tree has no path
        assert hct.trees[0].path != ''         # the first contained item has one though
        assert hct.mode == 0o40000              # trees have the mode of a linux directory
        assert hct.blobs[0].mode == 0o100644   # blobs have a specific mode though comparable to a standard linux fs

Access blob data (or any object data) using streams.

        hct.blobs[0].data_stream.read()        # stream object to read data from
        hct.blobs[0].stream_data(open(os.path.join(rw_dir, 'blob_data'), 'wb'))  # write data to given stream

The Commit object¶

Commit objects contain information about a specific commit. Obtain commits using references as done in Examining References or as follows.

Obtain commits at the specified revision

        repo.commit('master')
        repo.commit('v0.8.1')
        repo.commit('HEAD~10')

Iterate 50 commits, and if you need paging, you can specify a number of commits to skip.

        fifty_first_commits = list(repo.iter_commits('master', max_count=50))
        assert len(fifty_first_commits) == 50
        # this will return commits 21-30 from the commit list as traversed backwards master
        ten_commits_past_twenty = list(repo.iter_commits('master', max_count=10, skip=20))
        assert len(ten_commits_past_twenty) == 10
        assert fifty_first_commits[20:30] == ten_commits_past_twenty

A commit object carries all sorts of meta-data

        headcommit = repo.head.commit
        assert len(headcommit.hexsha) == 40
        assert len(headcommit.parents) > 0
        assert headcommit.tree.type == 'tree'
        assert headcommit.author.name == 'Sebastian Thiel'
        assert isinstance(headcommit.authored_date, int)
        assert headcommit.committer.name == 'Sebastian Thiel'
        assert isinstance(headcommit.committed_date, int)
        assert headcommit.message != ''

Note: date time is represented in a seconds since epoch format. Conversion to human readable form can be accomplished with the various time module methods.

        import time
        time.asctime(time.gmtime(headcommit.committed_date))
        time.strftime("%a, %d %b %Y %H:%M", time.gmtime(headcommit.committed_date))

You can traverse a commit’s ancestry by chaining calls to parents

        assert headcommit.parents[0].parents[0].parents[0] == repo.commit('master^^^')

The above corresponds to master^^^ or master~3 in git parlance.

The Tree object¶

A tree records pointers to the contents of a directory. Let’s say you want the root tree of the latest commit on the master branch

        tree = repo.heads.master.commit.tree
        assert len(tree.hexsha) == 40

Once you have a tree, you can get its contents

        assert len(tree.trees) > 0          # trees are subdirectories
        assert len(tree.blobs) > 0          # blobs are files
        assert len(tree.blobs) + len(tree.trees) == len(tree)

It is useful to know that a tree behaves like a list with the ability to query entries by name

        assert tree['smmap'] == tree / 'smmap'          # access by index and by sub-path
        for entry in tree:                                         # intuitive iteration of tree members
            print(entry)
        blob = tree.trees[0].blobs[0]                              # let's get a blob in a sub-tree
        assert blob.name
        assert len(blob.path) < len(blob.abspath)
        assert tree.trees[0].name + '/' + blob.name == blob.path   # this is how the relative blob path is generated
        assert tree[blob.path] == blob                             # you can use paths like 'dir/file' in tree[...]

There is a convenience method that allows you to get a named sub-object from a tree with a syntax similar to how paths are written in a posix system

        assert tree / 'smmap' == tree['smmap']
        assert tree / blob.path == tree[blob.path]

You can also get a commit’s root tree directly from the repository

        # This example shows the various types of allowed ref-specs
        assert repo.tree() == repo.head.commit.tree
        past = repo.commit('HEAD~5')
        assert repo.tree(past) == repo.tree(past.hexsha)
        assert repo.tree('v0.8.1').type == 'tree'               # yes, you can provide any refspec - works everywhere

As trees allow direct access to their intermediate child entries only, use the traverse method to obtain an iterator to retrieve entries recursively

        assert len(tree) < len(list(tree.traverse()))

Note

If trees return Submodule objects, they will assume that they exist at the current head’s commit. The tree it originated from may be rooted at another commit though, that it doesn’t know. That is why the caller would have to set the submodule’s owning or parent commit using the set_parent_commit(my_commit) method.

The Index Object¶

The git index is the stage containing changes to be written with the next commit or where merges finally have to take place. You may freely access and manipulate this information using the IndexFile object. Modify the index with ease

        index = repo.index
        # The index contains all blobs in a flat list
        assert len(list(index.iter_blobs())) == len([o for o in repo.head.commit.tree.traverse() if o.type == 'blob'])
        # Access blob objects
        for (path, stage), entry in index.entries.items():
            pass
        new_file_path = os.path.join(repo.working_tree_dir, 'new-file-name')
        open(new_file_path, 'w').close()
        index.add([new_file_path])                                             # add a new file to the index
        index.remove(['LICENSE'])                                              # remove an existing one
        assert os.path.isfile(os.path.join(repo.working_tree_dir, 'LICENSE'))  # working tree is untouched

        assert index.commit("my commit message").type == 'commit'              # commit changed index
        repo.active_branch.commit = repo.commit('HEAD~1')                      # forget last commit

        from git import Actor
        author = Actor("An author", "author@example.com")
        committer = Actor("A committer", "committer@example.com")
        # commit by commit message and author and committer
        index.commit("my commit message", author=author, committer=committer)

Create new indices from other trees or as result of a merge. Write that result to a new index file for later inspection.

        from git import IndexFile
        # loads a tree into a temporary index, which exists just in memory
        IndexFile.from_tree(repo, 'HEAD~1')
        # merge two trees three-way into memory
        merge_index = IndexFile.from_tree(repo, 'HEAD~10', 'HEAD', repo.merge_base('HEAD~10', 'HEAD'))
        # and persist it
        merge_index.write(os.path.join(rw_dir, 'merged_index'))

Handling Remotes¶

Remotes are used as alias for a foreign repository to ease pushing to and fetching from them

        empty_repo = git.Repo.init(os.path.join(rw_dir, 'empty'))
        origin = empty_repo.create_remote('origin', repo.remotes.origin.url)
        assert origin.exists()
        assert origin == empty_repo.remotes.origin == empty_repo.remotes['origin']
        origin.fetch()                  # assure we actually have data. fetch() returns useful information
        # Setup a local tracking branch of a remote branch
        empty_repo.create_head('master', origin.refs.master).set_tracking_branch(origin.refs.master)
        origin.rename('new_origin')   # rename remotes
        # push and pull behaves similarly to `git push|pull`
        origin.pull()
        origin.push()
        # assert not empty_repo.delete_remote(origin).exists()     # create and delete remotes

You can easily access configuration information for a remote by accessing options as if they where attributes. The modification of remote configuration is more explicit though.

        assert origin.url == repo.remotes.origin.url
        cw = origin.config_writer
        cw.set("pushurl", "other_url")
        cw.release()

        # Please note that in python 2, writing origin.config_writer.set(...) is totally safe.
        # In py3 __del__ calls can be delayed, thus not writing changes in time.

You can also specify per-call custom environments using a new context manager on the Git command, e.g. for using a specific SSH key. The following example works with git starting at v2.3:

ssh_cmd = 'ssh -i id_deployment_key'
with repo.git.custom_environment(GIT_SSH_COMMAND=ssh_cmd):
    repo.remotes.origin.fetch()

This one sets a custom script to be executed in place of ssh, and can be used in git prior to v2.3:

ssh_executable = os.path.join(rw_dir, 'my_ssh_executable.sh')
with repo.git.custom_environment(GIT_SSH=ssh_executable):
    repo.remotes.origin.fetch()

Here’s an example executable that can be used in place of the ssh_executable above:

#!/bin/sh
ID_RSA=/var/lib/openshift/5562b947ecdd5ce939000038/app-deployments/id_rsa
exec /usr/bin/ssh -o StrictHostKeyChecking=no -i $ID_RSA "$@"

Please note that the script must be executable (i.e. chomd +x script.sh). StrictHostKeyChecking=no is used to avoid prompts asking to save the hosts key to ~/.ssh/known_hosts, which happens in case you run this as daemon.

You might also have a look at Git.update_environment(...) in case you want to setup a changed environment more permanently.

Submodule Handling¶

Submodules can be conveniently handled using the methods provided by GitPython, and as an added benefit, GitPython provides functionality which behave smarter and less error prone than its original c-git implementation, that is GitPython tries hard to keep your repository consistent when updating submodules recursively or adjusting the existing configuration.

        repo = self.rorepo
        sms = repo.submodules

        assert len(sms) == 1
        sm = sms[0]
        assert sm.name == 'gitdb'                         # git-python has gitdb as single submodule ...
        assert sm.children()[0].name == 'smmap'           # ... which has smmap as single submodule

        # The module is the repository referenced by the submodule
        assert sm.module_exists()                         # the module is available, which doesn't have to be the case.
        assert sm.module().working_tree_dir.endswith('gitdb')
        # the submodule's absolute path is the module's path
        assert sm.abspath == sm.module().working_tree_dir
        assert len(sm.hexsha) == 40                       # Its sha defines the commit to checkout
        assert sm.exists()                                # yes, this submodule is valid and exists
        # read its configuration conveniently
        assert sm.config_reader().get_value('path') == sm.path
        assert len(sm.children()) == 1                    # query the submodule hierarchy

In addition to the query functionality, you can move the submodule’s repository to a different path <move(...)>, write its configuration <config_writer().set_value(...).release()>, update its working tree <update(...)>, and remove or add them <remove(...), add(...)>.

If you obtained your submodule object by traversing a tree object which is not rooted at the head’s commit, you have to inform the submodule about its actual commit to retrieve the data from by using the set_parent_commit(...) method.

The special RootModule type allows you to treat your master repository as root of a hierarchy of submodules, which allows very convenient submodule handling. Its update(...) method is reimplemented to provide an advanced way of updating submodules as they change their values over time. The update method will track changes and make sure your working tree and submodule checkouts stay consistent, which is very useful in case submodules get deleted or added to name just two of the handled cases.

Additionally, GitPython adds functionality to track a specific branch, instead of just a commit. Supported by customized update methods, you are able to automatically update submodules to the latest revision available in the remote repository, as well as to keep track of changes and movements of these submodules. To use it, set the name of the branch you want to track to the submodule.$name.branch option of the .gitmodules file, and use GitPython update methods on the resulting repository with the to_latest_revision parameter turned on. In the latter case, the sha of your submodule will be ignored, instead a local tracking branch will be updated to the respective remote branch automatically, provided there are no local changes. The resulting behaviour is much like the one of svn::externals, which can be useful in times.

Obtaining Diff Information¶

Diffs can generally be obtained by subclasses of Diffable as they provide the diff method. This operation yields a DiffIndex allowing you to easily access diff information about paths.

Diffs can be made between the Index and Trees, Index and the working tree, trees and trees as well as trees and the working copy. If commits are involved, their tree will be used implicitly.

        hcommit = repo.head.commit
        hcommit.diff()                  # diff tree against index
        hcommit.diff('HEAD~1')          # diff tree against previous tree
        hcommit.diff(None)              # diff tree against working tree

        index = repo.index
        index.diff()                    # diff index against itself yielding empty diff
        index.diff(None)                # diff index against working copy
        index.diff('HEAD')              # diff index against current HEAD tree

The item returned is a DiffIndex which is essentially a list of Diff objects. It provides additional filtering to ease finding what you might be looking for.

        # Traverse added Diff objects only
        for diff_added in hcommit.diff('HEAD~1').iter_change_type('A'):
            print(diff_added)

Use the diff framework if you want to implement git-status like functionality.

A diff between the index and the commit’s tree your HEAD points to

use repo.index.diff(repo.head.commit)

A diff between the index and the working tree

use repo.index.diff(None)

A list of untracked files

use repo.untracked_files

Switching Branches¶

To switch between branches similar to git checkout, you effectively need to point your HEAD symbolic reference to the new branch and reset your index and working copy to match. A simple manual way to do it is the following one

        # Reset our working tree 10 commits into the past
        past_branch = repo.create_head('past_branch', 'HEAD~10')
        repo.head.reference = past_branch
        assert not repo.head.is_detached
        # reset the index and working tree to match the pointed-to commit
        repo.head.reset(index=True, working_tree=True)

        # To detach your head, you have to point to a commit directy
        repo.head.reference = repo.commit('HEAD~5')
        assert repo.head.is_detached
        # now our head points 15 commits into the past, whereas the working tree
        # and index are 10 commits in the past

The previous approach would brutally overwrite the user’s changes in the working copy and index though and is less sophisticated than a git-checkout. The latter will generally prevent you from destroying your work. Use the safer approach as follows.

        # checkout the branch using git-checkout. It will fail as the working tree appears dirty
        self.failUnlessRaises(git.GitCommandError, repo.heads.master.checkout)
        repo.heads.past_branch.checkout()

Initializing a repository¶

In this example, we will initialize an empty repository, add an empty file to the index, and commit the change.

        import git

        repo_dir = os.path.join(rw_dir, 'my-new-repo')
        file_name = os.path.join(repo_dir, 'new-file')

        r = git.Repo.init(repo_dir)
        # This function just creates an empty file ...
        open(file_name, 'wb').close()
        r.index.add([file_name])
        r.index.commit("initial commit")

Please have a look at the individual methods as they usually support a vast amount of arguments to customize their behavior.

Using git directly¶

In case you are missing functionality as it has not been wrapped, you may conveniently use the git command directly. It is owned by each repository instance.

        git = repo.git
        git.checkout('HEAD', b="my_new_branch")         # create a new branch
        git.branch('another-new-one')
        git.branch('-D', 'another-new-one')             # pass strings for full control over argument order
        git.for_each_ref()                              # '-' becomes '_' when calling it

The return value will by default be a string of the standard output channel produced by the command.

Keyword arguments translate to short and long keyword arguments on the command-line. The special notion git.command(flag=True) will create a flag without value like command --flag.

If None is found in the arguments, it will be dropped silently. Lists and tuples passed as arguments will be unpacked recursively to individual arguments. Objects are converted to strings using the str(...) function.

Object Databases¶

git.Repo instances are powered by its object database instance which will be used when extracting any data, or when writing new objects.

The type of the database determines certain performance characteristics, such as the quantity of objects that can be read per second, the resource usage when reading large data files, as well as the average memory footprint of your application.

GitDB¶

The GitDB is a pure-python implementation of the git object database. It is the default database to use in GitPython 0.3. Its uses less memory when handling huge files, but will be 2 to 5 times slower when extracting large quantities small of objects from densely packed repositories:

repo = Repo("path/to/repo", odbt=GitDB)

GitCmdObjectDB¶

The git command database uses persistent git-cat-file instances to read repository information. These operate very fast under all conditions, but will consume additional memory for the process itself. When extracting large files, memory usage will be much higher than the one of the GitDB:

repo = Repo("path/to/repo", odbt=GitCmdObjectDB)

Git Command Debugging and Customization¶

Using environment variables, you can further adjust the behaviour of the git command.

GIT_PYTHON_TRACE

If set to non-0, all executed git commands will be logged using a python logger.

if set to full, the executed git command and its output on stdout and stderr will be logged using a python logger.

GIT_PYTHON_GIT_EXECUTABLE

If set, it should contain the full path to the git executable, e.g. c:\Program Files (x86)\Git\bin\git.exe on windows or /usr/bin/git on linux.

And even more ...¶

There is more functionality in there, like the ability to archive repositories, get stats and logs, blame, and probably a few other things that were not mentioned here.

Check the unit tests for an in-depth introduction on how each function is supposed to be used.