/ git

Git for Hackers - Part 1, A Basic Understanding

Continuing along the "Tools for Hackers" theme, today we'll be looking at Git.

Git is an often misused and ill understood tool. Many developers who use it daily barely understand what they're doing and just parrot a few commands, let alone hackers who may only use it periodically.

In these guides we're going to take a look at Git and go over some useful tips and tricks for hackers. We'll avoid most of the collaboration intricacies as usually we're just cloning a repository to use a tool, or uploading to our own repositories that few others can commit to, and just focus on the Version Control System (VCS) aspects.

In Part 1 we're going to get to grips with the fundamentals of how Git works, as understanding this will make us a real power user - quick, efficient and able to get out of any trouble we land ourselves in.

So What is Git?

Git is a fast, lightweight and open source distributed version control system. A Version Control System, or VCS, is a tool that allows you manage changes in files by creating a version of it after you change it. You can then change the version of the files that are used if you want to backtrack, or share the versions with other people.

The distributed part doesn't really matter to us, but essentially just means that every repository has a full copy of the contents and history, and not that the history is stored on a server somewhere.

The Fundamentals

In Git you make changes to your content, and then commit these changes. The commit object stores just the differences between the versions of the files, so commits are chained. Commits are identified by their hash, but have other metadata such as an author, message, timestamp etc.

The chain of commits form a branch, with master being the (you guessed it) master branch that you start with. The branch name is actually just a pointer to the commit at the head of the chain, so moving this around (if you want to backtrack for example) is fast and easy. You can move up and down this chain, reverting the state of things to earlier commits if you don't like the changes you've made, or if things "suddenly stop compiling".

Git branches
Your current working commit is referenced by another tag, called HEAD. So when you're on a branch both the branch name tag and HEAD tag will point to the last commit on that branch. If you have a cool idea you want to try out without interfering with your main codeline you can create a different branch, (such as testing above). Any commits on this branch are not added to master but are still connected on the chain. If the work on master continues, then the branches will diverge, as shown below.
Diverged branches
You can quickly and easily switch between the two (or more!) branches as all you are really doing when you switch is moving the HEAD tag around, and Git applies the appropriate file diffs.

Later on you can merge branches if you like the changes you've made, or just abandon them and delete the branch if it bears no fruit (see what I did there? :D). All this makes Git super flexible while still fast, lightweight and easy to use.

Remote Repositories

Keeping a local copy of the repository is great, but what happens if we want to back up our contents on a remote server, or if we want to share it?

Git handles this using remotes. When you create a repository you can point it to a remote copy of the repository. You can then push and pull data from and to this repository.

If you decide to clone an existing remote repository then you create a full local copy of that repository, with the entire history and all the contents. As the history is just diffs however, this is usually still lightweight and quick. The remote repository you cloned from will automatically be added as the default remote repository (called origin) in your local copy, but you can add other remotes if desired.

Binary Files

It is worth noting that Git doesn't store diffs for binary files. It still stores them efficiently, but if you intend to make frequent changes to binary files in your repository then the repository will start to get bigger and slower.

Summary

We've had a quick look at Git and how it works. We know you can commit versions of files and backtrack them to any commit in their history. You can spin off versions if you want to experiment without affecting the master branch and then merge them in later or abandon them, and share or backup your repository to a remote server.

Understanding these fundamentals will greatly benefit you if you use Git, anyone can copy commands from the internet but as soon as something goes wrong or you want to do something a little more advanced you'll find yourself in a much better place if you understand what's going on.

Next time we'll take a look at actually using Git, as well as some tips, tricks and useful configuration pointers to turn you into a fabled Git Guru.

References