I heard about Git a number of years ago and its continued popularity has spurred me to want to learn more about it today. If you are not already aware, Git is one of the most used revision control systems (also known as version control systems) today. As you can see from the list on this wikipedia page there are dozens of version control systems available to choose from. Some of the more popular ones you may or may not have heard of include CVS, Subversion, Mercurial and Bazaar. But what exactly is a version control system and why use one.
What is a Revision Control System?
A revision control system is a way to manage, control and track the history and changes made to documents, computer programs, large web sites, and other collections of information. If you are doing any sort of editing of files you will learn how important version control can be. Once exposed to version control you will realize how invaluable a tool it is for managing and keeping track of how your documents or software changes through time. A document that is being tracked within a revision control system provides a timeline of changes allowing one to revert a document or set of documents back to a specific time and place in the past. I found the video provides a good and helpful introduction to version control and if you are a beginner to version control.
In the past I have read a lot of good press about Git but I also encountered and heard some negative criticisms levelled at it as well. I also saw its popularity increase and adoption rise with the emergence of Github as a popular place for developers to collaborate on projects using Git as a revision control system. I learned it could be quite complex to learn or master but its particularly good collaborative and distributive features have obviously catapulted it to prominence. I was always curious about git but reluctant to wade in to learn about it properly until now.
It’s been pretty obvious to me for quite some time that Git is pretty influential and important in the developer community. Nowadays one only has to look at the number of projects being hosted on Github to realize Git’s ubiquitous use. It seems pretty safe to assume that Git is now entrenched and here to stay for some time to come. So one of my goals this year is to become more knowledgeable and proficient in using git. To further that goal I signed up for a Github account the other day with the intention of kick-starting my enthusiasm for learning it.
Who Created Git?
I always look to wikipedia first to get an overview of a topic I am researching, and in this case the Git wikipedia page has lots of useful information. The first thing to note about Git is who created it. Linus Torvalds the maintainer of the now famous Linux kernel is the creator. In 2005 he designed and developed Git for use in linux kernel development and licensed it as free software distributed under the terms of the GNU General Public License version 2. Consequently, it is probably not coincidental that popularity of Linux has a lot to do with why git is so popular today.
Distributed Vs Centralized
With a little further research on wikipedia the distinction between distributed and centralized version control systems became clearer. Git is among one of a few distributed version control systems like Mercurial and Bazaar. And it is very different from centralized ones like CVS or Subversion. Therefore, to understand Git requires knowing and delineating the important differences between distributed and centralized version control systems.
The first thing I noted was that a distributed revision control system like git takes a peer-to-peer approach whereas a centralized version control system takes a client-server approach. With a distributed system each peer’s working copy of the codebase is a complete repository and it synchronizes repositories by exchanging patches (sets of changes) from peer to peer. Whereas in the centralized model there is only one centralized repository to which clients synchronize.
Some of the other more notable differences of a distributed version control system I gleaned from the Wikipedia page on distributed version control included the following characteristics:
- No canonical, reference copy of the codebase exists by default; only working copies.
- Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.
- Each working copy effectively functions as a remote backup of the codebase and of its change-history, protecting against data loss.
- Multiple “central” repositories.
- Code from disparate repositories are merged based on a web of trust, i.e., historical merit or quality of changes.
- Numerous different development models are possible, such as development / release branches or a Commander / Lieutenant model, allowing for efficient delegation of topical developments in very large projects. Lieutenants are project members who have the power to dynamically decide which branches to merge.
- Network is not involved for common operations.
- A separate set of “sync” operations are available for committing or receiving changes with remote repositories.
With those significant differences in characteristics listed one may come to wonder what advantages does such a system provide. The same Wikipedia page on distributed version control lists a number of benefits one can obtain from such a system over a centralized model:
- allows users to work productively when not connected to a network.
- Makes most operations much faster.
- Allows participation in projects without requiring permissions from project authorities, and thus arguably better fosters culture of meritocracy
- Allows private work, so users can use their changes even for early drafts they do not want to publish.
- Avoids relying on one physical machine as a single point of failure.
- Permits centralized control of the “release version” of the project
- On FLOSS software projects it is much easier to create a project fork from a project that is stalled because of leadership conflicts or design disagreements.
The only significant disadvantage of note with a distributed system may come if access speed is low and the project is very large. Since initial cloning of a repository is usually slower compared to centralized checkout, because all branches and revision history are copied. With this knowledge now in hand one can then learn about what Git really is.
What is Git?
Git has several characteristics that make it unique. Since it was designed by Linus Torvalds it incorporated all of his experience and knowledge of developing for a large distributed project like Linux. The design and implementation choices chosen by Linus Torvalds for Git resulted in the following characteristics:
- Strong support for non-linear development
- Distributed development
- Compatibility with existing systems/protocols
- Efficient handling of large projects
- Cryptographic authentication of history
- Toolkit-based design
- Pluggable merge strategies
- Garbage accumulates unless collected
- Periodic explicit object packing
I will let you read the description about each characteristic on the Git Wikipedia page for yourself. If you want a more practical understanding of what Git is, the following screencasts (embedded below) do a really good job of introducing and explaining what git is and how it is used. I will let the videos speak for themselves.
My understanding of git at this point is pretty basic and rudimental. To start using git to keep track of files that you create, one needs to initialize a repository with the git init command. Then you need to add the files you want to be tracked to the staging area of the git repository by using the git add command. Finally, in order for the list of files added into the git staging by the previous command to be finalized and put into the repository you need to commit the files with the git commit command. An example below should clarify things.
$ git init
$ git add *.c
$ git add README
$ git commit -m 'initial project version'
Obviously, this is just touching the surface and not very substantive. But in a nutshell that is how git and its commands work from the command line.
Which is version control system is better?
The question to be asked next is which version control system is better? The answer is it all depends. Each system solves a different problem. The following answer to a question from Stack Overflow comparing distributed version control with centralized version control sums it up nicely:
1234567 "Distributed version control systems (DVCSs) solve different problems than Centralized VCSs. Comparing them is like comparing hammers and screwdrivers.Centralized VCS systems are designed with the intent that there is One True Source that is Blessed, and therefore Good. All developers work (checkout) from that source, and then add (commit) their changes, which then become similarly Blessed. The only real difference between CVS, Subversion, ClearCase, Perforce, VisualSourceSafe and all the other CVCSes is in the workflow, performance, and integration that each product offers.Distributed VCS systems are designed with the intent that one repository is as good as any other, and that merges from one repository to another are just another form of communication. Any semantic value as to which repository should be trusted is imposed from the outside by process, not by the software itself.The real choice between using one type or the other is organizational -- if your project or organization wants centralized control, then a DVCS is a non-starter. If your developers are expected to work all over the country/world, without secure broadband connections to a central repository, then DVCS is probably your salvation. If you need both, you're fsck'd."
In the end I suppose there is no right or wrong answer as to which type of version control one chooses, everyone is free to use the tool that makes the most sense under the circumstances. However, it seems the use of git is currently prevalent in the software world because of the distributed nature of the Internet. Thus, I have made a conscious decision to invest time and energy to learn about git at this point to fill in my knowledge gap. I will probably evalutate other alternatives eventually. My journey with using Git is just at the start.
Resources To Get Started Learning Git
After watching those introductory videos on git I know there is a whole lot more to be learned. Obviously the Git documentation itself in the form of the reference manual is invaluable. However, a book is more straightforward for most people and the excellent free book on git by Scott Chacon called Pro Git fits the bill. I am definitely putting it on my list of books to read to help my git understanding get to another level more quickly. Also, in researching this post I found a very nice guide on Stack Overflow entitled Git for Beginners: The Definitive Practical Guide that should come in handy as it is packed with useful practical information. I also found this link pointing to the Top 10 Git Tutorials for Beginners that Iooks very useful for a beginner like myself. I will certainly eventually need to attempt a few tutorials to get my feet wet and practice getting used to incorporating using git into my everyday workflow. With that in mind the following link with webinars explaining simple git workflow should come in handy. Lastly, I always find a cheatsheet is useful when learning new commands/syntax and this Git cheatsheet should come in handy for the times my memory fails me which is quite often.
Update, January 11, 2014:
I found a few more Git resources that I want to add to this page for completeness. The first couple are free books, one is called Git’ From the Bottom Up by John Wiegley and the other is called Git In The Trenches by Peter Savage. The rest are links I need to explore that I found on the sidebar of the Git Subreddit. I will just list them below: