Git is one of those tools that is absolutely essential to anyone who codes.
Though if you are like me when I first started learning, maybe you feel a bit overwhelmed on what
and how to use it effectively.
In this post I go over some of the major selling points of
git and a simple workflow commonly used in the workplace.
What Is Git?
If you visit git’s official website, you’ll be able to see a definition of what the product is (repasted below).
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.
Let’s break down the main features mentioned.
Branching is the concept of having a history of code snapshots in time that is distinctly separate from other possible history of snapshots. Each snapshot is connected to previous (parent) snapshots, wherein the ‘current’ snapshot on your branch is the aggregation of all historical snapshots of your branch.
This sounds very confusing, so consider the following visual.
Each blue arrow in figure 1 represents a branch -
topic. The most current snapshot in
master branch is C1. In contrast, the most current snapshot in
develop is C5. You might notice that according
to figure 1,
develop also has C2, C3, and C4. These intermediate snapshots are commited changes that lies
between the changes of C1 and C5. C1 is a commited change before C2. C2 is the commited change before C3, etc.
Notice that since
develop is not the same “branch” as
master does not have the C2 - C4 changes available in
develop does not have the C6 and C7 changes available in
On the otherhand, the opposite relationship occurs from
snapshot C2 is tracking C1 on the
develop should have the equivalent of the historical commits on
the historical commits of
master starting on C1. Likewise, C6 and C7 on the
topic branch will have the historical commits from
develop starting on C5, and
master since C5 belongs in
develop branches off from
master’s C1 commit.
A staging area represents an intermediate state where you can format and review your code changes before officially commiting them.
Suppose that you are working on some code changes that is ready to be put into a shared code repository for your team to use. Your code changes before staging them will be in your working directory. Now you must choose which files/lines you would like to attach to an official commit. The files/changes you choose to add will be shown in the staging area.
From there, you can do verification checks such as listing all the different files that have been changed that will be added to your newest commit, and looking through the line differences of changes between your new and old commit.
git is that you can pick and choose
which lines and files to bundle into a commit.
Assuming that everything looks good in your staging area, you can go ahead and officially commit your changes within history. You can kind of think of this as the “snapshot” mentioned in Local Branching.
Git is distributed. This means several things:
- Users of the code will have the entire backup of the code repository on their local machines
- There is no single point of failure with
gitunless there’s only one copy of the repository
- You can extend
gitto adapt to whatever workflow you’d like
Figure 3 shows one of the most common patterns of
git workflow that you will use in the workplace. As a software developer, you
will often find yourself working in teams wherein there will be multiple people working on the same product.
There will be times when you will work on a codebase that someone else is working on at the same time.
So how would we solve conflict issues? How will we allow multiple individuals to work on the same code at the same time?
If you guessed
git, you’d be correct. Since
git is designed in a distributed manner, multiple developers can
change the same codebase at the same time without interfering with anothers’ development process. There will be caveats
to this during merges to shared repositories, but we will go over this at a later time. When it comes to local development and
integration with others’ changes, it is hard to beat
Putting It Together - A Simple Workflow
So now that we’ve gone over some of the main concepts of
git, let’s go over a simple example
when working with a team.
Imagine that you are just joining a project where you will be working with multiple individuals on the same codebase. There are two scenarios that can happen:
- The codebase already has a shared repository (e.g. repository on GitHub)
- The codebase repository has not been created yet
Initializing an Existing Project
The first case is relatively easy to get started with. If the code already exists in a shared repository, the only initialization step you need to do is to pull the entire code history onto your local computer.
# Let's use the popular pandas library as an example
$ git clone https://github.com/pandas-dev/pandas.git
Creating and Initializing a New Project
The second case requires a little bit more setup. Since the code is not available on a shared repository, you need to create one first. We’ll also use Github as our code platform example.
git and allows large organizations/groups to develop code together.
# Before we create any project, we need a directory/folder in our file system to represent our new `git` project
$ mkdir my-awesome-git-project
# Set my current working directory to be the newly created git project directory
$ cd my-awesome-git-project
# Initialize a new `git` project on your local machine
$ git init
After completing the previous commands, you should now have a new git project on your local machine.
Making a Change on a Branch
Now that you have your project setup, it’s time to make a code change. I highly recommend making your custom changes on a separate branch rather than the “master” branch. The master branch is usually reserved for stable production code. Not to mention, since branches are so easy to make, reset, and delete, you should always use them for unstable or testing code that you don’t know if it will work. If something happens, you can always easily scrap your old branch and create a new one.
So - let’s create a new branch.
# Creates a new branch called my-dev-branch; this branch tracks the current snapshot you were at before running
# the command.
$ git checkout -b my-dev-branch
checkout is a generic command which allows you to switch between versions of different entities. An entity can be branches,
commits, files, etc.
-b is a flag that tells the
checkout command to create a new branch and set the “start point” to the current snapshot.
Now that you’re on a separate branch designated for development, make all the changes you want!
Adding Changes to the Staging Area
Now imagine that that you’ve made your changes and are relatively happy with it. Now it’s time to stage your changes for review!
I typically split this part into two steps: basic code/file change spot checking, adding changes to the staging area.
Line Difference Spot Checking
If you’re like me, you probably want a way to spot check your changes before staging them. The easiest way to do this is to compare the changes in your working directory against the previous historical commit.
# Check the line differences between your new changes and the last commit
$ git diff
Add to Staging
If everything in the previous step looks good, you can go ahead and add the changed files into your staging area.
# Adds all the changed files into the staging area
$ git add .
# Adds a specific file called "awesome_file.py" to the staging area
$ git add awesome_file.py
Remove From Staging
If you notice that you accidently added a change that you don’t want to be bundled into this commit, you can remove the specific change from the staging area.
# Remove a file called "remove_file.py" from Staging
$ git restore --staged remove_file.py
# Remove all files from Staging
$ git restore --staged .
File Spot Checking
After you’re satisfied with adding/removing files from the staging area, you can do a final comprehensive list check of all the files that will be changed.
# List all the modified entities
# On branch master
# Your branch is up to date with 'origin/master'.
# Changes to be committed:
# (use "git restore --staged <file>..." to unstage)
# new file: content/post/basic-git/branch-flowchart.png
# new file: content/post/basic-git/centralized-workflow.png
# new file: content/post/basic-git/featured.png
# new file: content/post/basic-git/git-add-and-commit.png
# new file: content/post/basic-git/index.md
# new file: content/post/basic-git/integ-workflow.png
$ git status
Commiting the Staging Changes
Assuming that by now you’re happy with all the changes in the Staging area, you can now bundle everything into a commit.
A git commit is basically a snapshot within your code timeline. It assures that at this point in time, you have the existing code (historical code) that you haven’t touched, along with a new set of changes (in your staging area) that you want to put a timeline stamp on.
# Create a commit with a short message
$ git commit -m "Added some changes..."
# Create a commit using the commit wizard
# INFO: Requires knowledge of vim if done through the CLI
$ git commit
Pushing Your Changes to the Shared Repository
Now that you have your commit, you can push this snapshot into a shared repository. Before you push your changes; however, I always recommend to pull in the latest changes in the remote repository before putting in your changes.
git by default will also ask you to do this if it detects that your commit’s parent (previous snapshot) doesn’t match
the most current snapshot in the shared repository.
Why is this the default?
Well - when you work with other developers, there will be cases where another individual finishes their changes faster than you. If they push their changes to the repository first, you will have to merge in their changes with yours and alter your timeline and fix conflicts before pushing your change to the repository.
As seen in the diagram above, your commit on
my-branch will need to incorporate the new changes done by the commit from
The easiest way to pull in the most up-to-date changes from the shared repository is using
# Pull in the most up to date changes in the remote repository
$ git pull
# If you would like to avoid merging with the pull, use fetch
$ git fetch <remote> <branch>
git pull will perform the equivalent of
git fetch +
git merge. If you don’t want to merge the updated changes from the
remote repository, use
git fetch first, and then merge on your own terms.
Once you’re sure that there will be no conflicts, you can push your commit(s) to your remote repository.
# Push your commit(s) to the shared repository
$ git push
If you hit an error saying that
git doesn’t know what branch to push your changes to, you need to set your local branch
to track a remote branch location.
You can do this by attaching it to your
git push command, or to the branch itself.
# Push to a remote branch
$ git push -u origin master
# Set your local branch to track a remote branch (in the shared repository)
$ git branch -u origin/master
# Push your changes
$ git push
Congrats for getting this far! Honestly, this has been an extremely watered down version of what
git is and what it is capable
of doing. I thought about expanding more upon the commands and examples, but the post was already getting too long.
Perhaps I will write other posts about detailed operations such as merging, rebasing, rewriting history, etc.
These topics, however, are more for advanced
git users. The commands gone over in
Putting It Together should get you familiar with some of the most
Last but not least, getting better is all about practice. The more you use
git, the better at it you become.
A Funny Story 📚
When I was in college, I had at least 2 classes wherein I had to work with a partner on a coding assignment.
Back when I first started learning how to code, I didn’t know about
git and what it could do. Instead of using
and GitHub for collaboration, my partner and I decided to use Google Drive.
As you can imagine, it was painful. It was painful to work remotely as well as in-person. For instance, only one person could work on the code at a time since we didn’t want to accidently overwrite the others’ code. Not to mention when there were bugs, we had to “trade” files over Google Drive. 😩
Years later I laugh about this incident and use it as a motivator when telling others to learn how to use
After all - it could potentially save you lots of headaches and troubles!