How to Use Git

Nov 17, 2022

Git is one of those tools that is absolutely essential to anyone who codes. Though if you are like me when I first started learning, maybe you feel a bit overwhelmed on what git is and how to use it effectively.

In this post I go over some of the major selling points of git and a simple workflow commonly used in the workplace.


What Is Git?

If you visit git’s official website, you’ll be able to see a definition of what the product is (repasted below).

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.

Let’s break down the main features mentioned.

Local Branching

Branching is the concept of having a history of code snapshots in time that is distinctly separate from other possible history of snapshots. Each snapshot is connected to previous (parent) snapshots, wherein the ‘current’ snapshot on your branch is the aggregation of all historical snapshots of your branch.

This sounds very confusing, so consider the following visual.

Git Branching
Git Branching

Each blue arrow in figure 1 represents a branch - master, develop, and topic. The most current snapshot in the master branch is C1. In contrast, the most current snapshot in develop is C5. You might notice that according to figure 1, develop also has C2, C3, and C4. These intermediate snapshots are commited changes that lies between the changes of C1 and C5. C1 is a commited change before C2. C2 is the commited change before C3, etc.

Notice that since develop is not the same “branch” as master, master does not have the C2 - C4 changes available in develop. Likewise, develop does not have the C6 and C7 changes available in topic.

On the otherhand, the opposite relationship occurs from topic to develop and develop to master. Since develop’s earliest snapshot C2 is tracking C1 on the master branch, develop should have the equivalent of the historical commits on develop, and the historical commits of master starting on C1. Likewise, C6 and C7 on the topic branch will have the historical commits from develop starting on C5, and master since C5 belongs in develop, and develop branches off from master’s C1 commit.

Staging Areas

A staging area represents an intermediate state where you can format and review your code changes before officially commiting them.

Add to Staging
Add to Staging

Suppose that you are working on some code changes that is ready to be put into a shared code repository for your team to use. Your code changes before staging them will be in your working directory. Now you must choose which files/lines you would like to attach to an official commit. The files/changes you choose to add will be shown in the staging area.

From there, you can do verification checks such as listing all the different files that have been changed that will be added to your newest commit, and looking through the line differences of changes between your new and old commit.

You do not have to stage all of the changed files in your working directory. The beauty of git is that you can pick and choose which lines and files to bundle into a commit.

Assuming that everything looks good in your staging area, you can go ahead and officially commit your changes within history. You can kind of think of this as the “snapshot” mentioned in Local Branching.

Multiple Workflows

Git is distributed. This means several things:

  1. Users of the code will have the entire backup of the code repository on their local machines
  2. There is no single point of failure with git unless there’s only one copy of the repository
  3. You can extend git to adapt to whatever workflow you’d like
Centralized Workflow
Centralized Workflow

Figure 3 shows one of the most common patterns of git workflow that you will use in the workplace. As a software developer, you will often find yourself working in teams wherein there will be multiple people working on the same product. There will be times when you will work on a codebase that someone else is working on at the same time.

So how would we solve conflict issues? How will we allow multiple individuals to work on the same code at the same time?

If you guessed git, you’d be correct. Since git is designed in a distributed manner, multiple developers can change the same codebase at the same time without interfering with anothers’ development process. There will be caveats to this during merges to shared repositories, but we will go over this at a later time. When it comes to local development and integration with others’ changes, it is hard to beat git.


Putting It Together - A Simple Workflow

So now that we’ve gone over some of the main concepts of git, let’s go over a simple example when working with a team.

Project Initialization

Imagine that you are just joining a project where you will be working with multiple individuals on the same codebase. There are two scenarios that can happen:

  1. The codebase already has a shared repository (e.g. repository on GitHub)
  2. The codebase repository has not been created yet

Initializing an Existing Project

The first case is relatively easy to get started with. If the code already exists in a shared repository, the only initialization step you need to do is to pull the entire code history onto your local computer.

# Let's use the popular pandas library as an example
$ git clone https://github.com/pandas-dev/pandas.git

Creating and Initializing a New Project

The second case requires a little bit more setup. Since the code is not available on a shared repository, you need to create one first. We’ll also use Github as our code platform example.

GitHub is a developer platform that also acts as a code hosting site. It is integrated with git and allows large organizations/groups to develop code together.
# Before we create any project, we need a directory/folder in our file system to represent our new `git` project 
$ mkdir my-awesome-git-project
# Set my current working directory to be the newly created git project directory
$ cd my-awesome-git-project
# Initialize a new `git` project on your local machine
$ git init

After completing the previous commands, you should now have a new git project on your local machine.

Making a Change on a Branch

Now that you have your project setup, it’s time to make a code change. I highly recommend making your custom changes on a separate branch rather than the “master” branch. The master branch is usually reserved for stable production code. Not to mention, since branches are so easy to make, reset, and delete, you should always use them for unstable or testing code that you don’t know if it will work. If something happens, you can always easily scrap your old branch and create a new one.

So - let’s create a new branch.

# Creates a new branch called my-dev-branch; this branch tracks the current snapshot you were at before running
# the command.
$ git checkout -b my-dev-branch

checkout is a generic command which allows you to switch between versions of different entities. An entity can be branches, commits, files, etc.

-b is a flag that tells the checkout command to create a new branch and set the “start point” to the current snapshot.

Now that you’re on a separate branch designated for development, make all the changes you want!

Adding Changes to the Staging Area

Now imagine that that you’ve made your changes and are relatively happy with it. Now it’s time to stage your changes for review!

I typically split this part into two steps: basic code/file change spot checking, adding changes to the staging area.

Line Difference Spot Checking

If you’re like me, you probably want a way to spot check your changes before staging them. The easiest way to do this is to compare the changes in your working directory against the previous historical commit.

# Check the line differences between your new changes and the last commit
$ git diff

Add to Staging

If everything in the previous step looks good, you can go ahead and add the changed files into your staging area.

# Adds all the changed files into the staging area
$ git add .

# Adds a specific file called "awesome_file.py" to the staging area
$ git add awesome_file.py

Remove From Staging

If you notice that you accidently added a change that you don’t want to be bundled into this commit, you can remove the specific change from the staging area.

# Remove a file called "remove_file.py" from Staging
$ git restore --staged remove_file.py

# Remove all files from Staging
$ git restore --staged .

File Spot Checking

After you’re satisfied with adding/removing files from the staging area, you can do a final comprehensive list check of all the files that will be changed.

# List all the modified entities
# On branch master
# Your branch is up to date with 'origin/master'.
#
# Changes to be committed:
#   (use "git restore --staged <file>..." to unstage)
#         new file:   content/post/basic-git/branch-flowchart.png
#         new file:   content/post/basic-git/centralized-workflow.png
#         new file:   content/post/basic-git/featured.png
#         new file:   content/post/basic-git/git-add-and-commit.png
#         new file:   content/post/basic-git/index.md
#         new file:   content/post/basic-git/integ-workflow.png
$ git status

Commiting the Staging Changes

Assuming that by now you’re happy with all the changes in the Staging area, you can now bundle everything into a commit.

A git commit is basically a snapshot within your code timeline. It assures that at this point in time, you have the existing code (historical code) that you haven’t touched, along with a new set of changes (in your staging area) that you want to put a timeline stamp on.

# Create a commit with a short message
$ git commit -m "Added some changes..."

# Create a commit using the commit wizard
# INFO: Requires knowledge of vim if done through the CLI
$ git commit

Pushing Your Changes to the Shared Repository

Now that you have your commit, you can push this snapshot into a shared repository. Before you push your changes; however, I always recommend to pull in the latest changes in the remote repository before putting in your changes.

git by default will also ask you to do this if it detects that your commit’s parent (previous snapshot) doesn’t match the most current snapshot in the shared repository.

Why is this the default?

Well - when you work with other developers, there will be cases where another individual finishes their changes faster than you. If they push their changes to the repository first, you will have to merge in their changes with yours and alter your timeline and fix conflicts before pushing your change to the repository.

gitGraph commit commit branch my-branch branch other-branch checkout my-branch commit checkout other-branch commit checkout main merge other-branch

As seen in the diagram above, your commit on my-branch will need to incorporate the new changes done by the commit from other-branch.

The easiest way to pull in the most up-to-date changes from the shared repository is using git pull.

# Pull in the most up to date changes in the remote repository
$ git pull

# If you would like to avoid merging with the pull, use fetch
$ git fetch <remote> <branch>
git pull will perform the equivalent of git fetch + git merge. If you don’t want to merge the updated changes from the remote repository, use git fetch first, and then merge on your own terms.

Once you’re sure that there will be no conflicts, you can push your commit(s) to your remote repository.

# Push your commit(s) to the shared repository
$ git push

If you hit an error saying that git doesn’t know what branch to push your changes to, you need to set your local branch to track a remote branch location.

You can do this by attaching it to your git push command, or to the branch itself.

# Push to a remote branch
$ git push -u origin master

# Set your local branch to track a remote branch (in the shared repository)
$ git branch -u origin/master
# Push your changes
$ git push

Conclusion

Congrats for getting this far! Honestly, this has been an extremely watered down version of what git is and what it is capable of doing. I thought about expanding more upon the commands and examples, but the post was already getting too long. Perhaps I will write other posts about detailed operations such as merging, rebasing, rewriting history, etc.

These topics, however, are more for advanced git users. The commands gone over in Putting It Together should get you familiar with some of the most basic operations. Last but not least, getting better is all about practice. The more you use git, the better at it you become.

A Funny Story 📚

When I was in college, I had at least 2 classes wherein I had to work with a partner on a coding assignment. Back when I first started learning how to code, I didn’t know about git and what it could do. Instead of using git and GitHub for collaboration, my partner and I decided to use Google Drive.

As you can imagine, it was painful. It was painful to work remotely as well as in-person. For instance, only one person could work on the code at a time since we didn’t want to accidently overwrite the others’ code. Not to mention when there were bugs, we had to “trade” files over Google Drive. 😩

Years later I laugh about this incident and use it as a motivator when telling others to learn how to use git. After all - it could potentially save you lots of headaches and troubles!

Yiping Su
Yiping Su
Engineering | Analytics

I am interested in data, software engineering, and the application of computer science concepts in real-world scenarios.

Related