You are here:

Git

Git is a decentralized version control system and content management tool. It allows developers and teams to manage projects by maintaining all versions of files, past and present, allowing for reversion and comparison; facilitating exploration and experimentation with branching; and enabling simultaneous work by multiple authors without the need for a central file server. It can be used offline for version control and revision history or in conjunction with a remote repository to make working in teams easier and safer.

It is important to note that Git itself is not a tool for backing up files. The loss of a local Git repository in connection with a file system failure is permanent unless a remote copy of the repository exists.

There is a short training video that parallels some of the topics discussed in this page.

Resources

Git binary

While it's possible to use git on most systems without configuration, we strongly recommend using the most recent version of Git, which can be accessed through the module:

module load git

This should prevent any problems that may arise as a result of version incompatibilities.

Remote Git repositories

The Center for High Performance Computing maintains a GitLab Community Edition server for users who are interested in collaborating and sharing internally. You can log in with your University of Utah credentials (your ID and password).

Alternatively, third-party hosting services can be used; some of the most popular are GitHub, GitLab, and Bitbucket. Each has its strengths and weaknesses, so seek out reviews, policies, and recommendations before you start.

Quick reference

This is intended for users who have some experience with Git. If you haven't seen these commands before, consider reading through the brief tutorial first.

Command Description
git help operation Read more about operation (e.g. git help push)
git init Create a Git repository in the current directory (if it doesn't already exist)
git clone URL destination Copy the project at URL into the (new) directory destination
git remote add remote_name URL Add a remote named remote_name with location URL; the primary remote is typically named origin
git config user.name "Firstname Lastname" Set your name to Firstname Lastname (use --global to change this globally)
git config user.email "firstname.lastname@utah.edu" Set your email to firstname.lastname@utah.edu (use --global to change this globally)
git status Display the status of the current branch (shows which files are present in the staging area)
git diff --cached Display what will be committed; alternatively, use git diff to show any conflicts
git log --stat --summary Display an overview of the project history, including the summary (commit message) and changes
git add filename other_file Add filename and other_file to the staging area
git rm --cached filename Remove filename from the staging area
git commit -m "message" Create a new commit with description message; git commit -a automatically commits any modified (but not new) files
git pull remote_name branch_name Fetch commits on branch branch_name of the remote remote_name; when set up, you can use git pull
git push remote_name branch_name Push commits on branch branch_name to the remote remote_name; when set up, you can use git push
git checkout -b branch_name Create (and switch to) new branch branch_name
git checkout branch_name Switch to (existing) branch branch_name
git branch Display the branches available; marks the current branch
git branch -d branch_name Delete the branch branch_name
git merge branch_name Merge commits in branch branch_name into the current branch (if there are no conflicts)

Sample usage

This is a small sample of how you might set up a Git repository on GitLab to share your work with others. For an in-depth explanation of the steps, refer to the tutorial section.

  1. Create or locate a remote repository on GitLab (or another service). The URL of this project will be of the form https://gitlab.chpc.utah.edu/gitlab-user/project-name.
  2. Create a local repository in a directory on your computer.
    • Without an existing (remote) repository:
      $ module load git
      $ cd your_directory
      $ git init
      $ git remote add origin https://gitlab.chpc.utah.edu/gitlab-user/project-name

      Crucially, gitlab-user is not necessarily your university ID. To determine what should be used here, sign in to GitLab and locate your user ID. This can be changed in your settings and it may be a good idea to use your university ID. You can also refer to the "Create a project on GitLab" section to determine the URL you should use.

    • From an existing repository:
      $ module load git
      $ git clone https://gitlab.chpc.utah.edu/gitlab-user/project-name your_directory
      $ cd your_directory

      Again, gitlab-user may be something other than your university ID. Refer to the URL of the project on GitLab to determine what to use.

  3. Stage and commit your files. Refer to the "Edit and stage your files" section for more information about adding files to the index and the "Commit your changes" section for information about commits. You can exclude certain files with the .gitignore file.
    $ git add .
    $ git commit -m "This is a description of the commit!"
  4. Push your changes to the remote.
    $ git push origin master
    If you are collaborating with others or working from multiple computers, it may be a good idea to use the git pull command first. See the "Conflicts" section for an explanation.

Brief tutorial

This is not meant to be a comprehensive guide to Git; in fact, it makes many generalizations and has no mention of many important features. It is meant only to introduce some of the concepts of version control and cover the commands necessary to get started. If you are looking for a more comprehensive tutorial or specific information, please try the official tutorial.

This tutorial assumes you're using, or plan to use, a remote repository on the Center for High Performance Computing instance of GitLab. The process should be very similar for other hosting providers.

Create a project on GitLab

If you plan to share your work with others, you'll likely need a remote repository to ensure availability. If you're using GitLab, this can be done by creating a new "project." The project contains the remote repository and adds additional features, like a description, wiki, and editing tools that can be used in an Internet browser. Each project has a "visibility level" for security. "Private" (default) requires you explicitly grant access to each user who will be working on (or simply viewing or cloning) the project, "Internal" allows all authenticated users to view or clone the project (but editing privileges must still be granted explicitly), and "Public" allows anyone to view or clone the project. It's also possible to create projects for groups of users, which is recommended if you have many projects with similar permissions.

You can use HTTPS or SSH when transferring files to and from your computer. When using HTTPS, you must sign in with your university ID and password (as you would on the GitLab website), while with SSH, you generate a pair of keys and create a single password. This decision is largely based on personal preference. The remainder of this tutorial will use HTTPS for consistency. In most cases, you won't want to use the URL given by the project page when using HTTPS. Instead, use the URL of the project page itself (you can copy it directly from your browser). For instance, instead of https://gitlab-user@gitlab.chpc.utah.edu/gitlab-user/project-name.git, use https://gitlab.chpc.utah.edu/gitlab-user/project-name. This will prompt you for both your username and password when pushing changes to the remote instead of assuming your username (in this case) is "gitlab-user," which is often different than your university ID, which must be used for authentication.

Create a local repository

Without an existing repository (new project)

To start using Git, you'll need to initialize it in the directory of your project.

$ module load git
$ cd your_project_directory
$ git init
$ git remote add origin https://gitlab.chpc.utah.edu/gitlab-user/project-name your_project_directory

From an existing project

You can copy an existing repository to your own computer with the git clone command.

$ module load git
$ git clone https://gitlab.chpc.utah.edu/gitlab-user/project-name your_project_directory
$ cd your_project_directory

Getting started

Verify that the local repository exists

To verify everything's worked up to this point, run git status in your project directory.

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	your_files/

nothing added to commit but untracked files present (use "git add" to track)

If this didn't work, you'll receive an error. If this happens, check your version of Git and the directory you're in and try again. The remainder of this tutorial assumes everything is working as intended, so it's best to resolve any issues now.

$ git status
fatal: Not a git repository (or any parent up to mount point /your/home/directory)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

Configure your name and email

This step is important, but often neglected. Your commits will be associated with the email address you provide here (including on most third-party hosting services) and your name will help colleagues identify you.

$ git config user.name "Firstname Lastname"
$ git config user.name "firstname.lastname@utah.edu"

If you plan to use the same name and email address for all of your projects, you can configure them globally.

$ git config --global user.name "Firstname Lastname"
$ git config --global user.name "firstname.lastname@utah.edu"

This saves your information and uses it for all projects (unless explicitly changed on a given project).

Edit and stage your files

You can use any editor to modify your files and run git status periodically to view their status. You'll notice that any "untracked" files are listed under a heading that says "use 'git add <file>...' to include in what will be committed"; a command like git add your_file_name.ext will add the file to the staging area, officially called the index, which contains all of the changes that will be made with your commit. If you're satisfied with all of your changes, it's possible to add them to the staging area with git add *. You can exclude files selectively with the .gitignore file. Once you've added files to the staging area, they will be visible when running git status. You can remove files from the staging area with git rm --cached your_file_name.ext.

Commit your changes

Once you're satisfied with the state of your project (specifically, the state of the staging area), you should "commit" your changes. This is analogous to saving and backing up your work (it's important to remember that Git alone should not be used to back up files: you could still lose them). You can compare commits and, if necessary, revert a file to a commit. Each commit contains a brief message about the changes that were made, and the easiest way to do this is with the git commit -m "This is your message" command. You can create as many commits as you'd like before pushing your work to a remote repository.

Push your changes to the remote

Conflicts

Git has no system to prevent collaborators (or even individuals working with multiple branches) from having two entirely different versions of the same file. Comparing and merging documents has always been tedious, but many methods of facilitating collaboration have developed in recent years to make it easy or even unnecessary. Some software, like the content management system used to edit the website you're reading now, requires users "check out" a document before editing (much like a library, once it's been checked out, nobody else can use it). Others, like Google Docs, allow people to work simultaneously and display changes in real-time but require a consistent Internet connection and only allow for one version of a file (there's little room for independent testing). Git's solution is somewhere in the middle: it can be used offline and independently, but it allows users to discuss conflicts and makes finding them much easier. In fact, Git will prevent you from finalizing your changes until you have (potentially) resolved all conflicts with other versions. In other words, you must git pull the most recent version from the remote before you can git push your own. If there are potential conflicts, they're identified (use git diff to see them) at this point. You should try your best to manually fix any conflicts: after you've pulled the more recent version of the file, you are now able to push your own, regardless of whether you've corrected any problems. This system allows all developers to work simultaneously without worrying about what others are doing, but it only works if everyone knows how to use it. It's still possible to overwrite someone else's work, but this allows for a much more dynamic development process than other methods and stays out of the way when not needed. For instance, two people can edit the same paper simultaneously. Each time there is a difference in the text, the better option can be chosen, or a new one written, to create an entirely new document with work from both contributors. No time or effort is wasted in comparing text that is the same in both versions.

General process

When you're ready to push your changes, it's generally a good idea to git pull. Often, this won't cause any problems and you can proceed with your git push. However, if there are conflicts, you will receive a warning:

$ git pull origin master
Username for 'https://gitlab.chpc.utah.edu': your_id
Password for 'https://your_id@gitlab.chpc.utah.edu':
From https://gitlab.chpc.utah.edu/gitlab-user/project-name
 * branch            master     -> FETCH_HEAD
Auto-merging your_file_name.ext
CONFLICT (content): Merge conflict in your_file_name.ext
Automatic merge failed; fix conflicts and then commit the result.

The file with the conflict will be modified to contain both versions:

<<<<<<< HEAD
This is an example of what it might look like. This is from the first version.
=======
This is from the second version!
>>>>>>> 57a4c537d0cc429794dfed77d02e5a1bfca9d91b

The differences can be identified with the git diff command and should be resolved manually. When you're satisfied with the files, add them to the staging area and create a new commit. Now, you can proceed with git push:

$ git push origin master

If everything worked, your changes should now be available on the remote. Check on GitLab to see if everything worked as expected.

Branching

Create and use branches

Branches allow developers to work on multiple versions of a project simultaneously. They can be used, for example, to test features that may or may not be included in a project. If it's decided they are to be included in the main version of the project, the branches can be merged simply and issues should be identified (as with potential issues between local and remote files). If the new version of the project isn't needed, the branch can be abandoned or deleted entirely without repercussions.

A new branch can be created with git checkout -b new_branch_name. The branch will contain the same files and commits as its origin when it is created. You can view available branches and identify the branch you're currently on with the git branch command.

git branch
* new_branch_name
  master

Now, if you modify files, they'll be modified on the new branch. If you want to switch to a different branch, you can use the git checkout command again, like git checkout master. Be sure to commit your changes on one branch before switching to another.

Merging branches

To merge one branch into another, use the git merge command. Start on the branch you'd like to merge changes into and run git merge other_branch. Everything said about conflicts between local and remote versions of a file holds for branching, too. If there have been commits in both branches, conflicts will need to be resolved manually.

Other considerations

Special files

.gitignore

The .gitignore file (a child of the project directory) is used to exclude certain files from most Git operations. The files listed in this document will not be tracked by Git (without explicit instruction). It might be used by a developer who wants to share source code but not binaries or a scientist working with sensitive information publishing his or her tools while ensuring the data itself is not available to the public.

Your .gitignore file uses patterns to exclude files. As a result, if the files you are adding are similar, you can simplify the process. For instance,

experiment.out
testing.out
case1.out
case2.out

might become (assuming all files ending in ".out" are to be excluded)

*.out

You can read more about patterns on the Git documentation.

README.md

The README file (a child of the project directory) describes a project and provides important information to potential users and contributors. It's typically displayed on the main page of a project on services like GitHub and GitLab. Most are written with Markdown syntax and named README.md. This is where people tend to look when searching for information about your project.

Recommendations

While Git can manage binary files, it works best with plain text. For instance, if you were writing a paper, it would be a good idea to use plain text (such as LaTeX) in place of a document created with an editor like Microsoft Word. Documents saved in plain text can be compared far more easily (often side-by-side) and can usually be viewed in a browser without downloading the file.

Try to git pull the most recent version of a project before you start editing it. This way, you won't have to resolve as many conflicts when it comes time to push your changes to the remote.

Last Updated: 9/15/17