Introduction to Git

Learn the power of Git so you can make use of Kyso's Github integration and automate your data science workflow! This is a simple beginner's guide for getting started with Git.

One of our most popular features to date has been our integration with Github. Syncing Github repositories with Kyso means that every commit you push to Github will be automatically reflected on your Kyso dashboard.

However, some of you may not have used Github before, or even know what Git is. This tutorial is for absolute beginners, designed to get you up & running immediately with some simple commands.

Why use Git?

Git is a Version Control System (VCS). Real life projects typically have multiple engineers, data scientists and analysts working simultaneously. A VCS monitors and keeps track of all changes to files in a given project, ensuring there are no code conflicts between different developers. It stores content and maintains a history of the changes made - which allows team members to revert back to an older version of the code.

How does it work?

Git has a remote repository which is stored in a server and a local repository which is stored in the computer of each engineer. A repository is a folder whose contents are tracked by Git. This means that the code is not just stored in a central server, but the full copy of the code is present in all the engineers’ computers. In this way, Git acts as a distributed VCS.

A repository may have multiple files and sub folders present within it. Usually, the files that are present within the repository contain source code. Within each repo, there is a .git folder. This folder contains all the files and folders required by Git to keep track of all the changes done to the files within this repo.

Getting Started with Git

Create a Github Account

Create a GitHub account here - it's free for public repositories.

Download and install Git

Downloading and installing Git is a fairly straightforward process. Simply follow the instructions at the link below:

Verify that Git is installed by using the following command in the command prompt:

git --version

Create a new local repository

Create a new directory, git-demo, and open it.

cd git-demo

Initialise a local Git repository in the project with:

git init

Checkout a repository

You can create a working copy of a local repository by running the command:

git clone /path/to/repository

git clone is used to clone an existing remote repository into your computer. When using a remote server, your command will be:

git clone username@host:/path/to/repository

where username will be your Github username and host is will be github in this case.

Staging & committing code

Committing is the process in which code is added to our local repository. Before committing the code, it has to be in the staging area, which is how we keep track of all the changes which are to be committed.

Any file which is not added to the staging area will not be committed. This gives us control over which files need to be committed.

You can propose changes (stage a file) using:

git add <filename>

Or to stage all changes made inside your project folder since the last commit:

git add .

Now we are ready to commit our edits. To actually commit these changes we use:

git commit -m "My Commit Message"

Status

Run git status to find out information in relation to which files are modified and which changes have been staged.

Pushing our changes

Once we have committed our changes in the local repository, we need to send those changes to our remote repository, by running:

git push -u origin master

This pushes the code from the master branch in the local repository to the master branch in the remote repository. Note that you can push your code to a different branch other than master - we'll discuss branching in just a sec.

If you have not cloned an existing repository and want to connect your repository to a remote server, you need to add it with:

git remote add origin <server>

Now you are able to push your changes to the selected remote server.

Branching

By default, Git commits are directed to the master branch. Branches are used to support multiple parallel developments, testing, and alternate analyses. We can merge branches back into the master branch upon completion/review.

Create a new branch named testing by running:

git checkout -b testing

Note that for branches to be available to others on your team, you'll also have to push it to the remote repository:

git push origin testing

You can list out all the branches in local using the following command:

git branch

Update and merge

To update your local repository to the newest commit on the remote repository, execute git pull in your working directory to fetch and merge remote changes.

Continuing with our example from above, let's say that our testing branch is ahead of master by one commit in our local repository. Let's imagine that the new code has been reviewed and it is time to merge the new model we've been testing with the rest of our project in master.

In order to merge the code from testing into master, follow these steps:

First go back to master:

git checkout master

Then run the merge command:

git merge testing

In this example, there are no conflicts - the merge is was successful. In real life, there are always conflicts, which is when we need to check the logs.

Note that we can also preview changes before merging with:

git diff testing master

Logs

We can look at the history of our repository by running git log. There are additional parameters to this command that go beyond the scope of this guide.

For a list of these, run:

git log --help

Github - the remote repository

Ok, let's set up our first remote repository using Github. Once you've created an account, on the homepage, click Start a Project to create a new Git repository. We'll call it kyso-git, then click Create Repository.

We have just created a remote repository on Github, the url of which should be https://github.com/<YourUsername>/kyso-git.git

From the command line, we can point our local repository to the remote repository using:

git remote add origin https://github.com/<YourUsername>/kyso-git.git

Or we could simply clone the new repo if we have no pre-existing work in a local repository.

git clone https://github.com/<YourUsername>/kyso-git/

Conclusion

This guide was designed for Git newbies or for those in need of a refresher, for technical, python or R savvy data teams who are not used to working with a VCS like Git. Making use of Kyso's Github integration is currently our best recommended workflow.