Github
Learn how to set up Github Actions so publishing your data science work becomes automatic!!
Prerequisites
Have a Kyso account - either on kyso.io or on your company's private Kyso installation.
Create a Kyso access token - follow these instructions. Save this for later!
Ensure your directory (or folders) contains a valid
kyso.yaml
file. Check the following instructions for more info.You can instead include your Kyso metadata in the frontmatter of a notebook or markdown file for each project report. Kyso will search for this information on a push action.
How to publish to Kyso with Git Actions
About the kyso-push
step variables:
The
username
andtoken
fields take their value from secret variables, to make the system work the user has to create the kyso auth token and define theKYSO_USERNAME
andKYSO_TOKEN
as explained on the github documentation.The
url
has to point to your Kyso deployment. So if your company, Acme Inc. has its own Kyso instance available on https://acme.kyso.io, that is the value that has to be assigned to it.
Create a
.github/workflows
directory in your repository on GitHub if this directory does not already exist.In the
.github/workflows
directory, create a file namedkyso-action.yml
.Copy the following YAML contents into the
kyso-action.yml
file:
Remember to create a repository environment variable as shown above to login in to Kyso from the CLI, rather than explicitly stating your access token in the actions script, which would potentially breach best security practices.
This means that on each new push to the repository, a Github action will be generated to run the commands in our action file above.
Remember to also ensure that your metadata specs in the sub-directory YAML files (or notebook frontmatter), are correct:
organization: [Your Organisation Name]
team/channel: [Destination channel for each report, and this might be different for each sub-directory]
author: [User email, or a list or emails for multiple authors]
Note that the example repo contains multiple sub-folders and so take on the type: meta
, and each sub-folder will have it's own kyso.yaml file with specifications on that individual project's metadata (e.g. title, description, type, tags, etc.). Read more at the link below:
Executing the CI/CD Pipeline
When we commit our work to Github, an Action will queued to execute. Navigate to the actions page on the repo (https://github.com/user/repo/actions
), we'll see that our runs are executing, first by installing the Kyso CLI, logging the user in, and then publishing the report.
If we check the logs, this is what we will see:
Now we can navigate to Kyso at https://kyso.io/org-name/channel-name to see the published list of our example reports!
Maintaining a QA process with Pull Requests
By integrating Kyso into Git actions, the workflow is super flexible and you can refine this further and select if you want to integrate a specific branch, tag, or whatever else fits your specific workflow needs!
Here is a link to all the different event types that trigger these workflows:
Publishing only when a PR is merged into the main branch
So, for example, if you work with a team of scientists, data scientists, etc.., all pushing and pulling to and from the same repository, you're going to want to control how, when and what changes get published to Kyso.
Now there is ongoing debate on how to do this. For example, see this discussion:
There are a couple of different commands we can use to ensure that only when a PR is merged into our main branch is our workflow triggered. However, we can actually just use our existing action file:
Because a merged pull request always results in a push, we can just use the push event to accomplish our goal. So the above workflow will run when a PR is merged or when a commit is made directly to the master branch.
To make this workflow even more secure, it has been recommended that you add branch protection rules to your main branch:
Last updated