Github

Learn how to set up Github Actions so publishing your data science work becomes automatic!!

Prerequisites

  • Have a Kyso account - either on kyso.io or on your company's private Kyso installation.

  • Create a Kyso access token - follow these instructions. Save this for later!

  • Ensure your directory (or folders) contains a valid kyso.yaml file. Check the following instructions for more info.

    • You can instead include your Kyso metadata in the frontmatter of a notebook or markdown file for each project report. Kyso will search for this information on a push action.

How to publish to Kyso with Git Actions

About the kyso-push step variables:

  • Theusernameand token fields take their value from secret variables, to make the system work the user has to create the kyso auth token and define the KYSO_USERNAME and KYSO_TOKEN as explained on the github documentation.

  • Theurl has to point to your Kyso deployment. So if your company, Acme Inc. has its own Kyso instance available on https://acme.kyso.io, that is the value that has to be assigned to it.

  1. Create a .github/workflows directory in your repository on GitHub if this directory does not already exist.

  2. In the .github/workflows directory, create a file named kyso-action.yml.

  3. Copy the following YAML contents into the kyso-action.yml file:

name: Kyso Push
on:
  push:
    branches:
      - main
jobs:
  Kyso-Push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
      - run: echo ${{secrets.KYSO_TOKEN}}
      - run: npm install -g kyso
      - run: kyso login --kysoInstallUrl https://kyso.io --provider kyso --username kyle@kyso.io --token ${{secrets.KYSO_TOKEN}}
      - run: kyso push

Remember to create a repository environment variable as shown above to login in to Kyso from the CLI, rather than explicitly stating your access token in the actions script, which would potentially breach best security practices.

This means that on each new push to the repository, a Github action will be generated to run the commands in our action file above.

Remember to also ensure that your metadata specs in the sub-directory YAML files (or notebook frontmatter), are correct:

  • organization: [Your Organisation Name]

  • team/channel: [Destination channel for each report, and this might be different for each sub-directory]

  • author: [User email, or a list or emails for multiple authors]

Note that the example repo contains multiple sub-folders and so take on the type: meta, and each sub-folder will have it's own kyso.yaml file with specifications on that individual project's metadata (e.g. title, description, type, tags, etc.). Read more at the link below:

Meta Reports

Executing the CI/CD Pipeline

When we commit our work to Github, an Action will queued to execute. Navigate to the actions page on the repo (https://github.com/user/repo/actions), we'll see that our runs are executing, first by installing the Kyso CLI, logging the user in, and then publishing the report.

If we check the logs, this is what we will see:

$ npm install -g kyso
added 234 packages, and audited 235 packages in 19s
33 packages are looking for funding
  run `npm fund` for details
found 0 vulnerabilities
$ kyso login --kysoInstallUrl https://kyso.io --provider kyso --username [YOUR USERNAME] --token [YOUR TOKEN]
Logged successfully
$ kyso push
9 reports found

No new or modified files to upload in report 'Salesforce Pipeline Looker Dashboard'
No new or modified files to upload in report '10xgenomics HTML Report: Chromium Nuclei Analysis'
Uploading report 'jupyter-notebooks'
๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Report Jupyter Notebook: Graphing Mutation Ratios was uploaded to: https://kyso.io/kyso-demo/data-analyses/jupyter-notebook-graphing-mutation-ratios ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

No new or modified files to upload in report 'HTML Outputs: MultiQC Sequencing Data'
No new or modified files to upload in report 'PDF Report: Hurdle & Zero Inflated Models, Overdraft Analytics'
No new or modified files to upload in report 'PowerPoint Presentation: Knowledge Aggregation with Kyso'
Uploading report 'markdown-writeups'
๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Report Markdown Report: Engineering Process Diagrams was uploaded to: https://kyso.io/kyso-demo/data-analyses/markdown-report-engineering-process-diagrams ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

Uploading report 'nba-player-clustering'
๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Report NBA Player Clustering was uploaded to: https://kyso.io/kyso-demo/data-analyses/nba-player-clustering ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

Uploading report 'baseball-data-analysis'
๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Report Baseball Data Analysis was uploaded to: https://kyso.io/kyso-demo/data-analyses/baseball-data-analysis ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

Now we can navigate to Kyso at https://kyso.io/org-name/channel-name to see the published list of our example reports!

Maintaining a QA process with Pull Requests

By integrating Kyso into Git actions, the workflow is super flexible and you can refine this further and select if you want to integrate a specific branch, tag, or whatever else fits your specific workflow needs!

Here is a link to all the different event types that trigger these workflows:

Publishing only when a PR is merged into the main branch

So, for example, if you work with a team of scientists, data scientists, etc.., all pushing and pulling to and from the same repository, you're going to want to control how, when and what changes get published to Kyso.

Now there is ongoing debate on how to do this. For example, see this discussion:

Github > Kyso on Merge Workflow

There are a couple of different commands we can use to ensure that only when a PR is merged into our main branch is our workflow triggered. However, we can actually just use our existing action file:

name: Kyso Push
on: 
  push:
    branches:
      - main
jobs:
  Kyso-Push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
      - run: npm install -g kyso
      - run: kyso login --kysoInstallUrl https://kyso.io --provider kyso --username [YOUR USERNAME] --token [YOUR ACCESS TOKEN]
      - run: kyso push

Because a merged pull request always results in a push, we can just use the push event to accomplish our goal. So the above workflow will run when a PR is merged or when a commit is made directly to the master branch.

To make this workflow even more secure, it has been recommended that you add branch protection rules to your main branch:

Branch Protection Rules

Last updated