At Kyso we're building a platform upon which knowledge gained from data can be communicated to everyone on a team or within a company, in a way that is understandable to as many people as possible, for both technical and non-technical readers alike. This guide will go through how you can write a really good data science article that will appeal to your readers.
Explain clearly at the beginning what you’re article is going to be about and the data you are using. Provide some background to the topic in question if necessary & explain why you are writing the post.
Determine the logical structure of your argument. Have a beginning, middle, and end. Provide a table of contents, use headings and subheadings accordingly, which gives readers an overview and will help orientate them as they read through the post — this is particularly important if your content is complex.
Aim for a logical flow throughout, with appropriate sections, paragraphs, and sentences. Keep sentences short and straight-forward. It’s best to address only one concept in each paragraph, which can involve the main insight with supporting information, such that the reader’s attention is immediately focused on what is most important.
There are many visualisation tools available to you. For static plotting or for very unique or customised plots, where you may need to build your own solution, matplotlib and seaborn are your answers.
However, if you really want to tell a story and allow the reader to immerse themselves in your analysis, interactivity is the way to go. Your two main options are Bokeh and plotly. Both are really powerful declarative libraries and worth considering. Our team at Kyso tends to use plotly as we feel the syntax is more intuitive than Bokeh, which means plots are quicker to make and you have to spend less time searching for the various functionalities. The plotly documentation is also much more comprehensive than that of Bokeh & there are plenty of examples of every graph you can think of.
Regardless of the one you choose, we recommend picking the one library and sticking with it until there’s a compelling reason to switch.
Altair is another (newer) declarative, lightweight, plotting library, and merits consideration. Check out its gallery here.
As discussed above, plotly has become the leader for interactive visualisations, which beautifully render on Kyso. Its efficiency and functionality goes above and beyond any other library out there. However, because plotly stores the data for each plot generated in the user’s browser session and renders every interactive data point, it can really slow down the load time of the post if you are working with multiple plots or with a very large dataset, which negatively impacts the reader’s experience.
If you are generating a lot of graphs or are working with very large datasets but wish to retain the interactivity, use Bokeh or Altair instead.
When plotting make sure to have explanatory text above or below the chart — explain to the reader what they are looking at, and walk them through the insights and conclusions drawn from each visualisation.
Label everything in your graphs, including axes and color scales. Create a legend if necessary. Your plots should be high-resolution, automatic for newer libraries like plotly and Bokeh, but if you’re using matplotlib or seaborn, execute the following code at the beginning of your notebook:
%config InlineBackend.figure_format = ‘retina’
to configure the Jupyter backend to apply retina display mode.
If you are writing a guide or a particularly technical post, in which you are constantly referring to the code, you should show the code cells by default. On Kyso, you can toggle the code visibility in the top right-hand corner of the post — you can show code input or both input & output.
Similar to Github repositories, Kyso posts can be forked with all attached files in order to use, make edits and extend analyses.
It's always a good idea to follow coding best practices when developing a data science project or publishing research, including using the correct directory structure, syntax, explanatory text (or comments in the code cells), versioning, and, most importantly, making sure all relevant files and datasets are attached to the post.
Don’t just taper off at the end of the article or finish with the final point in the main body. Give the reader a quick summary of what they have learned, explain how the insights gained impact the business and how your team members should apply this knowledge in their own work.
For fellow creators or other members of the data team, have a call to action — perhaps a recommendation for extending the analysis.
Before you publish your work, return to the basic questions you wanted to answer at the beginning. Have you answered all of them? Have you done your best to make it as easy as possible for your readers to understand your work?
Be sure to add an appropriate title, description, tags, and preview image in the settings tab of your post or in the kyso.yaml file on the Github repository. This is important for organising the team’s work on Kyso. For open-source writers, it helps when sharing and promoting your work around the web — presentation is key.