Build & Deploy Data Science Projects at Lightspeed

Holly Emblem
Towards Data Science
3 min readSep 3, 2018

--

“starry night” by Quentin Kemmel on Unsplash

(or thereabouts).

Using Repl.it’s Cloud Environment to Analyse and Share Meteorite Landings Data in Minutes

One of the biggest challenges in data science development today is quickly sharing, developing and deploying code.

Unless you’re working within a business where data science & analytics is deeply embedded within the company culture and ecosystem (and Github), chances are you’ve experienced the fun of email chains containing lines of SQL, endless .txt snippets on Slack and enough similarly named R environments to cause a real headache.

It can be challenging to share code in these environments with fellow data scientists, analysts and engineers. Data is still in some instances sitting too separate to traditional engineering departments, meaning that it’s a case of begging, borrowing and stealing code patterns and best practice.

After reading about repl.it’s shout-out from William Koehersen, I was keen to give the new platform a try. repl.it allows you to quickly build, test and share code all from a cloud environment. The good news for data folks is that repl.it supports a multitude of languages, including R and Python.

“silhouette photo of mountain during night time” by Vincentiu Solomon on Unsplash

I’ve been working with NASA’s meteorite landings data as sample data for a project around analysis of groups and when to utilise parametric vs nonparametric methods of analysis. This felt like a perfect opportunity to see how repl.it’s ability to import external data sets and provide quick analysis holds up.

While in this instance I have access to the population (known meteorites that have landed on earth), for the purposes of the project I wanted to take a sample of this data to understand if the Mass of Fell vs Found meteorites differs and also analyse if the sample comes from a normally distributed population.

Below, you can see an embed from repl.it, which imports NASA’s dataset, takes a sample, provides a Shapiro-Wilk test for normality and a boxplot visualisation to understand the distribution of the mass data, split by Fell vs Found meteorites, further:

As you can see, the repl.it environment handles importing external data and running visualisations with ease, even in Medium. I did find that currently, repl.it doesn’t seem to handle install.packages() and library() functionality as yet, so analysis using the likes of ggplot etc is currently off limits. However, for base functionality, repl.it offers an easy and fast way to share R code with your team and the wider data community. And it’s free!

Useful Resources

repl.it

https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh

--

--

Head of Insights at Rare, a Xbox Game Studio. Previous experience as a data scientist and lead. Interested in deep learning, quantum computing and statistics.