Simone Says: What is a data scientist and how could one help my business?
As of mid-2017, Airbnb is valued at $31 billion. It’s an impressive sum and a testament to Airbnb’s acumen – but even the property giant has issues with hiring quality data scientists.The tech industry at large is in the middle of a well-documented skill gap, with data science and cybersecurity representing a significant deficiency.
Airbnb’s approach is novel: it has created its own Data University to address its data skill requirements, with 630 employees on record as participating in the programme so far.
“Data is essential to us at Airbnb,” said Tim Rathschmidt of Airbnb. “Through Data University, we’re creating a network of recognised data experts who will become the front line for answering data-related questions on their teams.”
A day in the life of a data scientist
As per Simone Pampuri, Statwolf’s CEO, a data scientist isn’t so much a one-size-fits-all descriptor, but instead it’s someone with three main technical skills:
- Machine learning: They must have theoretical and practical knowledge to use, interpret, and improve on given data.
- Database and data frameworks: They must know how to manage structured/unstructured data and how the data processing framework works.
- Coding: They should be proficient in a coding language.
While there are many professionals with some combination of the above, Simone thinks that there is a lack of professionals who are truly proficient in all three. Of course, certain additional personality traits are integral too.
“Above all else,” Simone says, “they should be curious. Why does data look a certain way? Why is some data missing? Asking these questions (and finding the answers) is the difference between a person who knows how to use certain software and someone who is a data scientist.”
For data scientists, there is no real “typical” day – rather it breaks down by project progress. In the early stages of a project, a data scientist will spend their time structuring datasets, cleaning, and undergoing any coding work that’s necessary. The cleaner the data, the better the potential for success in the project.
While the early stages have a sense of rigor about them, the latter stages of a project can vary widely depending on the brief. Generally, the data scientists will spend much of their time working on the machine learning aspect of the project – which is why the theoretical and practical knowledge is so important.
In brief, a typical data science project can be broken into five ordered steps with regimented, top-line tasks:
1. Data collection
Data collection typically involves:
- Collecting and collating data from various sources.
- Readability assessment: conversion and parsing.
- Usability assessment: aggregation and alignment.
2. Data cleaning
Again, the cleaner the data, the better the chances of a successful project so this step is vital. It involves:
- Preparing data for analysis.
- Quality improvement: reconciliation and missing data handling.
- Correctness check: denoising and outlier detection.
3. Data modelling
Data modelling takes the fundamental goals of the project and translates them within the context of data science. This stage involves:
- Transforming business goals into mathematical language.
- Ensuring that your machines’ computational power is adequate to your needs and proceeding with features extraction from your dataset before processing it with the algorithm. Datasets with a lot of noise (i.e. imprecise data) can lead to less precise results while complex calculus requires high computational power – so both play a part in the data modelling process.
- Multiple approaches: building multiple models and using various algorithms in order to make datasets leaner and “easier to learn” by the algorithm.
The testing phase is integral to the success of the project and includes:
- Evaluation and iteration.
- Comparison step: anticipating the best models and algorithms.
- Evaluation blind: analysing new data performances.
The final stage essentially makes the project ‘real’ by applying it in actuality. Steps include:
- Real world testing.
- Implementation: building modules for online testing.
- Next phase design: understanding architecture requirements and use-cases.
The anatomy of a successful data science project
While Airbnb may have the budget to build a data programme/university to upskill its team, partnering with a reputable third party is more efficient and likely to create a successful data science project.
Data science relies on complex skills but a team needs time and expertise. Building a data science team from scratch can be expensive and, often, is inaccessible to many small or medium-sized organisations.
Partnering with a third party gives the organisation the opportunity to learn and grow as they deal with and come to understand the challenges, constraints, and successes of a project. Once the processes are in place, the organisation can then take its world-worn know-how and apply it to slowly building out a data team of its own.
If you do go down the third-party route, Simone advises acknowledging that a data science project isn’t an IT project.
“In IT, the process is pretty straightforward,” he says. “You have a list of technical requirements; you work to meet them all; and then you test your solution until it works. If you work on it, you’ll get the desired result.
“In data science, however, it all depends on data: that is to say that you can’t guarantee success until you start exploring your data. You might have the data you need; you might not. However, exploring your data is only ever beneficial to companies: data always reveal precious information. Even if the solution you had in mind isn’t feasible (yet), you will likely find new insights that you can use to better your company and grow your bottom line.”
Lastly, Simone stresses the importance of goal setting and expectation management. The goals set by management need to align with the capabilities of the data. Any data science project should be a collaboration – a growing partnership centred on setting ambitious but realistic goals.
Ultimately, the project success lies in working together to meet the goals you set.
Want to use data science to get ahead in 2018?
Whatever your data science goals are, Statwolf's expertise can help you find the best solution for you and your business.
Our team of data scientists have worked on predictive maintenance programmes around the world so can help you with any queries you might have.
Want to make sense of your data? Download our comprehensive guide: The Predictive Maintenance Cookbook.