Is data science even real?

Photo by Samuel Zeller on Unsplash

I got one of those comments on another post the other day.  A skeptic, questioning my use of the term 'data science' and citing failures of the big data era (like Nate Silver and FiveThirtyOne's "failure" to predict Trump's win in 2016).  Like many, this commentator asked if  it's not just all hype, and 'aren't we all just programmers anyway?'

It got me thinking.  Those of us 'in the biz' have been through many years of conferences, articles and discussion to define and refine our concepts of data science and data scientists. We talk about Type A and Type B data scientists, about data science versus data engineering versus data analysis.  But we can forget that the broader world is often not so immersed in the conversations. 

And often the broader world are the people who need to employ data scientists (data scientists who then get left sitting in dark corners chasing vague corporate objectives).  Or they are the people trying to figure out how to change their enterprise reporting system into one of those fancy-pants data science/AI powered insight machines that the glossy brochures would have you think are everywhere.

So here are my thoughts on the matter.


I don’t know that I concur with the idea that big data era has been the cause of many embarrassing failures, but case studies like Nate Silver’s election prediction ‘failure’ actually highlights why I use the term ‘science’. The scientific method is predicated on a process of

observe ->

wonder / tinker ->

hypothesise ->

test ->

refine ->

repeat

It may be easy to jump on a ‘failure’ but that exact moment when we see a weakness in the current model is an exciting opportunity for scientific growth, refinement and better descriptive and predictive models. When Einstein and his peers uncovered limitations in Newtons theory of gravity, did they sit around and talk about how stupid Newton was and isn’t it great he finally got his comeuppance?

No.

They expanded, tinkered, hypothesised, tested, and moved our knowledge forward. That’s what scientists do, and what the economists (not the media pundits) did off the back of those unexpected election results.

In my experience, the term data science (as it’s actually used day-to-day, not the glossy brochure version) is justified to capture a new multidisciplinary systems discipline covering maths, programming, data base/data pipelining, and business skills.

A programmer with a straight-up computer science degree (or back end engineering experience) will struggle with the maths and statistical depth needed. Myself (a computational mathematician by training) needed to expand my skills in software engineering, data management (and communications) to play in the data science space.

Piecing together a unicorn

Photo by rawpixel on Unsplash

Photo by rawpixel on Unsplash

An ideal data science team might consist of statisticians (with better than average programming), programmers and back-end developers (with better than average maths), business analysts (with better than average technical skills) and devOps support people making it all hang together.

And I think ‘science’ is the right term.

A traditional, waterfall, enterprise business intelligence architecture was reasonably rigid, often restricting users into predefined reports and data views. Could you call it sys admin? — yes; enterprise reporting? — yes; programming? sometimes; maths/stats? — maybe later and offline; a science? — not really.

In data science, we ask our teams to answer ad hoc queries daily (observe), to provide descriptions (tinker), to predict (hypothesise) and to refine after we have tested the ideas.

And I'd argue that data science is best delivered by a team.  I know quite a few folk with the right to the label 'data scientist' but all of us have particular preference for some subset of the discipline.  That's why when you hire Symbolix, you hire the team.  You will get a main contact for keeping the project on track and making sure you are part of the team too, but if we need to pull in more fire-power in a particular area, we will.

Want to learn more about how we do data science?  Drop us a line let us know what problem you are currently grappling with.  We'd be happy to chat.