What do Data Scientists do?
Updated: Jun 20, 2018
SearchTeam asked Data Science Expert Dan Hnyk what actually Data Scientist do and how to enter a data science field.
What do you do as a Data Scientist?
clean the data and prepare for analysis
analyzing data sets of various sizes from data quality assurance perspective (missing values, errors, statistics)
visualizing the data (charts of distributions, tables, relations...)
model the data
transform the data
How can somebody become a Data Scientist?
have a strong domain knowledge about the problem (or have someone else who has that)
knowing maths and statistics helps
What are the traits of excellent Data Scientist?
as every engineer: focus on a product, knowing what's the goal
his work is 100% reproducible, transparent
consider data as sacred - never does some weird transformation without thinking through possible side-effects
he is never dishonest about the analysis, never hides something
never believes only his intuition, always validates that with reality
being able to handle whatever input source of the data he encounters
Do I have to know some programming language to become a Data Scientist?
In what companies are Data Scientist positions?
tech companies, not surprisingly... Any company which collects some data and is big enough to hire a dedicated data scientist (>40 people my humble guess). Data are coming from non-data products such as user behavior, acquisition, various performance, pricing models... Or data products such as performing some data research (such as surveys), evaluating a performance of some processes, web analytics, ...
What is the best way to get a entry level data science job?
knowing some open source stack to the job (so the company doesn't have to invest in you)
have hands-on experience on a real data, e.g. through Kaggle competition
if you can program (advantage of knowing Python), it's usually much easier since you can also act based on the analysis
What are the main challenges of the Data Science job?
having the data - real-world data sets are small, messy, full of empty values and errors, without proper description or documentation. It's often more important to propose a better data acquisition mechanism for better data and providing a simple analysis rather than having robust analysis on a garbage data (there is a saying "Garbage in, garbage out" and it's true). It's often not necessary to collect everything, but only a good subset and concentrate on that
predicting time taken, it's just a dark art in DS. But all of us must meet the deadlines.
What do you like mostly about Data Science job?
it's variance (non routine) - no data set is the same, there are always surprises, you never know what's going to come
infinite number of approaches to solve a given problem
rapidly developing tech stack
What tools use Data Scientist?
PC/laptop with sufficiently high memory is all you need from a hardware perspective (depending of the size of the data set)
programming stack of his choice
various input sources of the data/storage (e.g. SQL, CSV, HDF, Hadoop...)
What in you opinion is a future of data science?
automation of data insights - data science as an automated service
Are there any good data science courses? a Are there any data science certifications?