You have started to get interested in data science recently, and always find it interesting spending time thinking about the best way to analyze your data. Your background isn’t computer science, though you’ve studied lots of statistics and have may have every run a few experiments. You do a bit of programming and are decent at math. So you find yourself pondering what you can do after you finish your current project and would like to change fields for a different number of reasons.
You’re considering getting a data science job and want to harness the strengths developed during your graduate study. You want to take advantage of your intellectual curiosity. You want to maintain the rigor of experimental design you took part in. And you enjoy working with others and debating the nuanced interpretation of results. Lastly, you’ve found that you enjoy math and programming and feel that you could make the transition if you just knew how to get started.
What do data scientists get hired to do
One important step to properly get started in a new field is to understand what it is that the job / role actually does. Unfortunately (or fortunately), the field (data science) is so big right now that what matters differs drastically from job to job. Some data scientists mostly build data plumbing. Some data scientists mostly build data cleaning services. Some data scientists do academic-style research. Some data scientists do a mix of all of the above to varying degrees. Some roles want natural language processing (NLP) skills, others want MapReduce experience, others want Hadoop, while others want Pig, and some others still want Spark. You’ll hear about people using SPSS, others using Incanter, others using Python, others using R, others using Weka, and yet another group using Scala. It’s enough to demotivate you as you don’t want to learn something totally irrelevant to your area of specialization or data science job you’d find interesting.
When in doubt, look at the data
This should come as no surprise to you, but what’s important is to look at the data when you’re not sure. To better understand what data scientists get hired to do, here’s what we’re going to do. We’re going to look at Indeed (a career website) and look at the first 3 pages of search results for the keyword "data scientist". This will cover 30 job postings. We will ignore the job advertisements and focus on the ones listed. For each listing, we’ll go into it and figure out what the data scientist would be doing when hired. Then we’ll put together a list of tasks that appeared. Note, the results may vary when you are reading this, as this search is being done today (December 16, 2014).
Note:, though we are only going to do Indeed, here are a list of recommended job websites you should take a look at...
The URL that we’ll use is the following one => http://www.indeed.com/jobs?q=%22data+scientist%22&l= and the search will be for "data scientist". Note, this search is based on "united states" as the country because my IP and cookies are all based in the United States.
Here is are the responsibilities and accountabilities:
Able to advice senior management in clear language about the implications of their work for the organization
Acquire, clean, and structure data from multiple sources
Actively be involved in growing the group and streamlining the people and processes to increase efficiency and effectiveness.
Actively engage in hands on development of offerings, sales kits and other marketing materials.
Analyze, report, and present on findings and behavioral trends
Assist in the implementation of the results of modeling and analysis through partnership with Clinical Operations, IT and other organizational entities
Assists business with casual inferences & observations with finding patterns , relationships in data
Author white papers and technical blog posts
Become a local marketing subject matter expert and adapt to our fast-paced environment
Build a super-fast person search engine with billions of documents
Build and deploy data analysis systems for large data sets
Collaborate with business and technical teams to formulate the problem, recommend a solution approach and design a data architecture
Collect and manage large data sets to perform complex data analysis, communicate the results and their implications to the business stakeholders
Conceive of and develop tools to minimize risk of experimentation
Constantly develop professional knowledge and skills
Constantly explore new opportunities, niches and trends to identify and develop new value-add offerings
Consult internal groups in the use and integration of machine learning
Coordinate with the Information Systems team in gathering data and implementing solutions
Customer churn scoring
Customer conversion scoring and optimization
Define and manage collaboration initiatives with outside research partners from business and academia
Define and set up internal and external cloud computing environment including Hadoop clusters, parallel computing software, and applications and algorithms to process large amounts of data.
Deploy algorithms to enable specific business applications
Design and develop automated data analysis frameworks for systematic analysis for advanced data applications
Design and implementation of pre-processing and warehousing pipelines for biomedical data
Design experiments to answer targeted questions and communicate informed conclusions and recommendations
Design experiments to identify casual factors
Design, build and deploy a large scale Record Linkage system to find relationships among 7+ billion person records
Determine the best analysis methods leveraging statistical and analytical best practices
Develop algorithms for optimal device control
Develop analysis plans and implement appropriate modeling techniques to answer complex business questions
Develop and execute Actionable segmentation
Develop and execute Advanced Predictive Modeling
Develop and execute Demand Forecasting
Develop and execute Digital Intelligence
Develop and execute end to end analytics solutions to drive profitable customer growth
Develop and execute Marketing Performance
Develop and execute Pricing Optimization
Develop and execute ROI Enhancement
Develop and execute Single View of Customer
Develop and execute Web Analytics
Develop and monitor health-related outcomes and other metrics to support the monitoring and evaluation of current and newly emerging products
Develop and optimize real-time data-driven algorithms that optimize viewer quality of experience across devices, networks, and content
Develop metrics and prototypes that can be used to drive business decisions
Develop new algorithms and methods for optimizing revenue and key performance metrics.
Develop one off experiments for large company initiatives and design the statistical analysis of the results
Develop predictive and descriptive models using advanced procedures
Develop predictive and prescriptive statistical or behavioral models
Develop predictive models for important business- and people-centered outcomes
Develop product offerings through careful consideration of business value and data analyses
Develop roadmap for algorithmic bidding platform to optimize digital marketing investments
Develop software, algorithms and applications to apply mathematics to data, perform large scale experimentation and build data driven apps to translate data into intelligence, solve a variety of business problems and enable business strategy
Develop, test and validate algorithms
Development of data visualization and analysis tools
Drive change by closely collaborating with internal stakeholders in Data Science, Website Engineering and Category Management
Drive the collection of new data and the refinement of existing data sources
Enhance data analytics teams understanding of machine learning techniques and algorithms through consulting, training, and seminars
Evaluate and optimize the people search engine
Experience in dealing with real world data in one or more of the following areas: machine learning, data science, probabilistic inference and/or computational statistics
Explore billions of records, research and develop predictive models and optimization algorithms for ad targeting and bidding on Ad Exchanges
Explore high-level, undefined ideas and business problems using unstructured, raw data
Extract insights and actionable recommendations from large volumes of data
Formulate business problems, translate them into data science projects and provide solution approaches
Grow our real-time internal data intelligence API
Grow our service provider base by identifying & improving recruiting & retention drivers
Help build and manage US-based and EU consulting practices
Help the business understand and evaluate data science use-cases appropriate for their businesses
Ideation, prototyping and creation of intellectual property
Identify high impact areas for novel proprietary algorithms
Identify methods that allow continuous and automated statistical testing to enhance the predictability of deployed models
Identify resources and courses to add to internal education program
Identify state of the art algorithms to perform core data science functions, including machine learning, optimization, and statistical analysis
Implement these models and algorithms, leveraging grid computing on Hadoop and Hive
implementation of data-driven algorithms that enhance the performance of our system.
Improve internal processes and tools to increase efficiency and spur future product innovation
Improve service reliability & quality by identifying the underlying drivers of issues
Inspire the adoption of advanced analytics and data science across different teams and functions
Institute rigorous test-and-learn methodology to achieve desired results
Integrate algorithms within current enterprise analytics platforms to support business intelligence applications
Integrate research outcomes within internal capabilities
Interpret data and communicate complex findings to leaders in HR and across the business
Isolate the incremental financial impact of the business question under investigation
Leads scoring
Leverage data mining and machine learning approaches to model and predict end user behavior
Maintain an engaged network of scholars and practitioners to maximize learning and idea exchange
Maintain familiarity with current trends in health behavior research
Maintain transparency by partnering with others to document and communicate results of analyses as well as the processes used to develop and implement analyses and predictive analytics.
Management of interdisciplinary teams on individual projects
Manipulate and analyze complex, high-volume, high-dimensionality data from varying sources
Manipulate and integrate a variety of data sources in the data preparation phase
Marketing mix modeling and planning
Media attribution
Mine experiment data for issues and unidentified wins, then automate and develop tooling around that
Mine our vast customer data to form hypotheses, deploy test & drive metrics every day.
Need to be able to link and mash up distinctive data sets to discover new insights
Organize and participate in internal and external seminars
Organize educational seminars
Participate (and lead) pre-sales activities related to consulting opportunities
Participate in building current customer base awareness
Participate in building target market awareness ... by contributing to marketing initiatives including social and media presence – presenting on events, publishing materials in related magazines and web resources, blogging, etc
Participate in cutting edge statistical analysis and predictive analytics
Participation in manuscript preparation
Partner with premier digital marketing companies to understand and suggest new opportunities, and work to test those new opportunities in a quest for additional revenue and margin
Provide exemplary Analytics consulting services fulfillment
Provide internal consulting to answer key product questions and drive product decisions
Publish in top-tier journals, file patent applications, and develop relevant applications that support the business
Rapidly create, test and improve innovative bidding algorithm to drive revenue and ROI goals
Real-time online media optimization
Research, develop, and implement predictive algorithms for our real time experimentation system
responsible for the categorization and optimization technologies that are the foundation components of our platform
Sales operation analytics
Scouting of novel technologies related to distributed architectures
Spur future product innovation
Summarize and present conclusions and solutions
Supervision of graduate students
Support relationships between development and client services so that all are appropriately aligned to meet team objectives
Support Senior Strategist in planning, conducting and synthesizing research
Technology and business model evaluation for automotive applicability
Train and develop the less experienced data scientists and analysts on the team
Train internal staff on use and maintenance of resources
Train, tune, and cross-validate a range of machine learning algorithms
Transform data into insights to identify & quantify business opportunities
Understanding of how a business and strategy works
Use and/or create software tools to gain insights into underlying data
Use SAS to build, implement, and regularly monitor the effectiveness of predictive models;
Uses predictive modeling, statistics, Machine Learning, Data Mining, and other data analysis techniques to collect, explore, and extract insights from structure and unstructured data
Work cross-functionally to establish reporting, instrumentation, and metric standards
Work with a broad spectrum of decision makers to determine the goals and expected results to their business questions
Work with a team of researchers on both theoretical and practical projects that will use his/her scientific, mathematical and computational skills
Work with building energy scientists to analyze and extend the capabilities of company's physics-based energy simulation model
Work with complex data from various sources
Work with cross functional teams to deliver yearly financial goals by implementing, managing, and communicating monetization programs
Work with our Analysts and Marketing teams to understand client goals, and work with Engineering team to turn research into products
Work with team members to collect data for ad-hoc and statistical analysis
Work with team members to develop the appropriate analytical methods to apply to outcomes research and discovery
Write research papers for internal audiences
Data scientists do everything and must do it well!
As you can see, a data science job can cover things from "Authoring white papers and technical blog posts" to doing "Real-time online media optimization". Just in 30 job postings (3 pages of indeed results), we were able to see 136 different responsibilities listed. Some are very similar and others are very very different. This is one of the fortunate or unfortunate things about the data science field at the moment, that it is so big right now that what matters and what you’d actually differs drastically from job to job.
So, when looking for a data science job, realize that some data scientists mostly build data plumbing, some data scientists mostly build data cleaning services, some data scientists do academic-style research, and some data scientists do a mix of all of the above to varying degrees. Which means that it’s well worth your time taking a few hours/days to go through the websites listed above to read through all of the job descriptions and responsibilities to see what sticks out and what you’d like to do. After all, the companies above are all hiring and are actively looking for people to take on those roles right now.
Get to it and good luck!
Receive the Data Science Weekly Newsletter every Thursday
Easy to unsubscribe at any time. Your e-mail address is safe.