Blog

We believe there is something unique at every business that will ignite the fuse of innovation.

An android programmer is an android programmer. The role is the skill set. But what skill sets are useful for a Data Scientist or a Data Analyst? Does this make staffing and executing projects in the analytics space more difficult? Is data analysis a science, or is the science in exploring data to prepare it for further analysis?

Unfortunately, there is no industry standard usage of the terms “Data Analyst” and “Data Scientist” that clearly distinguishes between the two roles. However, the devil is in the details; these roles tend to be complementary to one another but often span a wide variety of different skill sets and functional roles. For instance, within the conceptual data mining lifecycle illustrated in Figure 1 below, a “Data Analyst” focuses on the movement and interpretation of data, typically with a focus on the past and present. Alternatively, a “Data Scientist” may be primarily responsible for summarizing data in such a way as to provide forecasting, or an insight into future based on the patterns identified from past and current data.

Figure 1: Roles within the Data Lifecycle

Summarizing the cross-industry process for data mining process model (CRISP-DM) helps to further distinguish between a data analyst and data scientist role.

Figure 2: CRISP-DM Process Model (2003) [1]

The CRISP-DM process model, as depicted above, is composed of the following often iterative steps [2]:

  1. Business understanding – Determine Business Objectives, Assess Situation, Determine Data Mining Goals, Produce Project Plan
  2. Data understanding – Collect Initial Data, Describe Data, Explore Data, Verify Data Quality
  3. Data preparation – Select Data, Clean Data, Construct Data, Integrate Data
  4. Modeling – Select modeling technique, Generate Test Design, Build Model, Assess Model
  5. Evaluation  - Evaluate Results, Review Process, Determine Next Steps
  6. Deployment – Plan Deployment, Plan Monitoring and Maintenance, Produce Final Report, Review Project

While both Data Analysts and Data Scientists may participate in many of the same steps of the CRISP-DM process model, there are a few steps of the process that are specific to one role or the other. For instance, a Data Scientist is often heavily involved in the cleaning and manipulation of data to support their modeling needs as well as the building and evaluating of model designs which are intended to help guide changes in business decisions. On the other hand, a Data Analyst may spend their time exploring data to support troubleshooting efforts or to generate ideas for useful reports to pitch to the customer. In general, while Data Analysts tend to be more business focused, Data Scientists are often more mathematically focused. A sample of the difference in perspective across the two roles is shown below.

Figure 3: Data Analyst vs. Data Scientist Perspectives

Data Analysts and Data Scientists are further differentiated by the type of roles they perform.  Data Analysts typically perform data migration and visualization roles that focus on describing the past; while Data Scientists typically perform roles manipulating data and creating models to improve the future. Figure 4 contains a listing of common roles that Data Analysts and Data Scientists typically perform.

Figure 4: Data Anaylst vs. Data Scientist Roles

Let's take a look a simple business case that further differentiates Data Analysts from Data Scientists.

Situation:

A large provider of streaming entertainment and data services wants to improve call center performance and extract tactical business value from call center data

Project Domain:

Logged performance data from the firm’s proprietary hardware platform and call center data tied to specific customers and device IDs

In the above case, the Data Analyst and Scientist would both use data but, in different ways.  The Data Analyst would be concerned with reporting metrics, such as average call time; while the Data Scientist would be concerned with using the historical data to predict the future, such as predicting future months call volumes.  Both roles are equally important to the operation of the call center and help find solutions for the center to run smoothly.  The figure below details some solutions each role creates.

}

Figure 3: Data Analyst vs. Data Scientist Perspectives

Data Analysts and Data Scientists are further differentiated by the type of roles they perform.  Data Analysts typically perform data migration and visualization roles that focus on describing the past; while Data Scientists typically perform roles manipulating data and creating models to improve the future. Figure 4 contains a listing of common roles that Data Analysts and Data Scientists typically perform.

Figure 4: Data Analyst vs. Data Scientist Roles

Let's take a look a simple business case that further differentiates Data Analysts from Data Scientists.

Situation:

A large provider of streaming entertainment and data services wants to improve call center performance and extract tactical business value from call center data

Project Domain:

Logged performance data from the firm’s proprietary hardware platform and call center data tied to specific customers and device IDs

In the above case, the Data Analyst and Scientist would both use data but, in different ways.  The Data Analyst would be concerned with reporting metrics, such as average call time; while the Data Scientist would be concerned with using the historical data to predict the future, such as predicting future months call volumes.  Both roles are equally important to the operation of the call center and help find solutions for the center to run smoothly.  The figure below details some solutions each role creates.

Figure 5: Solutions by Role - Data Analyst vs. Data Scientist

When looking to utilize a Data Analyst and Data Scientist for a project in the analytics space, it is important to fully understand the goals of the project and the needs of the customer. Does the data need to be organized and analyzed to identify patterns of the past? Perhaps a Data Analyst is your best bet. Or does the business need the value of using patterns of the past to make their business decisions more robust? If so, a Data Scientist is the way to go. Maybe the goal of the project is generate visualization of data, in which case either role would suffice.

All in all, the goal of the project typically defines the scope of the role of a Data Analyst and Scientist.

While the two roles can be interchangeable, the devil is in the details, and it is important to understand the knowledge base of both to ensure project success and proper staffing.

Sources:

[1] Chapman, Pete, et al. "CRISP-DM 1.0 Step-by-step data mining guide." (2000).

[2] http://www.sv-europe.com/crisp-dm-methodology/