Reminder what the Ask stage means as defined in the Data Analysis Roadmap page:
Ask
Guiding questions
- What topic are you exploring?
- What is the problem you are trying to solve?
- What metrics will you use to measure your data to achieve your objective? Who are the stakeholders?
- Who is your audience for this analysis and how does this affect your analysis process and presentation?
- How will this data help your stakeholders make decisions?
Key tasks
It’s important to understand the problem and any questions about your case study early on so that you’re focused on your stakeholders’ needs.
- Choose a case study
- Identify the problem
- Determine key stakeholders
- Explore the data and establish metrics
Define the problem
So let’s dive deeper into the Ask stage. It’s impossible to solve a problem if you don’t know what it is. Here are some things to consider:
- Define the problem you’re trying to solve
- Make sure you fully understand the stakeholder’s expectations
- Focus on the actual problem and avoid any distractions
- Collaborate with stakeholders and keep an open line of communication
- Take a step back and see the whole situation in context
Questions to ask yourself in this step:
- What are my stakeholders saying their problems are?
- Now that I’ve identified the issues, how can I help the stakeholders resolve their questions?
First up, the analysts needed to define what the project would look like and what would qualify as a successful result. So, to determine these things, they asked effective questions and collaborated with leaders and managers who were interested in the outcome of their people analysis. These were the kinds of questions they asked:
- What do you think new employees need to learn to be successful in their first year on the job?
- Have you gathered data from new employees before? If so, may we have access to the historical data?
- Do you believe managers with higher retention rates offer new employees something extra or unique?
- What do you suspect is a leading cause of dissatisfaction among new employees?
- By what percentage would you like employee retention to increase in the next fiscal year?
At the start of any successful data analysis, the data analyst:
- Takes the time to fully understand stakeholder expectations
- Defines the problem to be solved
- Decides which questions to answer in order to solve the problem
Qualifying stakeholder expectations means determining who the stakeholders are, what they want, when they want it, why they want it, and how best to communicate with them. Defining the problem means looking at the current state and identifying the ways in which it’s different from the ideal state. With expectations qualified and the problem defined, you can derive questions that will help achieve these goals.
Since DAs always seem to be solving problems, what kind of problems do they solve?
Problem types
Data analytics is so much more than just plugging information into a platform to find insights. It is about solving problems. To get to the root of these problems and find practical solutions, there are lots of opportunities for creative thinking. No matter the problem, the first and most important step is understanding it. From there, it is good to take a problem-solver approach to your analysis to help you decide what information needs to be included, how you can transform the data, and how the data will be used.
Data analysts typically work with six problem types
Making predictions
This problem type involves using data to make an informed decision about how things may be in the future.
A company that wants to know the best advertising method to bring in new customers is an example of a problem requiring analysts to make predictions. Analysts with data on location, type of media, and number of new customers acquired as a result of past ads can’t guarantee future results, but they can help predict the best placement of advertising to reach the target audience.
Categorizing things
This means assigning information to different groups or clusters based on common features
An example of a problem requiring analysts to categorize things is a company’s goal to improve customer satisfaction. Analysts might classify customer service calls based on certain keywords or scores. This could help identify top-performing customer service representatives or help correlate certain actions taken with higher customer satisfaction scores.
Spotting the unusual
Data analysts identify data that is different from the norm
A company that sells smart watches that help people monitor their health would be interested in designing their software to spot something unusual. Analysts who have analyzed aggregated health data can help product developers determine the right algorithms to spot and set off alarms when certain data doesn’t trend normally.
Identifying themes
Identifying themes takes categorization as a step further by grouping information into broader concepts. Recognizing broader concepts and trends from categorized data.
User experience (UX) designers might rely on analysts to analyze user interaction data. Similar to problems that require analysts to categorize things, usability improvement projects might require analysts to identify themes to help prioritize the right product features for improvement. Themes are most often used to help researchers explore certain aspects of data. In a user study, user beliefs, practices, and needs are examples of themes.
By now you might be wondering if there is a difference between categorizing things and identifying themes. The best way to think about it is: categorizing things involves assigning items to categories; identifying themes takes those categories a step further by grouping them into broader themes.
Discovering connections
Enables data analysts to find similar challenges faced by different entities, and then combine data and insights to address them. Identifying similar challenges across different entities—and using data and insights to find common solutions.
A third-party logistics company working with another company to get shipments delivered to customers on time is a problem requiring analysts to discover connections. By analyzing the wait times at shipping hubs, analysts can determine the appropriate schedule changes to increase the number of on-time deliveries.
Finding patterns
Data analysts use data to find patterns by using historical data to understand what happened in the past and is therefore likely to happen again.
Minimizing downtime caused by machine failure is an example of a problem requiring analysts to find patterns in data. For example, by analyzing maintenance data, they might discover that most failures happen if regular maintenance is delayed by more than a 15-day window.
Smart questions
Companies in many industries today are dealing with rapid change and rising uncertainty. Even well-established businesses are under pressure to keep up with what’s new and figure out what’s next. To do that, they need to ask questions. Asking the right questions can help spark the innovative ideas that so many businesses are hungry for these days.
The same goes for data analytics. No matter how much information you have or how advanced your tools are, your data won’t tell you much if you don’t start with the right questions. Think of it like a detective with tons of evidence who doesn’t ask a key suspect about it. Coming up, you will learn more about how to ask highly effective questions, along with certain practices you want to avoid.
Highly effective questions are SMART questions:
Specific: Is the question specific? Does it address the problem? Does it have context? Will it uncover a lot of the information you need? |
Measurable: Will the question give you answers that you can measure? | Action-oriented: Will the answers provide information that helps you devise some type of plan? | Relevant: Is the question about the particular problem you are trying to solve? | Time-bound: Are the answers relevant to the specific time being studied? |
Examples of SMART questions
Here’s an example that breaks down the thought process of turning a problem question into one or more SMART questions using the SMART method: What features do people look for when buying a new car?
Specific
Does the question focus on a particular car feature? Questions are simple, significant, and focused on a single topic or a few closely related ideas.
Measurable
Does the question include a feature rating system? Questions can be quantified and assessed.
Action-oriented
Does the question influence change, creation of different or new feature packages? Questions encourage change.
Relevant
Does the question identify which features make or break a potential car purchase? Questions matter, are important, and have significance to the problem you’re trying to solve.
Time-bound
Does the question validate data on the most popular features from the last three years? Questions specify the time to be studied.
Questions should be open-ended. This is the best way to get responses that will help you accurately qualify or disqualify potential solutions to your specific problem. So, based on the thought process, possible SMART questions might be:
On a scale of 1-10 (with 10 being the most important) how important is your car having four-wheel drive? Explain.
What are the top five features you would like to see in a car package?
What features, if included with four-wheel drive, would make you more inclined to buy the car?
How does a car having four-wheel drive contribute to its value, in your opinion?
Things to avoid
Leading questions
Questions that only have a particular response
- Example: This product is too expensive, isn’t it?
This is a leading question because it suggests an answer as part of the question. A better question might be, “What is your opinion of this product?” There are tons of answers to that question, and they could include information about usability, features, accessories, color, reliability, and popularity, on top of price. Now, if your problem is actually focused on pricing, you could ask a question like “What price (or price range) would make you consider purchasing this product?” This question would provide a lot of different measurable responses.
Closed-ended questions
Questions that ask for a one-word or brief response only
- Example: Were you satisfied with the customer trial?
This is a closed-ended question because it doesn’t encourage people to expand on their answer. It is really easy for them to give one-word responses that aren’t very informative. A better question might be, “What did you learn about customer experience from the trial.” This encourages people to provide more detail besides “It went well.”
Vague questions
questions that aren’t specific or don’t provide context
- Example: Does the tool work for you?
This question is too vague because there is no context. Is it about comparing the new tool to the one it replaces? You just don’t know. A better inquiry might be, “When it comes to data entry, is the new tool faster, slower, or about the same as the old tool? If faster, how much time is saved? If slower, how much time is lost?” These questions give context (data entry) and help frame responses that are measurable (time).
Decisions
A data analyst’s job is to provide the data necessary to inform key decisions. They also need to frame their analysis in a way that helps business leaders make the best possible decisions.
We’re going to explore the role of data in decision-making and the reasons why data analytics professionals are so important to this process. You’ll compare data-driven and data-inspired decisions to understand the difference between them.
Both data-driven and data-inspired approaches are rooted in the idea that data is inherently valuable for making a decision. Well-curated data can provide information to decision-makers that improves the quality of their decisions. Remember: Data does not make decisions, but it does improve them.
Data-driven
Data-driven decision-making means using facts to guide business strategy. The phrase “data-driven decisions” means exactly that: Data is used to arrive at a decision.
This approach is limited by the quantity and quality of readily-available data. If the quality and quantity of the data is sufficient, this approach can far improve decision-making. But if the data is insufficient or biased, this can create problems for decision-makers. Potential dangers of relying entirely on data-driven decision-making can include overreliance on historical data, a tendency to ignore qualitative insights, and potential biases in data collection and analysis
A/B testing
A/B testing is a simple example of collecting data for data-driven decision-making. For example, a website that sells widgets has an idea for a new website layout they think will result in more people buying widgets. For two weeks, half of their website visitors are directed to the old site; the other half are directed to the new site. After those two weeks, the analyst gathers the data about their website visitors and the number of widgets sold for analysis. This helps the analyst understand which website layout resulted in more widget sales. If the new website performed better in producing widget sales, then the company can confidently make the decision to use the new layout!
Data-inspired
Data-inspired decisions include the same considerations as data-driven decisions while adding another layer of complexity. They create space for people using data to consider a broader range of ideas: drawing on comparisons to related concepts, giving weight to feelings and experiences, and considering other qualities that may be more difficult to measure. Data-inspired decision-making can avoid some of the pitfalls that data-driven decisions might be prone to.
Example
A customer support center gathers customer satisfaction data (often known as a “CSAT” score). They use a simple 1–10 score along with a qualitative description in which the customer describes their experience. The customer support center manager wants to improve customer experience, so they set a goal to improve the CSAT score. They start by analyzing the CSAT scores and reading each of the descriptions from the customers. Additionally, they interview the people working in the customer support center. From there, the manager formulates a strategy and decides what needs to improve the most in order to raise customer satisfaction scores. While the manager certainly relies on the CSAT data in the decision-making process, input of support center representatives and other qualitative information informs the approach as well.
Data types
There are two types of data: qualitative and quantitative. Qualitative data can help analysts better understand their quantitative data by providing a reason or more thorough explanation. In other words, quantitative data generally gives you the what, and qualitative data generally gives you the why. By using both quantitative and qualitative data, you can learn when people like to go to the movies and why they chose the theater. Maybe they really like the reclining chairs, so your manager can purchase more recliners. Maybe the theater is the only one that serves root beer. Maybe a later show time gives them more time to drive to the theater from where popular restaurants are located. Maybe they go to matinees because they have kids and want to save money. You wouldn’t have discovered this information by analyzing only the quantitative data for attendance, profit, and showtimes.
Quantitative
Quantitative data is all about the specific and objective measures of numerical facts. This can often be the what, how many, and how often about a problem. In other words, things you can measure
Now, take a closer look at the data types and data collection tools. In this scenario, you are a data analyst for a chain of movie theaters. Your manager wants you to track trends in:
Movie attendance over time
Profitability of the concession stand
Evening audience preferences
Assume quantitative data already exists to monitor all three trends.
Movie attendance over time
Starting with the historical data the theater has through its loyalty and rewards program, your first step is to investigate what insights you can gain from that data. You look at attendance over the last 3 months. But, because the last 3 months didn’t include a major holiday, you decide it is better to look at a full year’s worth of data. As you suspected, the quantitative data confirmed that average attendance was 550 per month but then rose to an average of 1,600 per month for the months with holidays.
The historical data serves your needs for the project, but you also decide that you will resume the analysis again in a few months after the theater increases ticket prices for evening showtimes.
Profitability of the concession stand
Profit is calculated by subtracting cost from sales revenue. The historical data shows that while the concession stand was profitable, profit margins were razor thin at less than 5%. You saw that average purchases totaled $20 or less. You decide that you will keep monitoring this on an ongoing basis.
Based on your understanding of data collection tools, you will suggest an online survey of customers so they can comment on the food at the concession stand. This will enable you to gather even more quantitative data to revamp the menu and potentially increase profits.
Evening audience preferences
Your analysis of the historical data shows that the 7:30 PM showtime was the most popular and had the greatest attendance, followed by the 7:15 PM and 9:00 PM showtimes. You may suggest replacing the current 8:00 PM showtime that has lower attendance with an 8:30 PM showtime. But you need more data to back up your hunch that people would be more likely to attend the later show.
Evening movie-goers are the largest source of revenue for the theater. Therefore, you also decide to include a question in your online survey to gain more insight.
Qualitative
Qualitative data is great for helping us answer why questions. For example, why people might like a certain celebrity or snack food more than others.
Qualitative data for all three trends plus ticket pricing
Since you know that the theater is planning to raise ticket prices for evening showtimes in a few months, you will also include a question in the survey to get an idea of customers’ price sensitivity.
Your final online survey might include these questions for qualitative data:
What went into your decision to see a movie in our theater today? (movie attendance)
What do you think about the quality and value of your purchases at the concession stand? (concession stand profitability)
Which showtime do you prefer, 8:00 PM or 8:30 PM, and why do you prefer that time? (evening movie-goer preferences)
Under what circumstances would you choose a matinee over a nighttime showing? (ticket price increase)
Now that you’ve asked the right questions, you need to document your steps.
SOW
A scope of work or SOW is an agreed-upon outline of the work you’re going to perform on a project. For many businesses, this includes things like work details, schedules, and reports that the client can expect.
Scope Of Work: Creating an SOW helps to make sure that everyone involved, from analysts and engineers to managers and stakeholders, shares the understanding of what those business goals are, and the plan for accomplishing them.As you ask more and more questions to clarify requirements, goals, data sources, stakeholders, and any other relevant info, an SOW helps you formalize it all by recording all the answers and details.
Preparing to write an SOW is about asking questions to learn the necessary information about the project, but it’s also about clarifying and defining what you’re being asked to accomplish, and what the limits or boundaries of the “ask” are.
There’s no standard format for an SOW. They may differ significantly from one organization to another, or from project to project. However, they all have a few foundational pieces of content in common.
Deliverables
Deliverables: What work is being done, and what things are being created as a result of this project? When the project is complete, what are you expected to deliver to the stakeholders? Be specific here. Will you collect data for this project? How much, or for how long?
Avoid vague statements. For example, “fixing traffic problems” doesn’t specify the scope. This could mean anything from filling in a few potholes to building a new overpass. Be specific! Use numbers and aim for hard, measurable goals and objectives. For example: “Identify top 10 issues with traffic patterns within the city limits, and identify the top 3 solutions that are most cost-effective for reducing traffic congestion.”
Milestones
Milestones: This is closely related to your timeline. What are the major milestones for progress in your project? How do you know when a given part of the project is considered complete?
Milestones can be identified by you, by stakeholders, or by other team members such as the Project Manager. Smaller examples might include incremental steps in a larger project like “Collect and process 50% of required data (100 survey responses)”, but may also be larger examples like ”complete initial data analysis report” or “deliver completed dashboard visualizations and analysis reports to stakeholders”.
Timeline
Timeline: Your timeline will be closely tied to the milestones you create for your project. The timeline is a way of mapping expectations for how long each step of the process should take. The timeline should be specific enough to help all involved decide if a project is on schedule. When will the deliverables be completed? How long do you expect the project will take to complete? If all goes as planned, how long do you expect each component of the project will take? When can we expect to reach each milestone?
Reports
Reports: Good SOWs also set boundaries for how and when you’ll give status updates to stakeholders. How will you communicate progress with stakeholders and sponsors, and how often? Will progress be reported weekly? Monthly? When milestones are completed? What information will status reports contain?
Out of scope
SOWs should also contain information specific to what is and isn’t considered part of the project. The scope of your project is everything that you are expected to complete or accomplish, defined to a level of detail that doesn’t leave any ambiguity or confusion about whether a given task or item is part of the project or not.
Structured Thinking
Structured thinking is the process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying the options. It’s having a clear list of what you are expected to deliver, a timeline for major tasks and activities, and checkpoints so the team knows you’re making progress. The starting place for structured thinking is the problem domain, which you might have remembered from earlier. Once you know the specific area of analysis, you can set your base and lay out all your requirements and hypotheses before you start investigating. With a solid base in place, you’ll be ready to deal with any obstacles that come up.
Context
Context can turn raw data into meaningful information. It is very important for data analysts to contextualize their data. This means giving the data perspective by defining it. To do this, you need to identify:
Who
The person or organization that created, collected, and/or funded the data collection
What
The things in the world that data could have an impact on
Where
The origin of the data
When
The time when the data was created or collected
Why
The motivation behind the creation or collection
How
The method used to create or collect it
Understanding and including the context is important during each step of your analysis process, so it is a good idea to get comfortable with it
Communication
Being able to communicate in multiple formats is a key skill for data analysts. Listening, speaking, presenting, and writing skills will help you succeed in your projects and in your career.
Project example
As a data analyst, you’ll get plenty of requests and questions through email. Let’s walk through an example of how you might approach answering one of these emails. Assume you’re a data analyst working at a company that develops mobile apps. Let’s start by reviewing answers to the four audience questions we just covered:
Who is your audience
Kiri, Product Development Project Manager
What do they already know
Kiri received updates about our project from its planning stages, including the most recent project report, sent two weeks ago.
What do they need to know
Kiri needs an update on the analysis project’s progress and needs to know that the executive team approved changes to the data and timeline. You know that adding a new variable to the analysis will impact the current project timeline. Kiri will need to change the project’s milestones and completion date.
How can you give them what they need to know
You can start by sending an email update to Kiri with the latest timeline for the project, but a meeting might be necessary if she wants to talk through her concerns about missing a deadline.
Data limitations
Data has its limitations. Has someone’s personal opinion found its way into the numbers? Is your data telling the whole story? Part of being a great data analyst is knowing the limits of data and planning for them. This reading explores how you can do that.
Incomplete
If you have incomplete or nonexistent data, you might realize during an analysis that you don’t have enough data to reach a conclusion. Or, you might even be solving a different problem altogether! For example, suppose you are looking for employees who earned a particular certificate but discover that certification records go back only two years at your company. You can still use the data, but you will need to make the limits of your analysis clear. You might be able to find an alternate source of the data by contacting the company that led the training. But to be safe, you should be up front about the incomplete dataset until that data becomes available.
Misaligned
If you’re collecting data from other teams and using existing spreadsheets, it is good to keep in mind that people use different business rules. So one team might define and measure things in a completely different way than another. For example, if a metric is the total number of trainees in a certificate program, you could have one team that counts every person who registered for the training, and another team that counts only the people who completed the program. In cases like these, establishing how to measure things early on standardizes the data across the board for greater reliability and accuracy. This will make sure comparisons between teams are meaningful and insightful.
Dirty
Dirty data refers to data that contains errors. Dirty data can lead to productivity loss, unnecessary spending, and unwise decision-making. A good data cleaning effort can help you avoid this. As a quick reminder, data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When you find and fix the errors - while tracking the changes you made - you can avoid a data disaster. You will learn how to clean data later in the training.
Tell a Clear Story
Avinash Kaushik, a Digital Marketing Evangelist for Google, has lots of great tips for data analysts in his blog: Occam’s Razor. Below are some of the best practices he recommends for good data storytelling:
Compare the same types of data:
Data can get mixed up when you chart it for visualization. Be sure to compare the same types of data and double check that any segments in your chart definitely display different metrics. Visualize with care: A 0.01% drop in a score can look huge if you zoom in close enough. To make sure your audience sees the full story clearly, it is a good idea to set your Y-axis to 0. Leave out needless graphs: If a table can show your story at a glance, stick with the table instead of a pie chart or a graph. Your busy audience will appreciate the clarity. Test for statistical significance: Sometimes two datasets will look different, but you will need a way to test whether the difference is real and important. So remember to run statistical tests to see how much confidence you can place in that difference. Pay attention to sample size: Gather lots of data. If a sample size is small, a few unusual responses can skew the results. If you find that you have too little data, be careful about using it to form judgments. Look for opportunities to collect more data, then chart those trends over longer periods.
Be the Judge
In any organization, a big part of a data analyst’s role is making sound judgments. When you know the limitations of your data, you can make judgment calls that help people make better decisions supported by the data. Data is an extremely powerful tool for decision-making, but if it is incomplete, misaligned, or hasn’t been cleaned, then it can be misleading. Take the necessary steps to make sure that your data is complete and consistent. Clean the data before you begin your analysis to save yourself and possibly others a great amount of time and effort.
Meetings
Great things can happen when participants anticipate a well-executed meeting. Attendees show up on time. They aren’t distracted by their laptops and phones. They feel like their time will be well spent. It all comes down to good planning and communication of expectations. The following are our best practical tips for leading meetings.
Before the meeting
If you are organizing the meeting, you will probably talk about the data. Before the meeting:
Identify your objective. Establish the purpose, goals, and desired outcomes of the meeting, including any questions or requests that need to be addressed. Acknowledge participants and keep them involved with different points of view and experiences with the data, the project, or the business. Organize the data to be presented. You might need to turn raw data into accessible formats or create data visualizations. Prepare and distribute an agenda. We will go over this next.
Craft a compelling agenda
A solid meeting agenda sets your meeting up for success. Here are the basic parts your agenda should include:
Meeting start and end time Meeting location (including information to participate remotely, if that option is available) Objectives Background material or data the participants should review beforehand
Here’s an example of an agenda for an analysis project that is just getting started:
During the meeting
As the leader of the meeting, it’s your job to guide the data discussion. With everyone well informed of the meeting plan and goals, you can follow these steps to avoid any distractions:
Make introductions (if necessary) and review key messages
Present the data Discuss observations, interpretations, and implications of the data Take notes during the meeting
Determine and summarize next steps for the group
After the meeting
To keep the project and everyone aligned, prepare and distribute a brief recap of the meeting with next steps that were agreed upon in the meeting.
You can even take it a step further by asking for feedback from the team.
Distribute any notes or data.
Confirm next steps and timeline for additional actions.
Ask for feedback (this is an effective way to figure out if you missed anything in your recap)
A final word about meetings:
Even with the most careful planning and detailed agendas, meetings can sometimes go off track.
An emergency situation might steal people’s attention.
A recent decision might unexpectedly change requirements that were previously discussed and agreed on.
Action items might not apply to the current situation.
If this happens, you might be forced to shorten or cancel your meeting. That’s all right; just be sure to discuss anything that impacts your project with your manager or stakeholders and reschedule your meeting after you have more information.