Whether you’re working on a portfolio project, or a full-fledged data professional (or affectionalty, a data nerd), you’ll likely take a structured approach to your Data analytics project by utilizing a similar framework. It comes in a variety of iterations, but unequivocally a consistent method to tackle your overarching goal is invaluable.
Why use the Data Analysis Lifecycle?
Why is this important? Why not just jump straight into Rstudio or Pandas and get down to business, that is the fun part after all isn't it? The reality is, without a structured approach to your project, your analysis isn't going to predict 1+1, never mind developing a useful model. Collecting all the necessary context to define your problem, understanding your business case, coordinating your data, and clarifying your limitations are all pivotal to coming through with solid work. After all, your goal is to turn data into real-world action, if what you put in is rubbish, your output is going to be even worse.
This system provides the methods and tools to map out your steps, ensuring you extract the most out of your data and create representative, applicable results.
Phase 1: Ask / Discovery
If you don't understand your problem, how are you going to solve it?
Your first step is always to ask questions. Clarify what your stakeholders need from you, and what you need from them. What is the scope of the work, and the business case you’re trying to solve? What would even qualify as a success? Clearly defining your problem, its requirements, and its solution is the only way to ensure your work aligns with your desired outcome.
There are a few tools to use during this phase: 5 Why’s to determine root cause, SMART Questions to eliminate ambiguity, or a Gap analysis to understand what you’re lacking and potential contributing factors. At the end of the day, good questions will immediately provide direction. It will show both where you should go for your data, and what you need to do with it.
Phase 2: Prepare
Once you have a clear direction, you need to dig into your functional requirements. What data do you need, and how are you going to collect it? Are you scraping Twitter for sentiment analysis? Or do you have a killer primary source database? You might want to think about quality concerns, or possible biases in your set. Your Ask phase should have helped develop a solid understanding of your potential limitations, now you can be strategic about filling them.
Depending on the project, you’ll be addressing your data governance requirements and the data’s own lifecycle throughout the project. The second phase is all about data generation, collection, storage, and management. You’re setting the structure of your data foundation, and you can start building out from here.
Phase 3: Process
So, you have your data, you’ve identified the KPIs you want to track, and have determined your variables. It's time to put on your gloves and get hands-on with your dataset. To drive change you need accurate results, and to get there you need clean data. Dirty data doesn’t just lead to inaccurate analytics. At best, you’ll deliver an embarrassing presentation when your data gets torn apart by the scrutiny of subject matter experts. At worst, a poor business decision is made on faulty data, wasting time and money. Here are a few common cleaning functions to prevent these uncomfortable possibilities;
- Remove duplicate data and handle outliers.
- correct data type inconsistencies and clean up erratic formats.
- Manage missing data - Can you safely fill missing values based on other observations, or do you drop those rows? How might you adjust your system to navigate the nulls?
- Hunt down root causes of your data inconsistencies.
- Data Validation - does the data you have actually make sense? Is it following the rules you know the subject matter must? How does the data in its context compare to your working hypothesis? Confirm it makes sense based on your business logic.
Once you think your data is accurate and reliable you might also begin data transformation to convert data from different formats or sources into a standardized structure, facilitating phase 4.
Phase 4: Analysis / Modeling
The main event! Use the tools and methods at your disposal to identify patterns and come to insights about your dataset. You might use anything from basic calculations and plotting to statistical models like linear regressions or decision trees. Your goal here is to answer; what story is your data telling you and how will these connections help solve your problem? This is an exploratory process, test your data, and hunt for answers to your business or project objective.
You will notice patterns in the problems you’re solving, and this will provide direction as you refine your analytical approach. Maybe you’re analyzing usage trend data to improve your customer touch-points, or identifying unseen connections in within your logistical challenges. You will develop an intuition for repeated themes across your business business cases, and can use that experience to develop and acuity for coming to solid conclusions.
Phase 5: Share / Communicate results
You have your deliverables in hand, you’ve extracted some game changing observations and now its time to share your findings with your stakeholders. Likely you will be developing visualizations or a dashboard to clearly communicate insights to those hoping to act on them. Tableau, PowerBi or even just Excel, whatever tool you use remember, attention spans are short. You need to provide a shot glass of direct info that wont require a seminar to connect to the business value. Make your data tell an effective story, clearly attached to your objective, and let that story incite the action you’re hoping to create.
At the end of the day, your goal is to see your data turn into action. If you’re part of a data-specific team, this final part of the project - action - might be out of your hands, but that doesn't make it any less important. Data-driven recommendations are the backbone of innovative business decisions and new understandings of the world around you. Be curious, take a structured approach, and make sure you step back to see the context.