Fixing Data Quality First? You’re Doing It Wrong
In today’s hyper-competitive business environment, the companies that get data to their teams faster and solve specific problems will outperform those who are preoccupied with data perfection.
Over the past two decades, we’ve seen a broad shift away from traditional waterfall development methodologies in favor of more agile and iterative approaches to developing and using software. Everywhere you look nowadays, agile development principles are helping companies build products and processes better and faster.
So why is it that so many companies still struggle to get business intelligence initiatives off the ground? Because although they may be adopting an agile approach to report development, their overall data strategy is stuck in the past.
In my experience, the breakdown starts when organizations try to modernize their analytics programs one layer at a time. The obvious way to do this is to start with data quality. After all, if your data isn’t in good shape, why do anything else? Trying to ensure all data (and the systems around it) are perfect before opening it up to users is a classic waterfall approach—and it’s a big mistake.
Drinking from the waterfall means you’ll drown
When you prioritize data quality over access, several problems emerge. In the near term, it leaves data teams overwhelmed by ad hoc requests for business intelligence. Business users aren’t just going to wait around for some grand system to appear—they need answers now. This prevents the team from focusing on the highest-value data analytics projects and makes it difficult to see the forest for the trees.
In the longer term, shadow IT starts to become an issue as business users look to solve data problems on their own. It’s an inevitable consequence of the waterfall approach to development, especially in data projects. Simply put, it’s impossible to meet the needs of users if you aren’t taking in their feedback throughout the development process—and when it comes to business intelligence, the needs of users are always changing. By the time you build the perfect system with the perfect data, there’s a good chance it will already be out of date.
Let’s be clear: there’s no question that data quality is important. It’s a necessary foundation for a modern business intelligence program. But it’s a mistake to think you need to cleanse all of your data and build the perfect system before you can put it to good use.
The new data pipeline is built around the user
Think about data initiatives just like you would building and delivering software. In other words, take a more agile and iterative approach that puts users first. Focus on getting people access to the data they need, when they need it, cleansing the data and building the larger system as you go. That’s the key to faster and more reliable success in data and business intelligence.
Previously, when users were trying to answer a specific question, they had to rely on a business analyst to build a clean data model, answer the question, and generate a report. But with the newest wave of self service and discovery-based business intelligence tools, data teams don’t build reports or dashboards to answer the questions for people anymore. Instead, they provide access to the data in a way that lets the users find answers on their own.
In this new world, it no longer makes sense to wait until all of your data is ready before giving people access to it directly. You can prepare the data they need and hand it off. Meanwhile, as you scrub new data sets, you can organically backfill the data warehouse with clean data over time. Since you are planning to cleanse it all anyway, you may as well start with the data users are already asking for.
Here’s an example: A businessperson comes to the data team and says, “I need to see my revenue for the last six months for these six products.” In the traditional (i.e., waterfall) approach, the team goes and finds the all the relevant data, cleans it up (because some will be duplicated, incorrect, or just doesn’t belong), and generates a report. The businessperson says, “This is great. It took three days, but it’s what I needed.” A month later, he or she comes back asking for the same report—except this time for a different region, looking at different product lines, and over a different period of time. Once again, the data team goes out and sources the data, cleans it up, brings it into a report, and delivers it three days later. On and on it goes—the same inefficient process every month—until the data warehouse is built and ready to deliver.
Now consider an iterative approach that puts the business user first: The request comes in, the data team sources and cleans the data, places it into a modern self-service data platform three days later, and says, “Here you go. The data you need is available in this platform. You can query it yourself and generate all the reports you want.” When the user comes back next month asking for the same report with new demographic data, the team sources and cleans the new data and adds it to the platform. Data from the previous month’s request is already available, so fulfilling the latest ask only takes one day. All the while, your data team is updating the data warehouse with clean, high-priority data as it goes.
In today’s hypercompetitive business environment, decisions are increasingly being made by frontline workers in a bid to keep up with the pace of business. The companies that get data to their teams faster and solve specific problems will outperform—and eventually annihilate—those who are preoccupied with data perfection.