The Quality Problem: A Call-to-Arms
LET’S MAKE 2024 A YEAR WHERE THINGS WORK
…a call-to-arms for all Data Goblins out there to put quality first in data and analytics… not only data quality, but the end-to-end quality of our solutions and processes, from design to decommission.
We have a quality problem, and it’s getting worse. It creates higher costs, hurts our productivity, and threatens our capability to achieve success. The problem: too often, we prioritize quicker results and newer features over lasting quality and consistency in the data solutions that we deliver. Too often, we don’t collect the right requirements, we don’t test, we don’t automate, and we rely on hope and heroism to save the day. The result: we’re besieged by issues, fighting constant battles against an avoidable enemy that we create ourselves.
When we don’t prioritize quality in our data and analytics solutions, we kill user adoption and effectiveness. Yet—time and time again—we hear of quality being sacrificed in the name of faster delivery and lower short-term costs for data, models, and reports. The consequence for users is that they run into issue after issue after issue… each one a slice on the cutting edge until our solution dies a death by a thousand cuts, hemmoraging exported files along the way.
Things don’t need to be perfect to produce value. Prioritizing quality is not the same as pursuing perfection. We can still deliver value incrementally to move fast and be flexible, but each iteration should be reliable. Each iteration should work. Doing wrong things faster won’t help anyone; quality should be wound into the DNA of our processes. Quality applies to every part of the analytics pipeline, from design to decomission, and has a fit for every size, from prototypes and proof-of-concepts to enterprise-scale solutions.
Perfect is the enemy of good, and we fight for good. To fight for quality is to fight continuously. It’s a million little battles; a perpetual marathon war where eager solutions leap too early and too fast from trenches only to die too soon, face-down in the mud.
We all fight these quality battles, every day. What battles are you fighting for quality, right now?
To achieve quality, we need to form and follow good plans.
We battle with a lack of understanding of what resources are required to design, build, and maintain a data solution, and a lack of resources to realize our plans.
We battle a lack of planning at all as we react to an endless treadmill of requests and escalations.
We battle unrealistic deadlines and estimations forcing us to prioritize speed and sacrifice quality.
We battle to show an ROI when we focus on outputs instead of outcomes measured the right way, steering us to build things that don’t help us achieve our objectives and advance progress to our goals.
To create quality solutions, we must collect the right requirements to create good designs.
We battle with tight timelines that tempt us to make assumptions and check boxes, instead of sufficiently defining and understanding the problem we should solve.
We battle with waterfall projects that insist on pushing top-down, lift-and-shift solutions, instead of engaging with users and learning what they really need.
We battle a desire to design and build technically interesting things we like to make, instead of useful things that our users need.
We battle a resistance to change, both from our users and from within ourselves, instead of facing it head-on as a core challenge to adoption and effectiveness.
To deliver quality, we must handle data effectively and deliver accurate results.
We battle with data that isn’t in the right shape, format, or completeness.
We battle with missing values and mistakes in the data itself.
We battle with flawed business logic and processes propped up by a mountain of exceptions and debt.
We battle with code that we’re tempted to deploy after a few simple tests (or none at all), hoping it works.
For users to get quality results with the things we make, they must have a quality experience with our solutions.
We battle to make things correct and complete, without being over-complicated.
We battle requests to “show everything” when we know we should focus on a specific question or problem.
We battle bias and preference when we should focus on facts.
We battle to a desire to create something we find interesting, instead of something users find useful.
To continually produce quality, our solutions must continually work.
We battle to ensure that new data doesn’t disrupt an existing solution.
We battle to make changes and improvements without regressing what already works.
We battle needs and demands for new tools, technologies, and features when they’re often unreliable or full of issues.
We battle the cost of quick, manual changes and maintenance over investing in automation.
Quality is more important now than ever before, as we stand upon the precipice of a new revolution. Emergent platforms and technologies rise upon the horizon, with fervent demand from organizations and individuals to use them to get value and improve efficiency. However, these technologies—this revolution—it all takes place upon our data foundation of today. Without prioritizing the quality of our data solutions and initiatives, we’re doomed to fail. Without quality, we won’t realize the potential of these innovations; we’re trying to run a race while limping and wounded.
How can we produce better solutions while still keeping up with increasing needs and requests?
There are many equally valid ways to improve the quality of our data solutions. In this article, I want to call attention specifically to one way we can improve quality without sacrificing agility: by adopting DataOps principles. DataOps (short for “Data Operations”) is a methodology that emphasizes both quality and agility by using automated testing, version control, and a set of disciplined practices to build, release, and manage data solutions. It’s important to emphasize that the underlying principles of DataOps are most important; you don’t attain quality by simply having version control or automated tests.
While not a silver bullet to eliminate poor quality, knowing and adopting these DataOps principles can improve the solutions you make.
Traditionally, DataOps (and related approaches like DevOps) are used and known by technical teams. These technical teams focus on code-first approaches to improve and automate development and data quality. However, quality is not a technical problem limited to code and data. DataOps principles apply to the entire analytics pipeline, and not just the technical parts.
DataOps is not only for Data Engineers - these principles can yield benefits for all data professionals and practitioners. If you deliver end-to-end analytical solutions with tools like Power BI, or even if you focus on a single part of that pipeline like visualization, DataOps can benefit you. By using DataOps, teams can improve quality and delivery times, simultaneously.
In this article, I’ll discuss what quality means for a data and analytics solution, and elaborate on the cost and consequences when quality isn’t prioritized. Last, I’ll introduce DataOps, and how it improves quality for data and analytics.
Above all, I issue a call-to-arms to all other Data Goblins out there: Let’s value innovation and agility, but let’s put quality first. We simply can’t afford not to.
The purpose of this article is threefold:
1. To describe the elements of a quality data solution with the accronym REAPER.
2. To emphasize that quality is about more than just data; it's both continuous and end-to-end in analytics.
3. To introduce DataOps and describe how DataOps helps to improve quality.
This articles serves as a basic introduction to these topics. Future articles to be published both here and elsewhere will elaborate on more specific and actionable areas as they pertain to semantic models and the tools or workloads that use them.
A clarification Kobold appears. Clearing its throat, it pulls out a scroll, and reading to you in a squeaky voice:
Quality can be a sensitive subject, because it's nuanced and subjective. As such, I want to clarify:
- Good quality isn't perfect.
- Quality isn't limited to data or code, but also design, processes, communication, and more.
- Quality doesn't imply rigidity or complexity.
- Quality has a size that fits everyone; you don't have to be a big enterprise team to put quality first.
- Ideation and experimentation are compatible with quality. You can still have a quality prototype, and a quality process to experiment, particularly when you don't use them directly in production.
- If you don't use or know DataOps, it doesn't mean you aren't making quality stuff. Relax.
A rambling goblin appears. It pulls out a soapbox and stands upon it, addressing you:
The following section is subjective, and colored by my own personal experiences, opinions, and bias. Please read it with critical eyes, and feel free to disagree.
WHY IS QUALITY PARTICULARLY CHALLENGING RIGHT NOW?
On a grander scale, quality seems to be a bigger challenge now than ever before. In general, by the end of 2023, I noticed that it’s become a surprise when I use a product, tool, or process without running into issues or blockers. The fatigue of constantly running into perpetual scaffolding of a work-in-progress when trying to accomplish a task is exhausting. I’m so, so tired of things not working when they should and need to.
Even supposedly finished products seem to show cracks out-of-the-box… and this isn’t limited to software or technology, either. From administrative processes to physical products, machines… even real estate—it sure seems like there’s more cracks in our foundation than ever before.
Why does it feel this way? Why does it feel like so many things just aren’t working the way they should?
ARE WE DEVELOPING QUALITY PRODUCTS?
There are trends and circumstances that create challenges for developers to make quality products or processes. Generally, these factors emphasize releasing more new things, faster, while under-prioritizing testing. Examples may include:
-
Early access is a term used in the video game industry where publishers distribute games when they’re still undergoing development. Players can purchase and play the game early, accepting that the game isn’t complete and may contain unresolved issues. However, they can provide valuable feedback to steer further development, funded by the revenue from early access sales. This model has become a common and popular strategy that drove success of big, multi-billion dollar titles, like Minecraft, Fortnite, and Baldur’s Gate 3.
Outside of the video game industry, Beta-testing—exposing an early release to a limited group for feedback—has long been commonplace (since the 1950s by IBM). However, the explosive commercial success of the Early Access model in the video gaming industry—combined with the shift towards products-as-a-service over ownership—strongly incentivizes these products to release and maintain these preview programs.
The prevalence and popularity of this strategy has debatably increased the broad public’s tolerance and palette for unfinished products or defective features. Someone using a “preview” version can conflate this with the “latest” release. As public preview programs become increasingly normalized and used, they can give the impression of lower quality compared to other, official releases.
-
A common goal is to release more content, faster; to reduce the time between release cycles. Some creators or developers may choose to sacrifice quality in order to prioritize quantity, instead. However, in software development, this is ideally enabled by automated testing to ensure consistent quality. Unfortunately, however, it’s common knowledge that pressure to meet deadlines can cause priorities and resources to shift toward developing new things over testing them and maintaining what’s already made.
Designing good, standardized tests can be a challenging and time-consuming process. This is particularly difficult when validating a user experience and involving users, who usually have limited time and incentive to perform this testing. Furthermore, users may not have the appropriate knowledge or vocabulary to sufficiently articulate their feedback.
As pressure rises to shorten release cycles without prioritizing testing, there are more issues that occur. Ultimately, these issues erode user trust, and the quality of the product or solution.
-
A successful approach used by many teams is to have iterative releases with agile development. The idea being that each release incrementally delivers value, with regular feedback cycles improving the end result. While true in theory, success with this process requires robust and disciplined processes. Otherwise, iterative releases may lack sufficient testing, or be abused to fit a “quick and dirty” / “fix it later” approach.
Ideally, each release in an iterative development cycle has undergone its own testing and validation to ensure quality. If teams use this approach to iterate upon the quality of the solution, it produces a suboptimal result.
ARE WE USING THESE PRODUCTS TO ACHIEVE QUALITY RESULTS?
There are trends and circumstances that make it harder for users to achieve quality results. to use things to accomplish their objectives, while having a satisfying user experience. Examples may include:
-
One risk of generative AI is that people use it to generate content or code that they don’t fully understand, themselves. This can result in unintended effects, like regressions on other, dependant parts of a solution or product. If the creator or developer doesn’t understand what they’ve generated, it’s likely that this generated content ends up in the final product, where it can be discovered by a user.
Worse, generative AI can produce false or inaccurate information. This alone is a huge detriment to quality. However, what’s most concerning is that many choose to use LLMs like ChatGPT in spite of their “hallucinations”. These errors are further normalizing poor quality.
Furthermore, questionable or bad actors can exploit generative AI to mass-produce high quantities of content to drive engagement or fill release quotas. One specific example of this is using generative AI to mass-produce articles for search engine optimization (SEO).
-
As the volume of information increases, the value of information in general decreases. An information inflation results; Due to the sheer volume of information, it’s becoming harder for people to find quality and trustworthy information, online. This challenge results in many people feeling overwhelmed, particularly when confronted with conflicting facts or approaches and no clear path towards what the “best answer” is (if one even exists, at all).
What’s worse, misinformation is becoming more prevalent. This misinformation is typically created by bad actors who intend to promote a specific worldview, approach, or product. However, this challenge is eroding public trust in new information, particularly when it conflicts with existing beliefs and opinions.
Taken together, it’s becoming increasingly difficult for people to find the right information so they can learn the right things to achieve their objectives. This is making it more challenging for non-experts to produce quality results.
-
People are not only struggling with the volume of information, but also the amount of change they’re confronted with on a regular basis. It’s commonly understood that the rate of change is increasing over time. This has been painfully evident in the last 18 months with milestone advances in AI and many other areas (like gene editing).
New things are more likely to have issues, and less likely to have established patterns, experts, and guidance to help people use them. Furthermore, questionable actors with commercial interests often compete in a goldrush-like environment to produce content or arise as leading players. Taken together, these things can make it difficult for people to use processes, tools, and products to achieve quality results.
Quality is a growing challenge everywhere, including for data and analytics. Poor quality creates higher cost, inefficiency, and frustration. Furthermore, improving quality will only become more important in the future, as we adopt new platforms (like Microsoft Fabric) and technologies (like generative AI).
But what is quality in data and analytics, and how do we address it?
QUALITY IN DATA & ANALYTICS
We have an implicit understanding about when something is of high quality, or not. Generally, something is high quality when it reliably helps us achieve our objectives, and we have a pleasant experience using it. Quality is subjective, but it’s fundamentally critical to every aspect of the solutions we make.
QUALITY IN A DATA SOLUTION: THE REAPER OF ADOPTION AND EFFECTIVENESS
Achieving quality in your data and analytics solutions is essential for them to enable people to realize their goals. So what does a quality data solution look like; what are the elements of a quality solution?
A quality data solution, process, or initiative has the following attributes; it’s…
If your data solutions, processes, or initiatives lack these elements, it lowers the quality of the end result. When this happens, quality quickly transforms into a lethal REAPER; deteriorating quality is the quickest way to kill user adoption and effectiveness. The result is a solution with limited value, or worse, one that can even create harm to your organization.
Examples of quality issues with data and analytics include:
A query that returns an inaccurate or unexpected result.
An inefficient report that takes a long time to answer questions, or doesn’t have the right views/visuals.
A model that doesn’t have the latest data when users need it to make decisions or take actions.
A pipeline or scheduled process that regularly fails due to anomalous data or resource availability.
The consequences of poor quality in your data and analytics include higher costs and less profit due to:
Deciding or acting on flawed or missing information, creating risk and missed opportunities.
Fostering a culture of mistrust in information, resistance to change, and even conflict between teams.
Creating inefficiency as focus and resources go to tasks that should be centralized or automated.
Wasting resources to deal with anomalies, issues, and outages that could have been avoided.
QUALITY IS ALWAYS IMPORTANT
Putting quality first helps build trust, improve adoption, and promote efficiency with users. As such, it’s always important to value quality, but it becomes more important the more users you have. As a solution increases in maturity and complexity, the need for quality rises, and the threshold to tolerate poor quality falls.
This is true throughout the solution’s lifecycle, and it applies to more than just the data and code.
QUALITY IS CONTINUOUS AND QUALITY IS END-TO-END
When quality is raised in the context of data and analytics, it’s usually regarding data quality. Data quality typically refers to the completeness, shape, and readiness of the data for use in analytical puposes. While this is important, quality extends far beyond the data. Quality data solutions address quality in the full solution lifecycle.
Quality also doesn’t start or stop at a finite point. It continues in parallel of the entire analytics pipeline. This means that efforts to improve quality must also be continuous. If quality is deprioritized at any point in this pipeline, it negatively impacts the results for the entire solution.
So far, these may all seem like obvious topics. We all know this, right? And yet—quality is a problem in data and analytics that many data teams struggle to address:
An absence of standards in processes results in redundant effort, data, and code throughout the solution lifecycle.
A lack of automation produces inefficiency and dependencies on “data heroes” who go the extra mile to meet deadlines and deliver results.
Error handling is atypical. For example, semantic models rarely have error handling in Power Query (for import models) or DAX measures.
Documentation is often not useful (like a dump of all the DAX measures in a semantic model) or neglected, entirely.
Testing is often insufficient (like simple checksums) or neglected, entirely.
…and so on…
A rambling goblin appears:
Quality is relative; it's not about enforcing dogmatic rules or best practices. Instead, it's about following a set of core principles to ensure the best functional, sustainable outcome for a given circumstance. This means something different depending upon the context in which a particular solution resides.
For instance, a semantic model is not poor quality simply because it has calculated columns, or because it's being used as a data source. Don't conflate quality with dogma.
To improve the quality of our data solutions, we should consider the practices and principles of DataOps.
A reliable solution works consistently, performs well, and is tested to ensure trustworthy results.
An example for Semantic Models:
Here's some examples of how you might make a more reliable semantic model:
- Ensure data is always up-to-date
- Have performant DAX code and connected report visuals.
- Validate results so they're accurate, matching baselines and meeting expectations.
An efficient solution achieves the desired effect with an optimal approach, or in an optimal number of steps or timeframe.
An example for Semantic Models:
Here's some examples of how you might make a more efficient semantic model:
- Optimize model refresh and DAX evaluation times.
- Avoid duplicating data or loading it in-memory if it’s not necessary, i.e. with calculated tables/columns; Roche's Maxim.
- Re-use existing patterns or data if they're effective.
An automated solution reduces the need for manual intervention as much as is feasible in use or maintenance.
An example for Semantic Models:
Here's some examples of how you might make an automated semantic model:
- Automatically deploy to environments when changes are approved, like from a Test to Production environment after successful user acceptance testing (UAT).
- Set up automated tests to detect and alert you to issues and anomalies when new data arrives or changes occur.
- Ensure a reliable scheduled refresh; i.e. Don't rely on local file sources, like Excel or CSV files on a user machine.
A polished solution is elegant, organized, and professional in its aesthetic / user experience.
An example for Semantic Models:
Here's some examples of how you might make a more polished semantic model:
- Organize measures and other model objects (like columns) into display folders.
- Use descriptions for measures and columns that describe what they are and how they should be used.
- Use format strings to effectively display figures.
A solution that’s easy to use allows the intended audience to get value with the least amount of learning time.
An example for Semantic Models:
Here's some examples of how you might make a semantic model easier to us:
- Use clear, consistent, business-friendly naming conventions.
- Hide and organize model objects like key columns or measures that only work in a specific context.
- Have clear, concise descriptions for measures and other objects, and have functionally relevant documentation (i.e. don't just do a dump of all the DAX measures to a CSV... please).
A solution that’s robust can withstand new data or changes without regression.
An example for Semantic Models:
Here's some examples of how you might make a semantic model more robust:
- Set up error handling where needed in DAX or Power Query code.
- Have limited dependencies that aren't adversely affected by changes to other objects, like measures, tables, or relationships.
- Avoid unnecessary complexity and customization; keep it simple... sphinx.
DATAOPS CAN IMPROVE QUALITY IN DATA AND ANALYTICS
DataOps is a methodology that aims to improve quality and reduce the time to deliver analytics, without introducing unwanted complexity. It does this by relying on a set of core principles and practices to create a structured framework around a data solution’s development lifecycle. As such, DataOps is not a firm concept, and can feel vague or difficult to understand if you’re unfamiliar with similar methodologies like DevOps or Agile.
This article aims to introduce DataOps so you understand how it improves quality in data and analytics. Future articles will elaborate further on this and related points. In the meantime, if you want a deep introduction to DataOps, you can read the DataOps Cookbook from DataKitchen. I also recommend John Kerski’s blog, particularly his series Bringing DataOps to Power BI.
A rambling goblin appears. It pulls out a soapbox and stands upon it, addressing you:
When reading about DataOps or DevOps, you're likely confronted with many other terms like VizOps (for visualizations) or similar terms for other discrete parts of the analytics pipeline. I personally find this pedantic; I suggest you just focus on the underlying principles of DataOps and how they can benefit you.
The rambling goblin refuses to leave. It continues:
To reiterate, there are many, equally valid ways to improve the quality of your data solutions, like data contracts or iterative feedback or peer-review cycles.
Examples of how you improve quality with DataOps are listed below:
To create structure in your analytics pipeline, you should use distinct environments (like workspaces) for different stages of solution development. Publishing changes directly to a solution that’s used by the business (in production) creates significant risk of disruption. Instead, it’s better to have separate environments to reduce this risk and enable more rigorous processes.
An example for Semantic Models:
You typically should publish a semantic model to separate workspaces for development (Dev), validation (Test) and production (Prod). You can promote (or deploy) the model between these environments by using Power BI Deployment Pipelines and/or Azure DevOps Pipelines.
Testing is integral to DataOps, and it’s critical to produce high-quality data solutions. Typically, these tests consist of code or scripts created in advance which can be run on a dedicated environment, automatically. The benefit of testing is that you identify issues before they reach production (and business users), and can address them proactively.
With DevOps, your tests must be well-designed and they must be automated.
Well-designed tests are sufficient to accurately validate a wide range of parameters. These tests should address both the code (like DAX) and the data (like table rows) in your solution. Ideally, these tests can be re-used for other solutions, too. There are different types of tests you can run, such as:
Logic tests ensure that the business logic in semantic models is sound and fits expectations.
Data input tests ensure that “bad data” doesn’t cause part of the solution to fail.
Data accuracy tests validate results against benchmarks to ensure consistent outputs.
Performance tests ensure that the solution is efficient with the available resources/concurrency.
Regression tests ensure that changes don’t adversely affect the existing state of the solution.
Security tests validate enforced security rules and identify flaws or vulnerabilities.
Automated tests are pre-prepared and machine-run on a separate environment. These tests run periodically, or when changes are made. The results are saved as documentation, or to reference later. There are different ways that you can automate tests, such as:
Running scheduled code, queries, or scripts from notebooks or other tools.
Simulating user interaction with an interface.
Testing against best practice rules (such as with the best practice analyzer).
Good testing fosters trust and creates a sustainable, scaleable environment for your data and analytics.
An example for Semantic Models:
Automated regression tests could validate a known figure that should never change, like an annual budget. These tests could run a DAX query to retrieve the budget for certain customers, comparing to the baseline. If a change in code or data produces an incorrect result, the tests could alert the model developer.
Remember — quality is end-to-end and it’s continuous. It’s insufficient to test your solution only during development. Once it’s released, you must monitor the solution to ensure it remains reliable and robust. This monitoring is similarly conducting regular, automated tests to identify issues or anomalies and trigger proactive action.
Typically, monitoring involves continuing to collect quality metrics from various tests after a solution is released. This data is then aggregated and presented in a monitoring report, with alerts to call attention to any metrics that fall outside acceptable limits.
An example for Semantic Models:
A monitoring solution could regularly query source data to find anomalies, like missing or invalid data. If the query identifies these anomalies in new data, it could trigger an alert for data teams to investigate. Then, it could trigger an automation flow to prevent a scheduled refresh, preventing users from seeing wrong data in connected reports and notebooks.
DataOps is a big topic, and there are many ways that it can benefit your data and analytics. However, there are many challenges to implement DevOps in Power BI and Fabric, though these details go beyond the scope of this article. It’s important to be aware of what DataOps is, and to already consider how it can benefit you.
TO CONCLUDE
Quality is critical achieving success in data and analytics. But despite how obvious this is, we still routinely prioritize novelty and speed over quality. This makes it difficult for us to realize value from data solutions, and will make it impossible for us to achieve success with new platforms and technologies.
It’s essential to know what quality looks like in a data solution. Remember the REAPER: Reliable, Efficient, Automated, Polished, Easy-to-use, and Robust. If your solution lacks these elements, it will kill adoption and efficiency.
One way to keep the reaper at bay is to understand DataOps principles and implement DataOps practices. DataOps is a methodology to improve both quality and agility, but it requires a collective discipline to ensure success. Moving forward, quality will only become more important.
So I’ll reiterate: lets take up arms, and put quality at the forefront of our data solutions and initiatives.