By Maarten Masschelein, CEO & Co-Founder
We were thrilled to sponsor a globally organized hackathon at HelloFresh, which challenged cross-market (global) teams of HelloFresh employees to build a ‘Standard Data Quality Dashboard’.
Subtitled “In Data We Trust?!”, the hackathon was part of HelloFresh’s efforts to improve data literacy and data quality across the organization in an engaging and fun way. Whilst the organization has been building its data capabilities, maintaining trust in the data quality and integrity is always top of mind. With the data governance department, led by Dr. Kinda El Maarry, looking to ensure a standard way of reporting on data quality whilst ensuring high adoption rates, hackathon teams were tasked with creating a dashboard that decreased the mental load needed to understand various formats of data quality reporting, enabled easy comparisons of data assets and freed up resources and time spent on reacting to data quality issues. Now that time and resource is devoted to adding value through deeper insight and analysis.
Soda was delighted to be able to support and sponsor this hackathon — we believe that data quality is a team sport and everyone who has a stake in the data, should be able to trust it and understand it. It was fascinating to see what the teams came up with! So, with six teams battling it out for a fantastic prize for each team member, plus some very cool HelloFresh/Soda swag up for grabs, what did our contestants come up with?
The teams all identified the six dimensions that are used to assess the quality of data within a dataset; accuracy, completeness, consistency, timeliness, uniqueness and validity. The teams understood that the dashboard should be able to give a high-level executive summary of how selected datasets performed against each of these dimensions, highlighting at a glance whether and where data was falling short of the approved quality threshold.
Being able to understand in simple terms where data quality is falling short is a powerful way of enabling an organization to boost data awareness amongst those not working within a specialised data team. Team “Fruit Salad” identified that its dashboard should be comprehensible to a Head of Production or Operations, offering high level information that wasn’t so granular that only data engineers and analysts could understand, but also wasn’t so topline that it was more suited for a senior executive level audience.
Being able to delve deeper into data quality and see exactly where datasets are failing was another key functionality all of the teams built into their dashboards. Team “Data Chefs” suggested building the ability to pinpoint exact dates and times for when data that did not meet the 6 pillars was added to a dataset, enabling streamlined tracing and quarantine of bad data. “Two Nordics Guys” suggested a dashboard which allowed data engineers to view the rules of each data quality dimension to test and validate which specific rules and metrics caused the majority of data quality issues.
Team “Tricolour Kiwis” took this a step further, and suggested the ability to weight the importance of each data quality dimension specific to each data asset. The ability to delve down into this level of granular detail, and zoom all the way out to a top level overview, is instrumental in encouraging meaningful interactions with data across all levels and functions of an organization.
What was absolutely clear in all of these entries was the importance of making data monitoring accessible to every part of an organization, creating complete transparency and providing end-to-end observability — that’s a sentiment we can get behind!
Underpinning the data quality metrics being displayed by these dashboards is Soda’s Data Observability Platform. I was pleased to be able to present my thoughts on best practices and tips on linking metrics to data quality dimensions, based on my experience in providing data monitoring solutions to organizations just like HelloFresh. I also shared our product design and ideations around data quality reporting and visualizations, a feature that we are releasing later this year.
The ability to monitor data in order to ensure good data quality is critical to making good business decisions. My presentation at the hackathon touched on our first open-source project, Soda SQL, which is helping us to champion the engineering principles of Test Driven Development and apply them to the principles and framework of our data monitoring platform.
Being able to monitor these test results couldn’t be easier, thanks to the ability to view and collaborate through a web application enabled by a free Soda Cloud account.
At Soda, we believe the foundation of data driven decision making is good quality data. If you want to get hands on with your data and leverage the Soda Hackathon-in-a-Box like the HelloFresh Team, please get in touch with our events team who will be happy to set you up (swag included). To see what a difference it could make for your organization, go ahead and visit our Soda SQL Github Page, and sign up for your free Soda Cloud account today!