缩放数据信任:汽车贸易如何迁移到一个分散的数据平台与蒙特卡罗

领先的公司正在通过去中心化的数据平台向更大的数据民主转变,但没有正确的治理和适当的可见性, data quality can suffer and trust in data can erode. That’s where data observability comes in. 

下面是Auto Trader的数据工程团队如何通过蒙特卡罗实现自动化监控和警报,同时分散责任和提高数据可靠性.

Manchester-based Auto Trader is the largest digital automotive marketplace in the United Kingdom and Ireland. For Auto Trader, connecting millions of buyers with thousands of sellers involves an awful lot of data. 

该公司每月的广告浏览量为2.35亿次,跨平台访问量为5000万次, 每分钟数千次交互——Auto Trader团队可以分析和利用所有数据点来提高效率, customer experience, and, ultimately, revenue. 从广告优化到报告,再到基于ml的汽车估值,数据也为商业结果提供了动力. 

For Principal Developer Edward Kent and his Data Engineering team, collecting and processing this enormous amount of data is no small feat. Recently, the data team has been focused on two key missions. 

“We want to empower Auto Trader and its customers to make data-informed decisions,” said Edward, “and democratize access to data through a self-serve platform.” 

These ambitious goals coincided with a migration to a modern, 基于云的数据架构——这意味着Edward和他的团队必须同时让更多团队更容易访问数据, while building trust in data quality. 

The challenge: building trust and empowering self-serve data 

“As we’re migrating trusted on-premises systems to the cloud, 这些旧系统的用户需要相信,新的基于云的技术与他们过去使用的旧系统一样可靠,” said Edward. 

Today, Edward and his team have a robust data stack. They ingest data through Kafka and Fivetran, handle orchestration and scheduling in Apache Airflow, store data in BigQuery and Amazon S3, use dbt and Apache Spark for modeling, Databricks for data science notebooking, and surface data to internal consumers through Looker. 

And there are a lot of eyeballs on that data. Over 500 active users (more than 50% of all Auto Trader employees!) are logging in and engaging with data in Looker every month, including complex, higher-profile data products such as financial reporting. Of course, 随着海量数据和多层技术堆栈的出现,数据管道有很多破裂的机会——而这些事件几乎总是最先被这500个数据消费者之一注意到.

Edward and his team needed to address data downtime (periods of time when their data was partial, erroneous, or otherwise inaccurate) to improve trust, but at the same time, were functioning as a bottleneck for the rest of the company. 他们集中的数据工程师团队处理所有与数据操作有关的事情, 从建立新管道和报告的请求到调查数据质量问题的紧急电话. The approach didn’t scale and resulted in a backlog of requests, which led Edward and his team to develop a plan to build an abstracted, self-serve platform for business consumers to use on their own. 

“Rather than having a single data team that does everything, 推荐一个正规滚球网站希望给予团队平台级别的能力和自主权,以构建他们自己的数据产品,” said Edward. “理想情况下,推荐一个正规滚球网站希望团队能够管理他们数据管道生命周期中的一切. So everything from ingestion to modeling to alerting, et cetera. 因此,正是在这种背景下,推荐一个正规滚球网站希望将数据可观察性作为平台的功能来提供.”

Enter Monte Carlo. 

The solution: Incorporate Monte Carlo for data observability

推荐一个正规滚球网站与Auto Trader合作,将蒙特卡罗数据可观测性平台集成到他们现有的数据平台中. Specifically, they wanted to add a layer of monitoring, alerts, 并沿袭了BigQuery和look -它们的数据堆栈中最可见的层. 

“It’s more important for us than ever before that the data we’re serving out is correct, accurate, and up-to-date,” said Edward. “So that’s where the concept of data observability comes in.” 

Auto Trader uses Monte Carlo to perform volume and freshness checks, as well as alerts on schema changes, on all tables in BigQuery. Edward的团队还在许多关键表中选择了一套ml驱动的统计检查, 使其易于获得列级置信度,而无需定义阈值的繁琐过程. Monte Carlo has also helped them understand all the dependencies in their data. “推荐一个正规滚球网站发现很难理解BigQuery中的哪些表出现在Looker中的哪些报告中, and vice versa,” Edward said. “The lineage tracking across those two systems was really powerful for us.”

Outcome: Incident tracking that builds trust

Before Monte Carlo, 数据工程团队会发现数据质量问题,当内部用户向他们发送关于Looker报告的Slack信息时,这些信息似乎不太正确. 

The data team would then have to a) investigate if there was a genuine issue; b) try to determine the root cause; c) figure out which tables or dashboards were downstream and which consumers would be affected; and d) finally, 跟踪并通知相关涉众已发现问题, estimate when it would be resolved, and follow up again once the problem was fixed. Edward described the process as “reactive, slow, and unscalable.”

Now, with Monte Carlo, Edward’s team receives a Slack notification when a possible incident is detected. 

They can investigate issues much more quickly thanks to automated, end-to-end lineage, 这有助于数据工程师了解上游和下游可能的影响,直到现场水平, as well as provides visibility into freshness, volume, 分布的变化可以帮助他们查看更新模式,并注意到可疑的变化.

With Monte Carlo, communication is also streamlined. Edward’s team doesn’t have to track down stakeholders—again, 并且可以通过可以玩滚球的正规app将问题和解决方案的通知发送到相关团队的Slack渠道. 

With automated monitoring and alerting in place, and lineage speeding up incident time-to-resolution, 数据工程团队通过在下游消费者发现问题之前主动解决数据问题,与利益相关者建立了更大的信任. 

Outcome: Scalable monitoring and visibility into “unknown unknowns”

Edward also credits Monte Carlo for its machine-learning powered monitoring. 

“为了获得价值,推荐一个正规滚球网站不需要知道推荐一个正规滚球网站需要监控什么,”爱德华说. “蒙特卡罗可以开始寻找模式,并将提醒推荐一个正规滚球网站任何异常和偏离这些模式.”

For example, Monte Carlo spotted an unexpected deletion of 150,000 rows in a table where deletions are rare. 一个数据工程师能够深入到蒙特卡罗,并注意到有一个删除表,通常只看到增加. Using lineage tracking, they could look at what was upstream of that data, see that it came in through ETL from an external source, and go and check with the data owner to see if that was legitimate and intended, or if something had gone wrong. 

“In this case, no action was needed,” said Edward. “But it’s really valuable to know that this sort of thing is happening with our data, because it gives us confidence that if there were a genuine issue, we’d find out about it in the same way.”

The data engineering team uses custom SQL rules and dbt for manual testing, but due to the scale of their data, relies on Monte Carlo as the bedrock of their monitoring so they can  catch ‘unknown unknowns’. 

“无论是自定义SQL规则还是dbt测试,你都必须预先配置,”Edward说. “You have to know in advance what it is you’re going to monitor, and go through the process of setting it up. For us, we have hundreds of data models defined and hundreds of tables built daily. 推荐一个正规滚球网站想要的是一种能够有效地让它运行起来的东西,而不需要推荐一个正规滚球网站投入太多精力. 蒙特卡罗提供的模式检查,容量检查,新鲜度检查都是在这方面进行的.”

Outcome: Empowering Auto Trader’s self-service data platform

Monte Carlo also supports Auto Trader’s transition to a decentralized, self-serve data program—without compromising on data quality. 

在这个新模型下,分散的警报被路由到适当团队的警报通道. Edward和他的团队要求在他们的dbt中将数据所有权和警报定义为元数据和其他属性 .yaml file, 因此,拥有特定数据集的产品团队将自动收到蒙特卡罗警报到他们自己的渠道. 

“Decentralized data ownership means decentralized responsibility for data observability,” said Edward. “Monte Carlo helps us provide this platform capability.”

The impact of data observability at Auto Trader

As Auto Trader seeks to build trust in data while opening up access, data observability is key to ensuring data remains accurate and reliable. 

“We have far more visibility into what’s going on with our data than we’ve ever had before,” said Edward. “Previously, 很多这样的问题本来会被数据消费者发现并报告的,而现在可以玩滚球的正规app已经对这些问题进行了标记. From a tracking perspective, 随着推荐一个正规滚球网站向去中心化数据平台迈进,这种可见性对推荐一个正规滚球网站来说非常重要.”

Interested in learning how data observability could improve your team’s visibility, response time, and efficiency? Reach out to Will Robins and the rest of the Monte Carlo team