数据可观测性: How Clearcover Increased Quality Coverage for ELT by 70 Percent

As organizations continue to ingest more data from more sources, 保持高质量, 可靠的数据资产成为一个关键的挑战. That’s why Clearcover partners with 蒙特卡罗 to ensure end-to-end data 可观察性 across ELT—和 beyond.

芝加哥的团队 Clearcover, 领先的科技驱动保险提供商, 他们以提供快捷而自豪, 全美范围内的汽车保险费率透明化.S. 他们也为自己的数据感到自豪, which they see as a competitive advantage that allows them to deliver on their promise to consumers: to save money while enjoying fast claims, 容易支付, 和特殊服务. 

推荐一个正规滚球网站最近采访了布劳恩·雷耶斯, 数据工程高级经理, to discuss Clearcover’s journey of scaling a self-service data platform while maintaining the trust 和 quality needed to achieve their data-powered mission—和 how they responded when 数据的可靠性 issues began to arise.

Clearcover的数据状况

当布劳恩加入Clearcover的时候, he was a one-man data engineering team—和 describes the data stack as a “data for analytics” approach. 他使用亚马逊 RDS 后端, Alooma 对于数据复制到 红移,并有一个 Apache气流 栈上运行 Kubernetes. 布劳恩还负责创建准备好的数据集. 

Clearcover’s first data stack relied on analysts to query sources directly, 一个数据工程师(布劳恩)负责清洁, 处理, 并存储准备好的数据. This led to recurrent bottlenecks that slowed down analysis 和 time to insights. 图片由布劳恩·雷耶斯提供.

But with his one-man data engineering team responsible for so many layers, 瓶颈开始出现. And Braun was spending more time maintaining his tools than actually using them to deliver data. With these bottlenecks 和 a lack of accessibility to—和 therefore trust in—the data, many data consumers found workarounds by simply querying the source data directly. 

“All this investment that I was making in this data stack was for naught,”布劳恩说. 

So, as the data engineering team grew, they migrated to a modern, distributed ELT data stack. 他们从“气流”切换到 完善 并开始使用 AWS Fargate 降低计算机的总拥有成本. 该团队还使用了 Fivetran雪花, which provides the accessibility that 红移 lacked for their needs. 

Clearcover的“现代数据栈”发展到包括雪花, 完善, 印度生物技术部, 和Fivetran, as well as separate analytics engineering 和 data engineering layers. 图片由布劳恩·雷耶斯提供.

现在, the data engineering layer is much smaller 和 focused on h和ling raw data, while a separate analytics engineering team takes the tools 和 the raw data they deliver to produce prepared data for the business—helping eliminate those earlier bottlenecks. 

问题:对源数据缺乏信任

While the distributed data stack made it much easier to integrate more data sources into 雪花, 布劳恩的团队遇到了一个新问题: 数据的可靠性

随着数据源的激增, it became harder 和 harder for data engineers to scale data quality testing across pipelines manually. 所以当他们在数据上获得了操作上的信任时, 他们现在必须处理数据质量和信任问题. 

When the team started replicating data sources to 雪花 to expedite the analytics engineering workflow, they encountered two paths: manually write data quality checks (a time-intensive 和 unscalable process) or invest in automated coverage. 图片由布劳恩·雷耶斯提供.

布劳恩说:“英语教学很好,但总有一个折衷。. “例如, 当您将数据从CRM复制到雪花时, your data engineering team is not necessarily going to be the domain experts on CRM or marketing systems. So it’s really hard for us to tailor data quality testing across all of those sources.”

When data quality issues came up, it was often a surprise to the data engineering team. 他们将从数据消费者那里收到Slack信息, 问为什么某桌几天都没换过菜. 

“你的管道正在运行, 一切看起来不错, but then you realize that the data you’re delivering is either incorrect or it just hasn’t arrived at all,”布劳恩说. “You’ve been so focused on the operational aspects of building the pipelines, 交付他们, 和 testing the code that perhaps there wasn’t enough b和width to then think about the health of the data itself.”

Braun 和 his team knew they needed to get a base set of common-sense checks that would provide the coverage required to build data trust. But adding coverage was continually moved down in their backlog, as they were busy dealing with new data sources being requested from the business. 

So they began exploring a faster, more holistic solution: 蒙特卡罗. 蒙特卡罗地址端到端 data 可观察性, allowing users to get visibility 和 underst和ing of the overall health of their data through automated monitoring, 报警, 与血统. 为了布劳恩和他的团队, the concept of measuring data health across the five pillars of 可观察性—freshness, 体积, 分布, 模式, 和lineage-instantly共鸣. 

“We’re all about simplicity on the data engineering team,”布劳恩说. “以这种方式提出问题对推荐一个正规滚球网站来说意义重大.”

解决方案:自动覆盖关键的ELT管道

加上可以玩滚球的正规app, providing automated monitoring 和 报警 of data health issues, Braun 和 his team had instant base coverage for all of their tables. 

“We no longer had to tailor specific tests to every particular data asset. 推荐一个正规滚球网站所要做的就是报名, add the security implementation to give 蒙特卡罗 the access that it needed, 推荐一个正规滚球网站可以开始得到问题的提醒. 可以玩滚球的正规app给了推荐一个正规滚球网站现成的证据.”

Critical step: reducing 报警 white noise with 印度生物技术部 artifacts 

有超过50个来源,提供了大量的数据, Braun also wanted to make sure that his team wasn’t getting alerted 和 being distracted by data incidents that didn’t actually matter to the business. 

为了减少自动化监控带来的噪音, they built automation around parsing out 印度生物技术部 artifacts to determine which raw tables were being used by the prepared data package. 然后, they used the 蒙特卡罗 GraphQL API to build automation around tagging those tables 和 forwarding incidents related to those key assets to a dedicated channel. 

通过使用蒙特卡罗和印度生物技术部构件隔离关键资产, Clearcover was able to reduce white noise 和 focus on the data that mattered most to their business. 图片由布劳恩·雷耶斯提供.

“We want to focus our attention on those things that are being used by the business,”布劳恩说. “通过将这些关键资产隔离在一个特定的Slack渠道, 这样我的团队就能专注于这些特殊事件.”

这种监控策略立即产生了影响. 与蒙特卡罗, Braun’s team was able to proactively identify those silent failures, 发起与利益相关者的对话, 和 get ahead of problems faster—instead of receiving panicked Slack messages about missing or incorrect data from analytics or business teams.

蒙特卡罗’s Slack integration makes it easy for Braun’s team to communicate incidents 和 their impact to the broader data org. 图片由布劳恩·雷耶斯提供.

Braun 和 his team were also able to begin preventing data issues from impacting the business. 

“例如,推荐一个正规滚球网站不是Zendesk的领域专家,”Braun说. “But if 蒙特卡罗 alerts us that some 模式 from that data source changed from number to string, we can reach out to the BI team 和 notify them so that when their prepared data package runs in the morning, 它们不会出现任何停机时间.”

解决方案:自动化, end-to-end lineage speeds up time to resolution for data incidents by 50 percent

蒙特卡罗也提供自动血统, giving the data team full visibility into upstream 和 downstream dependencies from ingestion to their BI dashboards. This helps the data engineering team underst和 the impact of 模式 changes or new integrations, 和 makes it much simpler to conduct root cause analysis 和 notify relevant stakeholders when something goes wrong. 

蒙特卡罗’s Incident IQ gives Braun 和 his team a “ground zero” to root cause data issues before they affect downstream consumers, cutting hours off the time to detection 和 resolution for data downtime. 图片由布劳恩·雷耶斯提供.

When the data engineering team gets a Slack alert about a key asset, they can go directly into the 事件智商仪表板 在蒙特卡罗.

 “It’s one of my favorite features because it’s really ground zero when we look into particular incidents,”布劳恩说. “Incident IQ sets the stage for how you go about investigating an issue. I can look at an attribute like freshness 和 see if any gaps are outside the norm, 为我的团队添加评论, 和 update the status of the incident so anyone coming in to look at it will know if it’s being worked on.”

Lineage also helps data engineers see potential downstream impacts 和 uncover hidden dependencies, 都在事件智商范围内. 然后, they’re able to reach out to anyone who may be running queries on that data or pulling it into Looker reports. 

Solution: Using code to extend, automate, 和 customize monitoring as the data ecosystem evolves

The Clearcover data team makes use of data 可观察性 (和 specifically, 蒙特卡罗’s 监视器作为代码特性) 通过代码扩展沿袭和监视. Braun 和 his team can write custom monitoring scripts 和 build automations easily within their CI workflow to add more relational information 和 context into 蒙特卡罗. 

“We do have pipelines 和 processes that are custom-built or have something like JSON 模式 that needed extra coverage beyond what the out-of-the-box machine learning monitors provide,”布劳恩说. “So we can add custom field health monitors that provides more context 和 even delivery SLAs to our most important 和 complex assets 和 pipelines.” 

With these custom monitors layered on top of the automated ones, it makes it easy for the data team to set new SLIs 和 keep closer tabs on SLAs. 使用这些JSON模式监控器, 再一次, it gives us the ability to apply common sense data quality to JSON variants, 发起对话, 和 keeps us in the loop on any potential incidents that could cause downtime.”

Outcome: 70 percent increase in quality coverage for raw data assets

可以玩滚球的正规app合作后,Clearcover发现 质量覆盖率提高70% 所有原始数据资产. 这导致了更积极主动的对话, 更快的根本原因分析, 数据事件的减少. And integrating data from over 50 sources is no longer an issue for the team—they know they’re covered when duplications or other anomalies arise. 

“现在, we can start having those proactive conversations to prevent downtime before stakeholders are affected, versus finding out after the fact that something was broken 和 then rushing to get it fixed,”布劳恩说. “所以,至少这些数据在今天结束前仍然可以公布.”

While the team had been considering building their own root cause analysis 和 anomaly detection tooling, they never were able to prioritize it due to the data engineering resources required. 与蒙特卡罗, 他们有两种车, 在不增加技术债务和减少对定制代码的需求的情况下.

Clearcover数据的下一步是什么?

Braun hopes to drive wider adoption of data 可观察性 beyond the data engineering team, extending its usage to the analytics engineering team 和 even savvy business users. 他相信通过共同成长, Clearcover 和 蒙特卡罗 will continue to increase the level of trust in data across the organization. 

“蒙特卡罗 is super responsive to our suggestions 和 requests,”布劳恩说. “And they are very interested in how we think about data downtime, data ops, 和 monitoring. 他们让推荐一个正规滚球网站参与到潜在的新功能中. We really feel like it’s more of a partnership than a vendor relationship. 蒙特卡罗 is a product we can grow with as we mature as a data organization.”

Curious how 蒙特卡罗 can help your organization achieve data quality coverage 和 build data trust across teams? 联系 给推荐一个正规滚球网站做个演示 了解更多.