数据可观察性:五种快速提高数据可靠性的方法

If your data breaks, does it make a sound? 

Odds are, the answer is yes. 但 will you hear it? Probably not. 

现在adays, 组织在日益复杂的生态系统中吸收大量的数据, and very often their data breaks silently, 结果,数据团队被置于黑暗中——直到为时已晚.  

但, if said data is a report used by your Chief Revenue Officer to determine next quarter’s forecast, chances are this data will make a very, very large sound. 它甚至可能被几个清晨的电话和紧急的Slack信号放大. 

Sound familiar? Fortunately, data doesn’t have to be silent. 

输入 Data observability, an emerging – and increasingly important – layer of the modern data stack, 对于帮助数据团队理解和改善数据健康状况至关重要吗, 在许多情况下,首先要防止这些数据灾难的发生. 

In DevOps, engineers rely on both testing and observability to tackle application downtime and ensure consistent uptime. 同样,软件已经成为组织创新能力的关键, 数据也成为决策和产品开发的基础. Similarly, 确保数据正常运行时间和减少数据停机时间(换句话说)的强大方法, periods of time where data is missing, inaccurate, 或其他错误)涉及数据测试和可观察性. 

With the right approach, data observability can even help you understand which data sets matter most to your organization (i.e., which data sets and pipelines make the loudest sound when they break) and which ones can be deprecated. 

在推荐一个正规滚球网站深入研究5个常见的数据可观察性用例之前, 让推荐一个正规滚球网站首先澄清一下“数据可观察性”是什么意思.

What is data observability?

Data observability, 组织充分了解其生态系统中数据的健康状况的能力, eliminates data downtime by applying best practices of DevOps and application observability to data pipelines. Like its DevOps counterpart, data observability uses automated monitoring, alerting, 并进行分类以识别和评估数据质量和可发现性问题, leading to healthier pipelines, more productive teams, and happier customers. To make it easy, 推荐一个正规滚球网站将数据的可观察性分为五个支柱:新鲜度, distribution, volume, schema, and lineage. Together, these components provide valuable insight into the quality and reliability of your data.

  • Freshness: Freshness旨在了解数据表的最新程度, as well as the cadence at which your tables are updated. Freshness is particularly important when it comes to decision-making; after all, 过时的数据基本上就是浪费时间和金钱的同义词.
  • Distribution: Distribution, in other words, a function of your data’s possible values, tells you if your data is within an accepted range. Data distribution gives you insight into whether or not your tables can be trusted based on what can be expected from your data.
  • Volume: Volume refers to the completeness of your data tables and offers insights into the health of your data sources. 如果2亿行突然变成了500万行,你应该知道.
  • Schema: Changes in the organization of your data, in other words, schema, often indicates broken data. Monitoring who makes changes to these tables and when is foundational to understanding the health of your data ecosystem.
  • Lineage: When data breaks, the first question is always “where?” Data lineage provides the answer by telling you which upstream sources and downstream ingestors were impacted, 以及哪些团队在生成数据以及谁在访问数据. Good lineage also collects information about the data (also referred to as metadata) that speaks to governance, business, 以及与特定数据表相关的技术指南, serving as a single source of truth for all consumers.

组织应该投资于数据可观察性的五个原因

These five reasons just scratch the surface of how investing in observability can help your team improve data quality at scale and trust your data faster than testing alone. 

1. Full-stack coverage from ingestion to the BI layer

要完全了解数据运行状况,需要全堆栈覆盖. Image courtesy of Monte Carlo.

Modern data environments are incredibly complex, with data continuously flowing in from a variety of sources, 通常来自“外部”资源,这些资源可以在没有通知的情况下发生变化. That data is then shipped off into some sort of data storage component (whether that’s a data warehouse, data lake, or even a data lakehouse), 然后传播到BI层,供涉众使用. 在这段时间内,数据常常要进行多次转换. 

和 despite how great your data pipelines are, 事实是,数据在其生命周期的任何阶段都可能被破坏. Whether as a result of a change or issue at the source, or an adjustment to one of the steps in your pipeline, or complex interaction between multiple pipelines, data can break for reasons you cannot control. Data observability allows you to have end-to-end visibility into breakages across your pipelines.

Outcome: ML-powered coverage of high priority tables

Alex Soria, VP of Data & Analytics at Mindbody, leads a team of over 25 data scientists, business intelligence analysts, and data engineers responsible for ensuring that the insights powering their product is fresh and reliable. 

Before implementing data observability, Mindbody无法识别数据的异常情况,直到为时已晚. By implementing a data observability solution across their Redshift warehouse and Tableau dashboards, they are now the first ones to find out about abnormalities in the lifecycle and duplicate data. 

With data observability,  他们能够有效地监控3个高优先级表中的15个,000+,自动检测和警报模式异常, freshness, and volume.



2. End-to-end field-level lineage across your data ecosystem


End-to-end lineage powered by metadata gives you the necessary information to not just troubleshoot broken pipelines, but also understand the business applications of your data at every stage in its life cycle. Image courtesy of Monte Carlo.


复杂的数据管道和不断变化的数据生态系统,  跟踪上游和下游的依赖关系是必要的. End-to-end lineage empowers data teams to track the flow of their data from point A (ingestion) all the way to point Z (analytics), incorporating transformations, modeling, and other steps in the process. Essentially, lineage gives your team a birds-eye view of your data and allows you to understand where it came from, who interacted with it, any changes that were made, and where it is ultimately served to end consumers.

但 lineage for the sake of lineage is useless. Teams need to ensure that the data being mapped is 1) accurate and 2) relevant to the business.

结果:数据问题的快速分流和事件解决

Manchester-based Auto Trader 是英国和爱尔兰最大的数字汽车市场吗. For AutoTrader, 将数百万买家和数千卖家联系起来需要大量的数据.

On top of wanting to add automated monitoring and alerts, Auto Trader needed a way to track which tables in BigQuery are surfaced in specific reports in Looker, the most visible layers of their data stack. With data observability and automated end-to-end lineage, the team at AutoTrader is able to investigate issues quickly and efficiently as they have a better understanding of what broke, what else is impacted, and who else should be notified of issues and resolutions.

3. Impact analysis for broken reports and pipelines 

With Incident IQ, data teams can understand the root cause, upstream and downstream dependencies, 和其他关键的背景信息的Segment数据事件. Image courtesy of Monte Carlo.


By gaining end-to-end visibility into the health, usage patterns, 数据资产的相关性,您的数据团队能够更快地解决数据问题. 在响应数据事件时,时间是至关重要的, data observability allows your team to troubleshoot issues and understand the impact far quicker than traditional, manual approaches.

Outcome: conduct root cause analysis for missing, stale, or inaccurate data in minutes, instead of hours or days

Hotjar是一家全球产品体验洞察公司,数据为各种各样的用例提供动力, 从制作理想的营销活动,创造令人愉快的产品特征. 他们的数据工程团队支持超过180个涉众和他们的数据需求, 从部署模型和构建管道到密切关注数据健康状况. When data downtime happens, they needed a way to keep tabs on what was happening and what else up and downstream was affected by the issue. To understand what was causing the downtime, 他们利用了数据可观察性的一个关键组件, 端到端沿袭,以理解与问题相关的上游和下游依赖关系. 现在, their team could do an impact assessment and identify the root cause of the issue faster and more efficiently. From there, the team could correct the course and identify those who need to know about the incident.

4. Monitoring and alerting that scales with your data ecosystem

A good alert will highlight the proper channels, recipients, 并与手头问题类型对应的推荐一个正规滚球网站. Image courtesy of Monte Carlo.

When data breaks, your team should be the first to know. Nothing is more embarrassing for a data worker when they are constantly getting emails and messages about data issues that are uncovered by a stakeholder when looking at a report. 数据可观察性确保您的团队是第一个了解和解决数据问题的人, so you can address the effects of data downtime right away. Ideally, these alerts should be automatic and require minimal effort on your part to get up and running (which is great for scaling alongside your data stack).


结果:增加创新,减少安装管道的时间

Blinklist, a book-summarizing subscription service benefits from automated monitoring and alerting of critical data assets, saving on average 120 hours per week. Through the use of machine learning algorithms to generate thresholds and rules for data downtime alerting, each engineer on the team saves up to 20 hours per week现在,这些资金都用于为最终用户打造产品功能和仪表盘.

5. 数据工程师、数据分析师和数据科学家之间更容易协作

Monte Carlo’s alerting workflow notifies data engineers and analysts of anomalies in specific warehouses, in this case, triggered by a distribution issue. Image courtesy of Optoro.

Arguably one of the most commonly utilized benefits data teams experience as a result of data observability is increased collaboration amongst team members. A best-in-class data observability platform facilitates transparency in data quality across every data stakeholder. 而不是每个功能都有自己的竖井式的方法,以保持在数据质量问题的顶部, a data observability platform enables all data engineers, data scientists, and data analysts to better understand data health and collaborate on improving data quality.


Outcome: decentralized, self-serve governance across teams and more reliable data

Optoro is a technology company that leverages data and real-time decision-making to help retailers and manufacturers manage and resell their returned and excess merchandise. By adopting a data observability platform, the team has saved an estimated 44 hours per week on support tickets investigating bad data. 和, 因为所有数据团队成员都可以访问自助监控和警报, data catalog views, and lineage, data analysts across other domains are now able to step up and take more ownership of data and take accountability for the products they ship.

有兴趣了解更多关于数据可观察性如何帮助您的团队的信息? Reach out to the team at Monte Carlo.