今天, 蒙特卡罗 is excited to announce the release of Incident IQ, our all-in-one incident management solution that allows data 团队s to collaboratively identify, 警报在, and remediate the root cause of critical data issues before they impact downstream systems and end users.
By applying similar workflows and end-to-end incident management capabilities of best-of-breed application performance monitoring solutions to your data 管道s, 蒙特卡罗 can now help data 团队s achieve full visibility into data health.
事故智商是第一个全自动的, end-to-end solution that conducts root cause analysis for data issues and changes at each stage of the 管道, from ingestion in the data warehouse or lake to analytics in your business intelligence dashboards. 帮助公司消除”数据停机时间导致失踪, 错误或不准确的数据, Incident IQ automatically generates historical insights about your data to identify patterns in query logs, trigger investigative follow-on query results, and monitor upstream dependency changes to pin-point exactly what caused the issue to occur.
报警 & 路由
当出现数据问题时, 警报通过Slack发送, PagerDuty, Opsgenie, 电子邮件, or webhooks to those who need to know so they can update the incident status for observers and take action.
Alerted parties can go into the 蒙特卡罗 application and access the Incident Report via a central UI that provides:
- An 事件时间表 这使得查看受影响的表变得很容易, and every action that was taken to manage and resolve the incident
- 综合查询日志 显示定期的ETL查询, 临时/回填查询, 查询模式的更改, 和更多的 hints that help 团队s identify the root cause of data incidents.
- 访问样本数据, to help users immediately understand what data involved in the incidents looks like, 典型的数据是什么样的.
- ML-generated见解 to help pinpoint specific groups and subsets in the data that are contributing to the incident..
- 自动的,端到端的血统 that maps impacted downstream BI dashboards to the furthest upstream tables, helping 团队s narrow the focus of root cause investigations.
- Quick links to 蒙特卡罗’s Lineage, historical incidents, Pipelines, and Catalog features, making it easy to identify, root cause, and fix data issues all from the same interface.
沟通 & 协作
一旦出现一个根本原因(或多个原因)!)已被确定, incident managers can use Incident IQ to provide updates on the state of the issue, as well as triage and collaborate to simultaneously resolve incidents. 功能包括:
- An 事件状态栏 that allows data engineers and analysts to mark the status of the incident as investigating, 固定, 预期, 不需要行动, and resolved depending on the severity of the issue, 以及委派事件所有者. 当用户改变状态时, 老板, 或严重性, an additional entry will automatically be captured on the incident’s timeline for post mortems and future learnings.
- 自动运行簿和工作流 to make the incident resolution and triaging process easy, 快, and collaborative between data engineers and analysts.
- 实时通知事件状态 across relevant 团队 channels, including Slack, PagerDuty, Opsgenie, 电子邮件, and webhooks.
事件的决议 & 预防
在事件解决之后, Incident IQ will alert relevant stakeholders and record vital information about the issue to help data engineering 团队s prevent future incidents.
- 事件趋势: Metrics related to each incident are easily available within the UI to help 团队s track total incidents by severity, 老板, 管道, 团队, 和更多的.
客户 have already benefited from the rich insights, 事件提醒, and root cause analysis capabilities of Incident IQ. 他们是这么说的:
- ““事件智商”真的很棒!” – data engineer at leading insurtech startup
- “我看到了新的事件页面，很喜欢!” – Head of Data Engineering at Fortune 50 food & 饮料公司
- 解决问题, 我希望看到所有受影响的表, 他们的查询日志, and any of their past issues we’ve looked into. 现在，推荐一个正规滚球网站把这些都放在一个地方!” data engineer at 2,000-employee e-commerce company
蒙特卡罗’s Incident IQ is currently available for qualified organizations. 一定要退房 推荐一个正规滚球网站的现场产品演示 2021年7月15日12:00.m. 东部标准时间/ 9:00 a.m. PST了解更多.
Interested in learning more about Incident IQ and 蒙特卡罗’s end-to-end 数据可观测性 Platform? 请求一个演示!