Optoro如何在蒙特卡罗建立数据信任和所有权

When your customers are the first to know about data gone wrong, their trust in your data—and your company—is damaged. 了解物流公司Optoro的数据工程团队如何直面这一挑战, 利用大规模的数据可观测性,回收数据信任和宝贵时间. 

Washington, DC-based Optoro 它有一个令人钦佩的使命:通过消除所有退货中的浪费,使零售更加可持续. 他们为像宜家这样的领先零售商提供退货技术和物流服务, Target, and Best Buy, 帮助提高利润,同时通过再商业减少环境浪费, or finding the “next best home” for returned items.

I recently had the opportunity to chat with Patrick Campbell, Lead Data Engineer at Optoro, during Fivetran’s Data Engineer Appreciation Day. 推荐一个正规滚球网站深入了解了Optoro的数据团队和技术堆栈,他们面临的挑战,以及如何应对 data observability 帮助他们实现他们的使命,建立一个更可持续的零售业.

The data landscape at Optoro

表面上,Optoro会为零售商重新安排退回或未售出的商品. 但除了移动大量的商品,Optoro实际上是在移动数据. A lot of data.

帕特里克说:“推荐一个正规滚球网站的技术平台将每一件物品连接到下一个最佳家居。. “As you can imagine, 当推荐一个正规滚球网站通过系统返回库存时,该系统创建了许多关键任务数据点.” 

Optoro的数据工程团队最近加入了工程组织,以更有效地与技术和产品团队合作, but they naturally collaborate with data quality, data science, and data analytics groups. Optoro also has many data consumers, both internal and external, accessing data through Looker dashboards. But it wasn’t always reliable.

The challenge: data integrity

推荐一个正规滚球网站需要洞察推荐一个正规滚球网站的数据质量,简单明了,”Patrick说. “推荐一个正规滚球网站没有一个很好的方法来了解数据何时可能丢失, when it might go stale, or if the data isn’t what we expected.”

That meant that when data issues did occur, customers (not Optoro’s data team) were often the first to know. 这将导致客户的不满,并阻止Optoro提供关于其软件管理的库存的可靠信息.

The solution: data observability 

蒙特卡罗的警报工作流程通知帕特里克的团队某一特定仓库的异常情况, triggered by a distribution issue. Image courtesy of Optoro.

帕特里克的团队需要在客户和不良数据之间找到平衡点, and they had two options: build or buy. Patrick considered building custom SQL integrity checks with dbt, but knew that with limited resources, 他的团队只能部分覆盖Optoro的许多管道, 从长远来看,哪些因素会增加数据工程团队已经非常繁重的工作. 

Instead, 他们选择用蒙特卡罗创建一个概念验证,看看他们能在多大程度上解决数据质量问题 data observability platform, 它使用机器学习来推断和学习数据的样子, identify data quality issues, and notify the right team members when something goes wrong. 

The POC required that Optoro could achieve the following: 

  • Alerting on stale data products
  • Alerting on large pipeline changes, such as schema changes
  • 当操作数据库安全故障转移时,始终开启监视和警报
  • Automatically generate rules based on expected data behavior
  • 编写自定义警报SQL检查以监视特定用例的能力. 

这些结果将帮助Patrick的团队实现他们的最终目标,即防止数据质量问题对客户体验产生负面影响. 

Optoro’s Data Platform leverages a Snowflake warehouse, Fivetran for integration, dbt for transformation, Monte Carlo for Data Observability, and Looker for analytics. Image courtesy of Optoro.

We quickly delivered what Optoro was looking for. 帕特里克和他的团队向前推进,全面整合可以玩滚球的正规app, layering it alongside Snowflake, Fivetran, dbt, and more in Optoro’s data stack. 从摄取到转换到BI报告,这些工具一起工作, 而蒙特卡罗则密切关注着数据生命周期的每个阶段.

The outcome: achieving trusted data through lineage

Monte Carlo让Patrick的团队在他们的数据资产上绘制端到端血统, down to the field level in Looker. Image courtesy of Optoro.

With Monte Carlo’s monitoring and alerting in place, Patrick和他的团队现在是第一个知道何时数据丢失或管道中断的人. And when alerts do come in, 由于蒙特卡罗提供的自动化沿袭,数据工程师可以更快地解决问题.

“We can get a visual on affected data sources, 从内部数据集市一直到推荐一个正规滚球网站的Looker报告,这些报告可能是面向客户的,” Patrick says. “能够迅速发现客户面临的问题,并积极主动,是在推荐一个正规滚球网站的数据中建立信任的关键. 这个特性使数据工程师的工作更容易——我可以从这里的经验告诉你.” 

For Optoro’s data engineers, the relief of having end-to-end, 完全自动化的数据沿袭,不需要任何手动映射或更新,是Patrick最喜欢的平台部分之一. 

“事实上,蒙特卡罗系统能够建立这一血统本身就是值得注意的,” says Patrick. “在构建上下游依赖关系方面,推荐一个正规滚球网站的数据团队几乎不需要输入任何信息.”

The outcome: data teams saving time and stepping up 

Optoro数据工程团队还估计,使用蒙特卡罗为每个工程师节省了至少4个小时的时间, per week, on support tickets to investigate bad data. With a data engineering team of 11+ members, this totals to 44 hours each week.

因为所有的数据团队——不仅仅是工程师——都可以访问自助监控和警报, data catalog views, and lineage through Monte Carlo, Patrick报告说,其他数据团队也在加紧对数据的所有权,并在交付的产品中承担更多责任. 

“这不仅是数据工程在大海捞针方面的巨大胜利, 但它帮助推荐一个正规滚球网站使其他数据团队能够帮助推荐一个正规滚球网站保持对数据的信任,” says Patrick. “使用这些框架可以让数据工程不再是这些情况下的中间人……数据完整性真的应该是自服务的。. And your data engineers will thank you.”

想知道数据可观察性是如何帮助你的团队建立信任和节省时间的? Reach out to the Monte Carlo team to learn more!