Harnessing Unstructured Data with AI

Virtual Town Hall Insights
New York CDAO Community

Henry Ehrenberg


Snorkel AI


John Manchisi

Business Transformation Manager


MAY 2022

The amount of data created and consumed across the globe is growing at an unfathomable rate. This year alone, it is projected that 97 zettabytes of data will be produced, and it will increase to 181 zettabytes by 2025. As a data and analytics leader, it may feel like you’re sinking into the abyss, but enterprises can utilize AI implementations to rein in this new and unstructured data and channel it toward innovation and sharper data-driven insights.


“80%-90% of an organization's data is largely semi-structured or unstructured and never used.” - Gartner


CDAOs in the New York community gathered recently to discuss best practices for harnessing unstructured data and data strategies to overcome barriers to AI implementation and scale. Henry Ehrenberg, Co-Founder of Snorkel AI kicked off the discussion by sharing how data is both “a key enabler and bottleneck for AI development.” As AI models require immense amounts of training data and there is no commodity solution for labeling this data at scale, data leaders must be strategic about their approach to see greater business value from AI. 

Members of the CDAO community broke into small groups to discuss how they are addressing this within their own organizations.

Accelerating AI with Automated Data Labeling

CDAOs were in agreement that the faster data labeling can occur, the faster machine learning models can be trained and insights can be generated. Each leader shared how they are leveraging data labeling, with a mix between manual processes and those using open source tools. Many are looking for solutions to workflows that are slow, expensive (either by crowdworking services or high opportunity costs), and hard to adapt across their lines of business.

Henry Ehrenberg shared a case study from a top-tier bank on how implementing a programmatic approach to data labeling can unlock the value of AI at scale. In his example, the bank previously implemented a manual data labeling strategy, which took a multi-person team months to implement, and then a shift in business goals delayed completion an additional three months. But through automation, they were able to accelerate the process and develop an accurate model of 250 thousand documents in a mere 24 hours.

Leveraging Subject Matter Experts in AI Initiatives

Data literacy and creating a data-driven culture are top CDAO challenges, and CDAO members discussed the necessity to leverage subject matter experts to improve business outcomes.

One CDAO expressed how bringing business leaders into the process earlier can empower them to be a part of the solution. Another shared that engaging subject matter experts in AI initiatives can give AI implementations the “human assist” needed for better goal setting. John Manchisi, NYC CDAO Governing Body Member from Verizon, shared that prioritizing a look alike model can help subject matter experts see the value of AI to improve collaboration efforts.

Continuing the Conversation

Evanta’s CDAO community connects peers from the world's leading organizations to discuss the most critical issues impacting CDAOs today. Connect with your local CDAO community to join the conversation: https://www.evanta.com/cdao


by CDAOs, for CDAOs

Join the conversation with peers in your local CDAO community.