Advances address untapped opportunity in data preparation, data discovery, and harmonization of sources including automated pattern-matching across diverse sources for faster insights
MENLO PARK, CA — February 16, 2016 – ClearStory Data, the company bringing business-oriented Data Intelligence to everyone through fast-cycle, disparate data analysis, today announced a breakthrough advancement to its industry-first data inference and Intelligent Data HarmonizationTM capabilities called Infinite Data Overlap Detection (IDOD). With this innovation and the research behind it, ClearStory’s Spark-based, business-ready analytics solution now detects and infers data patterns and customer-specific data types for all values for all data types in every source that a user connects to as part of an analysis. The benefit to organizations is more automation in data discovery and even faster blending of sources to further eliminate traditional data modeling complexities and speed business insights.
Earlier this month, Gartner named ClearStory Data a Visionary in its debut in the Magic Quadrant for 2016 Business Intelligence and Analytics Platforms.* As the MQ report notes: “Market awareness and adoption of smart data discovery will extend data discovery to a wider range of users, increasing the reach and impact of analytics. These emerging capabilities facilitate discovery of hidden patterns in large, complex and increasingly multi-structured datasets, without building models or writing algorithms or queries.”
ClearStory’s IDOD advancement addresses exactly the types of complexities that are prevalent across all Global 2000 organizations. The new ability to blend and harmonize “categorical value” data sources that are highly dimensional solves the root cause for the biggest delays and IT challenges in speeding business insights. Organizations benefit from more precise insights on large, complex data sources including ones with a high degree of customer-specific information, which is common in companies across industries and contributes to a rise in data analysis complexity.
ClearStory’s new, large-scale IDOD capability is used to determine how complex data from multiple sources should be blended, viewed, and visualized on the fly. IDOD plays the role of data modeling advisor to the business user, enabling them to blend data together and discover insights quickly, without data modeling expertise and days or weeks of manual effort. ClearStory’s approach replaces traditional methods of manually matching data or and data relationships across diverse sources. Traditional approaches are not sustainable as businesses have reached an urgent need to see business insights faster.
In primary research conducted in October 2015, nearly 70 percent of companies polled report they need access to refreshed data insights either hourly or daily. Eighty-six percent of them struggle with this challenge on a regular basis where four or more data sources and file formats are involved for analysis. A majority of respondents (68 percent) report they experienced “data blindness” at least once per week because they could not spot “what’s happening now, and why” soon enough, impacting their ability to make smart decisions and perform their jobs well.
The most difficult part of this problem being addressed has always been the customer-specific attributes and distinct customer values and nuances of data such as product names, category names, distinct phone numbers, product codes, and unique brand attributes; all which are prevalent across any G2000 company as well as small to medium-sized businesses. Such data and attributes have traditionally required heavy manual data wrangling to reconcile and inspect many thousands to millions of unique values with integrity and consistency.
Take one of these data sources and add to that more such sources that need to be blended together, and what results is a long, painful and error-prone process. As the data sources update, the subsequent repeated headache of preparing and modeling all the data relationships becomes unsustainable for even sophisticated data stewards. The business impact is major delays in reaching insights which ClearStory’s new smart machine-based approach directly addresses.
Highlights of the new Infinite Data Value Overlap capability include:
- Smarter Data Inference: Detects and infers the overlap of categorical values for all data types across hundreds, millions or even billions of unique values for attributes across all the source data being analyzed;
- Infinite Types: No limits on how many unique custom data types, custom dimensions, or values can be recognized in each source for data inference and data harmonization;
- Extensibility: New data types can be easily patterned and plugged into the capability for increased automation of vertical industries’ custom data types. This brings a powerful way to address vertical-specific and customer-specific data nuances and complexities. Further, it allows the ClearStory system to learn customer-specific attributes that further accelerate reaching insights and significantly reduce manual and complex data modeling.
- Granular Data Scoring and Data Relationships – Detailed granular scores are calculated for each custom data type and the values within are used to determine the right way to automatically blend data sources together into a holistic, harmonized view. Even data sources with hundred of millions of unique values per attribute can be intelligently inferred and automatically scored and matched to enable users to reach fast meaningful insights;
- Simple User Experience: As in all areas of the ClearStory Data solution, ease of use and an intuitive user experience is of the utmost importance. With IDOD, ClearStory extends its data inference and Intelligent Data HarmonizationTM user interface and experience to surface the power of the new, advanced processing engine in a simple, user-friendly way so users can be self-sufficient on even complex data sources.
“ClearStory’s introduction of automated, machine-based advancements in data preparation, discovery and data harmonization continues to build on its Spark-based core IP to process large-scale data at high speeds,” said Dr. Tim Howes, CTO of ClearStory Data. “By adding the advanced IDOD capability to automatically recognize infinite categories, values and granularities in data sources, we speed the cycle of data to insights by addressing a significant pain point that enterprises across all industries face today: the intricate, tedious task and massive time sink caused by manual data wrangling on large, complex data.”
ClearStory Data’s new capabilities are offered as a core part of the ClearStory solution and customers can experience it as part of their standard offering. For more advanced users, the data extensibility feature can also be made available as a premium API-based service.
*Gartner, Inc., “Magic Quadrant for Business Intelligence and Analytics Platforms,” by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich, February 4, 2016.
About ClearStory Data
ClearStory Data is bringing Data Intelligence to everyone to accelerate the way business leaders get answers from more data, on a faster cycle, across any number of disparate data sources. ClearStory Data’s solution simplifies data access to internal and external sources, automates data harmonization via Intelligent Data Harmonization™ across disparate data, enables fast, collaborative exploration, and reduces business wait time for insights via Interactive, Collaborative StoryBoards™. ClearStory Data lets business users be more self-reliant in reaching richer, faster insights. Its end-to-end solution includes an integrated Apache Spark-based data processing platform and an incredibly simple user application model for business consumption of insights. The company is headquartered in Menlo Park, CA and backed by Andreessen Horowitz, DAG Ventures, Google Ventures, Khosla Ventures and Kleiner Perkins Caufield & Byers (KPCB).
Sr. Director, Corporate Marketing