ClearStory’s In-Memory, Spark-based Intelligent Data Harmonization™ works with all data source types and enables fast data prep, data blending and context-aware, iterative analysis
MENLO PARK, CA — June 22, 2016 – ClearStory Data, the company bringing business-oriented Data Intelligence to everyone through fast-cycle, disparate data analysis, today announced the United States Patent and Trademark Office has issued Patent 9,372,913for the company’s innovation in automated, inference-based Data Harmonization across internal and external disparate data sources and diverse data types.
The patent includes ClearStory’s deep data inference and semantic recognition of data that enables multiple data sets and dimensions to be automatically converged and harmonized on-the-fly. Its scale-out, in-memory framework eliminates the need for costly, time-consuming data pre-modeling. The result is an extremely fast path from data access to data prep to blended insights, across data sources that are diverse in structure, size and velocity.
ClearStory embeds Apache Spark™ as the native in-memory data processing engine subsumed in this patent, but it’s not restricted to Spark-only, in-memory processing. The patent is associated with one of the core elements of ClearStory’s solution and interconnects data inference, data harmonization and the associated granular logical and physical metadata. It includes the architectural approach for how complex data is distributed in-memory, and data linkages across the data pipeline from inferring results to seeing harmonized results.
The patented innovation was driven by the foresight that organizations are in dire need to ease and speed how data is accessed, inferred, and combined to reach holistic, in-context insights. This core innovation is surfaced in ClearStory’s application via ClearStory’s Data Harmonization user model, Data Stories and Interactive, Collaborative StoryBoards™.
“Traditional BI and data science approaches to accessing and combining a variety of high-volume data sources that are constantly refreshing are no longer viable,” says Tim Howes, CTO of ClearStory Data. “Without a machine-based data harmonization approach like ClearStory’s, costly and time-consuming wrangling and pre-modeling of data often create delays to critical insights or inconsistencies that can materially impact a business. Our Apache Spark-based Data Intelligence platform automates data prep and data harmonization to speed time to insights. It’s context-aware, and enables collaborative iterations as data refreshes.”
Earlier this year, Gartner named ClearStory Data as a visionary in its debut in the 2016 Magic Quadrant for Business Intelligence and Analytics Platforms. For the Completeness of Vision category, the Gartner MQ report notes that: “Smart data preparation on multi-structured data is a core visionary feature in this category, because the need to automatically profile, enrich and infer relationships (to automatically generate a model for analysis) will be an area of innovation that will differentiate vendors in the future.”
The key components of the ClearStory patent claims include:
- Deep data inference on hierarchical dimensions – ClearStory automatically infers data types based on machine-based pattern recognition and understanding of the data accessed. Users don’t need to pre-model the data or specify the definition of time-based, categorical, geographical or other dimensional attributes.
- Harmonization and dimension scoring – ClearStory’s Intelligent Data Harmonization identifies and scores data relationships based on inferred data types, heuristics and semantics in the source data. It creates associated physical and logical metadata that covers context, data details, governance, and each analytical task performed on the data. The system automatically harmonizes and blends data even when distinct data sets consist of different levels of granularity, shape and scale.
- Granular physical and logical metadata on data and actions– The ClearStory system recognizes data shape and size as characterized by the uniformity, burstiness, and sparseness of the data, as well as the volume of the data. Statistical distribution, granularity of data, inferred data types, values, and additional descriptive metadata persist in the system to drive relevancy across diverse data sets and default visualizations.
ClearStory’s newly awarded patent for Inference-based Data Harmonization is one of the company’s five patent applications filed for bringing smart, scalable machine-based capabilities to self-service data analysis.
About ClearStory Data
ClearStory Data is bringing Data Intelligence to everyone to accelerate the way business leaders get answers from more data, on a faster cycle, across any number of disparate data sources. ClearStory Data’s solution simplifies data access to internal and external sources, automates data harmonization via Intelligent Data Harmonization™ across disparate data, enables fast, collaborative exploration, and reduces business wait time for insights via Interactive, Collaborative StoryBoards™. ClearStory Data lets business users be more self-reliant in reaching richer, faster insights. Its end-to-end solution includes an integrated Spark-based data processing platform and an incredibly simple user application model for business consumption of insights. The company is headquartered in Menlo Park, CA and backed by Andreessen Horowitz, DAG Ventures, Google Ventures, Khosla Ventures and Kleiner Perkins Caufield & Byers (KPCB).