Data + AI  |  Tim Howes  |  August 12, 2015

Get in the Flow: Data Intelligence for Everyone on Google Cloud Platform

Our customers want to know what’s happening now in their businesses. Increasingly, many prefer cloud services to make data consumption easier, faster, broader and more affordable.

Today, along with Google, we’re announcing the first integration between ClearStory Data and Google Cloud Dataflow that’s now generally available. At ClearStory Data, we enable business users to blend multiple large data sources together and analyze them to answer business questions interactively. Google Cloud Dataflow provides a scalable and flexible platform for performing data transformation tasks of any kind. The newly integrated solution accelerates the time for business users to get their questions answered.

One of the toughest challenges faced by customers is the need to transform data either before or after analysis with ClearStory takes place. Google Cloud Dataflow is perfect in this role. By integrating ClearStory Data as a data sink to Google Cloud Dataflow, customers can call on Google Cloud Dataflow’s advanced data transformation capabilities to prepare data prior to analysis with ClearStory. The data sink connection feeds the resulting data seamlessly into ClearStory, where users can blend it together with other sources and perform interactive analysis. By integrating ClearStory Data as a data source to Google Cloud Dataflow, customers can blend and analyze data in ClearStory and perform further processing and transformations in Google Cloud Dataflow by simply exporting the data seamlessly to a Dataflow pipeline.

The combination is powerful. ClearStory’s data inference and harmonization capabilities allow data from multiple sources to be blended together; its easy-to-use user interface lets business users and data stewards alike create insights relevant to the business; and its Spark-based processing engine performs interactively even on big data. Google Cloud Dataflow’s programming model and SDKs simplify the creation of cloud data processing jobs of any complexity; using scalable Google Cloud Platform technologies such as Compute Engine, Cloud Storage, and BigQuery enables Dataflow programs to run on data of virtually any size.

Despite this power, integrating the two platforms was straightforward. Dataflow provides a set of easy-to-use, well-documented SDKs for creating data sources and sinks. ClearStory comes equipped with a set of REST APIs for importing and exporting data and for initiating processing on the ClearStory platform. Add authentication and security to ensure data is protected as it flows back and forth between the systems, and the integration is complete. The Cloud Dataflow sink is connected to ClearStory as in the following diagram.

CSD-Dataflow-Sink

 

Figure 1: ClearStory as a Google Cloud Dataflow Sink

The ClearStory data source is connected like so:

CSD-Dataflow-source

 

Figure 2: ClearStory as a Google Cloud Dataflow Source

Google Cloud Dataflow has everything one could want to write complex data transformation pipelines and to scale them to virtually unlimited data sizes and compute resources. It’s a data scientist’s dream: a powerful engine for developing and running data transformations of any kind. Combining with ClearStory enables easy blending of data with internal and external data sources and the generation of interactive insights by business users who are not programmers. It’s a killer combination, and we look forward to seeing what the world will do with it.

Check out a video of a sample integration scenario here. For more information on the integration, visit https://www.clearstorydata.com/customers-collaborators/google-cloud-platform or https://cloud.google.com/dataflow/.

Related Blogs

 
Data + AI Tim Howes June 15, 2015
Spark Speeds Towards the Next Data Processing Revolution
A mighty flame followeth a tiny spark. — Dante Alighieri If you know anything about Apache Spark, you know that its chief claim to fame is speed.…
 
Data + AI Vaibhav Nivargi March 19, 2015
A Little Spark to Wildfire
Open Source Project Birthed at U.C. Berkeley Takes Off in the Enterprise This week the fast-growing Apache Spark community is gathering in New York City to celebrate and collaborate…
 
Data + AI Kumar Srivastava October 3, 2014
The 2+2=5 Principle and the Perils of Analytics in a Vacuum
Strategic decision making in enterprises playing in a competitive field requires collaborative information seeking (CIS). Complex situations require analysis that spans multiple sessions with multiple…