Our customers want to know what’s happening now in their businesses. Increasingly, many prefer cloud services to make data consumption easier, faster, broader and more affordable.
Today, along with Google, we’re announcing the first integration between ClearStory Data and Google Cloud Dataflow that’s now generally available. At ClearStory Data, we enable business users to blend multiple large data sources together and analyze them to answer business questions interactively. Google Cloud Dataflow provides a scalable and flexible platform for performing data transformation tasks of any kind. The newly integrated solution accelerates the time for business users to get their questions answered.
One of the toughest challenges faced by customers is the need to transform data either before or after analysis with ClearStory takes place. Google Cloud Dataflow is perfect in this role. By integrating ClearStory Data as a data sink to Google Cloud Dataflow, customers can call on Google Cloud Dataflow’s advanced data transformation capabilities to prepare data prior to analysis with ClearStory. The data sink connection feeds the resulting data seamlessly into ClearStory, where users can blend it together with other sources and perform interactive analysis. By integrating ClearStory Data as a data source to Google Cloud Dataflow, customers can blend and analyze data in ClearStory and perform further processing and transformations in Google Cloud Dataflow by simply exporting the data seamlessly to a Dataflow pipeline.
The combination is powerful. ClearStory’s data inference and harmonization capabilities allow data from multiple sources to be blended together; its easy-to-use user interface lets business users and data stewards alike create insights relevant to the business; and its Spark-based processing engine performs interactively even on big data. Google Cloud Dataflow’s programming model and SDKs simplify the creation of cloud data processing jobs of any complexity; using scalable Google Cloud Platform technologies such as Compute Engine, Cloud Storage, and BigQuery enables Dataflow programs to run on data of virtually any size.
Despite this power, integrating the two platforms was straightforward. Dataflow provides a set of easy-to-use, well-documented SDKs for creating data sources and sinks. ClearStory comes equipped with a set of REST APIs for importing and exporting data and for initiating processing on the ClearStory platform. Add authentication and security to ensure data is protected as it flows back and forth between the systems, and the integration is complete. The Cloud Dataflow sink is connected to ClearStory as in the following diagram.
Figure 1: ClearStory as a Google Cloud Dataflow Sink
The ClearStory data source is connected like so:
Figure 2: ClearStory as a Google Cloud Dataflow Source
Google Cloud Dataflow has everything one could want to write complex data transformation pipelines and to scale them to virtually unlimited data sizes and compute resources. It’s a data scientist’s dream: a powerful engine for developing and running data transformations of any kind. Combining with ClearStory enables easy blending of data with internal and external data sources and the generation of interactive insights by business users who are not programmers. It’s a killer combination, and we look forward to seeing what the world will do with it.
Check out a video of a sample integration scenario here. For more information on the integration, visit https://www.clearstorydata.com/customers-collaborators/google-cloud-platform or https://cloud.google.com/dataflow/.