A Little Spark to Wildfire

Open Source Project Birthed at U.C. Berkeley Takes Off in the Enterprise This week the fast-growing Apache Spark community is gathering in New York City to celebrate and collaborate on one of the most popular open source projects today. Launched in U.C. Berkeley’s AMPLab in 2009, Apache Spark has begun to catch on like wildfire during the last year and a… Read more »

ClearStory + Spark = Data Exploration Freedom

The release of Spark 1.0 marks a significant step in the move away from MapReduce based big data processing. In-memory. Distributed. Scale out. Machine Learning. 100X faster – on initial benchmarks and our Spark-inside Solution is evidence of the blazing speed. Data Scientists and Data Engineers are rejoicing – and drooling. Although for some users… Read more »

ClearStory and Databricks at the Spark Summit

To follow up on the success of the Spark Summit in San Francisco this past week, I wanted to share a great conversation I had with Reynold Xin who is one of the co-founders of Databricks and the main author behind Shark. Here’s a discussion capturing highlights of our chat at the Summit, in which we discuss how… Read more »

Spark Summit 2013

On the eve of the very first Spark Summit, there is a lot to be excited about. The past few months have been quite eventful for the Spark community – The Apache Software Foundation accepted Spark as an official incubator project and graduated Apache Mesos to a Top-Level Project, Databricks was founded to commercialize Spark… Read more »

A New Analytic Technology Stack for Scalable, Interactive Analysis

I was thrilled to see the public announcement of Databricks last week and Spark taking off with strong support from Andreessen Horowitz. Spark, for those who haven’t heard of it yet, is an open source cluster computing framework that is designed to make data analytics fast and boasts performance numbers 100x faster than traditional MapReduce… Read more »

Big Data: From Batch Processing to Interactive Analysis

‘Big Data’ is either very popular these days, or infamous, depending on who you ask, but it certainly has everyone’s attention. For the most part, it has also become synonymous with Hadoop, and for good reason. Hadoop and its primary programming model, MapReduce, are great for batch-oriented processing of huge amounts of data. With growing… Read more »