One of the most active and fastest growing open source big data cluster computing projects is Apache Spark, which was originally developed at U.C. Berkeley's AMPLab and is now used by internet giants and other companies around the world. Including, as announced most recently, IBM.
In this Q&A with Spark inventor Matei Zaharia -- also the CTO and co-founder of Databricks (and a professor at MIT) -- on the heels of the recent Spark Summit, we cover the difference between Hadoop MapReduce and Spark; what are the ingredients of a successful open source project; and the story of how Spark almost helped a friend win a million dollars.