If open source search technology has a father, his name is Doug Cutting.
Cutting co-created Hadoop (as well as other prominent big data technologies), and now much of his work lives, breathes, and changes the world over at the Apache Foundation. The Apache Incubator is the vehicle that delivers such powerful and innovative big data technologies as Lucene, Kudu, Hive, Pig, Beam, Cassandra, Flink, Impala, Kafka, and many others.
Aside from being a laundry list of the most popular big data technologies and open source software out there, this list has another thing in common: Those are some strange names for software. What is behind the odd, yet somehow ideally suitable, Apache project names?
Cutting has been very forthcoming about how Hadoop was named. Hadoop was the name of his child’s yellow toy elephant, and is now the superstar logo yellow elephant that has graced the covers of many magazines, blog posts, and trade journals. Hadoop is easy to remember, easy to spell, easy to say, short, and meaningless in any other context. When someone says, “Hadoop,” you immediately know you’re talking about big data.
Kafka, the distributed messaging system, was actually named by project co-creator Jay Kreps. He studied German writer Franz Kafka in college, liked the guy’s work, and though it was a cool name for a software project. There’s been much speculation about a deeper meaning. Writer Franz Kafka was most notable for writing about situations described as Kafkaesque, which means a situation involving an isolated protagonist challenged with some bizarre or surreal predicament. Kreps maintains there is no such connection — he just liked the name and it stuck when the project migrated from its inception at LinkedIn to the loving and nurturing Apache Incubator.
Originally conceptualized by Cloudera, Kudu, the open-source storage engine, has racked up numerous headliners for contributors, including Intel, Xiaomi, Dropbox, and Dremio. Now nurtured by the Apache Foundation, the name Kudu actually comes from a species of African antelope. Noted for their strange antlers, the kudu’s horns are often used in Jewish religious ceremonies.
“Flink” is a German word that means “swift” or “agile.” Like Spark, Flink is an open-source stream and batch processing platform, and is making noise in the big data community with speeds that make some of the more popular products seem comparatively slothful (even the zippy ones like Spark). Flink is a real-time streaming engine, while Spark is actually a micro batching engine that happens to go pretty fast. The project was originally named Stratosphere, but that name had some copyrighting issues. The Apache Foundation has a rather strict naming protocol that eliminates any name with potential trademark or copyright conflicts. The original concept for Stratosphere/Flink came from German developer Volker Markl, which is how the project was endowed with a German name.
But there’s more to these beauties than just a name. These Apache projects are making tremendous impacts across the landscape of big data. Want to learn more? Follow us on Twitter!