Splunk vs. Open Source

A long time ago in a galaxy far,far away....

"I'm Luke Skywalker, I'm here to rescue you."

As a Splunk (Consultant|Ninja|Jedi), I often get asked the following question, "Why should I use Splunk when I can use Open Source?"

It is actually a very good question and there are a number of things to consider when comparing the two. Firstly, Splunk is a mature Big Data platform with Enterprise Support, a large install base, and an active community of users who assist newbies and veterans alike via forums, irc, and free apps. Various open source technologies like Hadoop, ELK, Kafka, Spark, Zoomdata, etc have also been around for a while and they also have active communities too. Secondly, it is important to consider the end goal when it comes to any Big Data solution, i.e. what insights do the users want to gain from their data?

Big Data is a catch phrase that all the cool kids are using, and CIO's (on the golf course?) are talking about it and the "data lake". You know what? It's not the size of the data, it's what you do with it. I believe that Big Data refers to the Volume, Velocity, Variety, and Variability(1) of machine data. Splunk and the various Open Source tools can help you search the mountains of data that is important to an organisation, but they manage the data differently to various degrees. i.e. Any Big Data solution should cover the following functions of "The Data" :-

  • Collection
  • Storage
  • Lifecycle Management
  • Securing Access
  • Exploring
  • Correlating
  • Alerting
  • Visualizing
  • Automating
  • Enriching
  • Exporting
  • Defending against the dark side


I have been working with the Splunk product firstly as a customer back in 2007, and now as a consultant for the past few years, and on the level, there is no other product (Enterprise or Open Source) that covers all of the functions above, except for Splunk.

Getting access to the data is the most important step in any Big Data project, but there is the often un-realised opportunity cost of rolling your own Open Source solution vs standing up Splunk quickly and easily to achieve faster time to value. This is key. Some people see that Open Source is free, and the access to the code certainly is, however making multiple Open Source tools do ALL of the functions listed previously would take a team of developers many months to deliver. Splunk can be stood up in just minutes with all of the features and functions ready, and no additional licenses are required.

Solutions like ELK still use schema at write, just like traditional RDBS, where-as Splunk's most liberating feature is that it uses schema at read. Splunk is schema-less. It can harvest data from anywhere, in any format, store it and then make it searchable. And searches allow you to structure the result, so Splunk allows you to impose structure on any and all of your unstructured data!(2)

Splunk can also read data from Hadoop and NoSQL stores using Hunk. You can archive historical data from Splunk Enterprise to Hadoop to spend less on expensive storage area networks, and run federated queries from data in Splunk Enterprise or Splunk Cloud, Hadoop and NoSQL data stores.

You can also use Splunk DB Connect to look up customer data in your enterprise data warehouse or relational databases from IBM DB2, Oracle Database, Microsoft SQL Server, SAP Sybase , Teradata and more. It is also possible to save searches to power dashboards in Tableau, MicroStrategy and other business intelligence tools.(3)

Splunk can be downloaded for free from http://splunk.com and you can trial the full feature set for 60 days, then change to a Free license if your data ingest per day is less than 500 MB. Splunk is licensed on the volume of data ingested per day, and it runs on any flavour of linux, unix, mac or windows. You can run it on virtual or physical servers in your data center, or you can run it in AWS/your IAAS cloud provider of choice, or you can sign up to Splunk Cloud and they will run it for you. Check out http://www.splunk.com/getsplunk/cloudtrial for a free sandbox.

When you buy a Splunk license you get full vendor support, product upgrades are included, and all of their documentation is freely available at http://docs.splunk.com.

Personally, I have written a number of free Splunk Apps:

  • Splunk for Nagios
  • Splunk for Isilon
  • Splunk for Symmetrix
  • Splunk for Postfix
  • Splunk for SAP - collab with Shaun Butler & Jim Cooke

(Combined total of over 10,000 downloads)

There are hundreds of other free Splunk Apps to help you get your data into Splunk, and help you get value out of Splunk. Go to http://apps.splunk.com and be amazed.

Splunk exposes a fully documented REST API and they have published SDK's for every major programming language. Check out http://dev.splunk.com for more information.

K1 has some of the best and brightest talent in the Big Data landscape, and we are happy to do a Splunk 'Proof of Concept' to show the value of the solution to you.

Please use the force/contact me if you have any questions or queries,

Luke Harris. Splunk BMF luke@katana1.com twitter.com/skywalka