Splunk Metrics FTW

Splunk 7 was announced at .Conf 2017 and one of its shiny new features is improved "Metrics". Metrics are sets of numbers describing a particular process or activity, measured over time. Some common examples of metrics that you may be familiar with are: time series data; system metrics such as CPU, memory or disk; infrastructure metrics such as AWS CloudWatch; and IoT devices (temperature readings).

Splunk has always been great at reporting on logs and metrics, and now metrics are up to 200x faster!

I was reading through the documentation on ingesting metrics into Splunk and I noticed that the example was for Collectd.

If you already know what Collectd is and you are awesome, download our new Splunk app called "Analytics for Linux"





For those who are relatively new to the game, Collectd is a very lightweight daemon (written in C for performance and portability) that "gathers metrics from various sources, e.g. the operating system, applications, logfiles and external devices, and stores this information or makes it available over the network."

Collectd includes optimizations and features to handle hundreds of thousands of metrics. The daemon comes with over 100 plugins, and it is open source software developed and actively maintained by a current Google SRE: https://collectd.org/

Splunk Apps and Add-ons

First, a brief history on the old way of collecting Linux performance metrics using the two officially supported Splunk add-ons:

1/ Splunk Add-on for Unix and Linux includes a collection of monitors and scripts. The Add-on populates the Splunk App for Unix and Linux.

2/ Splunk Add-on for Linux includes support for collectd, and it includes prebuilt panels for adding reports to dashboards.

There is also a Splunk app called Collectd App for Splunk Enterprise which relies on inputs from the Splunk Add-on for Linux.

Unfortunately, the add-ons & apps above do not support the native metrics store introduced in Splunk 7, so I thought to myself, "here is an opportunity!"

Introducing Analytics for Linux

Hence, I have developed and released a new Splunk app called "Analytics for Linux" which is available for free on splunkbase, and all of the dashboards utilize the "mstats" command to query the Metrics Store which enables massive performance improvements, i.e. up to 200x faster queries.

The dashboards in Analytics for Linux include graphs of CPU, Memory, Swap, Load, Disk Usage, Network Interface Utilization, and Processes. plus nginx & Apache web servers.




Example Dashboards for Server Metrics







Example Dashboards for Web Servers



Metrics Exploration Dashboards

I have also included three dashboards for discovering metrics in your Splunk environment. You can use these dashboards to discover and chart any type of metrics ingested into Splunk, i.e. not limited to Collectd metrics.

1/ Metrics Comparison with Horizon Chart custom visualization -> compare two or more hosts by metric

2/ Metrics Explorer (inspired by the Metrics Explorer for Splunk app) -> includes split by host

3/ Metrics Navigator - dynamically display multiple charts of metrics -> includes the Horseshoe Meter custom visualization, ideal for percentage metrics

How to Query the Metrics Store

The most important change to querying metrics in Splunk is the use of the "mstats" command, i.e. you cannot search metrics data for individual metric events, and you cannot perform search-time extractions.

Although mstats is a new command, it is very similar in syntax to the existing tstats command.

Here are a couple of example mstats queries :-

CPU Usage by Host over time (Top 15):

| mstats avg(_value) AS Idle WHERE metric_name="cpu.percent.idle.value" span=10s by host
| eval Used=100-Idle
| timechart span=10s avg(Used) as "Avg" by host useother=f limit=15

CPU Used (Average & Peak) by Host (Top 15):

| mstats avg(_value) AS "Average" max(_value) AS "Maximum" WHERE metric_name="cpu.percent.idle.value" by host
| eval Average=100-Average
| eval Average=round(Average ,2) . " %", Maximum =round(Maximum ,2) . " %"
| sort 15 -Maximum
| table host Average Maximum

To enumerate metric names, dimensions, and values, use the mcatalog command, e.g.

| mcatalog values(metric_name)
| mcatalog values(metric_name) by host
| mcatalog values(metric_name) by plugin_instance
| mcatalog values(plugin_instance) by metric_name

Ingest Collectd Metrics into Splunk

The beauty of "Analytics for Linux" is that it doesn't require a Splunk Technology Add-on (TA) for ingestion of metrics.

Just use the Splunk built-in sourcetype for Collectd called "collectd_http" and configure your collectd agents to send to a Splunk HTTP Event Collector.

Configure the Splunk HTTP Event Collector

Use the following instructions to configure Splunk to ingest metrics from Collectd into a new metric index via the HTTP Event Collector (HEC):

Create a new 'metric' index :- e.g. collectd:

indexes.conf :-

 datatype = metric
 homePath = $SPLUNK_DB/collectd/db
 coldPath = $SPLUNK_DB/collectd/colddb
 thawedPath = $SPLUNK_DB/collectd/thaweddb

Add a new HTTP Event Collector token :- e.g. replace hec_token below:

inputs.conf :-

 source=collectd token

Note: you can generate a HEC token from the CLI :-

 # /opt/splunk/bin/splunk http-event-collector create new-token -uri https://localhost:8089 -name "Collectd" -disabled 0 -index collectd -sourcetype collectd_http -source "collectd token"

Install and Configure Collectd

There is an example collectd configuration file in the Splunk app :- $SPLUNK_HOME/etc/apps/sh_collectd/examples/collectd.conf

Note: You must replace splunk_server & hec_token in the Node definition, e.g.

URL "https://splunk_server:8088/services/collector/raw"
Header "Authorization: Splunk hec_token"

Here are some quick primers on getting collectd up and running on your Linux instances :-

Ubuntu 16.04:

 # echo "deb http://pkg.ci.collectd.org/deb xenial collectd-5.8" > /etc/apt/sources.list.d/collectd.list
 # curl http://pkg.ci.collectd.org/pubkey.asc | apt-key add -
 # apt-get update
 # apt-get install collectd collectd-core collectd-utils libcollectdclient1
 Update /etc/collectd/collectd.conf
 # /etc/init.d/collectd start

Amazon Linux:

 # yum install collectd collectd-disk collectd-netlink collectd-write_http collectd-apache collectd-nginx
 Update /etc/collectd.conf
 # /etc/init.d/collectd start

CentOS 7:

# yum install epel-release
# yum install collectd collectd-netlink collectd-apache collectd-nginx
Update /etc/collectd.conf
# /etc/init.d/collectd start

Note: collectd version 5.6 or higher is required.

Install Analytics for Linux

Download and install Analytics for Linux on your Search Head (the app is not required on Indexers or Heavy Forwarders) then check out the various dashboards in the app.

Now that you know that Metrics in Splunk are awesome, please feel free to contact us with your feedback and suggestions for improvement :)

Update 23/11/2017: Analytics for Linux is now a Splunk Certified app!

Use the force…

Luke @skywalka Splunk BMF


SplunkLuke Harris3 Comments