Web Admin Blog Real Web Admins. Real World Experience.

Splunk Best Practices

I've been working with Splunk a LOT lately. Over the past few months I've changed our Splunk configurations over and over again as I find out new and better ways to do things. I decided to put together this "Web Admin's Guide to Splunk Best Practices" for those of you who are either considering implementing Splunk or who have already implemented Splunk and are having issues getting it to do what you need it to. Hope it helps!

Configuration

1. Indexes

  • Write your log data to the main index. This is the default index that all inputs will write to so no extra configration should be necessary to do this.
  • Create a new index for your configuration files. Originally I was writing these out to the main index as well, but they started getting deleted out of the index as it grew to it's max size and/or max time. By writing configuration files out to a separate index, I am able to keep these files around for as long as I need to without worrying about them eventually falling off because of new logs coming in.
  • If you want to create dashboards which display lists of things like source names, source types, etc, they load a lot faster if you do some pre-processing and load that information into a new index. For example, our developers write all logs to /opt/apps/logs/appname directories and I wanted a dashboard that displayed a list of all of the appnames. I wrote a script which Splunk calls as a scripted input that does a find on the /opt/apps/logs filesystem and indexes the path to each file. Then, I'm able to use regular expressions to pull out the application names and display that list as a dashboard much faster than querying the main index for that information.

2. Bundles

  • Most of the Splunk documentation tells you to add your configuration files to the local bundle since that bundle won't get overwritten when Splunk is upgraded. My recommendation is to use the local bundle for server specific configurations like the outputs.conf and create your own custom bundles for all other configurations (inputs.conf, props.conf, transforms.conf, etc) that similar servers might share. This allows me to use the same bundle for say all of my Apache web servers regardless of environment (dev, test, or prod), but use the outputs.conf in the local bundle to send it to the right indexing server (one for each environment).
  • Create new bundles for each different type of server. You should have different bundles for your web servers, application servers, database servers, etc. The general rule of thumb should be to use the same bundle if they can use the same (or very similar) inputs.conf files.

3. Deployment Server

  • If you did what I suggested earlier and created bundles for each different type of server, then the transition to using a deployment server to update your bundles remotely should be a piece of cake. Just set up your deployment.conf file on the forwarding servers and on the indexing server and then drop your bundles in the $SPLUNK_HOME/etc/modules/distributedDeployment/classes directory. Now, you modify your bundles in a single location for each environment and all of the servers are updated. It takes a little bit of extra time to set this up, but it will save you tons of time if you are constantly tweaking your configuration bundles like I do.

4. Inputs

  • We were intially seeing about 40 GB/day worth of logs on servers where I couldn't find 40 GB total worth of log files on the entire system. Eventually, we tracked the issue down to how Splunk classifies XML files by default. These files automatically get categorized as "xml_file" sourcetype. Splunk thinks of this sourcetype as a configuration file where the entire file should get re-indexed if any part of it changes. The problem comes whenever you do logging in an XML format. With each new log (XML event) written out to the file, Splunk re-indexes the entire file. So as you log more and more, the file gets bigger and bigger, and Splunk's per-day useage skyrockets. To avoid this, just make sure that any XML logs get classified as something other than "xml_file".
  • Splunk tells when a log file has been changed by keeping a checksum on the head and tail of the log files it's monitoring. This allows it to be able to tell if a file has been renamed by something like a logrotate script so it won't just re-index it as a new file. Unfortunately, Splunk uses zcat to evaluate the contents of compressed files so it is not able to compare checksums in the event that a file is compressed as part of the logrotate process. It indexes these compressed files as though they were brand new files (even though the contents were already indexed before compression). While not nearly as impactful as the "xml_file" issue above, this will still double your per-day useage for these files. My best suggestion here is to blacklist all compressed files (.gz, .zip, etc) and do a batch import on them when you begin indexing with Splunk for the first time. This will give you the contents of the old log files without indexing the new ones twice.
  • If you are using any scripted inputs, do not place the scripts or any files created by the script inside of a bundle being deployed using the Deployment Server. This is because the bundles deployed using the Deployment Server are stored in a tar archive format. Splunk is able to read this format for the configurations, but as far as I can tell doesn't actually extract the bundle from the tar archive. So while your scripts would get deployed, you won't really be able to reference them in the inputs.conf. Instead, place the scripts and files outputted by them in either the local bundle on each server or create a new bundle for them that is not deployed with the Deployment Server. You can still use an inputs.conf inside of a deployed bundle to call the scripted input since you now know the exact path to the script.

5. Licensing

  • Splunk provides a "Licensing" tab in the admin section where you can view your license and see your daily use. This helps you to figure out where your usage is at in relation to your license, but doesn't do a thing to help you evaluate where that useage is coming from. I created a custom dashboard I titled "Splunk License Usage" that displays the total usage, usage by host, usage by source, and usage by source type over the past 24 hours. This allows me to track down my biggest loggers to figure out what is really necessary and it's easier to figure out when there are problems. You can download it off of SplunkBase here.
  • If you're logging with log4j, you probably know that you can set logging level to debug, info, warn, and error. Make sure you set expectations with your developers to only log error messages in production. The rest of those log levels are probably fine in dev or test, but constitute excessive logging for prod and eat up valuable disk space, cpu cycles, etc.

Troubleshooting

  • Make sure to check the dates on your various types of files to make sure that the date in the file corresponds correctly with the date Splunk gives it.  Especially do a search for all dates in the future to see if you turn up results for dates that haven't happened yet.  If this happens, you will need to specify time formats for those files in the props.conf.

That's it for now, but this will be a living document that I plan on updating as new "best practices" are realized. Please feel free to leave comments or add suggestions. Thanks! - Josh

Comments (20) Trackbacks (0)
  1. Hi Josh

    Great article and many thanks for taking the time to write it, and of course for your Licensing app too. But do you know how to import it into v4.09? Or have you an updated version already.

    That would really help me as I’m trying to locate where all my usage is coming from.

    Many thanks
    Lea

  2. Lea,

    Thanks for checking that out. That’s definitely old (3.x version) and some things like Deployment server are much nicer in the new version of Splunk. Perhaps someday I’ll get around to revising my best practices. I have actually added two new bundles to Splunkbase including the Splunk License Usage bundle modified for 4.x versions of Splunk.

    Splunk License Usage
    This bundle provides a new dashboard which has several widgets that query to help you determine your Splunk license usage total over the past 24 hours as well as usage by host, source, and sourcetype. It contains timecharts to help you understand usage over time and see usage spikes as well as pie charts to help you to figure out which log files, sourcetypes, and hosts Splunk is indexing the most data from.
    http://www.splunkbase.com/apps/Splunk+License+Usage

    Splunk Monitoring
    The Splunk Monitoring application can be used to monitor your Splunk forwarding nodes from your indexing node using an nmap query script. It creates a new “splunk_monitoring” index and has a single dashboard that displays the overall number of servers that are UP or DOWN as well as the status of each individual server. To use the Splunk Monitoring application, extract the files into your $SPLUNK_HOME/etc/apps directory. The actual monitoring script uses nmap so make sure you have it installed on your indexing node. Edit the $SPLUNK_HOME/etc/apps/splunk_monitoring/local/tags.conf file to include a list of your servers (the actual tag doesn’t matter) or edit the $SPLUNK_HOME/etc/apps/splunk_monitoring/bin/splunk_port_monitor.sh script to point to a different location for the tag_file variable. You will also want to edit that file if you run Splunk on a port other than 8089 or if your nmap executable is located in a location other than /usr/bin/nmap.
    http://www.splunkbase.com/apps/Splunk+Monitoring

    Enjoy!

  3. Hi Josh!

    A very helpful article. I’m in the process of setting up splunk and was looking at you license usage script for version 3.x. On Splunkbase, however, the script is still only for version 3.x. At least I can’t find it and the link you provide above takes me to the 3.x version of it.
    Have to take a look at the monitoring after the weekend.
    cheers,
    madsen

  4. Madsen/Lea,

    I see exctly what you’re talking about with the 4.x version of my Splunk License Usage app not showing up. It worked fine while I was logged in to Splunkbase, but not that I’m not logged in anymore, it just shows the 3.x version of the app. I’ve contacted Emma Dannin and Caleb Poterbin at Splunk support as they helped me get my app on the new Splunkbase. I will update you once it has been made available. Thanks!

  5. Alright, I think we’ve got the issues figured out with SplunkBase and you guys can download the new 4.x version of my Splunk License Usage application here:

    http://www.splunkbase.com/apps/All/4.x/App/app:Splunk+License+Usage

  6. Josh,

    Thank you for taking the time to write this article and sharing your experiences. I have a similar setup to your with all app logs being written to /logs/webapps/* . Can you share how you configured your scripted input with your setup? Would it be possible to post the script that you use for this?

    thanks so much

  7. Andy,

    We use the Splunk deployment server so I created a deployment app under “$SPLUNK_HOME/etc/deployment-apps/mybundle/default”. It contains a fields.conf file with the values:

    [niappname]
    INDEXED = true

    [nilogname]
    INDEXED = true

    An indexes.conf file with the values:

    [ni_dashboard]
    homePath = $SPLUNK_DB/ni_dashboard/db
    coldPath = $SPLUNK_DB/ni_dashboard/colddb
    thawedPath = $SPLUNK_DB/ni_dashboard/thaweddb

    A inputs.conf with the values:

    # Scripted input to do a find on /opt/apps/logs
    [script://$SPLUNK_HOME/etc/apps/mybundle/bin/apploglist.sh]
    interval = 3600
    sourcetype = dashboard_app_log_list
    source = ni_dashboard
    index = ni_dashboard
    disabled = false

    A props.conf file with the values:

    # Handler for apploglist.sh “dashboard_app_log_list” sourcetype
    [dashboard_app_log_list]
    BREAK_ONLY_BEFORE = ^
    TRANSFORMS-applist = apps,logs
    AUTO_LINEMERGE = True
    SHOULD_LINEMERGE = False

    The most important part is to set up your transforms.conf with the proper regex like this:

    # Add “niappname” field for apploglist.sh “oas10g_app_log_list” sourcetype
    [apps]
    REGEX = /opt/apps/logs/(\w+)/
    FORMAT = niappname::$1
    DEST_KEY = niappname
    WRITE_META = True

    # Add “nilogname” field for loglist.sh “oas10g_app_log_list” sourcetype
    [logs]
    REGEX = /opt/apps/logs/(\w+)/(\w+)\.|/opt/apps/logs/(\w+)/_apps_utf8/(\w+)\.
    FORMAT = nilogname::$2
    DEST_KEY = nilogname
    WRITE_META = True

    Then you still need to create your script. I put mine under “$SPLUNK_HOME/etc/deployment-apps/mybundle/bin” so it will get deployed with the deployment server. Here is my script:

    #/bin/bash

    # VARIABLES
    LOG_DIR=”/opt/apps/logs”;
    DST_DIR=”$SPLUNK_HOME/etc/apps/mybundle/log”;

    # Hostname
    HOSTNAME=`uname -n`;

    # Check if DST_DIR exists and if not create it
    if [ ! -d ${DST_DIR} ] ; then
    mkdir ${DST_DIR};
    fi

    # Check if log file exists and if not create it
    if [ ! -f ${DST_DIR}/${HOSTNAME}_apploglist.log ] ; then
    touch ${DST_DIR}/${HOSTNAME}_apploglist.log;
    fi

    # For each application directory
    for i in `find $LOG_DIR -type f`
    do
    # If that directory has not already been indexed
    if [ “`grep $i ${DST_DIR}/${HOSTNAME}_apploglist.log`” == “” ] ; then
    # Add it to the log file
    echo $i >> ${DST_DIR}/${HOSTNAME}_apploglist.log;
    # Output it to stdout
    echo $i;
    fi
    done

    That should do it!

  8. I have installed the License Usage app. After the install, I get prompted “The License Usage app has not been fully configured”. Clicking “Continue to app setup page” results in the error “KeyErrir:’elements’.

    We are using Splunk v4.2.1-98164.

  9. Erik, I have to say that I’ve never run into that. That said, I’ve never actually installed the app from SplunkBase either. You may want to try downloading the files and manually placing them in your apps directory. The .spl files are really just .tar.gz formatted so running “tar xvzf” should properly decompress them. Otherwise, I’d say that Splunk support may be able to help you further.

  10. splunk is consuming 50% of CPU when running on Windows 2008 R2 with 1 x vCPU and 6 GB of RAM. This is on a Windows server running with light forwarder mode on and sending the index to the main indexer.

    We’ve even pared down the number of logs down to just a couple and the CPU is still at 50%; it seems way to high and I don’t know how to fix this. Have you encountered this before?

  11. I downloaded and installed your Splunk Monitoring application. When I bring-up the app under the apps menu from within splunk I see the following error:

    “This view has a Splunk.Module.HiddenSearch module but it is configured with no child modules to push its changes to. This represents a configuration error.”

    There is someone who posted on splunk answers the same error, and someone wrote back and said to comment-out line 24 and to add line 78. But that totally does not work and causes yet another error (I saw this at http://splunk-base.staging.splunk.com/answers/5672/error-on-dashboard-splunk-agent-monitoring-app ).

    I have the monitoring script working okay, so it is the $SPLUNK_HOME/etc/apps/splunk_monitoring/default/data/ui/views/dashboard.xml that is apparently the problem. Suggestions?

  12. Garret, I’m not sure what to tell you. I wrote the app for an older version of Splunk and that dashboard module worked just fine. It’s possible that Splunk changed this in a newer version and it no longer works, but I’ve tested it on our version 4.2.2, build 101277 server and it seems to be working without issue. Have you tried talking to Splunk support?

  13. Great Explination. I would like to know about how you create index for remote files. I am using splunk forwarder to send some log files to indexed in splunk indexer. but its no matter what indexed as default main.

  14. It’s been a long time since I was using Splunk to do this, but I think this documentation has the answer you are seeking chandi.

    http://docs.splunk.com/Documentation/Splunk/6.0.2/Forwarding/Routeandfilterdatad

  15. Thanks a lot Josh for your response. I really appreciate it.
    I have looked at the document. looking at the document, I have specified the files that i wanted to monitor in inputs.conf file. I have install splunk forwarder in some lab machine and have done changes in following way

    —————————inputs.conf————————————————
    [default]
    host = A5xxxxxx

    [script://$SPLUNK_HOME\bin\scripts\splunk-wmi.path]
    disabled = 0

    [monitor:///C:\Users\rq113d\Desktop\IVTUpdateLog]
    disabled = false
    followTail = 0
    index = main
    sourcetype = IVTUpdateLog
    host = A5370950

    [monitor:///C:\Users\rq113d\Desktop\indexIVTMAIN]
    disabled = false
    followTail = 0
    index = ivtumain
    sourcetype = test
    host = A5370950
    —————————END OF inputs.conf———————————————-

    I have set up port number 9991 fine in outputs.conf.

    ———————————outputs.conf———————————————-
    [tcpout]
    defaultGroup = default-autolb-group

    [tcpout:default-autolb-group]
    server = AXXXXXXX:9991

    [tcpout-server://AXXXXXXX:9991]

    —————————END OF outputs.conf———————————————-
    the problem is I can see all information for IVTUpdateLog files since I have set default index main. but i dont see any information for indexIVTMAIN file because of using index called “ivtumain”.

    I have created index ivtumain in splunk indexer but not sure if i have done it right. On splunk indexer : i have default setting for new index “ivtumain” for local path and home path. (Settings> indexes > new –> ivtumain )

    do you have any suggestions on that?
    Any help would be appreciated.

  16. How about Search Head Pooling and NFS performance tips?

    I´m in charge of deploying a 1.76Tb/day and we are facing many problems with SHP.

    Hope 6.2 ends with it and offer a better option.

  17. Splunk 6.2 now offers Search Head Clustering and Search Head Pooling via NFS is deprecated. Looks like a real win:

    http://docs.splunk.com/Documentation/Splunk/6.2.0/DistSearch/AboutSHC

  18. Hi Team,
    I am new to splunk, may i know how to get the old log files.

    thanks in advance

  19. Has anyone had an issues with the App for Unix and being able to see the Home and Metric data. When I connect to these panes I just get spinning cursors. No data populates. I have proper access to this data. I have tried browsers ie, chrome and firefox. The display is slightly different for each browser but no data is displayed

    Any help would be appreciated.

  20. To Don- I had similar experience with splunk unix app running on clients but not forwarding to splunk indexer. To resolve it, try to restart splunk on each client, After the main Splunk pages indicates that the Splunk_TA_nix bundle has been downloaded to the clients. That worked for me.

    # on each client
    $ splunk restart


Leave a comment

No trackbacks yet.