This project lets you stand up a Splunk instance in Docker on a quick and dirty basis.
But what is Splunk? Splunk is a platform for big data collection and analytics. You feed your events from syslog, webserver logs, or application logs into Splunk, and can use queries to extract meaningful insights from that data.
Paste either of these on the command line:
bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-lab/master/go.sh)
bash <(curl -Ls https://bit.ly/splunklab)
...and the script will print up what directory it will ingest logs from, your password, etc. Follow the on-screen
instructions for setting environment variables and you'll be up and running in no time! Whatever logs you had sitting in your logs/
directory will be searchable in Splunk with the search index=main
.
If you want to see neat things you can do in Splunk Lab, check out the Cookbook section.
Also, the script will craete a directory called bin/
with some helper scripts in it. Be sure to check them out!
- https://localhost:8000/ - Default port to log into the local instance. Username is
admin
, password is what was set when starting Splunk Lab. - Splunk Dashboard Examples - Wanna see what you can do with Splunk? Here are some example dashboards.
- App databoards can be stored in the local filesystem (they don't dissappear when the container exits)
- Ingested data can be stored in the local filesystem
- Multiple REST and RSS endpoints "built in" to provide sources of data ingestion
- Integration with REST API Modular Input
- Splunk Machine Learning Toolkit included
/etc/hosts
can be appended to with local ip/hostname entries- Ships with Eventgen to populate your index with fake webserver events for testing.
These are screenshots with actual data from production apps which I built on top of Splunk Lab:
What can you do with Splunk Lab? Here are a few examples of ways you can use Splunk Lab:
- Drop your logs into the
logs/
directory. bash <(curl -Ls https://bit.ly/splunklab)
- Go to https://localhost:8000/
- Ingsted data will be written to
data/
which will persist between runs.
SPLUNK_DATA=no bash <(curl -Ls https://bit.ly/splunklab)
- Note that
data/
will not be written to and launching a new container will causelogs/
to be indexed again.- This will increase ingestion rate on Docker for OS/X, as there are some issues with the filesystem driver in OS/X Docker.
SPLUNK_EVENTGEN=1 bash <(curl -Ls https://bit.ly/splunklab)
- Fake webserver logs will be written every 10 seconds and can be viewed with the query
index=main sourcetype=nginx
. The logs are based on actual HTTP requests which have come into the webserver hosting my blog.
- Edit a local hosts file
ETC_HOSTS=./hosts bash <(curl -Ls https://bit.ly/splunklab)
- This can be used in conjunction with something like Splunk Network Monitor to ping hosts that don't have DNS names, such as your home's webcam. :-)
- Run any of the above with
PRINT_DOCKER_CMD=1
set, and the Docker command line that's used will be written to stdout.
This would normally be done with the script ./bin/devel.sh
when running from the repo,
but if you're running Splunk Lab just with the Docker image, here's how to do it:
docker run -p 8000:8000 -e SPLUNK_PASSWORD=password1 -v $(pwd)/data:/data -v $(pwd)/logs:/logs --name splunk-lab --rm -it -v $(pwd):/mnt -e SPLUNK_DEVEL=1 dmuth1/splunk-lab bash
This is useful mainly if you want to poke around in Splunk Lab while it's running. Note that you
could always just run docker exec splunk-lab bash
instead of doing all of the above. :-)
The following Splunk apps are included in this Docker image:
- REST API Modular Input (requires registration)
- Wordcloud Custom Visualization
- Slack Notification Alert
- Splunk Machine Learning Toolkit
All apps are covered under their own license. Please check the Apps page for more info.
Splunk has its own license. Please abide by it.
I put together this curated list of free sources of data which can be pulled into Splunk via one of the included apps:
- RSS
- REST (you will need to set
$REST_KEY
when starting Splunk Lab)- Non-streaming
- Streaming
Since building Splunk Lab, I have used it as the basis for building other projects:
- SEPTA Stats
- Website with real-time stats on Philadelphia Regional Rail.
- Pulled down over 60 million train data points over 4 years using Splunk.
- Splunk Twint
- Splunk dashboards for Twitter timelines downloaded by Twint. This now a part of the TWINT Project.
- Splunk Yelp Reviews
- This project lets you pull down Yelp reviews for venues and view visualizations and wordclouds of positive/negative reviews in a Splunk dashboard.
- Splunk Glassdoor Reviews
- Similar to Splunk Yelp, this project lets you pull down company reviews from Glassdoor and Splunk them
- Splunk Telegram
- This app lets you run Splunk against messages from Telegram groups and generate graphs and word clouds based on the activity in them.
- Splunk Network Health Check
- Pings 1 or more hosts and graphs the results in Splunk so you can monitor network connectivity over time.
- Splunk Fitbit
- Analyzes data from your Fitbit
- Splunk for AWS S3 Server Access Logs
- App to analyize AWS S3 Access Logs
Here's all of the above, presented as a graph:
A sample app (and instructions on how to use it) are in the
sample-app directory.
Feel free to expand on that app for your own apps.
HTTPS is turned on by default. Passwords such as password
and 12345 are not permitted.
Please, for the love of god, use a strong password if you are deploying this on a public-facing machine.
Yes, you can!
First, install mkcert and then run mkcert -install && mkcert localhost 127.0.0.1 ::1
to generate a local CA and a cert/key combo for localhost.
Then, when you run Splunk Lab, set the environment variables SSL_KEY
and SSL_CERT
and those files will be pulled into Splunk Lab.
Example: SSL_KEY=./localhost.key SSL_CERT=./localhost.pem ./go.sh
TL;DR If you're on a Mac, use OrbStack.
If you're running Docker in Vagrant, or just plain Vagrant, you'll run into issues because Splunk does some low-level stuff with its Vagrant directory that will result in errors in splunkd.log
that look like this:
11-15-2022 01:45:31.042 +0000 ERROR StreamGroup [217 IndexerTPoolWorker-0] - failed to drain remainder total_sz=24 bytes_freed=7977 avg_bytes_per_iv=332 sth=0x7fb586dfdba0: [1668476729, /opt/splunk/var/lib/splunk/_internaldb/db/hot_v1_1, 0x7fb587f7e840] reason=st_sync failed rc=-6 warm_rc=[-35,1]
To work around this, disable sharing of Splunk's data directory by setting SPLUNK_DATA=no
, like this:
SPLUNK_DATA=no SPLUNK_EVENTGEN=yes ./go.sh
By doing this, any data ingested into Spunk will not persist between runs. But to be fair, Splunk Lab is meant for development usage of Splunk, not long-term usage.
Sure does! I built this on a Mac. :-)
For best results, run under OrbStack.
I wrote a series of helper scripts in bin/
to make the process easier:
./bin/download.sh
- Download tarballs of various apps and splits some of them into chunks- If downloading a new version of Splunk, edit
bin/lib.sh
and bump theSPLUNK_VERSION
andSPLUNK_BUILD
variables.
- If downloading a new version of Splunk, edit
./bin/build.sh [ --force ]
- Build the containers.- Note that this downloads packages from an AWS S3 bucket that I created. This bucket is set to "requestor pays", so you'll need to make sure the
aws
CLI app set up. - If you are (re)building Splunk Lab, you'll want to use
--force
.
- Note that this downloads packages from an AWS S3 bucket that I created. This bucket is set to "requestor pays", so you'll need to make sure the
./bin/upload-file-to-s3.sh
- Upload a specific file to S3. For rolling out new versions of apps./bin/devel.sh
- Build and tag the container, then start it with an interactive bash shell.- This is a wrapper for the above-mentioned
go.sh
script. Any environment variables that work there will work here. - To force rebuilding a container during development touch the associated Dockerfile in
docker/
. E.g.touch docker/1-splunk-lab
to rebuild the contents of that container.
- This is a wrapper for the above-mentioned
./bin/push.sh
- Tag and push the container../bin/create-1-million-events.py
- Create 1 million events in the file1-million-events.txt
in the current directory.- If not in
logs/
but reachable from the Docker container, the file can then be oneshotted into Splunk with the following command:/opt/splunk/bin/splunk add oneshot ./1-million-events.txt -index main -sourcetype oneshot-0001
- If not in
./bin/kill.sh
- Kill a runningsplunk-lab
container../bin/attach.sh
- Attach to a runningsplunk-lab
container../bin/clean.sh
- Removelogs/
and/ordata/
directories../bin/tarsplit
- Local copy of my pacakge from https://github.com/dmuth/tarsplit
- Bump version number and build number in
bin/lib.sh
- Run
./bin/build.sh
, use--force
if necessary- This can take several MINUTES, especially if no apps are cached locally
- Run
SPLUNK_EVENTGEN=yes SPLUNK_ML=yes ./bin/devel.sh
- This will build and tag the container, and spawn an interactive shell
- Run
/opt/splunk/bin/splunk version
inside the container to verify the version number
- Go to https://localhost:8000/ and verify you can log into Splunk
- Run the query
index=main earliest=-1d
and verify Eventgen events are coming in - Go to https://localhost:8000/en-US/app/Splunk_ML_Toolkit/contents and verify that the ML Toolkit has been installed.
- Run the query
- Type
exit
in the shell to shut down the server - Run
./bin/push.sh
to deploy the image. This will take awhile.
- Here's the layout of the
cache/
directorycache/
- Where tarballs for Splunk and its apps hang out. These are downloaded whenbin/download.sh
is run for the first time.cache/deploy/
- When creating a specific Docker image, files are copied here so the Dockerfile can ingest them. (Or rather hardlinked to the files in the parent directory.)cache/build/
- 0-byte files are written here when a specific container is built, and on future builds, the age of that file is checked against the Dockerfile. If the Dockerfile is newer, then the container is (re-)built. Otherwise, it is skipped. This shortens a run ofbin/devel.sh
where no containers need to be built from 12 seconds on my 2020 iMac to 0.2 seconds.
I had to struggle with this for awhile, so I'm mostly documenting it here.
When in devel mode, /opt/splunk/etc/apps/splunk-lab/
is mounted to ./splunk-lab-app/
via go.sh
and the entrypoint script inside of the container symlinks local/
to default/
.
This way, any changes that are made to dashboards will be propagated outside of
the container and can be checked in to Git.
When in production mode (e.g. running ./go.sh
directly), no symlink is created,
instead local/
is mounted by whatever $SPLUNK_APP
is pointing to (default is app/
), so that any
changes made by the user will show up on their host, with Splunk Lab's default/
directory being untouched.
- The Docker containers are dmuth1/splunk-lab and dmuth1/splunk-lab-ml. The latter has all of the Machine Learning apps built in to the image. Feel free to extend those for your own projects.
- If I run
./bin/create-test-logfiles.sh 10000
and then start Splunk Lab on a Mac, all of the files will be Indexed without any major issues, but then the CPU will spin, and not from Splunk.- The root cause is that the filesystem code for Docker volume mappings on OS/X's Docker implementation is VERY inefficient in terms of both CPU and memory usage, especially when there are 10,000 files involved. The overhead is just crazy. When reading events from a directory mounted through Docker, I see about 100 events/sec. When the directory is local to the container, I see about 1,000 events/sec, for a 10x difference.
- The HTTPS cert is self-signed with Splunk's own CA. If you're tired of seeing a Certificate Error every time you try connecting to Splunk, you can follow the instructions at https://stackoverflow.com/a/31900210/196073 to allow self-signed certificates for
localhost
in Google Chrome.- Please understand the implications before you do this.
- Splunk N' Box - Splunk N' Box is used to create entire Splunk clusters in Docker. It was the first actual use of Splunk I saw in Docker, and gave me the idea that hey, maybe I could run a stand-alone Splunk instance in Docker for ad-hoc data analysis!
- Splunk, for having such a fantastic product which is also a great example of Operational Excellence!
- Eventgen is a super cool way of generating simulating real data that can be used to generate dashboards for testing and training purposes.
- This text to ASCII art generator, for the logo I used in the script.
- The logo was made over at https://www.freelogodesign.org/
- Lars Wirzenius for a review of this README.
- Splunk is copyright by Splunk, Inc. Please stay within the confines of the 500 MB/day free license when using Splunk Lab, unless you brought your own license along.
- The various apps are copyright by the creators of those apps.
My email is [email protected]. I am also @dmuth on Twitter and Facebook!