Hunt Fast: Splunk and tstats
Intro
One of the aspects of defending enterprises that humbles me the most is scale. Enabling different logging and sending those logs to some kind of centralized SIEM device sounds relatively straight forward at a high-level, but dealing with tens or even hundreds of thousands of endpoints presents us with huge challenges.
In this post I wanted to highlight a feature in Splunk that helps - at least in part - address the challenge of hunting at Scale: Datamodels and tstats.
What are data models?
According to Splunk's documents, Datamodels are:
A data model is a hierarchically structured search-time mapping of semantic knowledge about one or more datasets. It encodes the domain knowledge necessary to build a variety of specialized searches of those datasets. These specialized searches are used by Splunk software to generate reports for Pivot users.
From: https://docs.splunk.com/Documentation/Splunk/8.0.4/Knowledge/Aboutdatamodels
Another, more accessible way that I had this explained to me was that Datamodels take unstructured data and make it structured
What benefit does this give us? Speed!
Accelerated searches using Splunk data models return results almost instantly, even across large data sets.
Setup
I won't go into huge detail about how to set up Splunk and Datamodels, but you will need the following to get started:
A Splunk Enterprise Installation of some kind
A Splunk TA app that sends data to Splunk in a CIM (Common Information Model) format
The Windows and Sysmon Apps both support CIM out of the box
The Splunk CIM app installed on your Splunk instance, configured to accelerate the right indexes where your data lives
In my example I'll be working with Sysmon logs (of course!)
Something to keep in mind is that my CIM acceleration setup is configured to accelerate the index that only has Sysmon logs, if you are accelerating an index that has both Sysmon and other types of logs you may see different results in your environment.
Queries
Let's dive right into some queries using datamodels, for my first example I want to keep it simple and run a query that will show which directories processes are launching from.
This looks a bit different than a traditional stats based Splunk query, but in this case we are selecting the values of "process" from the Endpoint datamodel and we want to group these results by the directory in which the process executed. This gives us results that look like:
The biggest advantage to datamodels is speed, so let's try to set the time further back so we can pull more data and see how long it takes, running the same search for a period of "All Time" in my small test instance produced the following results:
Super. Fast.
By Default the CIM App has field mappings for certain Sysmon events but not every single field is mapped to a data model. If we wanted to look at ImageLoad events for example, we would need to create our own datamodel under the existing Endpoint data model
Now with our new data model we can run a query like:
And get results that look like:
This gives us a nice view of what DLLs were loaded by particular processes, with results returned to us almost instantly across large data sets - cool!
Let's continue and look at Sysmon ProcessAccess events, I followed the same procedure as outlined above and made a new datamodel for the ProcessAccess events and now I can do something like:
Which produces this result:
A really useful view for quickly grouping your Sysmon ProcessAccess events by the AccessGranted field, useful if you're trying to quickly hunt on anomalous GrantedAccess values.
If we wanted to quickly narrow down the above search to show us what is injecting into lsass we can do so with the following:
Which produces:
We can also apply the same idea to quickly hunt for anomolous values in the CallTrace, with a query that looks something like:
Which gives us the following:
This gives us a pretty clear picture of what processes are injecting where and what the corresponding call trace's look like, we can expand on this concept a bit further with something like:
Which produces the following results:
This query is splitting the CallTrace field up via it's "|" delimiter and then translating a line of the Call Trace to a human-readable format, this should make it easier to spot weird values in the Call Trace field.
tstats and Dashboards
I'm a bit of a rebel and like to use Splunk dashboards not just for visualizations, but to give myself a quasi hunting GUI, putting together some of the queries we went over above, we can build out a simple dashboard that looks like:
The idea here is that I can just enter a process name that I'm interested in quickly, and then that token is passed to three separate searches, all using datamodels on the back-end for almost instant results.
Let's add another search to our dashboard that glues ProcessCreate and NetworkConnect events together.
We can start with the following tstats query:
This gives us results which look something like:
I love viewing events in this manner as it gives me a really clear picture of what happened on the host: PowerShell was launched via Explorer, in the C:\Users\Administrator directory and then made some network connections. This query however, specifies a GUID, what if we wanted to select one from an existing list?
Let's add a dropdown to our Dashboard that pulls Process and GUID information from Sysmon Event ID 1 events:
We select the Dropdown option, give our field a label, add a token which will be passed to our search and then set the label for the field to be a process name, but the value of the field to be a Process GUID to be passed onto the search:
Now we have a dropdown in our dashboard that gives us a list of processes based on their GUID (so if you launched PowerShell a bunch of times you'll see multiple instances of PowerShell available for selection)
Now we select the process we're interested in and view our results:
The code for my entire dashboard is here (Note: You may need to tweak some your data models to make the searches work in your environment)
Some Notes
I am not an SPL master, so there's probably cooler / more efficient ways to re-work the queries I highlighted above
My datamodel configuration is unique to my test environment, a full-blown production environment will require some engineering elbow grease to get the data that's in Splunk aligned with CIM
Although I used Sysmon for my examples, these concepts are also applicable to normal Windows logs or any type of logs that exist in Splunk with fields that fit into a data model
References
Last updated