Wrangle Your PowerShell Transcript Logs with Apache Nifi

Intro:

You are an enterprise defender who has done the right thing and enabled PowerShell transcript logging. Now you have a whole bunch of flat text files containing PowerShell transcript logs, filled with various application noise. What do you do now?

Enter Apache Nifi: Apache Nifi is a data routing & transformation system and is available to download here: https://nifi.apache.org/download.html - this post will show you how to use Apache Nifi to extract relevant text from PowerShell transcript logs and send them to whatever logging system you have, in my case I will be using Splunk as an example.

Installation: First let's get Apache Nifi installed. You'll need to grab the install file for Nifi as well as OpenJDK

Nifi Download: https://nifi.apache.org/download.html

OpenJDK: https://jdk.java.net/

In my example I will be running this on a Window system, but Nifi supports various OS's and has a Docker image as well.

Extract the contents of OpenJDK into C:\Java, then edit your environment variables to include the bin directory under C:\Java\:

If you've done this step correctly, you should be able to open a new Command Prompt and execute the java command.

Next, extract the contents of your Apache Nifi zip file into whatever directory you want, then navigate to the bin directory and execute 'run-nifi-bat' you should see something similar to the following:

Making a Flow:

At this point Nifi should be running, and you should be able to browse to http://localhost:8080/nifi and see something like this:

The icons near the top left Nifi logo are the various processors, funnels and labels that Apache Nifi uses to do it's data wizardry - you drag the elements onto the 'canvas' - for our purposes we will be making a basic flow using processors only. When a processor is dragged to the canvas, Apache Nifi gives you options as to which processor you want to use. We will be starting with the "GetFile" processor:

Let's take a look at the properties of this processor:

We are telling this processor to look in the PSLogs directory, recursively. The Keep Source file option is set to false here, so Apache Nifi will delete the files when it is done processing them, flip this switch to true if you are just testing your flow, but keep in mind Nifi will continuously process your flow until you stop it.

Next we want to use the Extract Text Processor to perform regular expression matches against the contents of the logs that our GetFile processor read, we hover over the GetFile processor until the arrow icon appears, then drag this to our ExtractText processor:

When you successfully link the processors, a menu will appear asking you switch relationship you want to linked, for the GetFile processor the only relationship type is success, so we select that option:

Next up we want to configure the ExtractText processor to perform certain regular expressions on the text that was just extracted:

The + symbol allows us to enter our own properties, I have added three sets of regular expression matches for basic potentially undesirable PowerShell commands and have enabled DOTALL and Multiline modes.

In the settings tab of the ExtractText processor, I have set it to automatically terminated unmatched flows, and to route matched flows to our next processor, AttributesToJSON:

At this point, we now have Apache Nifi getting the contents of a file, performing regex matches against that file, then converting those matches into JSON format. Now that we have our matching data, we need to send it somewhere. For developing your flow, you may want to use the PutFile processor to just write your flow contents to a file for testing. In my case I have ended this flow with a PutSplunk Processor:

Note if you're using Splunk, you'll need to create an index and listener first, in my case I have a listener set up on port 1234 with a source type of JSON w/o a timestamp.

Since the PutSplunk processor is the last in my flow, I will tell it to terminate both success and failure relationships:

The flow should resemble something like this when you're all done:

Now locate the play button in the "Operate" window in the Nifi interface and watch your flow do it's thing:

Now let's take a look at the data in Splunk:

The field names highlighted correspond to the custom attributes you created in the ExtractText processor (I should have used much better names) and the field values contain the regular expression matches.

The file metadata is also included.

Here's what the data looks like with a little bit of formatting:

Notes:

  • My original use-case for looking at Nifi was to take a bunch of PowerShell transcript logs, do regex on them and then send only matches to Splunk in order to save on quota

  • I am not an expert in Nifi at all, so there's probably more efficient ways to do what I'm trying to accomplish

  • Huge thank you to https://twitter.com/Wietze for helping me work through some Nifi issues

  • I have not yet tested this in a full-blown production environment, but from limited testing Nifi seems to chew through large transcript logs without issue

More on PowerShell logging: https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html https://devblogs.microsoft.com/powershell/powershell-the-blue-team/

Last updated