I'm using a custom NanoBSD image with PacketFilter (PF) as my home firewall/gateway and indexing the PF activity into Splunk. Being able to report on and graph the firewall data is valuable Intelligence when planning your next HoneyPot project; among other things.
The write-up that follows isn't specific to NanoBSD and should work with any PF installation (FreeBSD, OpenBSD, pfSense, etc). I should also mention that I originally implemented this with real-time logging by running
tcpdump on pflog0 and piping the output to
logger. For the most part it worked, but Splunk would have random events that were missing the first line of the tcpdump output. Instead of burning cycles trying to determine where the random data was being dropped, I settled on a 5-minute "push" interval of the /var/pflog file to Splunk. I haven't encountered the issue since.
The PF-to-Syslog portion of my implementation is based on the OpenBSD documentation for PF with a couple minor tweaks. Whereas the Splunk portion is completely custom and involved many nights of RegExp wrestling and LOTS of test packets with Nping. As a quick side-note: if you're looking for a utility for writing and testing RegExps on OS X check out RegExRX. Well worth the $4.99.
Let's start with a high-level overview of how a PF event flows from my NanoBSD firewall to Splunk:
- Script on the firewall runs from cron every 5 minutes and sends pflog events to syslog, as well as rotates out the log
- The syslog service on the firewall forwards the events to Splunk via UDP
- The "sourcetype" of the incoming events is defaulted to "pflog"
- Splunk strips off the syslog timestamps (we only care about the timestamp from the packet that generated the PF event)
- We dynamically change the source-type based on the packet info: pflog_<PROTOCOL> (ex. pflog_TCP).
- Field extraction is performed based on the final source-type.
Configuring the Firewall
On your server where PF is running edit the Syslog config file to add the facility that PF events will be logged to, as well as the address of your remote Splunk server. If you are already using the local1 facility for other logs then I'd suggest picking an unused one (local2, local3, etc). I find it minimizes complications when I know the only log traffic on that facility is from PF. The same goes for the listening port on the Splunk server. If UDP port 11514 is already in use adjust the config below to suit your needs but I highly recommend dedicating a UDP port in Splunk to PF logs. Again, I do this for my own sanity.
# Append the following line to the config: local1.info @[SPLUNK_IP_ADDRESS]:11514
Once the config file is updated resart the syslog service.
Next create the /etc/pflogrotate script that will send PF events to syslog. This is the recommended script per the OpenBSD documentation; with a few tweaks. I've modified the tag parameter -t to "pflog" to reduce any ambiguity when regexp'ing strings and changed the priority -p to match the local1 facility I configured in syslog.conf. I've also modified the tcpdump options so the output is verbose -v and includes a unix timestamp -tt instead of the scripts default of time delta between packets.
#!/bin/sh PFLOG=/var/log/pflog FILE=/var/log/pflog5min.$(date "+%Y%m%d%H%M") pkill -ALRM -u root -U root -t - -x pflogd if [ -r $PFLOG ] && [ $(stat -f %z $PFLOG) -gt 24 ]; then mv $PFLOG $FILE pkill -HUP -u root -U root -t - -x pflogd tcpdump -n -e -v -tttt -r $FILE | logger -t pflog -p local1.info rm $FILE fi
Now edit /etc/crontab (or root's cron) to run the pflogrotate script every five minutes:
If you're using crontab -e to edit root's cron you'll need to remove the 'root' column in the example below.
# Send pflog to syslog every 5 mins. 0-59/5 * * * * root /bin/sh /etc/pflogrotate
Now that events are being sent every 5 minutes from the firewall, Splunk needs to be configured to receive the syslog traffic and index the events properly.
A UDP input needs to be created to match the port configured in Syslog on the PF firewall. You can either do this through the GUI if your familiar with the steps, or simply create (or append) the following to the local inputs.conf file:
[udp://11514] connection_host = ip index = main sourcetype = pflog no_appending_timestamp = true
Next we create a Splunk transform to dynamically change an event's "sourcetype" based on the protocol in the packet logged by PF. I chose to implement this way so that I could then create separate RegExps per protocol instead of trying to write a massive "one-size-fits-all". Again, I'm all about maintaing my sanity.
[sourcetype_pflog_by_proto] DEST_KEY = MetaData:Sourcetype REGEX = proto\s(\S+) FORMAT = sourcetype::pflog_$1
Now for the "heavy lifting"... Splunk's props.conf file is where I define Splunks behavior when indexing the PF events. The three most important options are:
- TIME_PREFIX: Defines where Splunk can find the timestamp to apply to the event, we want to use the tcpdump timestamp, not the one in the syslog header.
- SHOULD_LINEMERGE: The 'verbose' option in tcpdump appends a linefeed to its output, so each line is a syslog event to Splunk. This option directs Splunk to merge all lines. (note: tcpdump on OS X provides the -g option to disable the linefeeds)
- BREAK_ONLY_BEFORE: With the previous option enabled we need to define where Splunk should break individual events out from the merged input. My RegExp is simply looking for the "rule #" string that is in the first line of every PF event
- SEDCMD-strip-ts: This invokes a search&replace, based on the RegExp, on every event as its being indexed. I wrote this mostly for cosmetics, it strips the Syslog timestamp from the server.
- TRANSFORMS: This tells Splunk to run the transform (thats defined in transforms.conf) that performs the dynamic "sourcetype" assignment.
The three final sections pflog_TCP, pflog_UDP and pflog_ICMP, each contain the RegExp I wrote for extracting specific fields from each event. The RegExps are based on my reviews of the tcpdump source code; specifically the methods for printing PF headers; and then tested heavily by sending crafted packets at my firewall.
[pflog] pulldown_type = true maxDist = 3 TIME_PREFIX = pflog:\s+ MAX_TIMESTAMP_LOOKAHEAD = 33 SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = rule\s\d+ SEDCMD-strip_ts = s/[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\spflog\:\s//g TRANSFORMS = sourcetype_pflog_by_proto [pflog_TCP] EXTRACT-pflog_TCP = (?i)rule (?P<rule>\d+)(?:\.(?:\w+)?\.\d+)?/\d+\((?P<pf_reason>[\w-]+)\): (?P<pf_action>[\w-]+) (?P<pf_direction>[\w/]+) on (?P<interface>\w+): \(tos (?P<ip_tos>.*?), ttl (?P<ip_ttl>\d+), id (?P<ip_id>\d+), offset (?P<ip_offset>\d+), flags \[(?P<ip_flags>\w+)\], proto (?P<proto>\w+\s\(\d+\)), length (?P<ip_length>\d+)\)\R\s+(?P<src_ip>\d+\.\d+\.\d+\.\d+)\.(?P<src_port>\d+) > (?P<dst_ip>\d+\.\d+\.\d+\.\d+)\.(?P<dst_port>\d+): Flags \[(?P<tcp_flags>\S+)\] [pflog_UDP] EXTRACT-pflog_UDP = (?i)rule (?P<rule>\d+)(?:\.(?:\w+)?\.\d+)?/\d+\((?P<pf_reason>[\w-]+)\): (?P<pf_action>[\w-]+) (?P<pf_direction>[\w/]+) on (?P<interface>\w+): \(tos (?P<ip_tos>.*?), ttl (?P<ip_ttl>\d+), id (?P<ip_id>\d+), offset (?P<ip_offset>\d+), flags \[(?P<ip_flags>\w+)\], proto (?P<proto>\w+\s\(\d+\)), length (?P<ip_length>\d+)\)\R\s+(?P<src_ip>\d+\.\d+\.\d+\.\d+)\.(?P<src_port>\d+) > (?P<dst_ip>\d+\.\d+\.\d+\.\d+)\.(?P<dst_port>\d+) [pflog_ICMP] EXTRACT-pflog_ICMP = (?i)rule (?P<rule>\d+)(?:\.(?:\w+)?\.\d+)?/\d+\((?P<pf_reason>[\w-]+)\): (?P<pf_action>[\w-]+) (?P<pf_direction>[\w/]+) on (?P<interface>\w+): \(tos (?P<ip_tos>.*?), ttl (?P<ip_ttl>\d+), id (?P<ip_id>\d+), offset (?P<ip_offset>\d+), flags \[(?P<ip_flags>\w+)\], proto (?P<proto>\w+\s\(\d+\)), length (?P<ip_length>\d+)\)\R\s+(?P<src_ip>\d+\.\d+\.\d+\.\d+) > (?P<dst_ip>\d+\.\d+\.\d+\.\d+)
Now restart Splunk either through the GUI or from the command line:
I've include a couple screenshots to demonstrate the results below. If you run into any issues replicating this, or discover a packet that my RegExp isn't parsing correctly please leave a comment!