Passively OS Fingerprinting Email with PF 2
Ever want more fodder for those Spam Assassin rules or Bayes statistics? The normal assumption is that our spam and viruses come from networks of infected zombie Windows machines, but is that really true? With passive OS fingerprinting, you can answer this question instantly and with minimal resource usage.
Tools
For this exercise, we will be using FreeBSD 5.4, PF, Exim, and some basic shell scripting.
- Why FreeBSD and not OpenBSD? Jails, but that’s another article entirely.
- What’s so special about PF? PF just happens to have integrated passive OS fingerprinting so a simple
keep staterule for each fingerprint allows us to usepfctlto see which source IP matches which fingerprint. - What about (Postfix|Sendmail|Qmail|Exchange)? This will work with those too, but we happen to like Exim. Yes, you can do it with Exchange too, but again that’s another article.
Proof of Concept
Phase One – Get PF Fingerprinting
In order to use PF on FreeBSD 5.4, the pf kernel module must first be loaded and pf enabled:
server# kldload pf server# pfctl -e No ALTQ support in kernel ALTQ related functions disabled pf enabled server#You can disregard anything about ALTQ, that’s another article. Now that pf is loaded and enabled, we need some actual rules in order to do the matching. First let’s create a minimal
/etc/pf.conf:
ext_if="bge0" set fingerprints "/etc/pf.os"
(Note: You’ll want to change bge0 to your actual ethernet interface found by running ifconfig -a.)
/sbin/pfctl -qso | sed 's,[[:space:]], ,g' | \
egrep -v '^(-----|Class)' | \
sed 's,\(.*\)[[:space:]],pass in quick proto tcp from any os "\1" to \$ext_if keep state label "\1",' | \
tail -r >> /etc/pf.conf
Now that pf.conf has rules, they can be loaded with pfctl -f /etc/pf.conf and we can start to see some of the information we need (to reload pf.conf on reboot, consult the FreeBSD Handbook):
pfctl -vvslshows the traffic and packet count per rulepfctl -vvsrshows lists the actual rule and current state matches (contains the rule number which is part of the formula)pfctl -vvssshows the current states tracked (the other part of the formula)
Phase Two – Fingerprint Fetching Script
Information is nice but we still need access to it from inside the MTA. For that we will create a shell script (called get_os.sh in my case, and don’t forget chmod +x) that returns the OS or ‘Unknown’:
#!/bin/sh
tempvar=`pfctl -qvss | grep -A 2 $1 | egrep -m 1 -o 'rule[[:space:]][[:digit:]]*' | sed 's,rule ,,'`
if [ -z $tempvar ]; then
echo "Unknown";
exit;
else
pfctl -qvvsr | egrep -m 1 "^@$tempvar" | egrep -m 1 -o '"[^"]*"' | uniq | sed 's,",,g'
fi
Go on, try it out. SSH back into the server from somewhere (to reset the state in PF) and try it with your source IP (replacing the x’s with your IP of course):
/usr/local/etc/exim/get_os.sh xxx.xxx.xxx.xxx
Last thing to note, the permissions on /dev/pf will need to be changed from 600 to 644 because this script will run as mailnull:mail which has permission to do nearly nothing:
chmod 644 /dev/pf
(Note: Yes, this is an enormous security problem, but we’ll address this later. Remember, it’s a proof of concept not a space shuttle.)
Phase Three – Putting it in Exim
There are lots of different ways to put this in exim but to keep things simple, we’ll put this in the acl_smtp_data ACL in order to just add a header for later:
warn message = X-OS-Fingerprint: ${run {/usr/local/etc/exim/get_os.sh $sender_host_address}{$value}{Unknown}} ($sender_host_address)
(Note: Make sure that your get_os.sh script is in a place where the exim server can see it and has permission to execute it. This usually means in /usr/local/etc/exim and chown mailnull:mail)
All that’s left is a reload of exim with the new ACL line:
kill -HUP `cat /var/run/exim.pid`
Each message that goes through should now have the header specifying the OS that was scanned with PF.
Results
To determine some results, we tracked the accepted/rejected status of each e-mail in addition to it’s fingerprint for about one week. Combining all OS variants into single groups, the results were somewhat interesting:
| Operating System | Accepted | Rejected | Ratio |
|---|---|---|---|
| Unknown | 83052 | 300356 | 78.34% |
| AIX | 2716 | 100601 | 97.37% |
| OpenBSD | 4111 | 22719 | 84.68% |
| Windows | 2827 | 2823 | 49.96% |
| Linux | 2946 | 706 | 19.33% |
| FreeBSD | 801 | 2056 | 71.96% |
| AOL | 4 | 772 | 99.48% |
| NetApp | 622 | 109 | 14.91% |
| PocketPC | 84 | 565 | 87.06% |
| MacOS | 80 | 135 | 62.79% |
| ULTRIX | 89 | 28 | 23.93% |
| OpenVMS | 3 | 91 | 96.81% |
| AXIS | 0 | 85 | 100.00% |
| OS/400 | 1 | 43 | 97.73% |
| Alteon | 0 | 37 | 100.00% |
| Tru64 | 2 | 27 | 93.10% |
| NewtonOS | 19 | 2 | 9.52% |
| IRIX | 7 | 14 | 66.67% |
| Clavister | 7 | 1 | 12.50% |
| SCO | 0 | 8 | 100.00% |
| BeOS | 0 | 6 | 100.00% |
| Contiki | 1 | 3 | 75.00% |
| HP-UX | 0 | 3 | 100.00% |
| Dell | 0 | 3 | 100.00% |
| BSD/OS | 0 | 1 | 100.00% |
Why isn’t Windows at the top? Why is AIX at the top? Contiki is an OS? Doesn’t Alteon run switches? These are all fine questions. The answer to many of them is that passive OS fingerprinting is not as accurate as active OS fingerprinting—the same passive fingerprint might actually hit several different kinds of Operating Systems. My PowerBook shows up as NetApp 5.2.1 even though it’s much too shiny to be one of those, for instance.
So what’s the point if it’s not accurate? Well, here is the fun part about statistics. It doesn’t actually matter how correct the information is as long as it’s consistent. Simply based on the fingerprint, correct or not, the data shows that we can be >97% certain that an e-mail is spam if it comes from something passively identified as AIX (assuming the rest of the anti-spam system is accurate and the test sample is large enough.)
And since I know you are all wondering, let’s run a breakdown for Windows:
| Operating System | Accepted | Rejected | Ratio |
|---|---|---|---|
| Windows 2000 RFC1323 | 1134 | 584 | 33.99% |
| Windows 98 noSACK | 1684 | 2113 | 55.65% |
| Windows 2000 SP3 | 16 | 91 | 85.05% |
| Windows NT | 0 | 41 | 100.00% |
It’s much less exciting than it should be, I know.[1]
Conclusion
Is it massively useful? Is it utterly pointless? Who’d have thought SCO only sent spam? Since when does AOL only send legitimate mail? Such questions are not for me, but for e-mail administrators and armchair statisticians. What I do know is that when it comes to classification of e-mail, the more tools the merrier.
Scalability
You might be saying to yourself, “This gets executed for every single e-mail that comes through? You don’t seriously expect me to bog down my already overloaded server with more slow shell scripts do you? You said minimal resource usage in the abstract!” Well, that’s true, I did. So just how long does it take to run that thing (on a xeon 2.8)?
server# time ./fingerprinter.sh 69.2.123.45 OpenBSD 3.4 opera 0.014u 0.033s 0:00.05 80.0% 178+263k 0+0io 0pf+0w server#
That’s plenty fast for most servers compared to the usual gauntlet of scanning and classification. There are even some optimizations left in the shell script for the shell-programming savvy. Of course, it’s not great for enormously large volumes of email, but that’s why it’s only a proof of concept. Run it as a daemon using UNIX sockets and cache the lookups for the interval that makes you and your load average the happiest – the fingerprint is unlikely to change that often given an IP.
Security
We all know that changing the permissions on /dev/pf is a cardinal sin, so let’s address that issue. This particular problem can be tackled in a number of ways, the most common is to use “sudo”http://www.courtesan.com/sudo/ to allow the mail user to execute it. If you don’t like sudo, you could use a client/server type setup as either a UNIX socket or TCP server. Details on both of these are fodder for another article though.
Difficulty
Setting up the proof of concept isn’t immensely difficult, though some minimal script programming knowledge is required as well as some minimal Exim and FreeBSD knowledge. For that, I give it a 5/10 on the difficulty scale. The difficulty level rises when addressing some of the security concerns or further optimizations.
For Crazy People
You could set up a dnsrbl that uses occasional active OS fingerprinting and timeouts to distribute aggregate IP/fingerprint mappings to a wider audience for analysis, or you could write a plugin for your favorite MTA that communicates directly with the kernel to retrieve pf information to absolutely minimize latency, or ignoring the fingerprinting aspect altogether you could use PF tables to dynamically limit the number of TCP states a spamming server is able to make or prevent it from connecting entirely thereby reducing the load on the MTA. And no, those are not recommendations (though that last one might make a decent article.)
Enjoy!
1 Okay, so the numbers don’t actually add up to the above report. This information was taken from a live SQL database at slightly different times.
Trackbacks
Use the following link to trackback from your own site:
http://blog.insidesystems.net/articles/trackback/1
Good article. You could add some header to the mail and set-up a proper scoring with spamassassin based on the os fingerprints.
One could add a header and score points directly or score using a bayes-type system in dspam or SpamAssassin .. we opted to leave that part of the exercise to the reader since there are a number of different ways to use the information.