Improving Network Reliability on FreeBSD
If you didn’t notice it during the FreeBSD 7.0/6.3 release, FreeBSD got a great new tool from OpenBSD. The lagg device.
This device allows you to setup links with failover, or to combine them using LACP, and the operation is dead simple. Here’s an example rc.conf, that just does a basic link failover:
cloned_interfaces="lagg0"
ifconfig_bge0="up"
ifconfig_bge1="up"
ifconfig_lagg0="laggproto failover laggport bge1 laggport bge0 192.168.1.5 netmask 255.255.255.0"
Or if you use 802.1q trunks
cloned_interfaces="lagg0 vlan0"
ifconfig_em0="up"
ifconfig_em1="up"
ifconfig_lagg0="laggproto failover laggport em0 laggport em1"
ifconfig_vlan0="vlan 22 vlandev lagg0 192.168.1.5 netmask 255.255.255.0"
The only downside of this, at all, is you need to write a quick nagios plugin to check for dead links, but fortunately, that’s easy enough to do as well.
Note: We don’t currently use LACP, because we’ve had some issues with it losing connectivity altogether, after alternating link failures.
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:15:17:73:61:f4
media: Ethernet 100baseTX <full-duplex>
status: active
lagg: laggdev lagg0
em4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:15:17:73:61:f4
media: Ethernet 100baseTX <full-duplex>
status: active
lagg: laggdev lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:15:17:73:61:f4
inet 192.168.1.52 netmask 0xffffff00 broadcast 192.168.1.255
media: Ethernet autoselect
status: active
laggproto failover
laggport: em4 flags=0<>
laggport: em0 flags=5<MASTER,ACTIVE>
Example ifconfig output from a successful vlan and lagg combination:
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:11:0a:30:21:04
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
lagg: laggdev lagg0
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:11:0a:30:21:04
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
lagg: laggdev lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:11:0a:30:21:04
media: Ethernet autoselect
status: active
laggproto failover
laggport: bge0 flags=0<>
laggport: bge1 flags=5<MASTER,ACTIVE>
vlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=3<RXCSUM,TXCSUM>
ether 00:11:0a:30:21:04
inet 192.168.1.145 netmask 0xffffff00 broadcast 192.168.1.255
media: Ethernet autoselect
status: active
vlan: 22 parent interface: lagg0
Monitoring iLO2 with Nagios
We have a whole bunch of HP servers, purchased in large part for their excellent Lights-Out management software. We wanted to have Nagios monitor some basic things, like the status of the fans, internal temperatures, power supplies, and VRMs.
Fortunately, that turned out to be pretty easy.
The first step was to download the HP Lights-Out XML Perl Scripting Sample for Linux. I did this even though I don’t use Linux as a platform. The resulting file contains a bunch of sample XML scripts for accomplishing various goals, and a perl script (locfg.pl) that submits them to an iLO2 processor.
I then used locfg.pl to submit the following XML to each of our servers, in order to create a user with no substantial privileges.
<RIBCL VERSION="2.0">
<LOGIN USER_LOGIN="adminuser" PASSWORD="adminpass">
<USER_INFO MODE="write">
<ADD_USER
USER_NAME="Nagios Monitor"
USER_LOGIN="nagiosuser"
PASSWORD="nagiospass">
<ADMIN_PRIV value ="N"/>
<REMOTE_CONS_PRIV value ="N"/>
<RESET_SERVER_PRIV value ="N"/>
<VIRTUAL_MEDIA_PRIV value ="N"/>
<CONFIG_ILO_PRIV value="N"/>
</ADD_USER>
</USER_INFO>
</LOGIN>
</RIBCL>
This script allowed me to quickly create unprivileged users on each of the iLO2 consoles.
Armed with an unprivileged user, I set about writing the actual plugin, and ended up with this:
#!/usr/bin/env ruby
require 'optparse'
require 'socket'
require 'openssl'
require 'rexml/document'
# Command Line Options
options = {
:server => nil
}
opts = OptionParser.new do |opt|
opt.banner = "Usage: #{$0} [options]"
opt.on('-s', '--server HOSTNAME', String, "Hostname or IP of the server to query") { |i| options[:server] = i }
end
opts.parse!(ARGV)
if not options[:server]
$stderr.puts "Server must be specified"
exit!
end
# iLO XML
xml_start = <<EOF
<RIBCL VERSION="2.22">
<LOGIN USER_LOGIN="nagiosuser" PASSWORD="nagiospass">
EOF
xml_end = <<EOF
</LOGIN>
</RIBCL>
EOF
xml_emhealth = <<EOF
<SERVER_INFO MODE="read">
<GET_EMBEDDED_HEALTH />
</SERVER_INFO>
EOF
error_cnt = 0
error_msg = ''
error_summary = ''
s = TCPsocket.open(options[:server], 443)
ssl = OpenSSL::SSL::SSLSocket.new(s, OpenSSL::SSL::SSLContext.new)
ssl.sync
ssl.connect
ssl.write("<?xml version=\"1.0\"?>\r\n")
ssl.write(xml_start)
ssl.write(xml_emhealth)
ssl.write(xml_end)
ssl.flush
res = ssl.readlines
ssl.close
s.close
doc = REXML::Document.new(res.to_s.match(/<GET_EMBEDDED_HEALTH_DATA>.*<\/GET_EMBEDDED_HEALTH_DATA>/m).to_s)
if ! doc.elements["GET_EMBEDDED_HEALTH_DATA"]
error_cnt += 1
error_msg += "Unable to fetch embedded health data\n"
end
doc.root.elements["FANS"].each_element('//FAN') { |mod|
if mod.elements["STATUS"].attributes['VALUE'] != 'Ok'
error_cnt += 1
error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['ZONE'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']}\n"
error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}."
end
}
doc.root.elements["TEMPERATURE"].each_element('//TEMP') { |mod|
if mod.elements["STATUS"].attributes['VALUE'] != 'Ok' and mod.elements["STATUS"].attributes['VALUE'] != 'n/a'
error_cnt += 1
error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['LOCATION'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']} - #{mod.elements['CURRENTREADING'].attributes['VALUE']} #{mod.elements['CURRENTREADING'].attributes['UNIT']} (Caution/Critical: #{mod.elements['CAUTION'].attributes['VALUE']}/#{mod.elements['CRITICAL'].attributes['VALUE']})\n"
error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}."
end
}
doc.root.elements["VRM"].each_element('//MODULE') { |mod|
if mod.elements["STATUS"].attributes['VALUE'] != 'Ok'
error_cnt += 1
error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']}\n"
error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}."
end
}
doc.root.elements["POWER_SUPPLIES"].each_element('//SUPPLY') { |mod|
if mod.elements["STATUS"].attributes['VALUE'] != 'Ok'
error_cnt += 1
error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']}\n"
error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}."
end
}
if error_cnt == 0
puts "OK: 0 problems"
rc=0
else
puts "Critical: #{error_cnt} problems. #{error_summary}"
puts error_msg
rc=2
end
exit rc
Now having that in place, I simply added all of the iLO2 hosts to a hostgroup, and added a service to check that group using the script, and all my fans, power supplies and such are now monitored without any operating-system level overhead.
PostgreSQL scaling on 6.2 and 7.0
My previous posts have documented the utterly insane lack of scalability of PostgreSQL (at least according to that particular metric) on FreeBSD 6.2.
I ran the same test, on the same machine, using 7.0 with the 4BSD and the ULE schedulers. Postgres was tuned as in the previous posts, and WITNESS, INVARIANTS and all malloc debugging options were off.
Results:

Hopefully nobody will extrapolate this to mean more than it really does (it’s just one test, but it’s one that happens to matter to me.)
More is better?
Not always. I did some more tests, and confirmed that this particular workload actually gets slower as you add more than 2 cores.
sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=60 --max-requests=0 --oltp-read-only=on run
Here’s the results, with the tests run on absolutely identical hardware. (HP DL360, 2.00 GHz E5335 Xeon, 10K RPM SAS drives, 10 GB of RAM)

I know 6.2 isn’t the greatest for SMP, but I was still surprised to learn that we could get significantly lower overall performance by adding more processors.
I tried to test 7.0, but the 7.0 installer didn’t seem to recognize the iLO virtual keyboard tonight. Maybe tomorrow will go better.
What did I do wrong? 2
I’m hoping that somebody will look at this and say “hey, dummy, you did this all wrong.” I’m hoping that, because the results are shocking.
There have been a buzz about FreeBSD’s SMP performance compared to Linux’s, particularly a particular 8-core SMP test, using FreeBSD 7.0, sysbench, and mysql. A number of people mentioned that there were known problems with this test and PostgreSQL, so I decided to see how bad the problems were.
The results are so shocking, that my first thought was that I must have done something wrong.
The hardware:
Machine 1:
HP DL360 G5
1 5140 (dual core) 2.33 GHz (1333 FSB) Xeon
4 GB RAM
2 15K RPM SAS hard drives, running as mirrors through an e400i drive array
Machine 2:
HP DL360 G5
2 E5335 (quad core) 2.33 GHz (1333 FSB) Xeons
10 GB RAM
2 10K RPM SAS hard drives, running as mirrors through an e400i drive array
Operating System: Both machines are running FreeBSD 6.2-p3. After all, I wanted to see what performance levels to expect with production-quality software, not the 7.0 sweetness.
Kernel: Both are running identically configured SMP-enabled amd64 kernels.
sysctl.conf: both have the following
kern.ipc.shmmax=2147483647
kern.ipc.shmall=524288
kern.ipc.semmsl=512
kern.ipc.semmap=256
kern.ipc.somaxconn=2048
kern.maxfiles=65536
vfs.read_max=32
loader.conf: both have the following
kern.ipc.semmni=256
kern.ipc.semmns=2048
postgresql.conf: both have the following settings changed
shared_buffers = 1GB
work_mem = 64MB
maintenance_work_mem = 32MB
max_fsm_pages = 204800
random_page_cost = 3.0
effective_cache_size = 512MB
update_process_title = off
Both drive arrays are partitioned identically, and postgres was installed as the only running application in each case.
The test was run as follows:
sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=120 --max-requests=0 --oltp-read-only=on run
And yet here are the results:

Yes, the 8-core, 10GB machine is massively slower than the 2-core 4GB machine.
This can’t be right, can it?
Followup: I ran this single-cpu test:
openssl speed rsa
1×2-Core 2.33GHz 5140 Xeon
sign verify sign/s verify/s
rsa 512 bits 0.0003s 0.0000s 3905.9 45028.8
rsa 1024 bits 0.0009s 0.0001s 1164.6 18787.2
rsa 2048 bits 0.0049s 0.0002s 205.2 6332.2
rsa 4096 bits 0.0320s 0.0005s 31.2 1917.9
2×4-Core 2.33 GHz E5335 Xeon
rsa 512 bits 0.0003s 0.0000s 3381.1 38972.0
rsa 1024 bits 0.0010s 0.0001s 1003.5 15991.7
rsa 2048 bits 0.0057s 0.0002s 176.7 5481.7
rsa 4096 bits 0.0373s 0.0006s 26.8 1638.6
I wanted to see if this was simply one algorithm, or if other algorithms showed the same sort of slowdown, so I ran the following:
openssl speed md5 sha1 aes-256-cbc
The 2-core machine:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 14172.75k 45816.42k 114118.73k 182293.03k 220175.36k
sha1 14901.45k 42795.69k 97387.27k 143547.69k 166264.67k
aes-256 cbc 91518.54k 94159.65k 93817.92k 92696.25k 93907.24k
The 8-core machine:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 12305.58k 39149.96k 97745.88k 155771.63k 188252.02k
sha1 12740.06k 36648.90k 83318.76k 122733.05k 142119.67k
aes-256 cbc 78109.56k 80589.95k 81222.08k 81406.47k 81401.94k
The 8-core is a little slower than the 2-core again, but not by the enormous levels we see on the pgsql sysbench.
When running RSA across multiple threads, the results are closer to what one would expect. The command I used to do this:
openssl speed rsa -multi N

And now I’m left with the question: Did I screw something up, or is this what Kris Kennaway was talking about when he wrote this message?
I must admit, that while I expected the performance to fall off, it never occurred to me that an 8-core system could be slower than a 2-core system right out of the gate.
Unscientific 15K v 10K SAS Drive Comparison
We’ve been buying a lot of new machines today, and almost all of them come with SAS drives. We bought 10K SAS drives for obviously performance-neutral applications, and 15K SAS drives for performance-critical applications, but what exactly did we get for an extra $200/drive?
I decided to do some wildly unscientific tests, using two HP DL360s, one with mirrored 10K RPM drives, the other with mirrored 15K RPM drives, both running FreeBSD 6.2/AMD64. Both machines are equipped with P400i drive controllers.
# diskinfo -t /dev/da0
/dev/da0
512 # sectorsize
73372631040 # mediasize in bytes (68G)
143305920 # mediasize in sectors
17562 # Cylinders according to firmware.
255 # Heads according to firmware.
32 # Sectors according to firmware.
Seek times:
Full stroke: 250 iter in 1.016654 sec = 4.067 msec
Half stroke: 250 iter in 0.955639 sec = 3.823 msec
Quarter stroke: 500 iter in 1.960625 sec = 3.921 msec
Short forward: 400 iter in 1.444117 sec = 3.610 msec
Short backward: 400 iter in 1.457452 sec = 3.644 msec
Seq outer: 2048 iter in 2.067837 sec = 1.010 msec
Seq inner: 2048 iter in 2.069040 sec = 1.010 msec
Transfer rates:
outside: 102400 kbytes in 2.078970 sec = 49255 kbytes/sec
middle: 102400 kbytes in 2.598712 sec = 39404 kbytes/sec
inside: 102400 kbytes in 3.115841 sec = 32864 kbytes/sec
15K SAS Drives
# diskinfo -t /dev/da0
/dev/da0
512 # sectorsize
73372631040 # mediasize in bytes (68G)
143305920 # mediasize in sectors
17562 # Cylinders according to firmware.
255 # Heads according to firmware.
32 # Sectors according to firmware.
Seek times:
Full stroke: 250 iter in 0.723955 sec = 2.896 msec
Half stroke: 250 iter in 0.756001 sec = 3.024 msec
Quarter stroke: 500 iter in 1.255116 sec = 2.510 msec
Short forward: 400 iter in 0.680525 sec = 1.701 msec
Short backward: 400 iter in 0.878597 sec = 2.196 msec
Seq outer: 2048 iter in 2.055020 sec = 1.003 msec
Seq inner: 2048 iter in 2.062749 sec = 1.007 msec
Transfer rates:
outside: 102400 kbytes in 1.331560 sec = 76902 kbytes/sec
middle: 102400 kbytes in 1.848040 sec = 55410 kbytes/sec
inside: 102400 kbytes in 1.652754 sec = 61957 kbytes/sec
So far, so good. As expected, the results are faster for all the tasks that require seeking, and sequential transfer rates are considerably higher as well. I think I like these tiny lil drives.
Now let’s try some bonnie, with some big files. Let’s try bonnie++ -d /usr/home -u root -s 16g -n 256:65536:65536:16
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
anemone.insides 16G 379 99 49303 13 14526 9 621 98 30751 7 423.1 94
Latency 22409us 394ms 1414ms 37000us 142ms 121ms
Version 1.93c ------Sequential Create------ --------Random Create--------
anemone.insidesyste -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
256:65536:65536/16 540 14 661 47 32500 63 588 15 117 8 8358 64
Latency 3045ms 175ms 506ms 2631ms 1841ms 946ms
1.93c,1.93c,anemone.insidesystems.net,1,1176110507,16G,,379,99,49303,13,14526,9,621,98,30751,7,423.1,94,256,65536,65536,,16,540,14,661,47,32500,63,588,15,117,8,8358,64,22409us,394ms,1414ms,37000us,142ms,121ms,3045ms,175ms,506ms,2631ms,1841ms,946ms
15k SAS mirror
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
noname.insidesy 16G 444 99 77857 19 22512 6 613 94 85113 15 490.1 15
Latency 20163us 2910ms 296ms 274ms 114ms 51627us
Version 1.93c ------Sequential Create------ --------Random Create--------
noname.insidesystem -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
256:65536:65536/16 1013 19 845 13 44078 86 959 18 103 3 13215 94
Latency 2763ms 800ms 57543us 2798ms 1805ms 259ms
1.93c,1.93c,noname.insidesystems.net,1,1176153578,16G,,444,99,77857,19,22512,6,613,94,85113,15,490.1,15,256,65536,65536,,16,1013,19,845,13,44078,86,959,18,103,3,13215,94,20163us,2910ms,296ms,274ms,114ms,51627us,2763ms,800ms,57543us,2798ms,1805ms,259ms
And with that, I was reasonably satisfied that we weren’t wasting money when we bought the 15K SAS drives.