Improving Network Reliability on FreeBSD

Posted by Kevin Way Fri, 27 Jun 2008 20:05:00 GMT

If you didn’t notice it during the FreeBSD 7.0/6.3 release, FreeBSD got a great new tool from OpenBSD. The lagg device.

This device allows you to setup links with failover, or to combine them using LACP, and the operation is dead simple. Here’s an example rc.conf, that just does a basic link failover:


cloned_interfaces="lagg0" 
ifconfig_bge0="up" 
ifconfig_bge1="up" 
ifconfig_lagg0="laggproto failover laggport bge1 laggport bge0 192.168.1.5 netmask 255.255.255.0" 

Or if you use 802.1q trunks

cloned_interfaces="lagg0 vlan0" 
ifconfig_em0="up" 
ifconfig_em1="up" 
ifconfig_lagg0="laggproto failover laggport em0 laggport em1" 
ifconfig_vlan0="vlan 22 vlandev lagg0 192.168.1.5 netmask 255.255.255.0" 

The only downside of this, at all, is you need to write a quick nagios plugin to check for dead links, but fortunately, that’s easy enough to do as well.

Note: We don’t currently use LACP, because we’ve had some issues with it losing connectivity altogether, after alternating link failures.

Example ifconfig output from a successful lagg setup:

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
    ether 00:15:17:73:61:f4
    media: Ethernet 100baseTX <full-duplex>
    status: active
    lagg: laggdev lagg0
em4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
    ether 00:15:17:73:61:f4
    media: Ethernet 100baseTX <full-duplex>
    status: active
    lagg: laggdev lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
    ether 00:15:17:73:61:f4
    inet 192.168.1.52 netmask 0xffffff00 broadcast 192.168.1.255
    media: Ethernet autoselect
    status: active
    laggproto failover
    laggport: em4 flags=0<>
    laggport: em0 flags=5<MASTER,ACTIVE>


Example ifconfig output from a successful vlan and lagg combination:


bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
    ether 00:11:0a:30:21:04
    media: Ethernet autoselect (1000baseTX <full-duplex>)
    status: active
    lagg: laggdev lagg0
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
    ether 00:11:0a:30:21:04
    media: Ethernet autoselect (1000baseTX <full-duplex>)
    status: active
    lagg: laggdev lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
    ether 00:11:0a:30:21:04
    media: Ethernet autoselect
    status: active
    laggproto failover
    laggport: bge0 flags=0<>
    laggport: bge1 flags=5<MASTER,ACTIVE>
vlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=3<RXCSUM,TXCSUM>
    ether 00:11:0a:30:21:04
    inet 192.168.1.145 netmask 0xffffff00 broadcast 192.168.1.255
    media: Ethernet autoselect
    status: active
    vlan: 22 parent interface: lagg0

Monitoring iLO2 with Nagios

Posted by Kevin Way Wed, 05 Sep 2007 12:21:00 GMT

We have a whole bunch of HP servers, purchased in large part for their excellent Lights-Out management software. We wanted to have Nagios monitor some basic things, like the status of the fans, internal temperatures, power supplies, and VRMs.

Fortunately, that turned out to be pretty easy.

The first step was to download the HP Lights-Out XML Perl Scripting Sample for Linux. I did this even though I don’t use Linux as a platform. The resulting file contains a bunch of sample XML scripts for accomplishing various goals, and a perl script (locfg.pl) that submits them to an iLO2 processor.

I then used locfg.pl to submit the following XML to each of our servers, in order to create a user with no substantial privileges.


<RIBCL VERSION="2.0">
  <LOGIN USER_LOGIN="adminuser" PASSWORD="adminpass">
  <USER_INFO MODE="write">
    <ADD_USER 
      USER_NAME="Nagios Monitor" 
      USER_LOGIN="nagiosuser" 
      PASSWORD="nagiospass">
      <ADMIN_PRIV value ="N"/>
      <REMOTE_CONS_PRIV value ="N"/>
      <RESET_SERVER_PRIV value ="N"/>
      <VIRTUAL_MEDIA_PRIV value ="N"/>
      <CONFIG_ILO_PRIV value="N"/>
    </ADD_USER>
  </USER_INFO>
  </LOGIN>
</RIBCL>

This script allowed me to quickly create unprivileged users on each of the iLO2 consoles.

Armed with an unprivileged user, I set about writing the actual plugin, and ended up with this:


#!/usr/bin/env ruby
require 'optparse'
require 'socket'
require 'openssl'
require 'rexml/document'

# Command Line Options
options = {
  :server => nil
}

opts = OptionParser.new do |opt|
  opt.banner = "Usage: #{$0}  [options]" 
  opt.on('-s', '--server HOSTNAME', String, "Hostname or IP of the server to query") { |i| options[:server] = i }
end
opts.parse!(ARGV)

if not options[:server]
  $stderr.puts "Server must be specified" 
  exit!
end

# iLO XML
xml_start = <<EOF
<RIBCL VERSION="2.22">
  <LOGIN USER_LOGIN="nagiosuser" PASSWORD="nagiospass">
EOF

xml_end = <<EOF
  </LOGIN>
</RIBCL>
EOF

xml_emhealth = <<EOF
<SERVER_INFO MODE="read">
  <GET_EMBEDDED_HEALTH />
</SERVER_INFO>
EOF

error_cnt = 0
error_msg = ''
error_summary = ''

s = TCPsocket.open(options[:server], 443)
ssl = OpenSSL::SSL::SSLSocket.new(s, OpenSSL::SSL::SSLContext.new)
ssl.sync
ssl.connect
ssl.write("<?xml version=\"1.0\"?>\r\n")
ssl.write(xml_start)
ssl.write(xml_emhealth)
ssl.write(xml_end)
ssl.flush
res = ssl.readlines

ssl.close
s.close

doc = REXML::Document.new(res.to_s.match(/<GET_EMBEDDED_HEALTH_DATA>.*<\/GET_EMBEDDED_HEALTH_DATA>/m).to_s)

if ! doc.elements["GET_EMBEDDED_HEALTH_DATA"]
  error_cnt += 1
  error_msg += "Unable to fetch embedded health data\n" 
end

doc.root.elements["FANS"].each_element('//FAN') { |mod|
  if mod.elements["STATUS"].attributes['VALUE'] != 'Ok'
    error_cnt += 1
    error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['ZONE'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']}\n" 
    error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}." 
  end
}

doc.root.elements["TEMPERATURE"].each_element('//TEMP') { |mod|
  if mod.elements["STATUS"].attributes['VALUE'] != 'Ok' and mod.elements["STATUS"].attributes['VALUE'] != 'n/a'
    error_cnt += 1
    error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['LOCATION'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']} - #{mod.elements['CURRENTREADING'].attributes['VALUE']} #{mod.elements['CURRENTREADING'].attributes['UNIT']} (Caution/Critical: #{mod.elements['CAUTION'].attributes['VALUE']}/#{mod.elements['CRITICAL'].attributes['VALUE']})\n" 
    error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}." 
  end
}

doc.root.elements["VRM"].each_element('//MODULE') { |mod|
  if mod.elements["STATUS"].attributes['VALUE'] != 'Ok'
    error_cnt += 1
    error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']}\n" 
    error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}." 
  end
}

doc.root.elements["POWER_SUPPLIES"].each_element('//SUPPLY') { |mod|
  if mod.elements["STATUS"].attributes['VALUE'] != 'Ok'
    error_cnt += 1
    error_msg += "#{mod.elements['LABEL'].attributes['VALUE']} - #{mod.elements['STATUS'].attributes['VALUE']}\n" 
    error_summary += "#{mod.elements['LABEL'].attributes['VALUE']}." 
  end
}

if error_cnt == 0
  puts "OK: 0 problems" 
  rc=0
else
  puts "Critical: #{error_cnt} problems. #{error_summary}" 
  puts error_msg
  rc=2
end

exit rc

Now having that in place, I simply added all of the iLO2 hosts to a hostgroup, and added a service to check that group using the script, and all my fans, power supplies and such are now monitored without any operating-system level overhead.

PostgreSQL scaling on 6.2 and 7.0

Posted by Kevin Way Wed, 11 Apr 2007 22:40:00 GMT

My previous posts have documented the utterly insane lack of scalability of PostgreSQL (at least according to that particular metric) on FreeBSD 6.2.

I ran the same test, on the same machine, using 7.0 with the 4BSD and the ULE schedulers. Postgres was tuned as in the previous posts, and WITNESS, INVARIANTS and all malloc debugging options were off.

Results:

Hopefully nobody will extrapolate this to mean more than it really does (it’s just one test, but it’s one that happens to matter to me.)

More is better?

Posted by Kevin Way Wed, 11 Apr 2007 07:55:00 GMT

Not always. I did some more tests, and confirmed that this particular workload actually gets slower as you add more than 2 cores.

sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=60 --max-requests=0 --oltp-read-only=on run

Here’s the results, with the tests run on absolutely identical hardware. (HP DL360, 2.00 GHz E5335 Xeon, 10K RPM SAS drives, 10 GB of RAM)

I know 6.2 isn’t the greatest for SMP, but I was still surprised to learn that we could get significantly lower overall performance by adding more processors.

I tried to test 7.0, but the 7.0 installer didn’t seem to recognize the iLO virtual keyboard tonight. Maybe tomorrow will go better.

What did I do wrong? 2

Posted by Kevin Way Tue, 10 Apr 2007 01:09:00 GMT

I’m hoping that somebody will look at this and say “hey, dummy, you did this all wrong.” I’m hoping that, because the results are shocking.

There have been a buzz about FreeBSD’s SMP performance compared to Linux’s, particularly a particular 8-core SMP test, using FreeBSD 7.0, sysbench, and mysql. A number of people mentioned that there were known problems with this test and PostgreSQL, so I decided to see how bad the problems were.

The results are so shocking, that my first thought was that I must have done something wrong.

The hardware:

Machine 1:
HP DL360 G5
1 5140 (dual core) 2.33 GHz (1333 FSB) Xeon
4 GB RAM
2 15K RPM SAS hard drives, running as mirrors through an e400i drive array

Machine 2:
HP DL360 G5
2 E5335 (quad core) 2.33 GHz (1333 FSB) Xeons
10 GB RAM
2 10K RPM SAS hard drives, running as mirrors through an e400i drive array

Operating System: Both machines are running FreeBSD 6.2-p3. After all, I wanted to see what performance levels to expect with production-quality software, not the 7.0 sweetness.

Kernel: Both are running identically configured SMP-enabled amd64 kernels.

sysctl.conf: both have the following
kern.ipc.shmmax=2147483647
kern.ipc.shmall=524288
kern.ipc.semmsl=512
kern.ipc.semmap=256
kern.ipc.somaxconn=2048
kern.maxfiles=65536
vfs.read_max=32
loader.conf: both have the following
kern.ipc.semmni=256
kern.ipc.semmns=2048
postgresql.conf: both have the following settings changed
shared_buffers = 1GB                 
work_mem = 64MB
maintenance_work_mem = 32MB
max_fsm_pages = 204800   
random_page_cost = 3.0
effective_cache_size = 512MB
update_process_title = off

Both drive arrays are partitioned identically, and postgres was installed as the only running application in each case.

The test was run as follows:
sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=120 --max-requests=0 --oltp-read-only=on run

And yet here are the results:

Yes, the 8-core, 10GB machine is massively slower than the 2-core 4GB machine.

This can’t be right, can it?

Followup: I ran this single-cpu test:
openssl speed rsa
1×2-Core 2.33GHz 5140 Xeon
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0003s   0.0000s   3905.9  45028.8
rsa 1024 bits   0.0009s   0.0001s   1164.6  18787.2
rsa 2048 bits   0.0049s   0.0002s    205.2   6332.2
rsa 4096 bits   0.0320s   0.0005s     31.2   1917.9

2×4-Core 2.33 GHz E5335 Xeon
rsa  512 bits   0.0003s   0.0000s   3381.1  38972.0
rsa 1024 bits   0.0010s   0.0001s   1003.5  15991.7
rsa 2048 bits   0.0057s   0.0002s    176.7   5481.7
rsa 4096 bits   0.0373s   0.0006s     26.8   1638.6

I wanted to see if this was simply one algorithm, or if other algorithms showed the same sort of slowdown, so I ran the following:

openssl speed md5 sha1 aes-256-cbc
The 2-core machine:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              14172.75k    45816.42k   114118.73k   182293.03k   220175.36k
sha1             14901.45k    42795.69k    97387.27k   143547.69k   166264.67k
aes-256 cbc      91518.54k    94159.65k    93817.92k    92696.25k    93907.24k
The 8-core machine:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              12305.58k    39149.96k    97745.88k   155771.63k   188252.02k
sha1             12740.06k    36648.90k    83318.76k   122733.05k   142119.67k
aes-256 cbc      78109.56k    80589.95k    81222.08k    81406.47k    81401.94k

The 8-core is a little slower than the 2-core again, but not by the enormous levels we see on the pgsql sysbench.

When running RSA across multiple threads, the results are closer to what one would expect. The command I used to do this:


openssl speed rsa -multi N

And now I’m left with the question: Did I screw something up, or is this what Kris Kennaway was talking about when he wrote this message?

I must admit, that while I expected the performance to fall off, it never occurred to me that an 8-core system could be slower than a 2-core system right out of the gate.

Unscientific 15K v 10K SAS Drive Comparison

Posted by Kevin Way Mon, 09 Apr 2007 20:50:00 GMT

We’ve been buying a lot of new machines today, and almost all of them come with SAS drives. We bought 10K SAS drives for obviously performance-neutral applications, and 15K SAS drives for performance-critical applications, but what exactly did we get for an extra $200/drive?

I decided to do some wildly unscientific tests, using two HP DL360s, one with mirrored 10K RPM drives, the other with mirrored 15K RPM drives, both running FreeBSD 6.2/AMD64. Both machines are equipped with P400i drive controllers.

10K SAS drives
# diskinfo -t /dev/da0
/dev/da0
        512             # sectorsize
        73372631040     # mediasize in bytes (68G)
        143305920       # mediasize in sectors
        17562           # Cylinders according to firmware.
        255             # Heads according to firmware.
        32              # Sectors according to firmware.

Seek times:
        Full stroke:      250 iter in   1.016654 sec =    4.067 msec
        Half stroke:      250 iter in   0.955639 sec =    3.823 msec
        Quarter stroke:   500 iter in   1.960625 sec =    3.921 msec
        Short forward:    400 iter in   1.444117 sec =    3.610 msec
        Short backward:   400 iter in   1.457452 sec =    3.644 msec
        Seq outer:       2048 iter in   2.067837 sec =    1.010 msec
        Seq inner:       2048 iter in   2.069040 sec =    1.010 msec
Transfer rates:
        outside:       102400 kbytes in   2.078970 sec =    49255 kbytes/sec
        middle:        102400 kbytes in   2.598712 sec =    39404 kbytes/sec
        inside:        102400 kbytes in   3.115841 sec =    32864 kbytes/sec
15K SAS Drives
#  diskinfo -t /dev/da0
/dev/da0
        512             # sectorsize
        73372631040     # mediasize in bytes (68G)
        143305920       # mediasize in sectors
        17562           # Cylinders according to firmware.
        255             # Heads according to firmware.
        32              # Sectors according to firmware.

Seek times:
        Full stroke:      250 iter in   0.723955 sec =    2.896 msec
        Half stroke:      250 iter in   0.756001 sec =    3.024 msec
        Quarter stroke:   500 iter in   1.255116 sec =    2.510 msec
        Short forward:    400 iter in   0.680525 sec =    1.701 msec
        Short backward:   400 iter in   0.878597 sec =    2.196 msec
        Seq outer:       2048 iter in   2.055020 sec =    1.003 msec
        Seq inner:       2048 iter in   2.062749 sec =    1.007 msec
Transfer rates:
        outside:       102400 kbytes in   1.331560 sec =    76902 kbytes/sec
        middle:        102400 kbytes in   1.848040 sec =    55410 kbytes/sec
        inside:        102400 kbytes in   1.652754 sec =    61957 kbytes/sec

So far, so good. As expected, the results are faster for all the tasks that require seeking, and sequential transfer rates are considerably higher as well. I think I like these tiny lil drives.

Now let’s try some bonnie, with some big files. Let’s try bonnie++ -d /usr/home -u root -s 16g -n 256:65536:65536:16

10K SAS mirror
Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
anemone.insides 16G   379  99 49303  13 14526   9   621  98 30751   7 423.1  94
Latency             22409us     394ms    1414ms   37000us     142ms     121ms
Version 1.93c       ------Sequential Create------ --------Random Create--------
anemone.insidesyste -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 256:65536:65536/16   540  14   661  47 32500  63   588  15   117   8  8358  64
Latency              3045ms     175ms     506ms    2631ms    1841ms     946ms
1.93c,1.93c,anemone.insidesystems.net,1,1176110507,16G,,379,99,49303,13,14526,9,621,98,30751,7,423.1,94,256,65536,65536,,16,540,14,661,47,32500,63,588,15,117,8,8358,64,22409us,394ms,1414ms,37000us,142ms,121ms,3045ms,175ms,506ms,2631ms,1841ms,946ms
15k SAS mirror
Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
noname.insidesy 16G   444  99 77857  19 22512   6   613  94 85113  15 490.1  15
Latency             20163us    2910ms     296ms     274ms     114ms   51627us
Version 1.93c       ------Sequential Create------ --------Random Create--------
noname.insidesystem -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 256:65536:65536/16  1013  19   845  13 44078  86   959  18   103   3 13215  94
Latency              2763ms     800ms   57543us    2798ms    1805ms     259ms
1.93c,1.93c,noname.insidesystems.net,1,1176153578,16G,,444,99,77857,19,22512,6,613,94,85113,15,490.1,15,256,65536,65536,,16,1013,19,845,13,44078,86,959,18,103,3,13215,94,20163us,2910ms,296ms,274ms,114ms,51627us,2763ms,800ms,57543us,2798ms,1805ms,259ms

And with that, I was reasonably satisfied that we weren’t wasting money when we bought the 15K SAS drives.

Older posts: 1 2 3