What did I do wrong? 2

Posted by Kevin Way Tue, 10 Apr 2007 01:09:00 GMT

I’m hoping that somebody will look at this and say “hey, dummy, you did this all wrong.” I’m hoping that, because the results are shocking.

There have been a buzz about FreeBSD’s SMP performance compared to Linux’s, particularly a particular 8-core SMP test, using FreeBSD 7.0, sysbench, and mysql. A number of people mentioned that there were known problems with this test and PostgreSQL, so I decided to see how bad the problems were.

The results are so shocking, that my first thought was that I must have done something wrong.

The hardware:

Machine 1:
HP DL360 G5
1 5140 (dual core) 2.33 GHz (1333 FSB) Xeon
4 GB RAM
2 15K RPM SAS hard drives, running as mirrors through an e400i drive array

Machine 2:
HP DL360 G5
2 E5335 (quad core) 2.33 GHz (1333 FSB) Xeons
10 GB RAM
2 10K RPM SAS hard drives, running as mirrors through an e400i drive array

Operating System: Both machines are running FreeBSD 6.2-p3. After all, I wanted to see what performance levels to expect with production-quality software, not the 7.0 sweetness.

Kernel: Both are running identically configured SMP-enabled amd64 kernels.

sysctl.conf: both have the following
kern.ipc.shmmax=2147483647
kern.ipc.shmall=524288
kern.ipc.semmsl=512
kern.ipc.semmap=256
kern.ipc.somaxconn=2048
kern.maxfiles=65536
vfs.read_max=32
loader.conf: both have the following
kern.ipc.semmni=256
kern.ipc.semmns=2048
postgresql.conf: both have the following settings changed
shared_buffers = 1GB                 
work_mem = 64MB
maintenance_work_mem = 32MB
max_fsm_pages = 204800   
random_page_cost = 3.0
effective_cache_size = 512MB
update_process_title = off

Both drive arrays are partitioned identically, and postgres was installed as the only running application in each case.

The test was run as follows:
sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=120 --max-requests=0 --oltp-read-only=on run

And yet here are the results:

Yes, the 8-core, 10GB machine is massively slower than the 2-core 4GB machine.

This can’t be right, can it?

Followup: I ran this single-cpu test:
openssl speed rsa
1×2-Core 2.33GHz 5140 Xeon
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0003s   0.0000s   3905.9  45028.8
rsa 1024 bits   0.0009s   0.0001s   1164.6  18787.2
rsa 2048 bits   0.0049s   0.0002s    205.2   6332.2
rsa 4096 bits   0.0320s   0.0005s     31.2   1917.9

2×4-Core 2.33 GHz E5335 Xeon
rsa  512 bits   0.0003s   0.0000s   3381.1  38972.0
rsa 1024 bits   0.0010s   0.0001s   1003.5  15991.7
rsa 2048 bits   0.0057s   0.0002s    176.7   5481.7
rsa 4096 bits   0.0373s   0.0006s     26.8   1638.6

I wanted to see if this was simply one algorithm, or if other algorithms showed the same sort of slowdown, so I ran the following:

openssl speed md5 sha1 aes-256-cbc
The 2-core machine:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              14172.75k    45816.42k   114118.73k   182293.03k   220175.36k
sha1             14901.45k    42795.69k    97387.27k   143547.69k   166264.67k
aes-256 cbc      91518.54k    94159.65k    93817.92k    92696.25k    93907.24k
The 8-core machine:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              12305.58k    39149.96k    97745.88k   155771.63k   188252.02k
sha1             12740.06k    36648.90k    83318.76k   122733.05k   142119.67k
aes-256 cbc      78109.56k    80589.95k    81222.08k    81406.47k    81401.94k

The 8-core is a little slower than the 2-core again, but not by the enormous levels we see on the pgsql sysbench.

When running RSA across multiple threads, the results are closer to what one would expect. The command I used to do this:


openssl speed rsa -multi N

And now I’m left with the question: Did I screw something up, or is this what Kris Kennaway was talking about when he wrote this message?

I must admit, that while I expected the performance to fall off, it never occurred to me that an 8-core system could be slower than a 2-core system right out of the gate.

DELETE FROM important WHERE overly broad condition;

Posted by Kevin Way Wed, 14 Mar 2007 03:20:00 GMT

A client had an interesting problem. They had a critical PostgreSQL database, and they accidentally ran an overly broad delete on a table.

And their newest backup was two weeks old.

Could we help? Of course we could help.

The procedure used was as follows:

1) Stop the database.

2) Backup the database files to a secure location.

3) Make a second copy of the database files, to a new, empty postgresql instance.

4) Now if you have a dump prior to the error, and xlogs for the entire time period, you can do a Point-In-Time Recovery, with a properly configured recovery.conf. Unfortunately we didn’t have that. We had a two-week old dump and 6 hours of xlogs.

5) Try to determine an approximate transaction ID of the mistake. We did this by looking at timestamps of files on the newest pg_clog file, finding one that happened before the mistake, and then multiplying it’s name by 1048576.

6) Then we used that xid with pg_resetxlog, on the secondary database that we had created. It worked, and we had a view of the data. We then played a bit, to find an xid that was acceptably close to the point of failure.

7) From there, we just did a straight SQL COPY, to get the data from the db server we had just setup onto the original database, and to get it loaded.

It wasn’t the most elegant solution, but it worked beautifully, and the client was pleased.

Here’s to hoping you never need this information!

A Smarter Cron - Strata

Posted by Kelley Reynolds Tue, 04 Jul 2006 04:07:00 GMT

In today’s increasingly electronic commercial environment, one of the most common scenarios confronting businesses is the transport and processing of files. A typical scenario might be the following:

  • At the end of the day, connect to a vendor FTP site and download a couple of files
  • Decrypt them using GPG
  • Process each line and import the information into a local reporting database
  • Construct a subset of the information into a new format
  • Encrypt the new files with a different key and e-mail them to a client
  • Upload yet another format of the information to a website for client download

Of course, none of these tasks are particularly difficult, it’s just a matter of creating a simple shell script tossing it in the crontab. This is usually where things start to go progressively downhill and reality starts to assert itself. Strata can help fend off ugliness, but first lets explore why it’s necessary.

The Downward Spiral

First, the vendor runs a little late a few times causing the cron script to fail. A forward-thinking scripter will have built in e-mail notifications already but if you didn’t, you do so now. So once every other week or so, an e-mail gets sent to you saying the script failed because the vendor slightly missed the deadline and you have to either log in to the FTP site and wait for the file or run the program over and over until it snags the files and proceeds on it’s merry way. This seems alright but is tedious and it’s still a manual step that galls you. Being the clever programmer you are, you decide to alter the script to be able to run more than once in a day and only give a warning if it’s past a certain time so you can put it in the crontab to run every minute between 5 and 5:15 pm. This brilliant strategy works for a little while, and you forget about this part of your day and move on, thinking it’s a solved problem.

Every month or so, a new file gets added or a new client gets another file format for their custom, deprecated accounting system and eventually you have a very large, very important script that runs at the end of the day. Mission Critical. That’s why it’s so tragic when your script breaks just as you are ready to go home. For the first time ever, it takes your script more than a minute to download the FTP files because of accumulated data. Sadly, this means that another of your scripts starts up and begins downloading and processing the files. Suddenly, your local reporting database has some duplicate data in it for the day and every file that goes out afterward has some duplicate information in it throwing all downstream processes totally out of whack. You find this out when one of your largest clients calls and says that they are having trouble importing the file you made for them because their system is smarter than yours. Then another client. Then another. Being the quick-witted person you are, you have your team call everybody who receives those files and tell them that you are going to resend a clean copy, you wipe all the data for the day, re-run the script and all is well, but you vow never to let that happen again. Ever.

First thing next morning, you add a lockfile to the program so that it never runs concurrently. Thinking that will solve your issues, you let down your guard and in a week or two, the script dies in the middle for some reason .. maybe the reporting database was down or ran out of connections and the script couldn’t handle it. Here you are, right back where you were before the magic lockfile when you had to tell everybody that you had to resend the file. Trying to save face, you refer to the source of the script and manually run the pieces that had failed so that you don’t actually have to call anybody. You manage to finish the tasks, largely on time, and none of the clients seemed to notice. The rest of the team noticed though. They didn’t have these problems at the last place they worked, what’s your problem?

You resolve once more that this issue will never bother you again, so you spend the next week making a big, fancy diagram of data processing and flows, and proceed to write little atomic scripts for each action that check for concurrency, requirements to be able to run successfully, or for having already run successfully. Every action has notifications for each of the failure modes and also sends you success messages so that you know each task has completed successfully. You now have a special e-mail folder for these notifications, three directories full of scripts, two crontabs on different machines, and worried looks from the rest of your team at the end of the day. “How did it come to this?” you wonder to yourself. “How could the simple process of downloading and processing files turned into this behemoth tangle of interdependent lockfile muck?” Strangely enough, that’s what the new guy wonders too as he tries to make sense of your ‘magical’ system you’ve neglected to document in a manual. It now takes a full 30 minutes to execute all of the scripts involved and it’s quite difficult to tell within that window precisely what stage everything is at, but at least it’s pretty reliable so that’s good, right?

Enter Strata

Depending on your level of scripting experience, you might enter the above story at any point, even at the end where there are no problems and every script is reliable and robust (possibly even well-documented). The essential problem remains though, which is that no matter how well documented or reliable each of those scripts are, it’s still a tangled mess without a clear status window. Strata solves this problem by having a centralized configuration file, dependency language, coordination script, and reporting (notifications are absent from this list, but I’ll get back to that). The whole concept is that there are dependencies, and things to do when all of the dependencies are met. The list of dependencies is in stratas.xml1:

    <stratas>
            <strata id="test1">
                    <condition type="weekday">1</condition>
            </strata>
            <strata id="test2">
                    <condition type="concurrent_id">
                            <strataid>test2</strataid>
                            <count>2</count>
                    </condition>
            </strata>
            <strata id="test3">
                    <condition type="fileexists">
                            <filename>strata.conf</filename>
                            <filename parser="glob">*.conf</filename>
                            <filename parser="strftime">strata.conf-%Y</filename>
                    </condition>
            </strata>
            <strata id="test4">
                    <condition type="lastrun">
                            <strataid>test2</strataid>
                            <before>2</before>
                            <exitcode>0</exitcode>
                            <output>has successfully</output>
                    </condition>
            </strata>
    </stratas>

The basic format of the file is pretty self-explanatory so we’ll skip right to explaining what each strata does. The test1 strata will pass only if it is Monday (weekday is a subset of the crontab dependency module). The test2 strata will pass only if zero or one copies of it are already running. The count can be changed to one for more typical concurrency checks, or 0 if you just want to be obscure. The test3 strata will pass only if there is a file named strata.conf, a glob matching *.conf, and file called strata.conf-2006 (assuming you read this in the year 2006). The test4 strata is the most complicated of the above but it still reasonably simple. It states that test2 must have run more than 2 minutes ago, had a 0 exitcode, and printed ‘has successfully’ to stdout. The lastrun condition is a powerful one that allows the strata to depend on one another in an extremely flexible way. But now that we have our strata defined, how do we run them? There is a script called stratarunner which runs commands defined in strata-runner.xml:

    <runners>
            <defaults>
                    <strataconf>/usr/local/etc/stratas.xml</strataconf>
                    <database>sqlite:///var/db/strata/strata.db</database>
            </defaults>
            <runner>
                    <command>blah.sh</command>
                    <strataid>test1</strataid>
            </runner>
            <runner>
                    <command>blah.sh</command>
                    <strataid>test2</strataid>
            </runner>
            <runner>
                    <command>blah.sh</command>
                    <strataid>test3</strataid>
            </runner>
    </runners>

The defaults section defines which configuration file to use for each runner and which database to store the strata information in. Currently, SQLite and PostgreSQL are supported through a recreated-wheel abstraction in the library itself. The first runner will run the script blah.sh if the first strata passes, which is to say that it will run if it is Monday. The other two are precisely the same command, but with different strata. The reason for the two separate files is so that multiple programs can share the same strata, or a single great configuration file can be spread across different machines with interdependent runners on each machine.

A Concrete Example

Most of the small financial shops I’ve consulted to have a similar set of requirements:

  • Download data from custodian
  • Munge and import data into local system for client/website access
  • Send a version of the data to accounting

(don’t ask me why some accounting firms can’t take custodian data directly)

There can be more steps involved depending on the exact requirements for file processing and the number of service providers, but the concept is pretty much the same. Most small financial firms can’t (or choose not to) afford a full-time IT staff so the reliability and availability of these data flows is extremely important.

The configuration files

First, the stratas.xml file:

    <stratas>
        <strata id="download_file">
            <!-- Download the file between 6 and 8 pm during the week -->
            <condition type="crontab">* 18-20 * * 1-5</condition>
            <!-- Allow for it to take a long time downloading -->
            <condition type="concurrent_id">
                <strataid>download_file</strataid>
                <count>1</count>
            </condition>
            <!-- Only run if we haven't already downloaded todays account_info file -->
            <condition negative="true" type="fileexists">
                <filename parser="strftime">account_info.%Y%m%d</filename>
            </condition>
        </strata>
        <strata id="import_file">
            <!-- Import the file between 6 and 8 pm -->
            <condition type="crontab">* 18-20 * * 1-5</condition>
            <!-- Only run if the file has been downloaded -->
            <condition type="fileexists">
                <filename parser="strftime">account_info.%Y%m%d</filename>
            </condition>
            <!-- Don't run if we've run successfully in the last 2 hours -->
            <condition negative="true" type="lastrun">
                <strataid>import_file</strataid>
                <after>120</after>
                <exitcode>0</exitcode>
            </condition>
        </strata>
        <strata id="encrypt_file">
            <!-- Only run if the file has been downloaded -->
            <condition type="fileexists">
                <filename parser="strftime">account_info.%Y%m%d</filename>
            </condition>
            <!-- Only run if the file has not already been encrypted -->
            <condition negative="true" type="fileexists">
                <filename parser="strftime">newfile.%Y%m%d.gpg</filename>
            </condition>
            <!-- Only run if the download program has actually finished more than a minute ago -->
            <condition type="lastrun">
                <strataid>download_file</strataid>
                <before>1</before>
                <exitcode>0</exitcode>
            </condition>
        </strata>
        <strata id="send_to_accounting">
            <!-- Only run if the file has already been encrypted -->
            <condition type="fileexists">
                <filename parser="strftime">newfile.%Y%m%d.gpg</filename>
            </condition>
            <!-- Only run if the encryption is completely finished .. no half-written files -->
            <condition type="lastrun">
                <strataid>encrypt_file</strataid>
                <before>1</before>
                <exitcode>0</exitcode>
            </condition>
        </strata>
    </stratas>

Next the strata-runners.xml file:

    <runners>
        <defaults>
            <strataconf>/usr/local/etc/stratas.xml</strataconf>
            <database>sqlite:///var/db/strata/strata.db</database>
        </defaults>
        <runner>
            <command>downloadscript.sh</command>
            <strataid>download_file</strataid>
        </runner>
        <runner>
            <command>importscript.rb</command>
            <strataid>import_file</strataid>
        </runner>
        <runner>
            <command>encryptscript.sh</command>
            <strataid>encrypt_file</strataid>
        </runner>
        <runner>
            <command>uploadscript.sh</command>
            <strataid>send_to_accounting</strataid>
        </runner>
    </runners>

The database

As previously mentioned, Strata records all of it’s status information into a database so we better set one up (in PostgreSQL). Log into the server and create a database (this will change depending on your application and environment .. this os for DarwinPorts on OSX):

psql -U postgres8 template1

create database strata;

Stratarunner should create the schema for you the first time it starts, and you can run a no-op by checking for running processes:

$ ruby /opt/local/lib/ruby/gems/1.8/gems/Strata-0.1/bin/stratarunner -p -d pgsql://postgres8@localhost/strata
NOTICE:  CREATE TABLE will create implicit sequence "status_id_seq" for serial column "status.id" 
NOTICE:  CREATE TABLE will create implicit sequence "tags_id_seq" for serial column "tags.id" 
No currently running processes
$

One final thing to note, make sure that have you have appropriate gem for your database type installed whether it’s postgres or sqlite or whatever.

The Cron

For this application we’ll just set Strata to run every minute in the cron all day. This can be optimized by only running it for the parts of the day required, but it doesn’t take much juice.

* * * * * /usr/local/bin/stratarunner.rb

(Note: If you install via gem, you’ll have to change paths and/or make a symbolic link to the binary in whatever location your gems are stored. Also, you may need to manually specify the location of your conf files if they are not in the default location. stratarunner.rb -h for details)

Et Voila

At this point, strata should be running every minute attempting to do whatever it has been set to do. There are a few things you should know for proper strata maintenance though:

  • stratarunner -p shows you the currently running processes
  • stratarunner -z <strataid> will reset the state of a program that Strata thinks is running that isn’t really. This can happen occasionally if a child script exits very strangely
  • stratarunner -e <strataid> will attempt to run just one strata instead of all of them, useful for debugging and setup.

Oh Yeah, Notifications

Notifications are trivial with Strata in that they are just a strata themselves. If some file isn’t downloaded by X’o’clock, run an email script. Or a jabber script. Or sound a klaxon, whatever makes you the happiest and most notified.

Conclusion

We’ve used this setup since 2004 in production and it’s worked great when there are lots of files that have to go lots of different places and they are all dependent on one another. It allows nice atomic creation of scripts and a custom reporting interface to be written based on the data in the SQL database. Though, wouldn’t it be nice if there was a directed graph of some sort on a web page that displayed exactly what the status was? Stay tuned …

1 Leave me alone about the pluralization, I did it for differentiation in the XML document. If you don’t like it, you can alter the conf file source with a command-line switch.