PostgreSQL scaling on 6.2 and 7.0
My previous posts have documented the utterly insane lack of scalability of PostgreSQL (at least according to that particular metric) on FreeBSD 6.2.
I ran the same test, on the same machine, using 7.0 with the 4BSD and the ULE schedulers. Postgres was tuned as in the previous posts, and WITNESS, INVARIANTS and all malloc debugging options were off.
Results:

Hopefully nobody will extrapolate this to mean more than it really does (it’s just one test, but it’s one that happens to matter to me.)
More is better?
Not always. I did some more tests, and confirmed that this particular workload actually gets slower as you add more than 2 cores.
sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=60 --max-requests=0 --oltp-read-only=on run
Here’s the results, with the tests run on absolutely identical hardware. (HP DL360, 2.00 GHz E5335 Xeon, 10K RPM SAS drives, 10 GB of RAM)

I know 6.2 isn’t the greatest for SMP, but I was still surprised to learn that we could get significantly lower overall performance by adding more processors.
I tried to test 7.0, but the 7.0 installer didn’t seem to recognize the iLO virtual keyboard tonight. Maybe tomorrow will go better.
What did I do wrong? 2
I’m hoping that somebody will look at this and say “hey, dummy, you did this all wrong.” I’m hoping that, because the results are shocking.
There have been a buzz about FreeBSD’s SMP performance compared to Linux’s, particularly a particular 8-core SMP test, using FreeBSD 7.0, sysbench, and mysql. A number of people mentioned that there were known problems with this test and PostgreSQL, so I decided to see how bad the problems were.
The results are so shocking, that my first thought was that I must have done something wrong.
The hardware:
Machine 1:
HP DL360 G5
1 5140 (dual core) 2.33 GHz (1333 FSB) Xeon
4 GB RAM
2 15K RPM SAS hard drives, running as mirrors through an e400i drive array
Machine 2:
HP DL360 G5
2 E5335 (quad core) 2.33 GHz (1333 FSB) Xeons
10 GB RAM
2 10K RPM SAS hard drives, running as mirrors through an e400i drive array
Operating System: Both machines are running FreeBSD 6.2-p3. After all, I wanted to see what performance levels to expect with production-quality software, not the 7.0 sweetness.
Kernel: Both are running identically configured SMP-enabled amd64 kernels.
sysctl.conf: both have the following
kern.ipc.shmmax=2147483647
kern.ipc.shmall=524288
kern.ipc.semmsl=512
kern.ipc.semmap=256
kern.ipc.somaxconn=2048
kern.maxfiles=65536
vfs.read_max=32
loader.conf: both have the following
kern.ipc.semmni=256
kern.ipc.semmns=2048
postgresql.conf: both have the following settings changed
shared_buffers = 1GB
work_mem = 64MB
maintenance_work_mem = 32MB
max_fsm_pages = 204800
random_page_cost = 3.0
effective_cache_size = 512MB
update_process_title = off
Both drive arrays are partitioned identically, and postgres was installed as the only running application in each case.
The test was run as follows:
sysbench --num-threads=${i} --test=oltp --pgsql-user=bench --pgsql-db=bench --db-driver=pgsql --max-time=120 --max-requests=0 --oltp-read-only=on run
And yet here are the results:

Yes, the 8-core, 10GB machine is massively slower than the 2-core 4GB machine.
This can’t be right, can it?
Followup: I ran this single-cpu test:
openssl speed rsa
1×2-Core 2.33GHz 5140 Xeon
sign verify sign/s verify/s
rsa 512 bits 0.0003s 0.0000s 3905.9 45028.8
rsa 1024 bits 0.0009s 0.0001s 1164.6 18787.2
rsa 2048 bits 0.0049s 0.0002s 205.2 6332.2
rsa 4096 bits 0.0320s 0.0005s 31.2 1917.9
2×4-Core 2.33 GHz E5335 Xeon
rsa 512 bits 0.0003s 0.0000s 3381.1 38972.0
rsa 1024 bits 0.0010s 0.0001s 1003.5 15991.7
rsa 2048 bits 0.0057s 0.0002s 176.7 5481.7
rsa 4096 bits 0.0373s 0.0006s 26.8 1638.6
I wanted to see if this was simply one algorithm, or if other algorithms showed the same sort of slowdown, so I ran the following:
openssl speed md5 sha1 aes-256-cbc
The 2-core machine:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 14172.75k 45816.42k 114118.73k 182293.03k 220175.36k
sha1 14901.45k 42795.69k 97387.27k 143547.69k 166264.67k
aes-256 cbc 91518.54k 94159.65k 93817.92k 92696.25k 93907.24k
The 8-core machine:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 12305.58k 39149.96k 97745.88k 155771.63k 188252.02k
sha1 12740.06k 36648.90k 83318.76k 122733.05k 142119.67k
aes-256 cbc 78109.56k 80589.95k 81222.08k 81406.47k 81401.94k
The 8-core is a little slower than the 2-core again, but not by the enormous levels we see on the pgsql sysbench.
When running RSA across multiple threads, the results are closer to what one would expect. The command I used to do this:
openssl speed rsa -multi N

And now I’m left with the question: Did I screw something up, or is this what Kris Kennaway was talking about when he wrote this message?
I must admit, that while I expected the performance to fall off, it never occurred to me that an 8-core system could be slower than a 2-core system right out of the gate.
DELETE FROM important WHERE overly broad condition;
A client had an interesting problem. They had a critical PostgreSQL database, and they accidentally ran an overly broad delete on a table.
And their newest backup was two weeks old.
Could we help? Of course we could help.
The procedure used was as follows:
1) Stop the database.
2) Backup the database files to a secure location.
3) Make a second copy of the database files, to a new, empty postgresql instance.
4) Now if you have a dump prior to the error, and xlogs for the entire time period, you can do a Point-In-Time Recovery, with a properly configured recovery.conf. Unfortunately we didn’t have that. We had a two-week old dump and 6 hours of xlogs.
5) Try to determine an approximate transaction ID of the mistake. We did this by looking at timestamps of files on the newest pg_clog file, finding one that happened before the mistake, and then multiplying it’s name by 1048576.
6) Then we used that xid with pg_resetxlog, on the secondary database that we had created. It worked, and we had a view of the data. We then played a bit, to find an xid that was acceptably close to the point of failure.
7) From there, we just did a straight SQL COPY, to get the data from the db server we had just setup onto the original database, and to get it loaded.
It wasn’t the most elegant solution, but it worked beautifully, and the client was pleased.
Here’s to hoping you never need this information!