from the 1:0 dept.
After the upgrade of Spamassassin on our three spamfilters from 2.6 to 3.0, we had a massive performance problem on our fileserver sardona. The new bayesian filter makes many more database requests and the default BerkeleyDB over NFS performed poorly. During the last week end, we switched over to MySQL and are observing an impressive performance.
BerkeleyDB makes file access in 8KBytes large blocks. Over NFS, the read- and write-requests are not cached by the client - too many problems could occur in case of concurrent access.
The server is equipped with a hardware RAID of 10 SCSI disks and able to do about 1000 ~ 1200 8KBytes large random I/O per second. This looks bad at first glance, but remember that NFS forces synchronous writes on the server. Each update of a BerkleyDB record is a read and a write request, so we could do about 500 updates per second.
Our MySQL server is based on the quite same hardware. The processing of 500 SQL statements per second (i.e., the limit for BerkeleyDB over NFS) now costs about 10% system load. This includes transfer of the statements over the net, parsing the SQL, optimizing, searching for indexes, seeking accessing the data and returning the result over the net.
In this specific case, the MySQL solution is about 10 times faster than BerkeleyDB - which in itself is much faster than standard plaintext files!
Think about such improvements when distributing work in a cluster - each second your job waits for I/O, the CPU is idle and the total running time of your calculation will extend.
< | >