I'm working on recreating and reloading our 'reverse index' that is used for site search. This reverse indexes 19 million forum posts and a few other smaller collections of text, generating one (weighted) record per unique "word" per object. That is how one reaches a billion rows: Average 50 unique "words" per "item" x 20 million "items" = 1 billion records.
I have some random timings for your amusement:
1. creating it in the first place.Surprisingly, this is the least dramatic part of the job. Reading all the plain text, stripping format, splitting it up into words, creating all the location objects in another (mere 20 million row) data table, and then dumping the final file as 19gb of text file takes "only" a couple of hours. It could be much faster if I ran this job 4 times in parallel (and pushed our mysql server to deliver the old posts more quickly).
2. Simply reading the entire file.Since it is 19gb and the disk array can do ~150mb/sec sequential read then obviously even reading the entire file once from end to end is going to take a few minutes to do. Modifying it or removing items is really tricky. So it better be right from the word go!
3. Loading this data into mysql.The fastest way to load data into a mysql table is to use batch inserts that to make large single transactions (megabytes each). The syntax is something like:
insert ignore into table (fields) values (....),(....),(...)
The 'ignore' is so one possible duplicate doesn't blow out the entire insert.
Inserting un-ordered into mysql/myisam like this, with perl creating the insert strings, can be done at a high rate of speed if the table does not have an index. Testing this shows it running in less than 1 hour (way over 300,000 rows per second inserted).
4. Sorting in mysql, space usage in myisam and InnoDBDoing a table sort in mysql is a killer. I don't actually know the runtime. I gave up after a couple of hours. I don't even know whether the runtime is exponential with the size of the table, or linear. Interestingly, sorting and then just selecting the 'top' items by limit is very quick, in the order of 10 minutes or so. This high speed for "limit N" sort is also reported by the excellent mysql performance blog:
how fast can you sort data with mysql?. The table in mysql takes little more space than the text file to store as myisam, but it balloons to more than 40gb of data under innodb, or 65gb if it is fragmented - and innodb optimize/defragment is famously slow! I'm afraid of running it. So I should sort the data before inserting it. I want maximum clustering for these 7 years worth of old data, even if new data will slowly degrade the clustering.
5. Sorting the data using gnu unix sortSorting a billion rows is handled by unix sort with sort/merge passes. It creates as many temporary files as you are prepared to allocate in memory. If you do:
sort -s 2g
(which instructs sort to use 2gb of memory) then sort creates 700mb temporary files at a rate of about one file every 2 minutes. The overall sort time is around one hour. If sort was multi-threaded it could be as many times faster as you have cores available.
6. Loading it all into mysql InnoDBFirst attempt, with two indexes, innodb quickly slows down to approx 5000 to 7000 rows per second. At 6000 rows a second fully loading this table would take over 24 hours! If speed further declined, this could stretch further. An unacceptable result with a lot of uncertainty during execution (will it ever finish?).
Removing a tiny secondary index (which nevertheless internally duplicates the PK), setting innodb_flush_log_at_trx_commit=0 from 2 (not recommended for production), setting innodb_flush_method=O_DIRECT, increasing the innodb_buffer_pool_size to 10g from 6g, further increasing the insert packet size to 25k rows per insert, and setting a write lock on the table, the insertion rate starts at 100,000 per second but quickly slows to ~50,000 per second where (hooray) it appears to be stable. If it can maintain this average it will be done in 5 hours. This is 10x faster. I suppose only one change was responsible for most of this improvement. I don't know which, however.
With 4 cores during this procedure approx 2 are occupied, and disk i/o averages about 15mb/sec write. (The perl loader that is reading the flat file and building the insert transactions is only taking 10% of one core).
Since there is still disk subsystem bandwidth (only a small percentage of time is disk-wait) plus with 50% idle net cpu capacity, bulk inserts like this are not fully taking advantage of more than two cores.
After "tuning" and during the bulk insert:
=====================================
070925 21:22:33 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 44 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 13714, signal count 13691
Mutex spin waits 0, rounds 483238, OS waits 5541
RW-shared spins 8377, OS waits 4122; RW-excl spins 8187, OS waits 3950
------------
TRANSACTIONS
------------
Trx id counter 0 505355010
Purge done for trx's n:o < 0 505352977 undo n:o < 0 0
History list length 5
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 24763, OS thread id 1151719744
MySQL thread id 98, query id 46658 localhost root
show innodb status
---TRANSACTION 0 0, not started, process no 24763, OS thread id 1150916928
MySQL thread id 97, query id 46590 orange.dslreports.com 192.168.1.200 dslreports
---TRANSACTION 0 505355009, ACTIVE 1 sec, process no 24763, OS thread id 1149913408 inserting, thread declared inside InnoDB 330
mysql tables in use 3, locked 3
3 lock struct(s), heap size 368, 0 row lock(s), undo log entries 11693
MySQL thread id 54, query id 46633 orange.dslreports.com 192.168.1.200 dslreports update
insert ignore into ReverseIndex (ReverseIndex_Word, ReverseIndex_Score, Location_ID, y) values ('afraid',0.5,35358162,2001),('afraid',0.5,35358237,2001),('afraid',0.5,35358371,2001),('afraid',0.5,35358565,2001),('afraid',0.5,35358973,2001),('afraid',0.5,3535963,2006),('afraid',0.5,35360270,2001),('afraid',0.5,3536096,2006),('afraid',0.5,35360979,2001),('afraid',0.5,35366742,2001),('afraid',0.5,35366837,2001),('afraid',0.5,35367484,2001),('afraid',0.5,35368671,2001),('afraid',0.5,35368926,2001),('afraid',0.5,35369066,2001),('afraid',0.5,35369144,2001),('afraid',0.5,35369736,2001),('afraid',0.5,
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
34941 OS file reads, 94666 OS file writes, 14986 OS fsyncs
0.91 reads/s, 17612 avg bytes/read, 28.39 writes/s, 10.50 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 5646, free list len 22374, seg size 28021,
0 inserts, 1249359 merged recs, 27611 merges
Hash table size 21249871, used cells 656746, node heap has 985 buffer(s)
18078.50 hash searches/s, 37474.85 non-hash searches/s
---
LOG
---
Log sequence number 37 4230141094
Log flushed up to 37 4226707564
Last checkpoint at 37 3650382906
0 pending log writes, 0 pending chkp writes
5899 log i/o's done, 2.32 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 11492247262; in additional pool allocated 10723840
Dictionary memory allocated 50960
Buffer pool size 655360
Free buffers 351098
Database pages 303277
Modified db pages 33971
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 39106, created 264171, written 316562
0.98 reads/s, 331.22 creates/s, 341.90 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
1 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
3 tablespace extents now reserved for B-tree split operations
Main thread process no. 24763, id 1140881728, state: sleeping
Number of rows inserted 50820193, updated 0, deleted 0, read 2
54743.46 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
before:
Per second averages calculated from the last 32 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 119377, signal count 118789
Mutex spin waits 0, rounds 24466870, OS waits 79493
RW-shared spins 30137, OS waits 14091; RW-excl spins 74491, OS waits 17320
------------
TRANSACTIONS
------------
Trx id counter 0 505351913
Purge done for trx's n:o < 0 505325357 undo n:o < 0 0
History list length 18
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 23767, OS thread id 1153526080
MySQL thread id 27375, query id 83255267 localhost root
show innodb status
---TRANSACTION 0 0, not started, process no 23767, OS thread id 1149110592
MySQL thread id 27384, query id 83255200 orange.dslreports.com 192.168.1.200 dslreports
---TRANSACTION 0 505351912, ACTIVE 0 sec, process no 23767, OS thread id 1150515520 inserting, thread declared inside InnoDB 163
mysql tables in use 3, locked 3
3 lock struct(s), heap size 368, 0 row lock(s), undo log entries 1841
MySQL thread id 25258, query id 83255261 orange.dslreports.com 192.168.1.200 dslreports update
insert ignore into ReverseIndex (ReverseIndex_Word, ReverseIndex_Score, Location_ID, y) values ('everything',0.5,38147190,2000),('everything',0.5,3814720,2006),('everything',0.5,38147203,2000),('everything',0.5,38147324,2000),('everything',0.5,38147344,2000),('everything',0.5,38147368,2000),('everything',0.5,38147432,2000),('everything',0.5,38147447,2000),('everything',0.5,3814753,2006),('everything',0.5,38147570,2000),('everything',0.5,38147600,2000),('everything',0.5,38147624,2000),('everything',0.5,38147672,2000),('everything',0.5,38147684,2000),('everything',0.5,38147760,2000),('everything
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 1
44374 OS file reads, 78902654 OS file writes, 383233 OS fsyncs
14.84 reads/s, 17280 avg bytes/read, 187.46 writes/s, 22.91 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 16778, free list len 8408, seg size 25187,
5103138 inserts, 2021625 merged recs, 38211 merges
Hash table size 12750011, used cells 7987928, node heap has 18521 buffer(s)
8803.85 hash searches/s, 5457.74 non-hash searches/s
---
LOG
---
Log sequence number 36 2142170188
Log flushed up to 36 2141719998
Last checkpoint at 36 1564120222
0 pending log writes, 0 pending chkp writes
77784071 log i/o's done, 3.16 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 6921245678; in additional pool allocated 10765312
Dictionary memory allocated 56840
Buffer pool size 393216
Free buffers 32
Database pages 374663
Modified db pages 270643
Pending reads 0
Pending writes: LRU 0, flush list 1, single page 0
Pages read 92038, created 2499921, written 13243868
15.66 reads/s, 32.69 creates/s, 815.47 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
1 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 23767, id 1140881728, state: sleeping
Number of rows inserted 400273553, updated 0, deleted 154959, read 131020195
6169.28 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
The server is mysql 5.1.21-beta-log running on x86_64 Suse 10 SP1 with 16gb of memory/ Intel 5140 cpus (2 x dual core) connected to the afore-benchmarked MD3000 disk enclosure. The filesystem is XFS, over two striped LUNs using LVM.