Cache pre-loading on mysqld startup

The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.

To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql

SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT(‘SELECT COUNT(`’,column_name,’`) FROM `’,table_schema,’`.`’,table_name,’` FORCE INDEX (`’,index_name,’`)’) SEPARATOR ‘ UNION ALL ‘) INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN (‘information_schema’,’mysql’) AND seq_in_index = 1;
PREPARE stmt FROM @sql;
SET SESSION [email protected]@group_concat_max_len;
and in my.cnf add a line in the [mysqld] block

init-file = /var/lib/mysql/initfile.sql
That’s all. mysql reads that file on startup and executes each line. Since we can do the whole select in a single (admittedly quirky) query and then use dynamic SQL to execute the result, we don’t need to create a stored procedure.

Of course this kind of simplistic “get everything” only really makes sense if the entire dataset+indexes fit in memory, otherwise you’ll want to be more selective. Still, you could use the above as a basis, perhaps using another table to provide a list of tables/indexes to be excluded – or if the schema is really stable, simply have a list of tables/indexes to be included instead of dynamically using information_schema.

Practical (albeit niche) application:

In a system with multiple slaves, adding in a new slave makes it start with cold caches, but since with loadbalancing it will pick up only some of the load it often works out ok. However, some environments have dual masters but the application is not able to do read/write splits to utilise slaves. In that case all the reads also go to the active master. Consequentially, the passive master will have relatively cold caches (only rows/indexes that have been updated will be in memory) so in case of a failover the amount of disk reads for the many concurrent SELECT queries will go through the roof – temporarily slowing the effective performance to a dismal crawl: each query takes longer with the required additional disk access so depending on the setup the server may even run out of connections which in turn upsets the application servers. It’d sort itself out but a) it looks very bad on the frontend and b) it may take a number of minutes.

The above construct prevents that scenario, and as mentioned it can be used as a basis to deal with other situations. Not many people know about the init-file option, so this is a nice example.

If you want to know how the SQL works, read on. The original line is very long so I’ll reprint it below with some reformatting:

‘SELECT COUNT(`’,column_name,’`)
FROM `’,table_schema,’`.`’,table_name,
‘` FORCE INDEX (`’,index_name,’`)’
INTO @sql
FROM information_schema.statistics
WHERE table_schema NOT IN (‘information_schema’,’mysql’)
AND seq_in_index = 1;
The outer query grabs each regular db/table/index/firstcol name that exists in the server, writing out a SELECT query that counts all not-NULL values of the indexed column (so it must scan the index), forcing that specific index. We then abuse the versatile and flexible GROUP_CONCAT() function to glue all those SELECTs together, with “UNION ALL” inbetween. The result is a single very long string, so we need to tweak the maximum allowed group_concat output beforehand to prevent truncation.

Filed under: Uncategorized by Arjen Lentz on Wednesday, December 1, 2010 | Social tagging: cache > failover > init-file > mariadb > master > mysql > preload > replication > slave

2 Responses to “Cache pre-loading on mysqld startup”

Morgan Tocker, on December 2nd, 2010 at 4:05 am Said:
Have you tested it with innodb plugin?

Presumably these queries only load the data into the old sublist of the LRU: /

arjen, on December 2nd, 2010 at 11:56 pm Said:
Hey Morgan!
The two clients we developed this for are still on 5.0 (OurDelta builds) and also have the special case that all their data/indexes can fit in memory (in one case due to the fact that the masters run on very serious boxes).
Since they were our priority we haven’t yet tested it on a recent 5.1, but yes I would expect all this stuff to end up in the old sublist since it only reads them in once and then doesn’t touch until the regular db operations happen to use some. So the innodb_old_blocks_pct might need to be adjusted otherwise they’ll fall straight out again even with a large buffer pool. But if the entire dataset fits in RAM, things should just be tuned to take that into account.

Leave a Comment
Name (required)

Mail (will not be published) (required)

Website (URL)

Authenticate this comment using OpenID.
Notify me of followup comments via e-mail. You can also subscribe without commenting.

reCAPTCHA challenge image
Type the text
Privacy & Terms Get a new challenge
Get an audio challenge

« A day in the life of Datacenter Disasters PayPal & decisions on acceptable use «