Score:1

MariaDB cache going cold overnight

cn flag

We are running a LAMP stack which includes MariaDB 10.5.15 on Centos 7. It's a dedicated server with 4 CPU's and 8GB RAM.

Never had a problem until the last few weeks, where I noticed slow queries taking 8-9 seconds in the slow query log.

These queries are against multiple INNODB databases, but always large tables (i.e. 10,000+ rows). They always occur between the hours of 6am and 8am.

The tables in question have indexes, and the same queries execute in less than a second ordinarily.

I found that by logging into the server at 8am on a Sunday and querying these large tables with simple select statements, they would take 8-9 seconds to execute. Then, for the rest of the day, even beyond midnight, any queries against that table would be fast.

6am - 8am would also be the time when things "come back to life" after a period of inactivity during the early hours.

It seems like there is some kind of cache that is going cold and needs to warm up, but I am not sure why this suddenly started happening after years of trouble free use. There is no huge process running on the server overnight that I know of, and the server is not under load when this happens.

We monitor the CPU load 24/7 and periodically check the number of connections to Apache and MariaDB which remain moderately low throughout the day. Typically there is around 3GB memory free excluding buffers and cache.

Edit

Embarrassingly, I discovered that the slow queries in question actually weren't using indexes and were doing a full table scan. An initial query can take several seconds, then for the rest of the day even the full table scan takes less than a second. I am assuming this is some kind of disk cache that is going cold overnight.

Although it might not seem unusual for a full table scan to have performance issues, it still seems odd that this suddenly became an issue out of the blue.

Score:0
ua flag

You seem to have ruled out a nightly dump and some slow process?

More debugging suggestions:

Lower the global setting of long_query_time and keep the slowlog turned on; you might catch the villain.

How often does your query run? What I am fishing for here is narrowing down the time when things get slow.

Check SHOW GLOBAL STATUS LIKE "Uptime" If that is less than 86400, then the MariaDB has restarted less than 24 hours ago. In this case, you should be able to pin the time nearly to the second. (But not identify the cause).

Show us the query, plus SHOW CREATE TABLE and EXPLAIN... We may be able to help you speed it up (eg, by a better index or a reformulation of the query), thereby eliminating the problem (though without solving the cause).

MrCarrot avatar
cn flag
`uptime` is several days ago (since we last restarted manually). I don't think reviewing the table schema will help because this affects multiple tables across multiple databases (all large, but all using appropriate indexes). Yesterday I ran a crobjob which called one of the queries every 15 minutes, hoping to pin point the time when things go wrong (I suspect 6am but could be earlier). This morning there are no slow queries logged. It could be that the cron job kept the database warm, or it could be coincidence. I'll leave it a few days and then reduce the `long_query_time` setting
MrCarrot avatar
cn flag
Embarrassingly, I discovered that the slow queries in question actually weren't using indexes and were doing a full table scan. An initial query can take several seconds, then for the rest of the day even the full table scan takes less than a second. I am assuming this is some kind of disk cache that is going cold overnight. Although it might not seem unusual for a full table scan to have performance issues, it still seems odd that this suddenly became an issue out of the blue.
ua flag
@MrCarrot - Disk caching... Do you have a hardware RAID controller? Are you running in the Cloud? If "no" to both, I would not expect a "disk cache" to be the answer. InnoDB uses its in-RAM "buffer_pool".
MrCarrot avatar
cn flag
The disk is an SSD drive with RAID1 (it is a dedicated server). I was worried perhaps the drive was failing or the RAID array under load but the hosting company claims it is okay. I guess it could be the buffer pool going cold.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.