Score:0

Is the slow down due to traffic every day near 12 pm

in flag

I checked the slow logs and I got only 4 queries in 2 hours and all of them were similar to this:

"SELECT HEX(uhash) AS uhash, vehid, IF(deleted = 0 AND follow_price_drop = 1, 1, 0) AS follow_price_drop, email, deleted 
       FROM wp_ product_favorite_count AS cfc 
       INNER JOIN wp_ product_favorite_user AS cfu ON cfc. product_favorite_user_uhash = cfu.uhash
       WHERE cfc.updated > '2021-09-23 12:49:02' OR cfu.updated > '2021-09-23 12:49:02'"

I checked top and htop and I often get 100 cpu usage on all 6 cpu cores.

Most of the CPU usage come from mysqld, so I logged the db:

https://pastebin.com/BBv7ngW5

iostat -xm 5 3 gave me:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.34    0.01    1.80    1.13    0.08   85.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda             39.75   720.61   79.81  192.29     0.99     3.57    34.30     0.02    0.09    0.19    0.04   0.09   2.53

^[[A^[[A^[[Aavg-cpu:  %user   %nice %system %iowait  %steal   %idle
          84.15    0.00    6.16    0.05    0.03    9.61

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.80    31.00   14.40   19.80     0.65     0.20    50.95     0.02    0.73    0.93    0.58   0.43   1.48

^[[A^[[Bavg-cpu:  %user   %nice %system %iowait  %steal   %idle
          84.54    0.00    4.95    0.10    0.05   10.36

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     2.40   22.60    1.60     1.77     0.02   151.40     0.02    1.02    1.04    0.75   0.64   1.56

ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128341
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 128341
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I checked the general query log after checking the slow query log and was surprised that I got so many queries. When traffic is ordinary, I got: 136235 queries most of which are SELECT queries after 10 minutes. And when traffic is high, I got: 195650 queries in 10 minutes. I doubt it's 195650 visitors, but for some reason the calls are inside the general_log. The slow_query_log had only 4 queries and they didn't look like unoptimized queries. Is there anything else I should look at, or is this enough to surmise that it's from traffic and we should upgrade the server?

top roughly look like this, I couldn't capture it in time, but when it reached 95%+ cpu, the screen looked like this:

top - 13:04:51 up 1140 days, 19:59,  2 users,  load average: 26.57, 16.21, 8.92
Tasks: 429 total,  12 running, 421 sleeping,   0 stopped,   0 zombie
Cpu(s): 91.3%us,  1.6%sy,  0.0%ni, 65.7%id,  3.1%wa,  0.0%hi,  0.2%si,  0.1%st
Mem:  32877280k total, 31367584k used,  1509696k free,  3960824k buffers
Swap:        0k total,        0k used,        0k free,  3980580k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 
14576 mysql     20   0 12.9g 8.5g 8424 S 951.6 27.2  18841:47 mysqld                                                  
 6032  martind     20   0  510m  65m 9160 S 61.4  0.2   2:49.40 php-fpm                                                  
 7329  martind     20   0  498m  63m 5556 R 57.6  0.2   0:47.15 php-fpm                                                  
 7321  martind     20   0  487m  52m 5532 R 46.1  0.2   0:45.18 php-fpm                                                  
 7160  martind     20   0  488m  52m 5540 R 44.1  0.2   1:02.67 php-fpm                                                  
 6031  martind     20   0  511m  67m 8076 S 42.2  0.2   2:50.87 php-fpm                                                  
 6696  martind     20   0  498m  63m 5700 S 38.4  0.2   1:36.38 php-fpm                                                  
 7283  martind     20   0  494m  59m 5268 S 34.5  0.2   0:46.19 php-fpm                                                  
 7314  martind     20   0  490m  55m 5536 R 33.0  0.2   0:44.22 php-fpm                                                  
 7330  martind     20   0  496m  60m 5436 R 26.4  0.2   0:46.82 php-fpm                                                  
 7305  martind     20   0  494m  58m 5572 R 25.4  0.2   0:48.85 php-fpm                                                  
 6706  martind     20   0  507m  62m 8060 S 13.7  0.2   1:40.55 php-fpm                                                  
 7276  martind     20   0  498m  63m 5264 S  7.7  0.2   0:49.89 php-fpm                                                  
17464 redis     20   0 4328m 2.3g  888 R  7.7  7.3   7827:30 redis-server                                             
 6402  martind     20   0  511m  67m 8056 S  5.8  0.2   2:15.21 php-fpm                                                  
 6405  martind     20   0  512m  69m 9204 S  5.8  0.2   2:14.32 php-fpm                                                  
 6703  martind     20   0  513m  67m 8056 S  5.8  0.2   1:39.40 php-fpm                                                  
 6705  martind     20   0  513m  68m 9040 S  5.8  0.2   1:36.18 php-fpm                                                  
 7303  martind     20   0  493m  57m 6556 S  5.8  0.2   0:47.04 php-fpm                                                  
 7304  martind     20   0  494m  59m 5264 S  5.8  0.2   0:48.70 php-fpm                                                  
 7323  martind     20   0  511m  67m 7772 S  5.8  0.2   0:45.53 php-fpm                                                  
24515 nginx     20   0  123m  66m 2452 S  5.8  0.2   7231:17 nginx                                                    
 6039  martind     20   0  507m  63m 8200 S  3.8  0.2   2:48.39 php-fpm                                                  
 6400  martind     20   0  511m  68m 8204 S  3.8  0.2   2:13.54 php-fpm                                                  
 6401  martind     20   0  510m  66m 9052 S  3.8  0.2   2:13.36 php-fpm                                                  
 6404  martind     20   0  512m  68m 9048 S  3.8  0.2   2:12.75 php-fpm 

So because there are so many SQL queries when it tends to slow down a lot, I am thinking it's caused by a high traffic. I checked the cronjobs (wordpress cronjobs and php cronjobs) and nothing seems to run when it slows, there might be a rsync process running at the same time, but the rsync process runs at all time, so I doubt it's caused by this. Is there anything I can check?

Wilson Hauck avatar
jp flag
For your slow query posted (early in question), could you post A) EXPLAIN SELECT ..... and for each table used, B) SHOW CREATE TABLE table_name; and C) SHOW TABLE STATUS WHERE name LIKE 'tbl_name'; ? You may want to consider creating 6G Swap Space to enable likely survival when busy (even if slow for a few seconds). You may wish to review this URL for some rsync considerations since you are using rsync at the same time - https://www.electricmonk.nl/log/2016/11/06/very-fast-mysql-slave-setup-with-zero-downtime-using-rsync/ Welcome to serverfault.
Wilson Hauck avatar
jp flag
What is your result of SELECT @@general_log; at this time? If the result is ON, is there a requirement to have the general_log on? Is there a reason your log_output is TABLE? On a hard unexpected crash, you will have not clue what would have been in your error log. That can be corrected with log_output=TABLE,FILE (and will take storage space on your media/storage device).
Wilson Hauck avatar
jp flag
Please post your php.cnf file for analysis. And your code that is used to 'connect', 'process', 'close' the connection.
Score:0
ua flag

Analysis of GLOBAL STATUS and VARIABLES:

Observations:

  • Version: 10.4.12-MariaDB
  • 32 GB of RAM
  • Uptime = 19d 23:11:43
  • It appears that you are running both MyISAM and InnoDB.
  • 240 QPS

The More Important Issues:

Change long_query_time to 1 so you can catch more queries in the slowlog. (You have 10 seconds now; this probably explains why you found only 4 queries.) There are several clues that some of the queries are running inefficiently. Here's a way to find such queries: http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog

Why are you using MyISAM? The values are confusing -- it is as if you [re]built an index for a large MyISAM table, but did not do much else. In most cases, it is better to use InnoDB.

innodb_buffer_pool_size can probably be increased to improve InnoDB query speed.

Be cautious about the general_log -- it fills up the disk rather rapidly.

The "Query Cache" is running inefficiently. I recommend turning it completely off: query_cache_type=off and query_cache_size=0.

Max_used_connections hit 152, indicating that a lot of users are connected at the same time. (This does not say that 152 queries were running simultaneously.)

Details and other observations:

Conversion from MyISAM to InnoDB ( Key_blocks_used * 1024 / key_buffer_size ) = 460 * 1024 / 128M = 0.35% -- Percent of key_buffer used. High-water-mark. -- Lower key_buffer_size (now 134217728) to avoid unnecessary memory usage.

( (key_buffer_size / 0.20 + innodb_buffer_pool_size / 0.70) ) = ((128M / 0.20 + 8192M / 0.70)) / 32768M = 37.7% -- Most of available ram should be made available for caching. -- http://mysql.rjweb.org/doc.php/memory

( general_log ) = general_log = ON -- Log (FILE or TABLE) of all queries run. -- Turn off the general_log (now ON) when not in use. That log can fill up the disk very rapidly.

( innodb_buffer_pool_size ) = 8,192 / 32768M = 25.0% -- % of RAM used for InnoDB buffer_pool -- Set to about 70% of available RAM. (To low is less efficient; too high risks swapping.)

( (key_buffer_size / 0.20 + innodb_buffer_pool_size / 0.70) ) = ((128M / 0.20 + 8192M / 0.70)) / 32768M = 37.7% -- (metric for judging RAM usage)

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 4). Also check for swapping.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity ) = 200 -- When flushing, use this many IOPs. -- Reads could be slugghish or spiky.

( Innodb_log_writes ) = 43,856,157 / 1725103 = 25 /sec

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 137,804,939,264 / (1725103 / 3600) / 2 / 48M = 2.86 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 1,725,103 / 60 * 48M / 137804939264 = 10.5 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 50331648). (Cannot change in AWS.)

( innodb_flush_method ) = innodb_flush_method = fsync -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveats about O_ALL_DIRECT

( default_tmp_storage_engine ) = default_tmp_storage_engine =

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_adaptive_hash_index ) = innodb_adaptive_hash_index = ON -- Usually should be ON. -- There are cases where OFF is better. See also innodb_adaptive_hash_index_parts (now 8) (after 5.7.9) and innodb_adaptive_hash_index_partitions (MariaDB and Percona). ON has been implicated in rare crashes (bug 73890). 10.5.0 decided to default OFF.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( character_set_server ) = character_set_server = latin1 -- Charset problems may be helped by setting character_set_server (now latin1) to utf8mb4. That is the future default.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( Key_blocks_used * 1024 / key_buffer_size ) = 460 * 1024 / 128M = 0.35% -- Percent of key_buffer used . High-water-mark. -- Lower key_buffer_size (now 134217728) to avoid unnecessary memory usage.

( Key_writes / Key_write_requests ) = 19,978,377 / 40284646 = 49.6% -- key_buffer effectiveness for writes -- If you have enough RAM, it would be worthwhile to increase key_buffer_size (now 134217728).

( query_cache_size ) = 524,288 = 0.5MB -- Size of QC -- Too small = not of much use. Too large = too much overhead. Recommend either 0 or no more than 50M.

( Qcache_lowmem_prunes ) = 125,234,412 / 1725103 = 73 /sec -- Running out of room in QC -- increase query_cache_size (now 524288)

( Qcache_lowmem_prunes/Qcache_inserts ) = 125,234,412/146211296 = 85.7% -- Removal Ratio (frequency of needing to prune due to not enough memory)

( Qcache_not_cached ) = 78,413,835 / 1725103 = 45 /sec -- SQL_CACHE attempted, but ignored -- Rethink caching; tune qcache

( Qcache_hits / Qcache_inserts ) = 37,201,050 / 146211296 = 0.254 -- Hit to insert ratio -- high is good -- Consider turning off the query cache.

( Qcache_hits / (Qcache_hits + Com_select) ) = 37,201,050 / (37201050 + 282029692) = 11.7% -- Hit ratio -- SELECTs that used QC -- Consider turning off the query cache.

( Qcache_hits / (Qcache_hits + Qcache_inserts + Qcache_not_cached) ) = 37,201,050 / (37201050 + 146211296 + 78413835) = 14.2% -- Query cache hit rate -- Probably best to turn off the QC.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (524288 - 78344) / 82 / 16384 = 0.332 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 16384)

( Created_tmp_tables ) = 96,501,765 / 1725103 = 56 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 23,539,653 / 1725103 = 14 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Created_tmp_disk_tables / Questions ) = 23,539,653 / 414140316 = 5.7% -- Pct of queries that needed on-disk tmp table. -- Better indexes / No blobs / etc.

( Select_full_join / Com_select ) = 30,333,225 / 282029692 = 10.8% -- % of selects that are indexless joins -- Add suitable index(es) to tables used in JOINs.

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (87669877 + 27242 + 0 + 0 + 1452911 + 0) / 1725103 = 52 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of HDD drives. If you have SSD, then this metric is probably fine.

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Max_used_connections / max_connections ) = 152 / 151 = 100.7% -- Peak % of connections -- increase max_connections (now 151) and/or decrease wait_timeout (now 28800). Or speed up queries.

( Connections ) = 11,987,448 / 1725103 = 6.9 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

( Connection_errors_accept + Connection_errors_internal + Connection_errors_peer_address + Connection_errors_select + Connection_errors_tcpwrap ) = 0 + 26 + 0 + 0 + 0 = 26 -- Connection errors other than max_connections. -- For more info, see SHOW GLOBAL STATUS LIKE 'Connection_errors%'

Abnormally small:

Created_tmp_files = 0.094 /HR
innodb_spin_wait_delay = 4

Abnormally large:

Aria_pagecache_writes = 34 /sec
Aria_transaction_log_syncs = 25,641
Com_show_warnings = 40 /HR
Connection_errors_internal = 0.054 /HR
Handler_read_key = 85109 /sec
Handler_tmp_update = 839 /sec
Innodb_buffer_pool_read_requests = 675158 /sec
Innodb_buffer_pool_read_requests / (Innodb_buffer_pool_read_requests + Innodb_buffer_pool_reads ) = 100.0%
Innodb_rows_updated = 356 /sec
performance_schema_max_cond_classes = 90

Abnormal strings:

Innodb_have_punch_hole = OFF
aria_recover_options = BACKUP,QUICK
disconnect_on_expired_password = OFF
ft_boolean_syntax = + -><()~*:
innodb_fast_shutdown = 1
log_output = TABLE
myisam_stats_method = NULLS_UNEQUAL
old_alter_table = DEFAULT
optimizer_trace = enabled=off
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.