Score:0

Can't stop/restart MariaDB after a server ran out of disk space?

ml flag

Using CentOS Stream release 8.

I've made space, but my site still responds with Fatal error: Uncaught PDOException: SQLSTATE[HY000] [2002] Connection refused in /var/www/sites/mysite/inc/DB.php

I've had similar issues before and a systemctl restart mysql would normally fix it, but that just appears to hang, doing nothing for minutes on end. Same with systemctl stop mysql.

Should I assume it's doing something under the hood, and give it more time?

Or do I need to do something else?

Here's /var/lib/mysql/mysql.err:

2022-11-18 19:15:04 1003402 [ERROR] Error writing file '/var/lib/mysql/slow_log' (errno: 28 "No space left on device")
2022-11-18 20:50:45 938512 [Warning] Aborted connection 938512 to db: 'mysite' user: 'root' host: '127.0.0.1' (Got timeout reading communication packets)
2022-11-18 20:50:45 906873 [Warning] Aborted connection 906873 to db: 'mysite' user: 'root' host: '127.0.0.1' (Got timeout reading communication packets)
2022-11-18 21:05:35 0 [Warning] mariadbd: Disk is full writing '/var/lib/mysql/aria_log.00000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60
2022-11-18 21:05:35 0 [Warning] mariadbd: Retry in 60 secs. Message reprinted in 600 secs
2022-11-18 21:15:35 0 [Warning] mariadbd: Retry in 60 secs. Message reprinted in 600 secs
2022-11-18 21:25:35 0 [Warning] mariadbd: Retry in 60 secs. Message reprinted in 600 secs
2022-11-18 21:35:35 0 [Warning] mariadbd: Retry in 60 secs. Message reprinted in 600 secs
2022-11-18 21:45:35 0 [Warning] mariadbd: Retry in 60 secs. Message reprinted in 600 secs
2022-11-18 21:55:35 0 [Warnin2022-11-19 13:05:51 0 [ERROR] mariadbd: Can't lock aria control file '/var/lib/mysql/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds
2022-11-19 13:06:21 0 [ERROR] mariadbd: Got error 'Could not get an exclusive lock; file is probably in use by another process' when trying to use aria control file '/var/lib/mysql/aria_log_co
2022-11-19 13:06:21 0 [ERROR] Plugin 'Aria' init function returned error.
2022-11-19 13:06:21 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
2022-11-19 13:06:21 0 [Note] InnoDB: Uses event mutexes
2022-11-19 13:06:21 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-11-19 13:06:21 0 [Note] InnoDB: Number of pools: 1
2022-11-19 13:06:21 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-11-19 13:06:21 0 [Note] InnoDB: Using Linux native AIO
2022-11-19 13:06:21 0 [Note] InnoDB: Initializing buffer pool, total size = 34359738368, chunk size = 134217728
2022-11-19 13:06:21 0 [Note] InnoDB: Completed initialization of buffer pool
2022-11-19 13:06:21 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1384180559603,1384180559603
2022-11-19 13:06:36 0 [Note] InnoDB: Read redo log up to LSN=1385773870592
2022-11-19 13:06:36 0 [Note] InnoDB: Ignoring data file './mysite/#sql-ib17516.ibd' with space ID 17501. Another data file called ./mysite/TBL_FULL_TEXTS.ibd exists with the same s
2022-11-19 13:06:36 0 [Note] InnoDB: Ignoring data file './mysite/#sql-ib17516.ibd' with space ID 17501. Another data file called ./mysite/TBL_FULL_TEXTS.ibd exists with the same s
2022-11-19 13:06:38 0 [Note] InnoDB: Starting final batch to recover 58536 pages from redo log.
2022-11-19 13:06:41 0 [Note] InnoDB: 128 rollback segments are active.
2022-11-19 13:06:41 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-11-19 13:06:41 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-11-19 13:06:41 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-11-19 13:06:41 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-11-19 13:06:41 0 [Note] InnoDB: 10.5.17 started; log sequence number 1385794764583; transaction id 262741173
2022-11-19 13:06:41 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-11-19 13:06:41 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-11-19 13:06:41 0 [ERROR] Could not open mysql.plugin table: "Unknown storage engine 'Aria'". Some plugins may be not loaded
2022-11-19 13:06:41 0 [ERROR] Failed to initialize plugins.
2022-11-19 13:06:41 0 [ERROR] Aborting

Those 2022-11-19 13:06 lines got written the first time I tried restart. After a while I ctrl-C'd it, and tried stop instead, which also appears to do nothing, and hasn't even written anything to the above log file.

The contents of /var/lib/mysql looks like:

-rw-rw---- 1 mysql mysql   32423936 Nov 19 04:59 aria_log.00000001
-rw-rw---- 1 mysql mysql         52 Nov 19 05:00 aria_log_control
-rw-rw---- 1 mysql mysql    5854594 Nov 14 12:12 ib_buffer_pool
-rw-rw---- 1 mysql mysql 1690304512 Nov 19 13:06 ibdata1
-rw-rw---- 1 mysql mysql 8589934592 Nov 19 13:06 ib_logfile0
-rw-rw---- 1 mysql mysql          0 Nov 12  2020 multi-master.info
drwx------ 2 mysql mysql       4096 Nov 12  2020 mysql
-rw-rw---- 1 mysql mysql      23122 Nov 19 13:06 mysql.err
srwxrwxrwx 1 mysql mysql          0 Nov 14 12:12 mysql.sock
-rw-r--r-- 1 root  root          15 Nov 12  2020 mysql_upgrade_info
drwx------ 2 mysql mysql       4096 Nov 12  2020 performance_schema
drwx------ 2 mysql mysql      16384 Nov 19 05:00 mysite
-rw-rw---- 1 mysql mysql 1028366336 Nov 19 04:55 slow_log
-rw-rw---- 1 mysql mysql          8 Nov 14 12:12 web1.pid

systemctl status mysql:

● mariadb.service - MariaDB 10.5.17 database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: deactivating (final-sigterm) (Result: exit-code) since Sat 2022-11-19 05:00:05 GMT; 8h ago
     Docs: man:mariadbd(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 1695715 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 1966403 ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 1966249 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_ST>
  Process: 1966248 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 1966403 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"
    Tasks: 8 (limit: 821928)
   Memory: 7.6G
   CGroup: /system.slice/mariadb.service
           ├─1950524 /usr/sbin/mariadbd
           └─1950718 /usr/sbin/mariadbd

Nov 19 13:05:51 web1.mysite.com systemd[1]: Starting MariaDB 10.5.17 database server...
Nov 19 13:05:51 web1.mysite.com systemd[1]: mariadb.service: Found left-over process 1950524 (mariadbd) in control group while starting unit. Ignoring.
Nov 19 13:05:51 web1.mysite.com systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Nov 19 13:05:51 web1.mysite.com systemd[1]: mariadb.service: Found left-over process 1950718 (mariadbd) in control group while starting unit. Ignoring.
Nov 19 13:05:51 web1.mysite.com systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Nov 19 13:05:51 web1.mysite.com mariadbd[1966403]: 2022-11-19 13:05:51 0 [Note] /usr/sbin/mariadbd (mysqld 10.5.17-MariaDB-log) starting as process 1966403 ...
Nov 19 13:05:51 web1.mysite.com mariadbd[1966403]: 2022-11-19 13:05:51 0 [Warning] Could not increase number of max_open_files to more than 32768 (request: 33323)
Nov 19 13:05:51 web1.mysite.com mariadbd[1966403]: /usr/sbin/mariadbd: One can only use the --user switch if running as root
Nov 19 13:06:44 web1.mysite.com systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Nov 19 13:21:44 web1.mysite.com systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.

What's my best course of action? Thank you!

Codemonkey avatar
ml flag
I just killed the process IDs (there were two, oddly) and then performed `systemctl start mysql` and it instantly seems fine. Was this an ok thing to do? What checks/fixes should I now do?
ua flag
In some situations, `KILL processid` can be scarily slow. This seems to be because it is undoing whatever transaction it was in the middle of. Your case may not have involved that. Anyway, InnoDB claims to (and seems to be very good at) recovering from _any_ type of abort. Kill/abort/crash/etc -- all are recoverable.
Score:0
ua flag

A note about "making space" in Unix-based servers.

  1. A program opens a file (such as the slowlog)
  2. You rm the file.
  3. It vanishes from any ls.
  4. But the disk is not released until that program terminates (gracefully or not).

Since you can "see" the slowlog, I don't think you are stuck there. However, you should deal with that later.

A note about /tmp.

That is often allocated on a separate "disk partition". ALTERing a big table may fill up /tmp and claim, cryptically, "table is full".

But that does not seem to be the problem.

Look around at other files.

  • Clearing caches for other products
  • Removing other stuff
Codemonkey avatar
ml flag
I know why I ran out of space, I had a batch job making new thumbnails from photos, which are stored on another server. The script pulled a photo down, resized and saved the thumbnail, then went onto the next one. But I forgot the crucial "delete the photo" step. 100,000 photos in, I ran out of space.
ua flag
@Codemonkey - Were the big photos in a table (as opposed to separate files)? Alas, you can't get in to `DELETE`. I wonder if this is a use case arguing for "keep big images in separate files, not a db table".
Codemonkey avatar
ml flag
No, they're kept as separate files on another server, but I bring them over to this server to reprocess them. I have no qualms/concerns/queries about running out of space. I fully understand what went wrong, and had no problems with fixing the issue. It was how to recover the side-effect of a crashed MariaDB that I was asking about.
ua flag
In 22 years of MySQL crashes, your case seems to be the worst. Usually, reboot + `REPAIR` was the most that was needed to get the database going again. Yours seems to be "more stuck". File a bug at jira.mariadb.org
Codemonkey avatar
ml flag
I don't know if you saw the comment I put at the end of my question. I managed to trivially fix the database by manually killing the process and then starting the DB. I'm guessing a reboot would have had a similar effect. I just wasn't sure if there's anything I should be doing to check/repair anything, or if mariaDB will have done that silently under the hood, and "if the site seems fine, there's probably no issue".
ua flag
@Codemonkey - Thanks for pointing me there. This forum is quite sophisticated, but that is one thing it doesn't do. (I'll add a response there.)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.