Score:1

Configuring MySQL for SLURM

cn flag
Ray

I'm having problems getting SLURM (for job scheduling) to work with a MySQL database. I was using this as a reference, but perhaps I misunderstood something in it. If someone can let me know what I've missed, that would be great...

This is SLURM 21.08 on Ubuntu 22.10. I'm using MySQL 8.0.32 .

I previously had configured SLURM working with completion and accounting being stored in a file. And it seemed to be working fine; the controller was up and I ran one or two jobs ok.

Then, I switched to MySQL. My /etc/slurm/slurm.conf had these values updated:

 Job Completion Logging | MySQL
      JobCompLoc | slurm_complete_db
      JobCompHost | localhost
      JobCompPort | <blank>
      JobCompUser | slurm
      JobCompPass | ...some password...
 Job Accounting Storage | SlurmDBD
      AccountingStorageLoc | slurm_acct_db
      AccountingStorageHost | localhost
      AccountingStoragePort | <blank>
      AccountingStorageUser | slurm
      AccountingStoragePass | ...
      AccountingStoreFlags | job_script,job_env

And in /etc/slurm/slurmdbd.conf:

 AuthType=auth/munge
 DbdHost=xps8930
 DebugLevel=info
 StorageHost=xps8930
 StorageLoc=slurm_acct_db
 StoragePass=...
 StorageType=accounting_storage/mysql
 StorageUser=slurm
 LogFile=/var/log/slurm/slurmdbd.log
 PidFile=/run/slurmdbd.pid
 SlurmUser=slurm

I've created two MySQL databases, a user called "slurm", and grant privileges as follows:

CREATE DATABASE slurm_complete_db DEFAULT CHARACTER SET utf8 COLLATE
utf8_unicode_ci ;
CREATE DATABASE slurm_acct_db DEFAULT CHARACTER SET utf8 COLLATE
utf8_unicode_ci ;
CREATE USER 'slurm'@'%' IDENTIFIED WITH caching_sha2_password BY '' ;
GRANT ALL ON slurm_complete_db.* TO 'slurm'@'%';
GRANT ALL ON slurm_acct_db.* TO 'slurm'@'%';

I confirmed using the "show engines" command that InnoDB support is enabled.

Since the databases are empty, I believe my next step ought to be configuring the database. In slurm.conf, I called my ClusterName "personal". So, I did this:

$ sacctmgr add cluster personal
sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to host:localhost:6819: Connection refused
sacctmgr: error: Sending PersistInit msg: Connection refused

slurm and slurmdbd are running (SLURM and MySQL are on the same computer):

$ ps -aef | grep slurm
root        1407       1  0 09:42 ?        00:00:08 /usr/sbin/slurmd -D -s
root        1857       1  0 09:43 ?        00:00:03 /usr/sbin/slurmdbd -D -s

In /var/log/slurm/slurmdbd.log, I see this:

[2023-01-26T18:06:02.541] error: mysql_real_connect failed: 2003 Can't
connect to MySQL server on 'xps8930:3306' (111)
[2023-01-26T18:06:02.541] error: The database must be up when starting
the MYSQL plugin.  Trying again in 5 seconds.

In /var/log/slurm/slurmctld.log, I have this:

[2023-01-26T09:42:33.264] error: Configured MailProg is invalid
[2023-01-26T09:42:33.350] slurmctld version 21.08.5 started on cluster personal
[2023-01-26T09:42:36.121] error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819:
Connection refused
[2023-01-26T09:42:36.121] error: Sending PersistInit msg: Connection refused
[2023-01-26T09:42:36.153] accounting_storage/slurmdbd:  clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
[2023-01-26T09:42:36.153] error: Sending PersistInit msg: Connection refused
[2023-01-26T09:42:36.154] error: Sending PersistInit msg: Connection refused
[2023-01-26T09:42:37.456] No memory enforcing mechanism configured.
[2023-01-26T09:42:39.924] error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
[2023-01-26T09:42:39.924] fatal: You haven't inited this storage yet.

I'm not sure what I should do next or what steps I'm missing. I guess between slurmdbd and slurmctld, I should focus on slurmdbd first? Once it is working, then either slurmctld should come up and/or I can try to get it working.

Sorry for the long post! Any advice would be appreciated!

PS: The command munge -n | unmunge was successful.

uz flag
Jos
Apparently, MySQL was up when you created the databases, but the `/var/log/slurm/slurmdbd.log` says it's down. What does `systemctl status mysql` say?
cn flag
Ray
@Jos Thanks for the suggestion! So, `systemctl status mysql` says it is active and I can confirm I can use `mysql -u root -p` to log in on the command-line. Seems like the problem is with`slurmdbd` -- like I missed something...
uz flag
Jos
If Slurm keeps saying MySql is down, it is either trying to reach the wrong server (xps8930) or the wrong port (3306).
j4nd3r53n avatar
my flag
One thing worth checking is which interface mysqld binds to - I think by default it is 127.0.0.1, and I see you are trying to connect to xps8930, which I guess is the address of the NIC? If so, then that's why it can't find your MySQL
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.