I want to install slurm to manage properly my DIY cluster. I want to use the cluster (HPC) for parallel simulations. I have 3 nodes (1 master 2 slaves)
Ubuntu Server 20.04
I followed the instructions from nekodaemon.com (I can't access the website right now) in the chapter "Slurm Quick installation for cluster on Ubuntu 20.04", but I removed the last line they say to add on the compute node
CgroupMountpoint=/sys/fs/cgroup
because it created an error when launching the start
Process: 46877 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
May 02 10:15:54 ben1 systemd[1]: Starting Slurm node daemon...
May 02 10:15:54 ben1 slurmd[46877]: error: _parse_next_key: Parsing error at unrecognized key: CgroupMountpoint
May 02 10:15:54 ben1 slurmd[46877]: error: Parse error in file /etc/slurm-llnl/slurm.conf line 149: "CgroupMountpoint=/sys/fs/cgroup"
May 02 10:15:54 ben1 slurmd[46877]: fatal: Unable to process configuration file
May 02 10:15:54 ben1 systemd[1]: slurmd.service: Control process exited, code=exited, status=1/FAILURE
May 02 10:15:54 ben1 systemd[1]: slurmd.service: Failed with result 'exit-code'.
May 02 10:15:54 ben1 systemd[1]: Failed to start Slurm node daemon.
After this I was able to start munge and slurm on the master node but on the compute node:
I run:
sudo systemctl start slurmd
I get:
Job for slurmd.service failed because the control process exited with error code.
See "systemctl status slurmd.service" and "journalctl -xe" for details.
Then I run journalctl -xe
and I get:
The job identifier is 22481 and the job result is failed.
May 02 10:48:48 ben1 sudo[47959]: pam_unix(sudo:session): session closed for user root
May 02 10:49:04 ben1 multipath[47985]: sdc: can't store path info
May 02 10:49:04 ben1 multipathd[771]: sdc: spurious uevent, path not found
May 02 10:49:04 ben1 multipathd[771]: uevent trigger error
May 02 10:49:05 ben1 multipath[47992]: sdc: can't store path info
May 02 10:49:06 ben1 multipathd[771]: sdc: spurious uevent, path not found
May 02 10:49:06 ben1 multipathd[771]: uevent trigger error