I would like to install SLURM on Clear Linux because of its good benchmarks. I have followed the tutorial at https://docs.01.org/clearlinux/latest/tutorials/hpc.html. When I got to the step of the section "Create slurm.conf configuration file" I noticed that slurmctld service didn't start. The error was related to the slurm.conf file. This was in the log:
jul 11 19:20:00 slurm-controller slurmctld[615]: error: Ignoring obsolete FastSchedule=1 option. Please remove from your configuration.
jul 11 19:20:00 slurm-controller slurmctld[615]: fatal: SallocDefaultCommand has been removed. Please consider setting LaunchParameters=use_interactive_step instead.
I deleted FastSchedule
and SallocDefaultCommand
lines from the config file. After that I added these lines:
LaunchParameters=use_interactive_step
InteractiveStepOptions="srun -n1 -N1 --pty --preserve-env --mpi=pmix_v3 $SHELL"
After I corrected that I could not continue because there is an undefined symbol in a shared object.
This is the log:
[2021-07-11T19:35:14.260] slurmctld version 20.11.8 started on cluster linux
[2021-07-11T19:35:14.261] cred/munge: init: Munge credential signature plugin loaded
[2021-07-11T19:35:14.262] debug: auth/munge: init: Munge authentication plugin loaded
[2021-07-11T19:35:14.262] select/cons_res: common_init: select/cons_res loaded
[2021-07-11T19:35:14.263] select/linear: init: Linear node selection plugin loaded with argument 1
[2021-07-11T19:35:14.263] select/cons_tres: common_init: select/cons_tres loaded
[2021-07-11T19:35:14.263] preempt/none: init: preempt/none loaded
[2021-07-11T19:35:14.264] debug: acct_gather_energy/none: init: AcctGatherEnergy NONE plugin loaded
[2021-07-11T19:35:14.264] debug: acct_gather_Profile/none: init: AcctGatherProfile NONE plugin loaded
[2021-07-11T19:35:14.264] debug: acct_gather_interconnect/none: init: AcctGatherInterconnect NONE plugin loaded
[2021-07-11T19:35:14.264] debug: acct_gather_filesystem/none: init: AcctGatherFilesystem NONE plugin loaded
[2021-07-11T19:35:14.265] debug2: No acct_gather.conf file (/etc/slurm/acct_gather.conf)
[2021-07-11T19:35:14.265] debug: jobacct_gather/none: init: Job accounting gather NOT_INVOKED plugin loaded
[2021-07-11T19:35:14.265] error: plugin_load_from_file: dlopen(/usr/lib64/slurm/prep_script.so): /usr/lib64/slurm/prep_script.so: undefined symbol: run_script
[2021-07-11T19:35:14.265] error: Couldn't load specified plugin name for prep/script: Dlopen of plugin file failed
[2021-07-11T19:35:14.266] error: prep_plugin_init: cannot create prep context for prep/script
[2021-07-11T19:35:14.266] fatal: failed to initialize prep plugin
Since the slurm.conf file of the bundle (package) of Clear Linux is outdated, I thought that maybe using a better configuration file the error would disappear. My hypothesis was that maybe I needed to load another plugin that has the run_script symbol. Then, I tried creating a better configuration file using https://slurm.schedmd.com/configurator.easy.html. But I got the same error.
Do you think it is either a bug of SLURM, something missing in the configuration or an error in the compilation of the bundle (package) I installed? I have noticed that in other Linux distributions there are similar issues with precompiled packages of SLURM. However, it happens with other shared objects and other symbols.
If the problem is Clear Linux what's the best Linux for SLURM?
I would appreciate any help you may give me. Thank very much in advance.
Best regards,
Braulio J. Solano-Rojas