Score:0

Slurm jobs undesirably get access to all threads

kg flag

I have one Ryzen R9 5950x CPU (16 cores/32 threads), one Xeon Phi 7120p card and partition/node in slurm.conf defined as:

NodeName=mic0 RealMemory=15000 Sockets=1 CoresPerSocket=61 ThreadsPerCore=4 State=UNKNOWN
PartitionName=compute Nodes=mic0 Default=YES MaxTime=INFINITE State=UP TRESBillingWeights="CPU=1.0,Mem=4.0G"

NodeName=amd RealMemory=10000 Sockets=1 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN 
PartitionName=fast Nodes=amd Default=No MaxTime=INFINITE State=UP TRESBillingWeights="CPU=4.0,Mem=4.0G"

I want to run one task per core or thread of Ryzen CPU, but each of tasks in my jobs gets access to all CPU threads. For example, after the job allocation with salloc -p fast -n 8 --threads-per-core=1 --mem=256mb, the following command srun -l --cpu_bind=threads cat /proc/self/status | grep Cpus_allowed_list | sort -n displays:

0: Cpus_allowed_list:   0-31
1: Cpus_allowed_list:   0-31
2: Cpus_allowed_list:   0-31
3: Cpus_allowed_list:   0-31
4: Cpus_allowed_list:   0-31
5: Cpus_allowed_list:   0-31
6: Cpus_allowed_list:   0-31
7: Cpus_allowed_list:   0-31

I want one task to use only one thread or eventually core. The same problem is with salloc -p fast -n 8 --ntasks-per-core=1 --mem=256mb

In contrast to Ryzen, everything works just fine with Xeon Phi.

How can I fix the problem? Is there a mistake in the slurm.conf or the job allocation lines?

The slurm version is 21.08.8-2. The OS is CentOS 7.

The complete slurm.conf (it is a very small "cluster", just a workstation):

ClusterName=cluster
SlurmctldHost=amd
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
PriorityType=priority/multifactor
PriorityWeightAge=1000
PriorityWeightFairshare=10000
PriorityWeightJobSize=1000
PriorityWeightPartition=1000
PriorityWeightQOS=1000 # don't use the qos factor
PriorityWeightTRES=CPU=1000,Mem=4000
PriorityFavorSmall=YES
AccountingStorageEnforce=associations,limits
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreFlags=job_comment
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
NodeName=mic0 RealMemory=15000 Sockets=1 CoresPerSocket=61 ThreadsPerCore=4 State=UNKNOWN
PartitionName=compute Nodes=mic0 Default=YES MaxTime=INFINITE State=UP TRESBillingWeights="CPU=1.0,Mem=4.0G"
#
NodeName=amd RealMemory=10000 Sockets=1 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
PartitionName=fast Nodes=amd Default=No MaxTime=INFINITE State=UP TRESBillingWeights="CPU=4.0,Mem=4.0G"
jm flag
It looks like you don't have `cgroups` defined/enabled in your configuration. Setting the parameter `TaskPlugin=task/cgroup` and having `ConstrainCores=yes` in your cgroup.conf may do what you need.
Igor Popov avatar
kg flag
@doneal24 I cannot do it. Namely I run CentOS in WSL1 where setting up cgroups is an issue (not implemented). I have to opt for WSL but not the bare metal OS installation since CentOS 7 (and old ReadHat linux) support the old technology (Xeon Phi), and particularly the old Intel compilers which are the only compilers that compile for Xeon Phi with auto-vectorization support.
Igor Popov avatar
kg flag
@doneal24 Additionally it is not possible to bare metal-install CentOS 7 on Ryzen 5950x (it is not supported and is a known issue). However, my slurm configuration without cgroups on WSL CentOS works perfectly. If I issue `salloc -n 8 --ntasks-per-core=2` the above `srun` command nicely prints used cores 1,2, 5,6, 9, 10 (Xeon Phi has 4 threads per core). So it seems possible to have a good task-thread assignment even without cgroups. There is a problem with Ryzen.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.