I'm trying to successfully run code in PyTorch that uses DataLoader
. It is possible to configure the DataLoader
to load data using several processes (which speeds up data loading a lot), via the use of the num_workers
argument, configuring it with a positive number (https://pytorch.org/docs/stable/data.html#multi-process-data-loading).
I can get the desired result by setting num_workers
to a number well above 0, for example to use multiple load processes if I adapt my code to not use GPU, i.e. several CPU cores are used normally, each one working in a process, but if I adapt my code to use GPU, if I set num_workers to a value greater than 0, only 1 CPU core is used, which is linked to a single process (main) which is left with 100% kernel utilization and the program does not progress in its execution. Regarding the Slurm script, I configure it to use 1 node, 1 task, and x cpus per task, x being the value I configure for num_workers
(as per instructions on this page - https://researchcomputing.princeton.edu/support/knowledge-base/pytorch - section "Data Loading using Multiple CPU-cores").
I've done many tests, but I can't solve this. The section "Single-process data loading (default)" - https://pytorch.org/docs/stable/data.html#single-process-data-loading-default, immediately preceding the section of the page that appears in the link that I passed above from the PyTorch website mentions that resource(s) used to share data between processes (e.g. shared memory, file descriptors) may be limited which would preclude using num_workers
> 0 properly, so would that be the case? If so, would there be any configuration I should do in the Slurm script or in my code? Although I think it wouldn't make sense for a situation like this to occur, since by adapting my code to not use the GPU I don't have any problems using num_workers
with a value > 0. Thanks in advance to your attention!