Question

0

362

Views

Understanding Memory Usage by PyTorch DataLoader Workers

When running a PyTorch training program with num_workers=32 for DataLoader, htop shows 33 python process each with 32 GB of VIRT and 15 GB of RES.

Does this mean that the PyTorch training is using 33 processes X 15 GB = 495 GB of memory? htop shows only about 50 GB of RAM and 20 GB of swap is being used on the entire machine with 128 GB of RAM. So, how do we explain the discrepancy?

Is there a more accurate way of calculating the total amount of RAM being used by the main PyTorch program and all its child DataLoader worker processes?

Thank you

over 3 years ago · Santiago Trujillo

2 answers

Answer question

0

Answer question

Find remote jobs

Answer 1 · 2022-03-10T19:51:37.528Z

There is a python function called tracemalloc which is used to trace memory blocks allocated to python. https://docs.python.org/3/library/tracemalloc.html

Tracebacks
Statics on memory per filename
Compute the diff between snapshots

import tracemalloc
tracemalloc.start()
do_someting_that_consumes_ram_and releases_some()
# show how much RAM the above code allocated and the peak usage
current, peak =  tracemalloc.get_traced_memory()
print(f"{current:0.2f}, {peak:0.2f}")
tracemalloc.stop()

https://discuss.pytorch.org/t/measuring-peak-memory-usage-tracemalloc-for-pytorch/34067

Answer 2 · 2022-03-10T19:51:37.808Z

Does this mean that the PyTorch training is using 33 processes X 15 GB = 495 GB of memory?

Not necessary. You have a worker process (with several subprocesses - workers) and the CPU has several cores. One worker usually loads one batch. The next batch can already be loaded and ready to go by the time the main process is ready for another batch. This is the secret for the speeding up.

I guess, you should use far less num_workers.

It would be interesting to know your batch size too, which you can adapt for the training process as well.

Is there a more accurate way of calculating the total amount of RAM being used by the main PyTorch program and all its child DataLoader worker processes?

I was googling but could not find a concrete formula. I think that it is a rough estimation of how many cores has your CPU and Memory and Batch Size.

To choose the num_workers depends on what kind of computer you are using, what kind of dataset you are taking, and how much on-the-fly pre-processing your data requires.

HTH

0

362

Understanding Memory Usage by PyTorch DataLoader Workers

2 answers

0

0

Find remote jobs

Andres GPT