Company logo
  • Jobs
  • Bootcamp
  • About Us
  • For professionals
    • Home
    • Jobs
    • Courses
    • Questions
    • Teachers
    • Bootcamp
  • For business
    • Home
    • Our process
    • Plans
    • Assessments
    • Payroll
    • Blog
    • Calculator

0

108
Views
Understanding Memory Usage by PyTorch DataLoader Workers

When running a PyTorch training program with num_workers=32 for DataLoader, htop shows 33 python process each with 32 GB of VIRT and 15 GB of RES.

Does this mean that the PyTorch training is using 33 processes X 15 GB = 495 GB of memory? htop shows only about 50 GB of RAM and 20 GB of swap is being used on the entire machine with 128 GB of RAM. So, how do we explain the discrepancy?

Is there a more accurate way of calculating the total amount of RAM being used by the main PyTorch program and all its child DataLoader worker processes?

Thank you

11 months ago · Santiago Trujillo
2 answers
Answer question

0

There is a python function called tracemalloc which is used to trace memory blocks allocated to python. https://docs.python.org/3/library/tracemalloc.html

  • Tracebacks
  • Statics on memory per filename
  • Compute the diff between snapshots
import tracemalloc
tracemalloc.start()
do_someting_that_consumes_ram_and releases_some()
# show how much RAM the above code allocated and the peak usage
current, peak =  tracemalloc.get_traced_memory()
print(f"{current:0.2f}, {peak:0.2f}")
tracemalloc.stop()

https://discuss.pytorch.org/t/measuring-peak-memory-usage-tracemalloc-for-pytorch/34067

11 months ago · Santiago Trujillo Report

0

Does this mean that the PyTorch training is using 33 processes X 15 GB = 495 GB of memory?

Not necessary. You have a worker process (with several subprocesses - workers) and the CPU has several cores. One worker usually loads one batch. The next batch can already be loaded and ready to go by the time the main process is ready for another batch. This is the secret for the speeding up.

I guess, you should use far less num_workers.

It would be interesting to know your batch size too, which you can adapt for the training process as well.

Is there a more accurate way of calculating the total amount of RAM being used by the main PyTorch program and all its child DataLoader worker processes?

I was googling but could not find a concrete formula. I think that it is a rough estimation of how many cores has your CPU and Memory and Batch Size.

To choose the num_workers depends on what kind of computer you are using, what kind of dataset you are taking, and how much on-the-fly pre-processing your data requires.

HTH

11 months ago · Santiago Trujillo Report
Answer question
Find remote jobs