GPU Performance Benchmarking for Deep Learning
Introduction
Deep learning requires significant computational power to train models on large datasets. GPUs are commonly used for deep learning tasks due to their parallel processing capabilities. In this tutorial, we will compare the performance of three popular GPUs – P40, P100, and RTX 3090 – for deep learning tasks.
Benchmarking Tools
There are several tools available for benchmarking GPU performance for deep learning tasks. Two widely used tools are TensorFlow and PyTorch. These libraries provide APIs for interacting with GPUs and running deep learning models.
Setting Up the Environment
Before we start benchmarking the GPUs, we need to set up the environment with the necessary libraries and tools. Make sure you have TensorFlow or PyTorch installed on your system.
Running the Benchmark
To run the benchmark, we will use a common deep learning task such as training a neural network on the MNIST dataset. We will measure the time taken to train the model on each GPU and compare the performance.
P40 Benchmark
Insert code snippet for running benchmark on P40 GPU
P100 Benchmark
Insert code snippet for running benchmark on P100 GPU
RTX 3090 Benchmark
Insert code snippet for running benchmark on RTX 3090 GPU
Analysis
After running the benchmarks, compare the performance of each GPU based on the time taken to train the model. You may also consider factors such as power consumption and cost when choosing a GPU for your deep learning tasks.
Conclusion
In this tutorial, we have compared the performance of three GPUs – P40, P100, and RTX 3090 – for deep learning tasks. By benchmarking these GPUs, you can make an informed decision on which GPU to use for your deep learning projects.
TDD, great effort. The great P40 v. P100 question has been heating up for a while now and is only growing. Good on you for getting some good information out to the community.
Some observations:
Is the "Scaled Throughput per $ By GPU", chart accurate? Is it actually measuring 'Scaled CPU Throughput / $'? I could be misinterpreting the chart… forgive me if that's the case. Dividing the 'Scaled Throughput by GPU' number in the top chart (@17:23) by the prices you gave ($161.99, $149, $819.95):
P40 0.034508 / 0.03315
P100 0.043020 / 0.04101
RTX 3090 0.020684 / 0.01855
(Making the P100 about double the value of the RTX3090 vis-a-vis scaled CPU).
Value is the meat and potatoes of these benchmarks for a lot us, so I figured I'd ask for clarification in case others are confused as I am.
A very minor observation– and again I might be misinterpreting: the graphic representation of the 'Broad Performance' chart (@17:23 for reference) doesn't seem to correlate with the labelled scale on the left (or the relative proportions for a single bar either…). For instance, the blue 30.42 for the dual RTX 3090's looks way more than double the turquoise 16.96 for the single card, and also nowhere near the 47-ish labelled on the left of the chart. I'm no data scientist so maybe exaggerating proportions is common practice to highlight differences?
You have the best videos around about the relative price:performance of these cards so keep 'em coming.
Confused and very new to this but interested in a GPU build. Reading about NVLink on their site it says it was first introduced for the P100 but the specs on the one you showed only shows "Yes" for that on the 3090. Does the model of the P100 you tested lack that? If so seems a model with NVLink could potentially still be a good deal. Great content. Thanks!
Again this is great content! Commenting for the Algo.
very informative but this guy talks like elon, have to watch at 1.25-1.5 speed.
im considering a second p40. do you know if comfyui supports a dual setup?
What about llama,mistral,gemma training,fine-tunning inference? Maybe Karpathy's llm.c?
Really great Video! I am considering working with ML and developing myself further in the field of AI. But the price of the P40 seems very strange to me. I feel like I haven't found a P40 anywhere on the internet that costs less than $200. Maybe it's because I live somewhere else. But more than $200 for a single P40 is a bit much for me. I'm still in school, so the price is a deciding factor.
Sorry to be this guy but anyone got a TLDR?
Wow! Easy to follow, no gibberish, and pure information with clear and readable statistics!
I’m coming from power bi as well and the new company I’m working with is a Google shop, so definitely interested into a looker studio comparison if you’re game.
Thanks for this! We've been wondering since Craft Computing mentioned it recently. But ..
30:00 I don't follow your math here. Firstly, the 2xP40 bar says 10.74, but lies on the graph at over 15. I believe this bar should stop at 10.74 which not only shows its true value, but the height of the blue bar would visually show the performance added by the 2nd GPU.
2nd, I can't see where Throughput per Dollar numbers come from. The 3090 has T=17 and $=820, so should appear here at T/$=0.02 or $/T=48. Where did the 141 come from?
3rd, if you're going to look at running costs, then electricity is very important. In some locations electricity costs can exceed server costs in well under a year. It's the reason this hardware is so cheap — companies can't afford to keep it running.
Wow… Randomly stumbled upon the video. Thanks super useful! I wish someone would do a nice dataset like this, but with multi-GPU configs with nvlink vs. pcie and so on
we cant hear you… 🙁
If possible, please upload NVLink and PCIe extender video. It would be really helpful to understand them.
For some reason, 3090s are going out for $500 over here, and there are a lot of them on the market. Crypto crash or something? Anyway, that seems to change the calculations a lot in my case
Thank you for your work. Do you have any plans to test the tesla v100 16 GB? They are goes for half the price of the 3090 and support nvlink.
Oh wow, I'm in the market for the 3090 for LLMs but I've been eyeing the P40s because of their price. I saw your videos on the R720s and now I'm wondering if I can put a P40 in my R710.
I'm happy whenever you upload
PCI 3.0 vs PCI 4.0 makes a difference in this kind of setups?
Brilliant bud.