Comparison of Deep Learning GPU Performance: P40 vs P100 vs RTX 3090

Posted by

Alfalfa

–

July 15, 2024

GPU Performance Benchmarking for Deep Learning – P40 vs P100 vs RTX 3090

GPU Performance Benchmarking for Deep Learning

Introduction

Deep learning requires significant computational power to train models on large datasets. GPUs are commonly used for deep learning tasks due to their parallel processing capabilities. In this tutorial, we will compare the performance of three popular GPUs – P40, P100, and RTX 3090 – for deep learning tasks.

Benchmarking Tools

There are several tools available for benchmarking GPU performance for deep learning tasks. Two widely used tools are TensorFlow and PyTorch. These libraries provide APIs for interacting with GPUs and running deep learning models.

Setting Up the Environment

Before we start benchmarking the GPUs, we need to set up the environment with the necessary libraries and tools. Make sure you have TensorFlow or PyTorch installed on your system.

Running the Benchmark

To run the benchmark, we will use a common deep learning task such as training a neural network on the MNIST dataset. We will measure the time taken to train the model on each GPU and compare the performance.

P40 Benchmark

        Insert code snippet for running benchmark on P40 GPU

P100 Benchmark

        Insert code snippet for running benchmark on P100 GPU

RTX 3090 Benchmark

        Insert code snippet for running benchmark on RTX 3090 GPU

Analysis

After running the benchmarks, compare the performance of each GPU based on the time taken to train the model. You may also consider factors such as power consumption and cost when choosing a GPU for your deep learning tasks.

Conclusion

In this tutorial, we have compared the performance of three GPUs – P40, P100, and RTX 3090 – for deep learning tasks. By benchmarking these GPUs, you can make an informed decision on which GPU to use for your deep learning projects.

3090, Bottle, comparison, deep, django, fastapi,, flask, gpu, Keras, Kivy, learning, p100, p40, performance:, PyQt, PySimpleGUI, python, PyTorch, RTX, scikit-learn, TensorFlow, Tkinter

Alfalfa

0 0 votes

Article Rating

20 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

@darnell8897

3 months ago

TDD, great effort. The great P40 v. P100 question has been heating up for a while now and is only growing. Good on you for getting some good information out to the community.

Some observations:

Is the "Scaled Throughput per $ By GPU", chart accurate? Is it actually measuring 'Scaled CPU Throughput / $'? I could be misinterpreting the chart… forgive me if that's the case. Dividing the 'Scaled Throughput by GPU' number in the top chart (@17:23) by the prices you gave ($161.99, $149, $819.95):

P40 0.034508 / 0.03315
P100 0.043020 / 0.04101
RTX 3090 0.020684 / 0.01855

(Making the P100 about double the value of the RTX3090 vis-a-vis scaled CPU).

Value is the meat and potatoes of these benchmarks for a lot us, so I figured I'd ask for clarification in case others are confused as I am.

A very minor observation– and again I might be misinterpreting: the graphic representation of the 'Broad Performance' chart (@17:23 for reference) doesn't seem to correlate with the labelled scale on the left (or the relative proportions for a single bar either…). For instance, the blue 30.42 for the dual RTX 3090's looks way more than double the turquoise 16.96 for the single card, and also nowhere near the 47-ish labelled on the left of the chart. I'm no data scientist so maybe exaggerating proportions is common practice to highlight differences?

You have the best videos around about the relative price:performance of these cards so keep 'em coming.

@jdcodersteinersky7257

3 months ago

Confused and very new to this but interested in a GPU build. Reading about NVLink on their site it says it was first introduced for the P100 but the specs on the one you showed only shows "Yes" for that on the 3090. Does the model of the P100 you tested lack that? If so seems a model with NVLink could potentially still be a good deal. Great content. Thanks!

@callmebigpapa

3 months ago

Again this is great content! Commenting for the Algo.

@kazadori164

3 months ago

very informative but this guy talks like elon, have to watch at 1.25-1.5 speed.

@gamingthunder6305

3 months ago

im considering a second p40. do you know if comfyui supports a dual setup?

@StoianAtanasov

3 months ago

What about llama,mistral,gemma training,fine-tunning inference? Maybe Karpathy's llm.c?

@Artikel.1

3 months ago

Really great Video! I am considering working with ML and developing myself further in the field of AI. But the price of the P40 seems very strange to me. I feel like I haven't found a P40 anywhere on the internet that costs less than $200. Maybe it's because I live somewhere else. But more than $200 for a single P40 is a bit much for me. I'm still in school, so the price is a deciding factor.

@drewroyster3046

3 months ago

Sorry to be this guy but anyone got a TLDR?

@repixelatedmc

3 months ago

Wow! Easy to follow, no gibberish, and pure information with clear and readable statistics!

@VastCNC

3 months ago

I’m coming from power bi as well and the new company I’m working with is a Google shop, so definitely interested into a looker studio comparison if you’re game.

@ICanDoThatToo2

3 months ago

Thanks for this! We've been wondering since Craft Computing mentioned it recently. But ..

30:00 I don't follow your math here. Firstly, the 2xP40 bar says 10.74, but lies on the graph at over 15. I believe this bar should stop at 10.74 which not only shows its true value, but the height of the blue bar would visually show the performance added by the 2nd GPU.

2nd, I can't see where Throughput per Dollar numbers come from. The 3090 has T=17 and $=820, so should appear here at T/$=0.02 or $/T=48. Where did the 141 come from?

3rd, if you're going to look at running costs, then electricity is very important. In some locations electricity costs can exceed server costs in well under a year. It's the reason this hardware is so cheap — companies can't afford to keep it running.

@BaldyMacbeard

3 months ago

Wow… Randomly stumbled upon the video. Thanks super useful! I wish someone would do a nice dataset like this, but with multi-GPU configs with nvlink vs. pcie and so on

@makerspersona5456

3 months ago

we cant hear you… 🙁

@gorangagrawal

3 months ago

If possible, please upload NVLink and PCIe extender video. It would be really helpful to understand them.

@Horesmi

3 months ago

For some reason, 3090s are going out for $500 over here, and there are a lot of them on the market. Crypto crash or something? Anyway, that seems to change the calculations a lot in my case

@JimCareyMulligan

3 months ago

Thank you for your work. Do you have any plans to test the tesla v100 16 GB? They are goes for half the price of the 3090 and support nvlink.

@werthersoriginal

3 months ago

Oh wow, I'm in the market for the 3090 for LLMs but I've been eyeing the P40s because of their price. I saw your videos on the R720s and now I'm wondering if I can put a P40 in my R710.

@H0mework

3 months ago

I'm happy whenever you upload

@jaroman

3 months ago

PCI 3.0 vs PCI 4.0 makes a difference in this kind of setups?

@scentilatingone2148

3 months ago

Brilliant bud.

Comparison of Deep Learning GPU Performance: P40 vs P100 vs RTX 3090

GPU Performance Benchmarking for Deep Learning

Introduction

Benchmarking Tools

Setting Up the Environment

Running the Benchmark

P40 Benchmark

P100 Benchmark

RTX 3090 Benchmark

Analysis

Conclusion

Like this:

Recent Posts

Categories

Tags

Custom Widgets in PyQt and PySide: Floating and Absolute Positioned Widget, Menu, and Container in QT

Amalan Doa Nabi Daud untuk Mencairkan dan Menyentuh Hati yang Kaku

Quick Overview of Next.js 15 Updates

Custom Widgets in PyQt and PySide: Floating and Absolute Positioned Widget, Menu, and Container in QT

Amalan Doa Nabi Daud untuk Mencairkan dan Menyentuh Hati yang Kaku

Quick Overview of Next.js 15 Updates

Custom Widgets in PyQt and PySide: Floating and Absolute Positioned Widget, Menu, and Container in QT

Amalan Doa Nabi Daud untuk Mencairkan dan Menyentuh Hati yang Kaku

Quick Overview of Next.js 15 Updates

Custom Widgets in PyQt and PySide: Floating and Absolute Positioned Widget, Menu, and Container in QT

Amalan Doa Nabi Daud untuk Mencairkan dan Menyentuh Hati yang Kaku

Quick Overview of Next.js 15 Updates

Comparison of Deep Learning GPU Performance: P40 vs P100 vs RTX 3090

GPU Performance Benchmarking for Deep Learning

Introduction

Benchmarking Tools

Setting Up the Environment

Running the Benchmark

P40 Benchmark

P100 Benchmark

RTX 3090 Benchmark

Analysis

Conclusion

Share this:

Like this:

Recent Posts

Categories

Tags