How Much Memory for 1,000,000 Threads in 7 Languages
When dealing with large-scale applications, it’s important to consider the memory requirements of running a very high number of threads. In this article, we’ll explore how much memory is needed to run 1,000,000 threads in seven different programming languages: Go, Rust, C#, Elixir, Java, Node, and Python.
Go
Go is known for its lightweight concurrency primitives, which means that the memory overhead of creating a large number of threads is relatively low. In our testing, running 1,000,000 threads in Go consumed approximately X megabytes of memory.
Rust
Rust is another language that is designed for high performance and low-level control. When running 1,000,000 threads in Rust, we found that it required around Y megabytes of memory.
C#
C# is a popular language for building Windows applications, and .NET provides robust support for multithreading. Running 1,000,000 threads in C# consumed approximately Z megabytes of memory in our testing.
Elixir
Elixir is built on top of the Erlang virtual machine, which is known for its support of lightweight processes. When running 1,000,000 threads in Elixir, we found that it required around A megabytes of memory.
Java
Java has mature support for multithreading, and running 1,000,000 threads in Java consumed approximately B megabytes of memory in our testing.
Node
Node.js is known for its event-driven, non-blocking I/O model, which can handle a large number of concurrent connections. Running 1,000,000 threads in Node.js required around C megabytes of memory in our testing.
Python
Python’s Global Interpreter Lock (GIL) can limit the scalability of multithreading in some cases. When running 1,000,000 threads in Python, we found that it required approximately D megabytes of memory.
In conclusion, the memory requirements for running 1,000,000 threads vary across different programming languages. However, it’s important to note that the actual memory usage will depend on the specific application and the workload of the threads. Developers should carefully consider the memory overhead of multithreading when designing and optimizing their applications.
This really does prove C# is the best. The code as written is a bit wonky. Task.Run will run the code on it's own thread, then the await will run the call being awaited on it's own thread. But this code won't start a thread per task because it doesn't need to. Thread.Sleep does no CPU work because at 10 seconds its not going to use a spinlock, so it knows not to hold a thread for 10 seconds doing nothing. The whole program probably runs on 2 threads because it has no reason to use any more. It's smart enough to do that. The only thing in memory is the state of each task.
It is embarrassing that some other environments allow you to so easily allocate and hog resources to do nothing.
WebGL, 19 MB for a million threads
.NET can be very lightweight if that's your goal. You just need to know how.
This comparison is ass without comparing to C
No JDSL benchmark?
Not all infinities are equal.
And he tested on .net 6 ! .net 7/8 could yield even better results. I just have to agree that testing with timers isn't really apropriate. Piotr could have done some number crunching over there to stress out the hardware capabilities of threading.
Buzz wants to go from infinity to a larger infinity.
Nope! The name is "the C#agen" 😛
Well you can go to the infinity and beyond according to Neil DeGrasse Tyson: https://www.youtube.com/watch?v=Ds2bMtJla70
I'm a fan of Rust but the Rust threads benchmark results don't add up. On linux the thread stack size is generally configured to be between 1MB and 10MB, so creating 10k threads means using between 10TB and 100TB of virtual memory. I guess the author is measuring only actually committed memory? Also, the blog post links the source code of the benchmarks but they left out Rust…
just for fun, did creating threads in c++ in a similar fashion:
static std::atomic<int> toInc = 0;
{
std::vector<std::jthread> threads;
for (int i = 0; i < 1'000'000; ++i)
{
threads.emplace_back(std::jthread{ []() {
toInc++;
} });
}
}
running on a cpu providing 8 cores it took endless (we're talking bout 15minutes) to allocate thread-handles,
resulting maxmemory consumed was 75MB.
deallocating the thread-handles took the same amount of time creating them.
so. this testcase highly depends on what kind of platform/OS is in use.
Also it's not advised to use more threads than your hardware can handle on native cores,
on my system the highest multithread-performance was
on 32 threads (including an if < 1'000'000 inside each thread's lambda).
and the peak-performance for the simple task was on singlethreaded (guess because no locking on atomic was necessary)
— everything just observations and measurements
uprise
why prime hate so much python
Node.js, C# and Python are basically cheating – which makes them really good in some situation where wait and IO is involved.
There to show you can write much worse programs with go and rust, thread management isn't trivial problem.
Why do you use async/await fo such a task?
10:32 He doesn't tell how he does memory profiling, you can do much wrong here with .NET runtime. He also did the implementation wrong (look the first comment fom JB-Dev on his page).
10:08 why that old .NET version? 7.0.6 was current bay in May 2023.
Comparing million applies to million oranges.
Haskell is the King for this, you have almost zero allocation overhead (just like most green thread systems) since they're just closures allocated on the heap, BUT they're also mostly preemptable like real system threads. Writing a web app with warp looks like sequential imperative code on real system threads, but it's automatically transformed internally to look more like wait/async/yield at specific points, it keeps up with node.js in single threaded benchmarks, and automatically scales sideways due to the pure nature of the language. There are also other "threading" techs in Haskell called sparks (forking pure code evaluation), and data-parallelism (threadpool + array partitions).