Software Drag Racing: C++ vs C# vs Python – Which Will Win?



Retired Microsoft Engineer Davepl writes the same ‘Primes’ benchmark in Python, C#, and C++ and then compares and explains the differences in the code before racing them head to head to see what the performance difference is like between the languages.

It appears the upload process does some volume leveling or loudness, so my apologies if you get startled during the into. It was mixed down in the master, honest 🙂

Thanks to the Simpsons for the inevitable reference or two that I throw in now and then! See if you can spot both in this episode!

0:00 Start
2:08 The Primes Assignment
6:25 How a Sieve Works
7:32 Coding Begins
9:00 Python Version
12:50 C# Version
18:25 C++ Version
22:22 Charts and Graphs
23:17 Outtakes

I’ve placed the code up on GitHub for your reference without any warranty for any purpose!
https://github.com/davepl/Primes

I get a lot of questions about which keyboard I’m using as well as various other camera and studio equipment questions, so here are the highlights:

CORSAIR K70 RGB MK.2 Mechanical Gaming Keyboard (Cherry MX Blue Switches)
https://amzn.to/31UrUUD

Sony FX3 or A7SIII Cameras
https://amzn.to/31TRdWK
https://amzn.to/3wG9iG7

Aputure 120D Mark II Light and Light Dome II Mini
https://amzn.to/3uya8Ts
https://amzn.to/31XwBx2

Glide Gear TMP100 Prompter
https://amzn.to/3ux84Ll

40 thoughts on “Software Drag Racing: C++ vs C# vs Python – Which Will Win?

  1. Thank you for taking me back to the Commodore Pet. Our highschool didn’t have any computers when I started, but in the 4th year they started showing up in shops, and I was one of the “regulars” using the demo model. Eventually a TRS-80 model 1 was bought to help staff plan classes, (Dutch schooling allows you to select a subset of classes for 4th to 6th year, so they have to juggle schedules to ensure optimal planning.) I was then the “gang-leader” of those allowed to use it for the rest of the year. Exciting times.

  2. "It's like magic." Yeah, about that… I've never heard an explanation of why you only calculate to the square root of N that wasn't magic or mathematician speak. Anyone wanna throw down an explanation that assumes I'm barely competent with a scientific calculator?

  3. Shouldn't it be "while (factor <= q)"? "factor < q" should think perfect squares are primes, methinks. Ah, no; never mind; I thought q was sqrt of the number; it is sqrt of the sieve size… never mind. So, for loops in Python can have two arguments or three arguments? Can't grasp the (factor * 3, …. , factor * 2) … I'm totally lost.
    UPDATE: Looking now at the C code, and I'm still lost. I don't understand what "runSieve" means. We are clearly not building the sieve in it; we are using it, but all these interactions between num and factor are hard to understand.
    Question: Where is the size of the sieve decided?
    Loaded question: Wouldn't it make sense to have the sieve double in size by powers of two as we go along? Like, say you start with a sieve of size 32… Compute it, apply it to 32~63, double the sieve size to 64, check for new non-primes in this upper range and update the sieve's upper half accordingly; apply this size 64 sieve to numbers in the 64 to 127 range; double the sieve to 128 bits; check for new non-primes in this upper range updating the upper half of the sieve; … wash and repeat?

  4. Maybe I did something wrong (but I just did copy and paste), but I just ran the sieve in a simple way find primes under 10 (unless I am mistaken there should 5 [1,2,3,5,7]) but the prog (in python) prints 4 [0, 1, 2, 3]. All I did was remove everything after Main Entry and replaced it with

    sieve = prime_sieve(10)
    sieve.runSieve()
    print("—{}—".format(sieve.countPrimes()))
    for x in range(len(sieve.rawbits)):
    if sieve.rawbits[x]:
    print(x)

    So I am not sure the python version is working properly (or like I said I may have done something wrong)

  5. your bit array in C++ is still only 8 bits wide in the 64 bit-code. Would it make a difference to go for uint64_t as the base for the bit array using 64 bit shifting?

  6. The thing here (as always) is that the everlasting hunt for a silver bullet is pointless. What you need is the proper tool for the task, not an ultimate universal tool for everything. So, such benchmarks' relevancy is highly dubious.
    Since (C)Python naturally binds with C/C++ extremely well, what we do is using Python to do the part of the job which is not performance critical and leave the number crunching to C or C++. That combo is IMO a very progressive technology.
    I code in C for ~26 years, in C++ for ~22 years and in Python for about 5 years. In their respective areas, they're all very powerful tools, but far from universal. But combined together, they cover most of the tasks I ever needed to do. Add a bit of JavaScript for web app FE and you've got your full stack tool set.

    One more note: despite the popular myth, Python is not all that easy. As any other language (namely such paradigmatic hybrids), Python does contain certain "gotchas" when you delve deeper under the surface. That's just inevitable, because no programming language is "easy". And that's because programming is only as easy as the problem on hand is. Python might seem easy to a beginner, but that's only because you're doing simple things. When you start to tackle harder problems, the programming gets just as difficult as in anything else. In fact, what makes me a successful Python programmer (excuse me for bragging) is the very fact that I'm also a successful C/C++ programmer—and I can "see under the hood", which many of my younger colleagues simply can't.

  7. std::vector<bool> does pretty much exactly what you're trying to do with bit arrays. It will pack 8 values per byte and do some bit manipulation to access them as if it was a standard C array.

  8. I wouldn't use Python for any heavy lifting anyway 🙂 All I write these days is small datasets parsing/analysis grabbed out of ticketing systems (jira, freshdesk) so it takes longer to fetch the data than process it anyway 🙂 But I like the video – shows that one has to avoid limiting themselves to just one convenient language… (/me googling 'how to write rest api client with json parsing in c++ :))

  9. I keep hearing nowadays from various people (mostly from life-time C#/Java developers without much experience in C++) that "C#/Java is just as fast as C++ now, because of their highly-tuned runtime-optimizations, and the benchmarks show it". The problem with the statement like this is it doesn't take into account the fact that significant differences show in real-life systems, where there are complex interactions between modules and classes within an application or a library. In this context, the speed advantage of C++ over "dynamic" languages like C# and Java become even more pronounced. C++ with templates (i.e. statically bound polymorphism) would out-perform C# and Java with the equivalent interactions between objects by multiple-folds (3x or even more). This is why game-devs and any other computing-intensive apps/libs are all written in C++ (or C if the language simplicity is preferred), not in C# or Java, not to mention the problem with random garbage-collection events in C# and Java. I don't even include Python there because that's just slow even with simple single-loop benchmark routines, compared to the rest of these languages. Granted 99% of applications that people develop don't need that extra performance anyway (because external factors like communicating with network or reading from / writing to files/database takes up a vast majority (i.e. > 95%) of waiting time, and you can just use external C/C++ libraries for the actual heavy computing), so C# and Java (or even Python) are perfect middle-ground language to work with for a vast majority of apps that people develop, with various benefits like the ease of coding, and the automatic memory-safety, etc..

  10. I personnaly use this function for timing c++ code :

    template <class F, class … Args>

    auto tester(F f, Args &&… args) {

    auto pre = high_resolution_clock::now();

    auto res = f(std::forward<Args>(args)…);

    auto post = high_resolution_clock::now();

    return pair{ res, post – pre };

    }

    std::foward allow the argument to be used as a fowarding argument , such that if it's a reference that you pass to the f function , it will be a reference passed to the tester function ( if I recall correctly). This is a trick that my real-time programming teacher showed me , and since he is in the C++ commite , I think it should be a reliable way to time it.

  11. I found slightly funny when I saw the python code. Clearly Dave is more C++ guy since 'this' is often used. Generally 'self' is used in python. Either way if it works then it's OK

  12. If you're OK with libraries, Boost has a dynamic_bitset class. Regular C++ STL has a specialisation for std::vector<bool> but I haven't had time to benchmark it.

  13. When writing tight loops in Python, you have to remember two things about the language:

    1. Attribute lookups and variable lookups not in the local namespace are slow.
    2. Calling functions is slow.

    Thus I was able to speed up the Python version in the repo from 39 iterations for limit 1_000_000 to ~150 iterations just by inlining the code from GetBits/ClearBits and creating a reference for this.rawbits and this.sieveSize in local variables (and by eliminating the superfluous check for index%2 in the inner loop).

    This speedup is achieved without any optimizations to the algorithm.

Leave a Reply

Your email address will not be published. Required fields are marked *