hckrnws
FTA: The loop example iterates 1 billion times, utilizing a nested loop:
u = ARGV[0].to_i
r = rand(10_000)
a = Array.new(10_000, 0)
(0...10_000).each do |i|
(0...100_000).each do |j|
a[i] += j % u
end
a[i] += r
end
puts a[r]
Weird benchmark. Hand-optimized, I guess this benchmark will spend over 99% of its time in the first two lines.If you do liveliness analysis on array elements you’ll discover that it is possible to remove the entire outer loop, turning the program into:
u = ARGV[0].to_i
r = rand(10_000)
(0...100_000).each do |j|
a += j % u
end
a += r
puts a
Are there compilers that do this kind of analysis?Even though u isn’t known at compile time, that inner loop can be replaced by a few instructions, too, but that’s a more standard optimization that, I suspect, the likes of clang may be close to making.
Compilers don't do liveness analysis on individual array elements. It's too much data to keep track of and would probably only be useful in incorrect code like this.
I used to work on an AI compiler where liveness analysis of individual tensor elements actually would have been useful. We still didn't do it because the compilation time/memory requirements would be insane.
Truffle ruby could replace this with a O(1) Operation, even when it’s part of a c extension.
I think most compilers could do that. That's a separate much easier optimisation.
Closed form that works for most cases:
result = ((u * (u - 1)) / 2 * (100000/u)) + (100000%u * (100000%u - 1) / 2) + r)
The article refers to upcoming versions of Ruby. For the curious, looks[1] like ruby 3.4.0 will be released this Christmas, and ruby 3.5.0 next Christmas.
Also, I'm wondering what effect Python's minimal JIT [2] has coming for this type of loop. Python 3.13 needs to be built with the JIT enabled, so it would be interesting if someone who has built it runs the benchmarks.
[1] https://www.ruby-lang.org/en/downloads/releases/
[2] https://drew.silcock.dev/blog/everything-you-need-to-know-ab...
Ruby is always released on Christmas, it's a predictable and cute schedule.
But perf improvements can and do drop in point releases too, afair.
> There was a PR to improve the performance of `Integer#succ` in early 2024, which helped me understand why anyone would ever use it: “We use `Integer#succ` when we rewrite loop methods in Ruby (e.g. `Integer#times` and `Array#each`) because `opt_succ (i = i.succ)` is faster to dispatch on the interpreter than `putobject 1; opt_plus (i += 1)`.”
I find myself using `#succ` most often for readability reasons, not just for performance. Here's an example where I use it twice in my UUID library's `#bytes` method to keep my brain in “bit slicing mode” when reading the code. I need to loop 16 times (`0xF.succ`) and then within that loop divide things by 256 (`0xFF.succ`): https://github.com/okeeblow/DistorteD/blob/ba48d10/Globe%20G...
Why do you find 0xF.succ better than 0x10 in this case?
Because of how I'm used to thinking of the internal 128-bit UUID/GUID value as a whole:
irb> 0xFFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF.bit_length => 128
... 0 to 127 < 128
After all these years, I still love Ruby. Thank you Matz!
super interesting , actually I am also a contributer of the https://github.com/bddicken/languages and after I had tried to create a lua approach , I started to think of truffleruby as it was mentioned somewhere but unfortunately when I had run the code of main.rb , there was virtually no significant difference b/w truffleruby and main.rb (sometimes normal ruby was faster than truffleruby)
I am not sure if the benchmark that you had provided showing the speed of truffleruby were made after the changes that you have made.
I would really appreciate it if I could verify the benchmark
and maybe try to add it to the main https://github.com/bddicken/languages as a commit as well , because the truffleruby implementation actually is faster than the node js and goes close to bun or even golang for that matter which is nuts.
This was a fun post to skim through , definitely bookmarking it.
With TruffleRuby you'll need to account for startup time and time to max. performance which vary with the native and JVM runtime configurations. See https://github.com/oracle/truffleruby
Woah, Ruby has become fast, like really fast. What's even more impressive is TruffleRuby, damn!
It's Oracle https://github.com/oracle/truffleruby Double Damn!
It's open source under Eclipse Public License version 2.0, GNU General Public License version 2, or GNU Lesser General Public License version 2.1.
Making it easily fork-able should Oracle choose to do something users dislike.
Holy! I know TruffleRuby is Open Source but I somehow always thought Graal ( Which TruffleRuby is based on ) wasn't open sourced.
Note that Rails doesn't work on Truffle and from what I understand, won't anytime soon.
Which is disappointing since it has the highest likelihood of making the biggest impact to Ruby perf.
Huh, what exactly doesn't work? Their own readme says "TruffleRuby runs Rails and is compatible with many gems, including C extensions." (https://github.com/oracle/truffleruby)
Truffle:
TruffleRuby is not 100% compatible with MRI 3.2 yet
Rails: Rails 8 will require Ruby 3.2.0 or newer
https://github.com/oracle/trufflerubyThat doesn't mean Rails won't run on TruffleRuby. TruffleRuby may not implement 100% of MRI 3.2, but that doesn't mean it doesn't implement all the parts that Rails needs.
Is it possible that those two statements taken together means truffleruby can run rails 8?
Comment was deleted :(
Super interesting. I didn't know that YJIT was written in Rust.
It was initially written in C then ported to rust[0], which seems like it was a good idea. The downside is that it may not be enabled at build time if you don't have the right toolchain/platform, but that seems a good trade off.
0: https://shopify.engineering/porting-yjit-ruby-compiler-to-ru...
Another language comparison repo that's been going for longer with more languages https://github.com/niklas-heer/speed-comparison.
Another language comparison repo with hard-to-read presentation.
The chart axis labels and bar labels overlap each other, and there are no vertical grid lines.
Oh for a simple HTML table!
> Python was the slowest language in the benchmark, and yet at the same time it’s the most used language on Github as of October 2024.
Interesting that there seems to be a correlation between a language being slow and it being popular.
Now do it again, but include compile time and amortise across the number of executions expected for that specific build.
I say this as a pretty deep rust fanatic. All languages (and runtimes, interpreters, and compilers) are tools. Different problems and approaches to solving them benefit from having a good set at your disposal.
If you're building something that may only run a handful of times (which a lot of python, R, et al programs include) slow execution doesn't matter.
it's like food, people like it way more when you put sugar on top
by and large, Ruby is slow, but damn is it nice to code with, which is more appealing for newcomers
I think, for being a interpreted language, Ruby is quite fast now.
Because now a JIT is part of the picture, as it should be in any dynamic language that isn't only meant for basic scripting tasks.
Ruby was always faster than people gave it credit for.
Not really.
Work has been done to make faster Ruby language implementations.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
of course sorry, you're right, what I meant is that it's rather slow _in the grand scheme of things_
Slower languages are higher level thus easier to use.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Program performance is associated with the specific programming language implementation and the specific program implementation.
No one takes that site too seriously when judging real world programming, for a variety of reasons.
Do you really not understand that the exact same Java programs are likely to be 10x slower without JIT?
Different language implementation, so different performance.
> too seriously
Take those measurements just seriously enough.
Does that correlation hold if you look at let's say the top 20 popular languages?
No, because Java is #2, C++ is #4, C# is #7. People just really like Python for what it brings to the table.
Game changing for my advent of code solutions which look surprisingly similar
I'm a little surprised that Node is beating Deno. Interesting that Java would be faster than Kotlin since both run on jvm.
“Faster”.
> Ran each three times and used the lowest timing for each. Timings taken on an M3 Macbook pro with 16 gb RAM using the /usr/bin/time command. Input value of 40 given to each.
Not even using JMH. I highly doubt accuracy of the “benchmark”.
That is one of the differences between a platform systems language, and guest languages.
You only have to check the additional bytecode that gets generated, to work around the features not natively supported.
Which difference? It is literally same code, it doesn’t even use any Kotlin std goodies.
Yet, they don't generate the same bytecode, and that matters.
I mean, the JVM's been optimized specifically for Java since the Bronze Ages at this point, it's not that surprising
Although being slow, Python has a saving grace: it doesn't have a huge virtual machine like Java, so it can in many situations provide a better experience.
Does JavaME have a "huge virtual machine" ?
https://www.oracle.com/java/technologies/javameoverview.html
Do you mean CPython or PyPy or MicroPython or ?
> Does JavaME have a "huge virtual machine"
Yes, compared to Python.
> Do you mean CPython or PyPy
Python standard virtual machine is called CPython, just look at the official web page.
I imagine we need a nuts and bolts definition of "virtual machine" before we can make a comparison.
[flagged]
>This got me thinking that it would be interesting to see a kind of “YJIT standard library” emerge, where core ruby functionality run in C could be swapped out for Ruby implementations for use by people using YJIT.
This actually makes me feel sad because it reminded me of Chris Seaton. The idea isn't new and Chris has been promoting it during his time working on TruffleRuby. I think the idea goes back even further to Rubinius.
It is also nice to see TruffleRuby being very fast and YJIT still has lots of headroom to grow. I remember one obstacle with it running rails was memory usage. I wonder if that is still the case.
One of the amazing things truffle ruby does is handle c extensions like ruby code, meaning C is interpreted and not compiled in a traditional sense.
This makes way for jitting c code to make it way faster than the author has written it.
Amazing indeed!
Yup, Rubinius was probably the most widely known implementation of Ruby's standard library in Ruby. Too bad it was slower than MRI.
I think jRuby takes a similar approach.
It’s possible to write gems which will use underlying C on MRI or Java when running on jRuby.
It would be interesting to know if a “pure” would also help jRuby too.
I thought maybe mruby had a mostly ruby stdlib - but I guess it's c ported over from mri?
Comment was deleted :(
"In most ways, these types of benchmarks are meaningless. Python was the slowest language in the benchmark, and yet at the same time it’s the most used language on Github as of October 2024."
First, this indicates some sort of deep confusion about the purpose of benchmarks in the first place. Benchmarks are performance tests, not popularity tests. And I don't think I'm just jumping on a bit of bad wording, because I see this idea in its various forms a lot poking out in a lot of conversations. Python is popular because there are many aspects to it, among which is the fact that yes, it really is a rather slow language, but the positives outweigh it for many purposes. They don't cancel it. Python's other positive aspects do not speed it up; indeed, they're actually critically tied to why it is slow in the first place. If they were not, Python would not be slow. It has had a lot of work done on it over the years, after all.
Secondly, I think people sort of chant "microbenchmarks are useless", but they aren't useless. I find that microbenchmark actually represents some fairly realistic representation of the relative performance of those various languages. What they are not is totally determinative. You can't divide one language's microbenchmark on this test by another to get a "Python is 160x slower than C". This is, in fact, not an accurate assessment; if you want a single unified number, 40-50 is much closer. But "useless" is way too strong. No language is so wonderful on all other dimensions that it can have something as basic as a function call be dozens of times slower than some other language and yet keep up with that other language in general. (Assuming both languages have had production-quality optimizations applied to them and one of them isn't some very very young language.) It is a real fact about these languages, it is not a huge outlier, and it is a problem I've encountered in real codebases before when I needed to literally optimize out function calls in a dynamic scripting language to speed up certain code to acceptable levels, because function calls in dynamic scripting languages really are expensive in a way that really can matter. It shouldn't be overestimated and used to derive silly "x times faster/slower" values, but at the same time, if you're dismissing these sorts of things, you're throwing away real data. There are no languages that are just as fast as C, except gee golly they just happen to have this one thing where function calls are 1000 times slower for no reason even though everything else is C-speed. These performance differences are reasonably correlated.
> First, this indicates some sort of deep confusion about the purpose of benchmarks in the first place. Benchmarks are performance tests, not popularity tests.
I don't think it indicates a deep confusion. I think it leaves a simple point unsaid because it's so strongly implied (related to what you say):
Python may be very low in benchmarks, but clearly it has acceptable performance for a very large subset of applications. As a result, a whole lot of us can ignore the benchmarks.
Even in domains where one would have shuddered at this before. My students are launching a satellite into low earth orbit that has its primary flight computer running python. Yes, sometimes this does waste a few hundred milliseconds and it wastes several milliwatts on average. But even in the constrained environment of a tiny microcontroller in low earth orbit, language performance doesn't really matter to us.
We wouldn't pay any kind of cost (financial or giving up any features) to make it 10x better.
I wouldn't jump on it except for the number of times I've been discussing this online and people completely seriously counter "Python is a fairly slow language" with "But it's popular!"
Fuzzy one-dimensional thinking that classifies languages on a "good" and "bad" axis is quite endemic in this industry. And for those people, you can counter "X is slow" with "X has good library support", and disprove "X lacks good tooling" with "But X has a good type system", because all they hear is that you said something is "good" but they have a reason why it's "bad", or vice versa.
Keep an eye out for it.
"My students" - so there's really nothing on the line except a grade then, yeah? That's why you wouldn't pay any cost to make it 10x better, because there's no catastrophic consequence if it fails. But sometimes wasting a few milliwatts on average is the difference between success and failure.
I've built an autonomous drone using Matlab. It worked but it was a research project, so when it came down to making the thing real and putting our reputation on the line, we couldn't keep going down that route -- we couldn't afford the interpreter overhead, the GC pauses, and all the other nonsense. That aircraft was designed to be as efficient as possible, so we could literally measure the inefficiency from the choice of language in terms of how much it cost in extra battery weight and therefore decreased range.
If you can afford that, great, you have the freedom to run your satellite in whatever language. If not, then yeah you're going to choose a different language if it means extra performance, more runtime, greater range, etc.
> "My students" - so there's really nothing on the line except a grade then, yeah? That's why you wouldn't pay any cost to make it 10x better, because there's no catastrophic consequence if it fails. But sometimes wasting a few milliwatts on average is the difference between success and failure.
Years of effort from a large team is worth something, as is the tens of thousands of dollars we're spending. We expect a return on that investment of data and mission success. We're spending a lot of money to improve odds of success.
But even in this power constrained application, a few milliwatts is nothing. (Nearly half the time, it's literally nothing, because we'd have to use power to run heaters anyways. Most of the rest of the time, we're in the sun, so there's a lot of power around, too). The marginal benefit to saving a milliwatt is zero, so unless the marginal cost is also zero we're not doing it.
> That aircraft was designed to be as efficient as possible, so we could literally measure the inefficiency from the choice of language in terms of how much it cost in extra battery weight and therefore decreased range
If this is a rotorcraft of some sort, that seems silly. It's hard to waste enough power to be more than rounding error compared to what large brushless motors take.
If you have enough power from the sun and enough compute, are you really that resource constrained?
Let me ask you, why do you think most real-time mission critical projects are not typically done in Python?
> If this is a rotorcraft of some sort, that seems silly. It's hard to waste enough power to be more than rounding error compared to what large brushless motors take.
It was a glider trying to fly as long as possible, so no motors, no solar power either. It got to the point that we could not even execute the motion planner fast enough in Matlab given the performance demands of the craft, we had to resort to Mex, and at that point we might as well have been writing in C. Which we did.
otoh When performance doesn't matter, it doesn't matter.
otoh When the title is "Speeding up Ruby" we are kind-of presuming it matters.
> My students are launching a satellite into low earth orbit that has its primary flight computer running python. Yes, sometimes this does waste a few hundred milliseconds
Never mind performance, would it not be good to at least machine check some static properties? A dynamic language is not a good choice for anything mission critical IMHO.
Python has had since Mypy and Pyright since forever.
Even with those retrofits, it's still a language designed for maximum flexibility and maximum ease of use. This has trade offs with regard to reasoning for correctness.
What’s your point? That their type checks are incomplete?
That Python makes the wrong trade-offs for mission critical software. This goes beyond just lacking static types.
Which trade offs do you make when you opt in for full static typing? Except for performance.
It does help that the Python ecosystem sees C and Fortran as being "Python".
> people sort of chant "microbenchmarks are useless", but they aren't useless.
They might be !
(They aren't necessarily useless. It depends. It depends what one is looking for. It depends etc etc)
> You can't divide one language's microbenchmark on this test by another to get a "Python is 160x slower than C".
Sure you can !
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
— and —
Table 4, page 139
https://dl.acm.org/doi/pdf/10.1145/3687997.3695638
— and then one has — "[A] Python is 160x slower than C" not "[THE] Python is 160x slower than C".
Something multiple and tentative not something singular and definitive.
> Benchmarks are performance tests, not popularity tests.
But presumably they're meant to test something that matters. And the popularity suggests that what's being tested in this case doesn't.
> But "useless" is way too strong. No language is so wonderful on all other dimensions that it can have something as basic as a function call be dozens of times slower than some other language and yet keep up with that other language in general.
And yet Python does keep up with C in general. You might object that when a Python-based system outperforms a C-based system it's not running the same algorithm, or it's not really Python, and that would be technically true, but seemingly not in a way that matters.
> if you're dismissing these sorts of things, you're throwing away real data
Everything is data. The most important part of programming is often ignoring the things that aren't important.
very true.
Also, for a lot of the areas where languages like python or ruby aren't great choices because of performance, they would also not be great choices because of the cost of maintaining untyped code, or in python's case the cost of maintaining code in a language that keeps making breaking changes in minor versions.
Script with scripting languages, build other things in other languages
Comment was deleted :(
It seems odd to willfulky ignore Crystal language when discussing Ruby and speeding it up. Granted, macro semantics mean something else, more like c macros, but the general syntax and flow of Crystal is basically Ruby. https://crystal-lang.org/
Amber and Lucky are 2 mature frameworks to give Rails a run for their money, and Kemal is your Sinatra.
Crystal is not Ruby. Full stop. It is not useful to anyone with an existing Ruby code base.
Mentioning Crystal would be odd since it has nothing to do with the article.
Will these Crystal frameworks allow me to share a single standalone binary with peers that allows them run the web application locally?
As per this article[0] seems that crystal produces statically linked binaries, so I think the answer is yes.
[0] https://crystal-lang.org/2020/02/02/alpine-based-docker-imag...
Woah, what a luxury
Comment was deleted :(
It seems like it's been a while since I've seen one of these language benchmark things.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/... seems like the latest iteration of what used to be a pretty popular one, now with fewer languages and more self-deprecation.
> fewer languages
Maybe you've only noticed the dozen in-your-face on the home page?
The charts have shown ~27 for a decade or so.
There's another half-dozen more in the site map.
Comment was deleted :(
> a fun visualization of each language’s performance
The effect is similar to dragging a string past a cat: complete distraction — unable to avoid focusing on the movement — unable to extract any information from the movement.
To understand the measurements, cover the "fun visualization" and read the numbers in the single column data table.
(Unfortunately we aren't able to scan down the column of numbers, because the language implementation name is shown first.)
Previously: <blink>
https://developer.mozilla.org/en-US/docs/Glossary/blink_elem...
It does visualise how big the difference is though
Cover up the single column of lang/secs and then try to read how big the difference is between java and php from the moving circles.
You would have no problem doing that with a [typo histogram should say bar chart].
Comment was deleted :(
Cover the labels on the histogram and try to read how big the difference is between java and php....
We can read the relative difference from the length of the bars because the bars are stable.
I can see the relative difference in speed between the two balls.
"The first principle is that you must not fool yourself and you are the easiest person to fool."
:-)
PHP looks much slower
The question is: How much slower?
We could try to count how many times the java circle crosses left-to-right and right-to-left, in the time it takes for the PHP circle to cross left-to-right once.
That's error prone but should be approximately correct after a couple of attempts.
That's work we're forced to do because the "fun visualization" is uninformative.
That might be your question, but then you can look at the numbers. No chart will be as exact.
If only we could look at the numbers without the uninformative distraction.
I found the animation informative
> I found the animation informative
Java was so fast it glowed orange!
I wonder if the distraction of the animation actually makes people slower at reading the information that is in the text column.
The animation serves its purpose -- it grabs attention.
Comment was deleted :(
Dart - I see it mentioned (and perf looks impressive), but is it widely adopted?
Also, would have loved to see LuaJIT (interpreted lang) & Crystal (static Ruby like language) included just for comparison sake.
It looks like a more complete breakdown is here. Crystal ranks just below Dart at 0.5413 (Dart was 0.5295). Luajit was 0.8056. I'm surprised Luajit does worse than Dart. Actually I am surprised Dart is beating out languages like C# too.
Dart's VM was designed by the team (I think not just the one guy, but maybe I'm wrong on that and it really is just Lars Bak) that designed most of the truly notable VMs that have ever existed: Self, Smalltalk Strongtalk, Java Hotspot, and JavaScript V8. It also features an ahead-of-time compiler mode in addition to a world-class JIT and interpreter, allowing for hot reload during development.
https://en.m.wikipedia.org/wiki/Lars_Bak_(computer_programme...
It was stuck with a bad rep for being the language that was never going to replace JavaScript in the browser, and then was merely a transpiler no one was going to use, before it found a new life as the language for Flutter, which has driven a lot of its syntax and semantics improvements since, with built-in VM support for extremely efficient object templating (used by the reactive UI framework).
Maybe that dozen lines of code isn't sufficient to characterize performance differences?
Nearly 25 years ago, nested loops and fibs.
https://web.archive.org/web/20010424150558/http://www.bagley...
https://web.archive.org/web/20010124092800/http://www.bagley...
It's been a long time since the benchmarks game showed those.
This nested loops microbenchmark only measures in-loop integer division optimizations on ARM64 - there are division fault handling differences which are ARM64 specific which introduce significant variance between compilers of comparable capability.
On x86_64 I expect the numbers would have been much closer and within measurement error. The top half is within 0.5-0.59s - there really isn't much you can do inside such a loop, almost nothing happens there.
As Isaac pointed out in a sibling comment - it's best to pick specific microbenchmarks, a selection of languages and implementations that interest you and dissect those - it will tell you much more.
Runtime startup isn't amortized.
How do you know?
The methodology is documented in the link of the comment I responded to.
Perhaps you mean that "the methodology" does not include an explicit step intended to amortize runtime startup.
Perhaps the tiny tiny programs none-the-less took enough time that startup was amortized.
I wonder why C++ isn't in that list but a bunch of languages no one uses are.
Been using pure Dart since last year, it's a lovely language that has it's quirks. I like it.
It's fast and flexible.
Have you used it for anything other than Flutter? I recently did a Flutter project and I'm interested in using dart more now.
Yes, that's what I meant with pure Dart. I've created cli's with it and a little api-only server.
Comment was deleted :(
[dead]
[flagged]
Comment was deleted :(
Comment was deleted :(
[flagged]
This kind of benchmark doesn't make sense for Python because it is measuring the speed of pure code written in the language. However, and here is the important point, most python code rely on compiled libraries to run fast. The heavy lifting in ML code is done in C, and Python is used only as a glue language. Even for web development this is also the case, Python is only calling a bunch of libraries, many of those being written in C.
That's not true. Sure, many hot path functions dealing with tensor calculations are done in numpy functions, but etl and args/results are python objects and functions. And most web development libs are pure python (flask, django, etc)
For performance, hot paths are the only ones that matter.
Sure, but only a small subset of problems have a hot path. You can easily offload huge tensor operations to C. That's the best possible case. More usually the "hot path" is fairly evenly distributed through your entire codebase. If you offload the hot path to C you'll end up rewriting the whole thing in C.
> "hot path" is fairly evenly distributed
No, hot paths are seldom fairly evenly distributed, even on non-numeric applications. In most cases they will be in a small number of locations.
Not in my experience.
Yeah, this is a benchmark of recursion and tight loops doing integer math on array members. Nontrivial recursion is nonidiomatic in Python, and tight loops doing integer math on array members will probably be done via one of the many libraries that do one or more of optimizing, jitting, or move those to GPU (Numpy, Taichi, Numba, etc.)
aka Python is as fast as C when it is C.
Any language with FFI (which is like all of them, these days) has the same exact issue, the only difference being how common it is to drop into C or other fast compiled language for parts of the code.
And this kind of benchmark is the one that tells you why this is different across different languages.
I don't know. Ruby is able to call C too so it's a wash?
Yet this particular blog post shows how Ruby-writen-in-Ruby is faster than Ruby-written-in-C because it's more optimizable.
Yes, if you pull out all the optimization tricks for Python, it will be faster than vanilla Python. And yet it's still 6x slower (by my measurement) than naive code written in a compiled language like Rust without any libraries.
Crafted by Rajat
Source Code