Being unable to completely give up math for computers, I am naturally drawn to Project Euler and as a result solved a ridiculous number of the problems posted there while learning Python. A few months ago (March 13), after reading Secrets of a Javascript Ninja, I decided to begin converting my solutions to Javascript. A month and a half later I came back to it, and then finally two months after that, I began to take it seriously.
After making this decision, I noticed the prime Sieve of Eratosthenes
was mighty fast when I ran it in Chrome, maybe even faster than my
beloved Python. I tabled the thought for a little, but never really
forgot it. So a few weeks ago (early August 2011), I finally got a
working install of node running on my machine and
was able to make more of this thought. (I say finally installed
because on two previous tries I gave up because of conflicts with my
version of gcc
, coupled with the fact that I had no good reason to use
node
.)
When I originally did the conversion, I had skipped problem 8, because
my implementation required pulling in the problem data as text from a
file. While hanging out with Boris and
Eric from on the Chrome Developer
Relations team, I decided to give it another go on node (xhr requests
not allowed) and found it to be quite simple with readFileSync
in the
in the node native fs
module. After witnessing this, over this weekend,
I decided to harness the power of V8 —
the Javascript engine that powers Chrome and node — and run all my
scripts locally with node. So over a two day period, I
hack-hack-hacked
my way into converting the Python solutions for problems 11 through 50
(the remaining unconverted) into their Javascript equivalents, while
also converting a good portion of my hefty
functions
module.
Once this was done, I had also found I could replace most of the nice
parts about Python with my own equivalent. For example, I was able to
replace functionality I needed from the Python set
datatype with
function uniq(arr) {
var result = {};
for (var i = 0, val; val = arr[i]; i++) {
result[val] = true;
}
return Object.keys(result);
};
and I was able to replace the (amazingly) useful Python handling of long
integers with a non-native node package called
bigint that uses libgmp
.
Of course, for Python's secret sauce — the list comprehension —
I was able to substitute enough filter
, reduce
, and map
statements to
almost make it seem like I had never left Pythonland. After doing all this,
I also ended up writing my own
operator.js
to replace the wonderful Python native module
operator, and my own
timer.js
to stand in for the Python native module
time.
Finally, I had working code and could do a side by side comparison of V8 and the Python interpreter.
Update: I added a column for PyPy, a just in time implementation of Python.
Here is what I found (averaging the runtime over 10 separate calls to each function, the results are):
Problem | Answer | Python | Javascript | Ratio (PY/JS) | PyPy |
---|---|---|---|---|---|
1\* | 233168 | 1804ms | 1215ms | 1.48 | 385ms |
2\* | 4613732 | 247ms | 102ms | 2.42 | 85ms |
3\* | 6857 | 4725ms | 1508ms | 3.13 | 582ms |
4 | 906609 | 8708ms | 149ms | 58.44 | 282ms |
5\* | 232792560 | 136ms | 186ms | 0.73 | 114ms |
6\* | 25164150 | 10ms | 4ms | 2.50 | 6ms |
7 | 104743 | 656ms | 12ms | 54.67 | 11ms |
8\* | 40824 | 18045ms | 5014ms | 3.60 | 7042ms |
9 | 31875000 | 610ms | 3ms | 203.33 | 8ms |
10 | 142913828922 | 6628ms | 167ms | 39.69 | 116ms |
11 | 70600674 | 49ms | 2ms | 24.50 | 11ms |
12 | 76576500 | 5127ms | 203ms | 25.26 | 100ms |
13\* | 5537376230 | 1795ms | 10710ms | 0.17 | 1423ms |
14 | 837799 | 5572ms | 1712ms | 3.25 | 362ms |
15\* | 137846528820 | 54ms | 18ms | 3.00 | 55ms |
16\* | 1366 | 1844ms | 265ms | 6.96 | 462ms |
17 | 21124 | 87ms | 4ms | 21.75 | 7ms |
18\* | 1074 | 2291ms | 1790ms | 1.28 | 1090ms |
19\* | 171 | 2254ms | 336ms | 6.71 | 342ms |
20\* | 648 | 1061ms | 9154ms | 0.12 | 374ms |
21 | 31626 | 18910ms | 1038ms | 18.22 | 728ms |
22 | 871198282 | 188ms | 7ms | 26.86 | 8ms |
23 | 4179871 | 83318ms | 1120ms | 74.39 | 1295ms |
24\* | 2783915460 | 206ms | 210ms | 0.98 | 139ms |
25 | 4782 | 5865ms | 35ms | 167.57 | 232ms |
26 | 983 | 28ms | 18ms | 1.56 | 4ms |
27 | -59231 | 645738ms | 22536ms | 28.65 | 28288ms |
28\* | 669171001 | 8509ms | 1037ms | 8.21 | 981ms |
29 | 9183 | 184ms | 96ms | 1.92 | 20ms |
30 | 443839 | 52167ms | 1037ms | 50.31 | 877ms |
31 | 73682 | 9606ms | 257ms | 37.38 | 154ms |
32 | 45228 | 206888ms | 12096ms | 17.10 | 4266ms |
33 | 100 | 300ms | 6ms | 50.00 | 15ms |
34 | 40730 | 7462ms | 2447ms | 3.05 | 247ms |
35 | 55 | 8617ms | 848ms | 10.16 | 242ms |
36 | 872187 | 189788ms | 2183ms | 86.94 | 3532ms |
37 | 748317 | 2389022ms | 71845ms | 33.25 | 61551ms |
38 | 932718654 | 506ms | 10ms | 50.60 | 12ms |
39 | 840 | 178ms | 6ms | 29.67 | 12ms |
40\* | 210 | 326ms | 202ms | 1.61 | 119ms |
41 | 7652413 | 2627ms | 133ms | 19.75 | 65ms |
42 | 162 | 65ms | 7ms | 9.29 | 8ms |
43 | 16695334890 | 38ms | 2ms | 19.00 | 2ms |
44 | 5482660 | 384013ms | 27744ms | 13.84 | 6621ms |
45\* | 1533776805 | 17ms | 4ms | 4.25 | 8ms |
46 | 5777 | 2864ms | 202ms | 14.18 | 65ms |
47 | 134043 | 400967ms | 12838ms | 31.23 | 4425ms |
48 | 9110846700 | 46ms | 16ms | 2.88 | 6ms |
49 | 296962999629 | 115ms | 8ms | 14.38 | 13ms |
50 | 997651 | 3277ms | 80ms | 40.96 | 51ms |
\*These were very quick to run, so the runtimes are the time taken to run 10000 times. |
As you'll notice, standard Python gets its butt kicked. I was kind of
saddened by this, but in the end, just giddy that our web is faster
because of it (90% of my life is digital) and also that we can do
scripting faster on the server side (attribute to
Boris Smus) because of the node
project
(thanks Ryan Dahl).
Standard Python is actually slower in 46 of the 50 problems. In 28 of the 46 node is faster by a factor of 10 or greater, in 9 of those 28 by a factor of 50 or greater and in 2 of the 9 by a factor of 100 or greater! The only 4 in which Python was faster were from the n = 10000 sample. In fact, I was able to pinpoint exactly why:
- #5 - My own Javascript implementation of
gcd
is slower than the native (from fractions import gcd
) Python library (resulting in a difference of 50 ms over 10000 iterations) - #13 - The node package
bigint
is slower than the Python nativelong int
(Javascript is slower by a factor of 6) - #20 - The node package
bigint
is slower than the Python nativelong int
(Javascript is slower by a factor of 8.5) - #24 - Having to perform two
slice
s is slower in Javascript than in Python and there is no good way to just remove one element (resulting in a difference of 4 ms over 10000 iterations; a little bit about that here)
So what, you ask, is that lesson I speak of? Well, Javascript didn't used to be this fast. How did it get that way? The brilliant and inspired people behind V8 rethought the Javascript compile steps and after much work, we now have code that is closer to the metal (attribute to: Eric Bidelman, i.e. closer to machine code) than we had ever had before. The use of just-in-time compilation and other incredible techniques has taken a formerly slow and clunky language (Javascript) which was used well beyond its original intended scope, and turned it into a damn fast dynamic language. Hopefully, this movement will make its way to Python and other dynamic languages and we can all have our code end up this close to the metal.
Update 1:
In response to the comments, I ran the same code on the
same machine, but with PyPy in place of Python. This is the direction I
hope standard Python goes in and commend the guys pumping out faster and
faster just in time implementations over at PyPy. I went through and
counted 20 node wins, 29 PyPy wins and 1 tie. (I reserve the right to
update the post with more detailed metrics.) While I do commend them,
the results don't really belong in this post because PyPy is still an
offshoot. (However, as I understand, they both have ties to C, as PyPy
uses gcc
to compile to bytecode and V8 is written in C++. Feel free to
supplement my knowledge in the comments.)
Update 2:
All benchmarking was run on my Mac Pro Desktop with a 3.2 GHz Quad-Core Intel Xeon processor and 4 cores for a total of12 GB 1066 MHz DDR3 memory. I used Python version 2.6.1, node version 0.4.9, and PyPy version 1.5 (running on top of Python 2.7.1 with GCC 4.0.1).