A few weeks ago I mentioned the ncrunch comparison of "mathematical programs for data analysis" in a comment in another thread. There is now a new, 5th release of that review. The systems reviewed are:
The review is skewed towards statistical computation and data manipulation, but it includes several interesting comparisons of the major computer algebra systems (CAS).
There is a comparative performance section, and the worksheets used for that benchmarking are available for download. Here is the Maple worksheet, which was used with Maple 11.
The Maple code used in the performance benchmarking could be improved for some of the individual tests. Maple's fast numerics are better than what the report indicates.
So here's an idea: if you have an improved version for one of the individual performance calculations, then perhaps post it here as a blog item or comment. It might help if there were only one mapleprimes blog/thread per problem (or else it could get confusing). Such entries could have easily searched names, like "ncrunch prob. 11, fibonacci", etc. A good contender should be both fast and simple. A blog item could start off with the original code used in the report. Giving the timings -- as run on your own machine -- for both the original and any candidate improvement would be helpful.
The performance comparisons are with systems like Matlab and Scilab which can only really do double-precision calculation on their own (without a symbolic toolbox, say). So it should be fair to use Maple's evalhf, and hardware float[8] datatype Matrices, etc.
Memory usage was not part of the performance comparison. That's a weakness of the analysis, as it might be significant.
note: The individual test labelled "2000x2000 normal distributed random matrix^1000" is actually an elementwise powering operation and not a matrix powering (matrix-matrix product) calculation. Checking the Maple, Mathematica, and Matlab code indicates that 's consistent across the sources.
Comments
ncrunch prob.13, Gamma over matrix
For problem 13, "Gamma function over a 1500x1500 matrix".
The original code,
was reported as taking about 10000 sec on their "Intel Quad Core Q6600 processor with 2.4 GHz and 2 GB RAM running under Windows Vista Home".
Replacing
LinearAlgebra:-Map(GAMMA,a):with
map[evalhf](GAMMA,a):makes it take only 66 sec on a single-CPU AMD Athlon64 3200+ under 64bit Linux with Maple 11.02.
Using,
map[evalhf,inplace](GAMMA,a):brought it to 59 sec.
Dave Linder Mathematical Software, Maplesoft
Great, but
How on earth is a mere mortal supposed to come up with that map[evalhf] ? I thought myself fairly Maple knowledgeable, and yet I would never have thought of using that. Ever. Plus, isn't it LinearAlgebra:-Map's job to do this efficiently??? The help page for LinearAlgebra[Map] makes no mention that sometimes using map would be better/faster.
When did map[evalhf] appear? Maple 10. Great, so I go re-re-read the updates for Maple 10. No hint whatsoever (other than this exists) that this is massiverly faster! It is in the 'language changes' section, but not in the 'efficiency' section of the What's New.
What's my point? The point is that if you expect Maple users to really benefit from your hard work, then you have to document it properly.
Speaking of which, the help page for map:
I agree
I agree with what you write, Jacques, about the importance of good documention of the system (and its changes... and its updates...).
I was considering that (at least) for float[8] datatype Matrices and Vectors LinearAlgebra:-Map could reasonably call map[evalhf,inplace].
Dave Linder Mathematical Software, Maplesoft
ncrunch prob.14, erf over matrix
For problem 14, "Gaussian error function over a 1500x1500 matrix"
The original code,
was reported as taking about 4800 sec on their "Intel Quad Core Q6600 processor with 2.4 GHz and 2 GB RAM running under Windows Vista Home".
Replacing
LinearAlgebra:-Map(erf,a):with
map[evalhf](erf,a):makes it take only 17 sec on a single-CPU AMD Athlon64 3200+ under 64bit Linux with Maple 11.02.
Using,
map[evalhf,inplace](erf,a):brought it to 10 sec.
Dave Linder Mathematical Software, Maplesoft
poor performance
4800 seconds is an hour and twenty minutes! That is some obscene overhead for what boils down to a simple type check + software float. Then I looked at `evalf/erf`. What is the overhead of try/catch ? It seems to catch a numeric exception for the sole purpose of returning the same error.
Here is something I wish Maplesoft would keep in mind (and I'm sure you agree): a factor of 300 slowdown makes otherwise useful software useless. And it shows. It shows badly whenever someone tries to do anything "modest", let alone big. The fact is, it took 512000 CPU cycles to evaluate each call to erf.
Now Maple is interpreted and there are some underlying issues, but that number is just not defensible. I think in the long run, it would make sense to know the cost of software float arithmetic and functions. We should know, approximately, the number of cycles per bit of precision. And it better not be more than 10000. Nobody is going to take this stuff seriously otherwise. Serious projects do this. They estimate the cost. Then you get predictable performance. It also helps you argue that the software is fast. It doesn't matter that Maple is interpreted. We should know the cost of procedure calls and all that stuff in cycles (approximately).
(edit: I calculated the CPU cycles wrong)
Why so much slower here?
Just for curiosity, why is the above code so much slower on my laptop (Core2 2.33GHz, 2GB RAM, 32bit WinXP).
restart;
Digits := trunc(evalhf(Digits));
with(LinearAlgebra):
TotalTime:=0:
for i from 1 by 1 to 100 do a:=RandomMatrix(1500,density=0.5,generator=0.0..1.0,outputoptions=[datatype=float[8]]):
t:=time():
#Map(erf,a):
map[evalhf,inplace](erf,a):
TotalTime:=TotalTime+time()-t:
end do:
print(TotalTime);
yields 49.266 sec. As the processor should be roughly comparable I wonder if it's the 64bit version that gives you a speed up of more than a factor 2!?
yes
I just tried it and yes, you're right. It's 10sec with 64bit Maple (Linux) and 42sec with 32bit Maple (Linux), run on the same machine and OS.
Sorry, I missed that. So only about 100 times faster than 4800 sec, then.
By the way, I have read here that you are interested in solving problems with high performance. Have you see this note about OMP_NUM_THREADS? If you set that to the value 2 as a WinXP environment variable on your Core2 Duo (and reboot to get it to take effect) then you may see improvement in the hardware floating-point linear algebra examples. It could allow the MKL used by Maple to run two threads. But you'd have to measure wall clock time, since Maple's time() command will add the times take by both threads make it appear like there's no difference.
Dave Linder Mathematical Software, Maplesoft
Interesting to know
It is interesting to know that 32bit vs 64bit really makes such a difference. I actually didn't expect that...
I read your note www.mapleprimes.com/blog/dave-l/blas-in-maple#comment-7274
and I already use OMP_NUM_THREADS=2. But there you also mention the 'unofficial' upgrade to BLAS (under Linux). Is there a special reason why this should not be possible under Windows? If not, how could I use BLAS as the standard library in Maple? By the way, thanks for all thes nice insider tricks! :-)
re: Interesting to know
In that note I was considering replacing the ATLAS BLAS that are already bundled with Maple. That might be relevant if one had an 8-core machine and was running Linux.
There is no dynamic mechanism in the shipped ATLAS binaries to handle varying the number of cores/cpus that get used: it's set and compiled in as a fixed parameter. Maple doesn't ship with an 8-core/cpu optimized ATLAS on Linux, so if one had such a machine then dropping in a replacement might be desirable. But on Windows with MKL it might be dynamically set, using that OMP_NUM_THREADS environment variable.
One cannot simply drop into Maple an updated set of .dll's to MKL. The bits that use it must be linked against it specifically. But there's less reason to want to do so, because of the above.
Dave Linder Mathematical Software, Maplesoft
Ok, thanks for clarifying.
Ok, thanks for clarifying. As you mentioned before, it might be anyway more worthwile to identify the real Maple bottlenecks in one's code.
ncrunch prob.11, fibonacci numbers
For problem 11, "Calculation of 10,000,000 fibonacci numbers"
Original code:
with(combinat): with(LinearAlgebra):
frnd:=rand(100..1000):
A := [seq(frnd(),i=1..10000000)]:
time(evalf(Map(a->fibonacci(a), A)));
was reported as taking about 821.140 sec on their "Intel Quad Core Q6600 processor with 2.4 GHz and 2 GB RAM running under Windows Vista Home".
It took 1180 seconds on my 2.13GHz Core2 machine.
Since floating point version of the Fibonacci numbers are being asked for, it doesn't make sense to compute them exactly then convert to floats. It looks like all the other platforms are computing them numerically:
frnd:=rand(100..1000):
A := Vector([seq(frnd(),i=1..10000000)], datatype=float[8]):
f:=subs(is5=evalhf(1/sqrt(5)),phi =evalhf((1+sqrt(5))/2), a->((phi^a-(-phi)^(-a))*is5));
time(map[evalhf](f, A));
takes 18.38s on my machine.
John May
Mathematical Software
Maplesoft
tweak
While it isn't part of the comparison, we might as well use a more efficient way to generate the initial vector:
Howevever, I doubt randomizing these has any significant effect on the actual measurement since we expect each integer to appear some 11,000 times.
Stupid question
Has someone gathered up all these improvements and sent them to the author of ncrunch? This seems like a worthwhile endeavour, in fact worthwhile enough for a corporate type to do it, rather than have an 'outsider' do it. I am sure that particular review has sold a lot of Mathematica and may well have sunk some Maple sales.