Some years ago, before the advent of the Statistics package, a colleague asked for a fast way to generate thousands of normally distributed random numbers in Maple. The suggestion that worked quickest and most easily (using existing, simple Maple Library routines) was to generate random deviates using the usual formula associated with the distribution. But the key was to replace the scalar values (representing the uniformly distributed input) with a whole Matrix of input values. Then all the random deviates get generated at the same time. The reason it was fast is that all the arithmetic was done not only in precompiled external functions, but also in fast BLAS functions tuned for L2 cache, SIMD chipset extensions, etc.

The lesson there was that offloading repetitive floating-point arithmetic to LinearAlgebra can bring about a speed-up.

So, I was looking at the "principal component analysis" section of the Maple code used in this software performance review. I noticed that a rather large part of the time was being spent inside Statistics:-CovarianceMatrix. But an earlier part of that code had already set up the columns of the input Matrix and subtracted column-means to get what it called centered data. It turns out that getting the covariance matrix from that is as easy as doing lots of pairwise dot products with that centered data. With the Vectors as columns of a Matrix M, this amounts to computing Transpose(M).M. Then it became apparent that this alternative way was much faster than CovarianceMatrix. And the reason appears to be that CovarianceMatrix computes all the individual entries one at a time, within a double-loop of interpreted Library code. And it breaks the original Matrix data up into columns, so as to pass them in pairs to an external routine which computes just the single result. But with a Matrix-Matrix multiplcation, all that arithmetic can once again benefit from a single tuned level-3 BLAS call.

As can be seen below, the speed-up increases with the size of the input data.

> kernelopts(printbytes=false):
> Digits := trunc(evalhf(Digits)):

> Cov := proc(M)
> local i,m,n,c,means,tempV,res;
> Digits := max(Digits,trunc(evalhf(Digits)));
> m,n := op(1,M);
> c := Matrix(m,n,M,'datatype'='float[8]');
> for i from 1 to n do
> means[i] := Statistics:-Mean(M[1..-1,i]);
> end do;
> tempV := Vector(m,'datatype'='float[8]');
> for i from 1 to n do
> ArrayTools:-Fill(m,means[i],tempV,0,1);
> LinearAlgebra:-LA_Main:-VectorAdd(tempV,c[1..m,i],-1,1,
> 'inplace'=true,'outputoptions'=[]);
> ArrayTools:-Copy(m,tempV,0,1,c,(i-1)*m,1);
> end do;
> res:=LinearAlgebra:-Transpose(c).c;
> LinearAlgebra:-LA_Main:-MatrixScalarMultiply(res,evalf(1/m),
> 'inplace'=true,'outputoptions'=[]);
> end proc:

> M := LinearAlgebra:-RandomMatrix(200,200,'generator'=0.0..1.0,
> 'outputoptions'=['datatype'='float[8]']):
> st:=time(): cv1 := Cov(M): time()-st;
0.099

> st:=time(): cv2 := Statistics:-CovarianceMatrix(M): time()-st;
0.978

> LinearAlgebra:-Norm(cv1-cv2);
-14
0.244463449456484502 10


> M := LinearAlgebra:-RandomMatrix(10000,200,'generator'=0.0..1.0,
> 'outputoptions'=['datatype'='float[8]']):
> st:=time(): cv1 := Cov(M): time()-st;
1.078

> st:=time(): cv2 := Statistics:-CovarianceMatrix(M): time()-st;
22.094

> LinearAlgebra:-Norm(cv1-cv2);
-14
0.126576304945475327 10

Please Wait...