For problem 14, "Gaussian error function over a 1500x1500 matrix"

The original code,

with(LinearAlgebra):
TotalTime:=0:
for i from 1 by 1 to 100 do
a:=RandomMatrix(1500,density=0.5,generator=0.0..1.0,outputoptions=[datatype=float[8]]):
t:=time():
Map(erf,a):
TotalTime:=TotalTime+time()-t:
end do:
print(TotalTime);

was reported as taking about 4800 sec on their "Intel Quad Core Q6600 processor with 2.4 GHz and 2 GB RAM running under Windows Vista Home".

Replacing

LinearAlgebra:-Map(erf,a):

with

map[evalhf](erf,a):

makes it take only 17 sec on a single-CPU AMD Athlon64 3200+ under 64bit Linux with Maple 11.02.

Using,

map[evalhf,inplace](erf,a):

brought it to 10 sec.

Dave Linder Mathematical Software, Maplesoft

For problem 13, "Gamma function over a 1500x1500 matrix".

The original code,

with(LinearAlgebra):
TotalTime:=0:
for i from 1 by 1 to 100 do
a:=RandomMatrix(1500,generator=0.01..1.0,outputoptions=[datatype=float[8]]):
t:=time():
Map(GAMMA,a):
TotalTime:=TotalTime+time()-t:
end do:
print(TotalTime);

was reported as taking about 10000 sec on their "Intel Quad Core Q6600 processor with 2.4 GHz and 2 GB RAM running under Windows Vista Home".

Replacing

LinearAlgebra:-Map(GAMMA,a):

with

map[evalhf](GAMMA,a):

makes it take only 66 sec on a single-CPU AMD Athlon64 3200+ under 64bit Linux with Maple 11.02.

Using,

map[evalhf,inplace](GAMMA,a):

brought it to 59 sec.

Dave Linder Mathematical Software, Maplesoft

Take, for example, this Matrix U.
restart:
U := Matrix([[a,0,0],[0,0,0],[0,0,c]]);
with(LinearAlgebra):
Evals := Eigenvalues(U);
U1:= U - Evals[1]*IdentityMatrix(3):
Now apply that "trick" of dropping the bottom row of U1. If I've got the wrong idea from your post then please let me know.
V1:= LinearSolve(U1[1..2,1..3],Vector(2));
U.V1 - Evals[1].V1; # not all zero, so not right
That V1 doesn't look like the right parametrization of the eigenspace associated with the first eigenvalue. (Unless _t[3] is set to zero, but there is no mention of such clever assignments, in the trick as described. And how to deduce that, anyway?)
But this looks fine.
V1_1:= LinearSolve(U1,Vector(3));
U.V1_1 - Evals[1].V1_1;
The trick of removing the last row of the Characteristic Matrix (evaluated at an eigenvalue) looks dubious to me.
As for whether using LinearSolve or NullSpace is better, that depends on what form one wants the eigenvectors in. If you want a basis of the eigenspace (like Eigenvectors returns in columns) as separate Vectors then use NullSpace. If you want a parametrized sum, in the case of an eigenspace with more than one Vector in its basis, then use LinearSolve.
The above is all general, with nothing about those side relations.
If you want to avoid mistakenly using as a pivot some entry which is (actually, or under some side relations) zero, then setting Testzero or Normalizer should work with each of these.
LinearSolve(U1,Vector(3),method=LU);
LinearSolve(U1,Vector(3));
NullSpace(U);
The fact that it doesn't always work so well, using Normalizer (to include simplification with side relations) and Eigenvectors is something that could be investigated.
Dave Linder
Mathematical Software, Maplesoft

Take, for example, this Matrix U.
restart:
U := Matrix([[a,0,0],[0,0,0],[0,0,c]]);
with(LinearAlgebra):
Evals := Eigenvalues(U);
U1:= U - Evals[1]*IdentityMatrix(3):
Now apply that "trick" of dropping the bottom row of U1. If I've got the wrong idea from your post then please let me know.
V1:= LinearSolve(U1[1..2,1..3],Vector(2));
U.V1 - Evals[1].V1; # not all zero, so not right
That V1 doesn't look like the right parametrization of the eigenspace associated with the first eigenvalue. (Unless _t[3] is set to zero, but there is no mention of such clever assignments, in the trick as described. And how to deduce that, anyway?)
But this looks fine.
V1_1:= LinearSolve(U1,Vector(3));
U.V1_1 - Evals[1].V1_1;
The trick of removing the last row of the Characteristic Matrix (evaluated at an eigenvalue) looks dubious to me.
As for whether using LinearSolve or NullSpace is better, that depends on what form one wants the eigenvectors in. If you want a basis of the eigenspace (like Eigenvectors returns in columns) as separate Vectors then use NullSpace. If you want a parametrized sum, in the case of an eigenspace with more than one Vector in its basis, then use LinearSolve.
The above is all general, with nothing about those side relations.
If you want to avoid mistakenly using as a pivot some entry which is (actually, or under some side relations) zero, then setting Testzero or Normalizer should work with each of these.
LinearSolve(U1,Vector(3),method=LU);
LinearSolve(U1,Vector(3));
NullSpace(U);
The fact that it doesn't always work so well, using Normalizer (to include simplification with side relations) and Eigenvectors is something that could be investigated.
Dave Linder
Mathematical Software, Maplesoft

In the FPS system of units, the default unit for length is the foot (ft). That's what the F in FPS stands for.
You can create and use a new system, where inches is used instead of feet.
Here's an example, where inches is used for 1-dimensional length.
restart:
with(Units[Natural]):
Units[AddSystem](FPS_with_inches, Units[GetSystem](FPS), inch);
Units[UseSystem](FPS_with_inches):
85000*pounds/inch^2;
36*lbs/ft;
Dave Linder
Mathematical Software, Maplesoft

In the FPS system of units, the default unit for length is the foot (ft). That's what the F in FPS stands for.
You can create and use a new system, where inches is used instead of feet.
Here's an example, where inches is used for 1-dimensional length.
restart:
with(Units[Natural]):
Units[AddSystem](FPS_with_inches, Units[GetSystem](FPS), inch);
Units[UseSystem](FPS_with_inches):
85000*pounds/inch^2;
36*lbs/ft;
Dave Linder
Mathematical Software, Maplesoft

Thanks, that's useful, to have something so concrete to consider.
What sizes (N, rho, d, trace_list) are most important? Do you expect to call the routines many, many times at small sizes, or fewer times at larger sizes?
Does your profiling show relatively how much overhead is within Feynman_permutation_matrix() and Permutation_matrix(), for Partial_trace()?
In Parametrize_SU_Euler(), the final loop looks heavy in terms of how many new Matrices get created (eventual garbage). One possibility might be to re-use Matrix temp each time through it (using ArrayTools:-Copy to get lambda[used_lambdas[k]] into it, and then scale it in-place). Hmm. Does that iterated scaling & matrix exponential simplify mathematically, I wonder? There are, what, N^2-1 matrix exponential calls?
It might be worthwhile to look at the looping which creates used_lambdas. Interesting.
If there is just too much overhead (function calls, garbage collection, etc), how low-level would you be willing to go? For example, would you consider examing and re-using the matrix-exponential Library internal routine but altered to re-use workspace Matrices? How about direct BLAS calls from within Maple, instead of LinearAlgebra routines? Of course, those would be last resorts, if no other big improvements could be made.
Dave Linder
Mathematical Software, Maplesoft

Thanks, that's useful, to have something so concrete to consider.
What sizes (N, rho, d, trace_list) are most important? Do you expect to call the routines many, many times at small sizes, or fewer times at larger sizes?
Does your profiling show relatively how much overhead is within Feynman_permutation_matrix() and Permutation_matrix(), for Partial_trace()?
In Parametrize_SU_Euler(), the final loop looks heavy in terms of how many new Matrices get created (eventual garbage). One possibility might be to re-use Matrix temp each time through it (using ArrayTools:-Copy to get lambda[used_lambdas[k]] into it, and then scale it in-place). Hmm. Does that iterated scaling & matrix exponential simplify mathematically, I wonder? There are, what, N^2-1 matrix exponential calls?
It might be worthwhile to look at the looping which creates used_lambdas. Interesting.
If there is just too much overhead (function calls, garbage collection, etc), how low-level would you be willing to go? For example, would you consider examing and re-using the matrix-exponential Library internal routine but altered to re-use workspace Matrices? How about direct BLAS calls from within Maple, instead of LinearAlgebra routines? Of course, those would be last resorts, if no other big improvements could be made.
Dave Linder
Mathematical Software, Maplesoft

Jacques'

comments here are not off the mark. Working with Matrix blocks might be said to straddle an area between computational and abstract linear algebra (ALA). And ALA is a big area, with a number of challenging design issues. The not least of which would be to have it usefully mesh with the current computational functionality of LinearAlgebra.
I find it interesting that theoretical physicists have been showing the most interest in this area. We do read MaplePrimes content. And it's very helpful, adding insight for us on what users want to do.
Dave Linder
Mathematical Software, Maplesoft

Thanks for that bit of perspective, Jacques.
Dave Linder
Mathematical Software, Maplesoft

I look forward to seeing any of your results in this area.
An issue for me is that, right now, Sample(X,N) returns a Vector with either hardware float or software datatype, according to UseHardwareFloats and Digits. The same is true for Sample(N), which I believe returns a procedure which produces a Vector whose datatype is determined at invocation time. That is, it isn't fixed at the time that the procedure output of Sample(N) is created.
But Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample") produces a procedure with a specific characteristic. It produces a procedure which returns Vectors of only a single datatype. That datatype is determined when the procedure is created and not when it is called.
For example,

> kernelopts(opaquemodules=false):
> Statistics:-ExternalSupport:-Initialize():
> p_hw := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p := proc()
option call_external, define_external("STATS_MapleNormalRandomSample_HW",
MAPLE, LIB = "libstatshw.so");
call_external(0, 182945245392, true, args)
end proc
> Digits:=20:
> p_sw := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p := proc()
option call_external, define_external("STATS_MapleNormalRandomSample_MP",
MAPLE, LIB = "libstatsmp.so");
call_external(0, 182955154176, 0, args)
end proc

So routines p_hw and p_sw behave differently, each static w.r.t the datatype and environmental settings like Digits and UseHardwareFloats. But Statistics:-Sample(X,N) on the other hand, that changes behaviour according to those settings.
Emitting a lean procedure, with as little argument checking and as few optional parameters as possible, but which also respects UseHardwareFloats and Digits on the fly, well, that's very close to what Sample(X) now returns.
ps. Actually, it's a tiny little bit weird, that the software Vectors have datatype=anything instead of datatype=sfloat, but perhaps that's an attempt to be more flexible. It shouldn't affect these discussions.
Dave Linder
Mathematical Software, Maplesoft

I look forward to seeing any of your results in this area.
An issue for me is that, right now, Sample(X,N) returns a Vector with either hardware float or software datatype, according to UseHardwareFloats and Digits. The same is true for Sample(N), which I believe returns a procedure which produces a Vector whose datatype is determined at invocation time. That is, it isn't fixed at the time that the procedure output of Sample(N) is created.
But Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample") produces a procedure with a specific characteristic. It produces a procedure which returns Vectors of only a single datatype. That datatype is determined when the procedure is created and not when it is called.
For example,

> kernelopts(opaquemodules=false):
> Statistics:-ExternalSupport:-Initialize():
> p_hw := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p := proc()
option call_external, define_external("STATS_MapleNormalRandomSample_HW",
MAPLE, LIB = "libstatshw.so");
call_external(0, 182945245392, true, args)
end proc
> Digits:=20:
> p_sw := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p := proc()
option call_external, define_external("STATS_MapleNormalRandomSample_MP",
MAPLE, LIB = "libstatsmp.so");
call_external(0, 182955154176, 0, args)
end proc

So routines p_hw and p_sw behave differently, each static w.r.t the datatype and environmental settings like Digits and UseHardwareFloats. But Statistics:-Sample(X,N) on the other hand, that changes behaviour according to those settings.
Emitting a lean procedure, with as little argument checking and as few optional parameters as possible, but which also respects UseHardwareFloats and Digits on the fly, well, that's very close to what Sample(X) now returns.
ps. Actually, it's a tiny little bit weird, that the software Vectors have datatype=anything instead of datatype=sfloat, but perhaps that's an attempt to be more flexible. It shouldn't affect these discussions.
Dave Linder
Mathematical Software, Maplesoft

Hi John,
We've been working on a KroneckerProduct routine. The two variations you just gave here, entered with exactly the same syntax, both produce the same result in a development version of Maple.
Dave Linder
Mathematical Software, Maplesoft

I'm trying to understand how far (or close) to these ideas the following distinction is:
X := Statistics:-RandomVariable(Normal(0,1));
# numeric vector return, with overhead
V := Statistics:-Sample(X,5);
# or...
G := Statistics:-Sample(X);
# numeric vector return, with less overhead
V := G(5);
There is still some overhead in the returned procedure G. Consider this variant,
kernelopts(opaquemodules=false):
Statistics:-ExternalSupport:-Initialize();
p := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p(5,0,1);
But it seems that Statistics:-ExternalSupport:-DefineExternal will behave according to UseHardwareFloats and Digits. So an even leaner "trampoline" sort of reassignment might be tricky.
Dave Linder
Mathematical Software, Maplesoft

I'm trying to understand how far (or close) to these ideas the following distinction is:
X := Statistics:-RandomVariable(Normal(0,1));
# numeric vector return, with overhead
V := Statistics:-Sample(X,5);
# or...
G := Statistics:-Sample(X);
# numeric vector return, with less overhead
V := G(5);
There is still some overhead in the returned procedure G. Consider this variant,
kernelopts(opaquemodules=false):
Statistics:-ExternalSupport:-Initialize();
p := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p(5,0,1);
But it seems that Statistics:-ExternalSupport:-DefineExternal will behave according to UseHardwareFloats and Digits. So an even leaner "trampoline" sort of reassignment might be tricky.
Dave Linder
Mathematical Software, Maplesoft