Dave Linder

527 Reputation

19 years, 287 days
Maplesoft
Software Architect

Dave Linder
Mathematical Software, Maplesoft

Sample...

I'm trying to understand how far (or close) to these ideas the following distinction is: X := Statistics:-RandomVariable(Normal(0,1)); # numeric vector return, with overhead V := Statistics:-Sample(X,5); # or... G := Statistics:-Sample(X); # numeric vector return, with less overhead V := G(5); There is still some overhead in the returned procedure G. Consider this variant, kernelopts(opaquemodules=false): Statistics:-ExternalSupport:-Initialize(); p := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample"); p(5,0,1); But it seems that Statistics:-ExternalSupport:-DefineExternal will behave according to UseHardwareFloats and Digits. So an even leaner "trampoline" sort of reassignment might be tricky. Dave Linder Mathematical Software, Maplesoft

understanding...

Often the replies of Doug and Georgios show that they have made extra effort, both to understand what the original submitters were after as well as to illustrate their solutions. Good choices! Dave L ps. I have no part in the mentor awards selection process.

Hi Jacques, Along with this other reply to an earlier post, you seem to be hinting that we could do much better with the dispatching in the Statistics package. We know that we should get rid of as much use of the Library level ProcessOptions package as possible, going instead with the newer, more efficient parameter-processing now in the kernel. (see ?paramprocessing) It gets a little complicated because, for example, routines like Sample can produce numeric Vectors or procedures to generate the same. Also, Statistics is complicated, as a package. Even amending Sample so that it can admit a Vector argument, for in-place re-use, is a bit involved. I didn't see an example of statistical computations in your preprint that you cited here. Does your MapleMIX system already have examples related to what Maple's Statistics does, can I ask? best regards, Dave Linder Mathematical Software, Maplesoft

Hi Jacques, Along with this other reply to an earlier post, you seem to be hinting that we could do much better with the dispatching in the Statistics package. We know that we should get rid of as much use of the Library level ProcessOptions package as possible, going instead with the newer, more efficient parameter-processing now in the kernel. (see ?paramprocessing) It gets a little complicated because, for example, routines like Sample can produce numeric Vectors or procedures to generate the same. Also, Statistics is complicated, as a package. Even amending Sample so that it can admit a Vector argument, for in-place re-use, is a bit involved. I didn't see an example of statistical computations in your preprint that you cited here. Does your MapleMIX system already have examples related to what Maple's Statistics does, can I ask? best regards, Dave Linder Mathematical Software, Maplesoft

ping Dave L

zgemm (Linux)...

I happened to already have a code fragment for this idea. The example below does 32x32 complex Matrix^%H-Matrix multiplication 10000 times. This only works in Linux, but shows the idea right away of using zgemm (f06zaf) directly. The benefits are avoiding recreation of Matrix rho at each iteration, and avoiding actual transposition. The speedup for this example is about a factor of 2. > # benchmark > kernelopts(printbytes=false): > with(LinearAlgebra): > > (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused): > (maxiter,N,d) := 10000,5,2: > temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]): > for i from 1 to maxiter do > rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply( > LinearAlgebra:-LA_Main:-HermitianTranspose(temp,inplace=false, > outputoptions=[]), > temp,inlace=false,outputoptions=[]); > od: > time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu; 9.768, 4652204, 576723804 > # Using zgemm directly, in a brand new session # zgemm does c <- alpha*a*b + beta*c (can transpose a or b) # Some bug prevents datatype=complex[8] on ARRAY parameters, # but this can be circumvented using ArrayTools:-ComplexAsFloat. > zgemm := define_external( 'zgemm_', > TRANSA::string[1], > TRANSB::string[1], > M::REF(integer[4]), > N::REF(integer[4]), > K::REF(integer[4]), > ALPHA::REF(complex[8]), > A::ARRAY(1..M,1..K,order=Fortran_order,datatype=float[8]), > LDA::REF(integer[4]), > B::ARRAY(1..K,1..N,order=Fortran_order,datatype=float[8]), > LDB::REF(integer[4]), > BETA::REF(complex[8]), > C::ARRAY(1..M,1..N,order=Fortran_order,datatype=float[8]), > LDC::REF(integer[4]), > LIB="libatlas.so" ): > > kernelopts(printbytes=false): > with(LinearAlgebra): > > (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused): > (maxiter,N,d) := 10000,5,2: > rho := Matrix(d^N,d^N,datatype=complex[8]): > temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]): > alpha := 1.0+0.0*I: > beta := 0.0+0.0*I: > rtemp := ArrayTools:-ComplexAsFloat(temp): > rrho := ArrayTools:-ComplexAsFloat(rho): > for i from 1 to maxiter do > zgemm("C","N",d^N,d^N,d^N,alpha,rtemp,d^N,rtemp,d^N,beta,rrho,d^N); > od: > time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu; 5.016, 3800392, 64592136 > > Norm(temp^%H . temp - rho); 0. There is some lost overhead due to not calling any LinearAlgebra routine at all. Setting maxiter,N,d to 10000,0,2 resp. give performance for the methods of, 5.432, 4193536, 164662908 2.564, 3996964, 60722984 Setting maxiter to 5,10,2 gives performance for the two methods of, 31.737, 488022752, 661812396 11.404, 34006956, 34303948 This case, with the larger size of 1024x1024 Matrices, shows how much memory management is struggling. Memory fragmentation may be a cause. Dave Linder Mathematical Software, Maplesoft

zgemm (Linux)...

I happened to already have a code fragment for this idea. The example below does 32x32 complex Matrix^%H-Matrix multiplication 10000 times. This only works in Linux, but shows the idea right away of using zgemm (f06zaf) directly. The benefits are avoiding recreation of Matrix rho at each iteration, and avoiding actual transposition. The speedup for this example is about a factor of 2. > # benchmark > kernelopts(printbytes=false): > with(LinearAlgebra): > > (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused): > (maxiter,N,d) := 10000,5,2: > temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]): > for i from 1 to maxiter do > rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply( > LinearAlgebra:-LA_Main:-HermitianTranspose(temp,inplace=false, > outputoptions=[]), > temp,inlace=false,outputoptions=[]); > od: > time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu; 9.768, 4652204, 576723804 > # Using zgemm directly, in a brand new session # zgemm does c <- alpha*a*b + beta*c (can transpose a or b) # Some bug prevents datatype=complex[8] on ARRAY parameters, # but this can be circumvented using ArrayTools:-ComplexAsFloat. > zgemm := define_external( 'zgemm_', > TRANSA::string[1], > TRANSB::string[1], > M::REF(integer[4]), > N::REF(integer[4]), > K::REF(integer[4]), > ALPHA::REF(complex[8]), > A::ARRAY(1..M,1..K,order=Fortran_order,datatype=float[8]), > LDA::REF(integer[4]), > B::ARRAY(1..K,1..N,order=Fortran_order,datatype=float[8]), > LDB::REF(integer[4]), > BETA::REF(complex[8]), > C::ARRAY(1..M,1..N,order=Fortran_order,datatype=float[8]), > LDC::REF(integer[4]), > LIB="libatlas.so" ): > > kernelopts(printbytes=false): > with(LinearAlgebra): > > (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused): > (maxiter,N,d) := 10000,5,2: > rho := Matrix(d^N,d^N,datatype=complex[8]): > temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]): > alpha := 1.0+0.0*I: > beta := 0.0+0.0*I: > rtemp := ArrayTools:-ComplexAsFloat(temp): > rrho := ArrayTools:-ComplexAsFloat(rho): > for i from 1 to maxiter do > zgemm("C","N",d^N,d^N,d^N,alpha,rtemp,d^N,rtemp,d^N,beta,rrho,d^N); > od: > time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu; 5.016, 3800392, 64592136 > > Norm(temp^%H . temp - rho); 0. There is some lost overhead due to not calling any LinearAlgebra routine at all. Setting maxiter,N,d to 10000,0,2 resp. give performance for the methods of, 5.432, 4193536, 164662908 2.564, 3996964, 60722984 Setting maxiter to 5,10,2 gives performance for the two methods of, 31.737, 488022752, 661812396 11.404, 34006956, 34303948 This case, with the larger size of 1024x1024 Matrices, shows how much memory management is struggling. Memory fragmentation may be a cause. Dave Linder Mathematical Software, Maplesoft

?OpenMaple,C,Examples...

There is quite a bit of information on the topic of external-calling in the help-page, ?OpenMaple,C,Examples , and in the pages which it cross-references. Dave Linder Mathematical Software, Maplesoft

A reason it was slow...

A reason for the relative slowness of the terser method you gave is that it forms a list of 1 million calls to rand(0..1), I believe. Those are only evaluated subsequent to the creation of the list. Compare with this, > ba,bu,st:=kernelopts(bytesalloc),kernelopts(bytesused),time(): > g := rand(0..1): > N:= 10^6: > R := [ seq( add( g(), k=1..12 )-6, i=1..N ) ]: > kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu,time()-st; 11401176, 176104208, 10.168 So the memory allocation difference for using a hardware datatype Vector instead of a list, for R, isn't that much here. Dave Linder Mathematical Software, Maplesoft

A reason it was slow...

A reason for the relative slowness of the terser method you gave is that it forms a list of 1 million calls to rand(0..1), I believe. Those are only evaluated subsequent to the creation of the list. Compare with this, > ba,bu,st:=kernelopts(bytesalloc),kernelopts(bytesused),time(): > g := rand(0..1): > N:= 10^6: > R := [ seq( add( g(), k=1..12 )-6, i=1..N ) ]: > kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu,time()-st; 11401176, 176104208, 10.168 So the memory allocation difference for using a hardware datatype Vector instead of a list, for R, isn't that much here. Dave Linder Mathematical Software, Maplesoft

thank you...

Thank you, Douglas, We will be entering each issue in your Document into our database, and then examining them. I appreciate the time and effort you've taken in reporting all this. I can say something about one of the issues. fsolve is set up generally to disallow mixing of operator form and expression form input. So, having defined, h := x -> minimize(n*x^(1/n), n = 1 .. 3); the subsequent call, fsolve(h-2), mixes those. Sure, 2 may act nicely as an operator, but in general such mixing could be unfortunate to allow. Now, you may wonder why the examples with, f := x -> minimize(n*x, n = 1 .. 3); g := x -> minimize(x^(1/n), n = 1 .. 3); seem to work out. It's due to a behaviour of fsolve which I consider wrong in general, that it will try to apply g (or f) to some dummy local variable r. For your g and f, that application results in min(r,r^(1/3)) and min(r,3*r) respectively. For h, `minimize` does not return such a result of a single `min` call. I'd like to improve the error message, "Can't handle expressions with typed procedures". But I'd also like to stop fsolve from that application of the functional input to dummy r. One can come up with examples where such application produces a wrong result. That would likely mean that your f and g examples would also generate an (improved) error message from fsolve. If that is worrisome, please consider the example fsolve(f-2-Pi). If allowed through, that mixed example would try to compute something like fsolve('f'(r)-2-Pi(r),r) which would be misguided. best regards, Dave Linder Mathematical Software, Maplesoft

macros for escaped maple calls?...

Hi Jacques, Does the vim maple-mode file have macros in it? I'm thinking of things like this: - escaping to a new shell, which starts maple and pipes in the current buffer of maple source and lands one at the maple prompt. - escaping to a shell and piping the current buffer's source into a maple session, without leaving one at the prompt, but while stripping any leading # mark off lines which continue with a savelib() call. You get the picture, I am sure. I was wondering whether such macros might be put in as commented-out suggestions, in the vim maple-mode file. The maple sessions started by such macros might take a few nice -b options, too. Dave Linder Mathematical Software, Maplesoft

it's useful...

This is useful Doug. By that I mean it's very useful to have straighforward examples which show what can be improved. That routine, RealRange_union, dates from about 1993. The bits about infinity were added at some later date. It's clear that you enjoyed analyzing it. Another thing worth examining might be whether the use of eval@subs is really necessary in RealRange_ends. Dave Linder Mathematical Software, Maplesoft