kfli

75 Reputation

4 Badges

8 years, 323 days

MaplePrimes Activity


These are replies submitted by kfli

Ok, I think the way to solve this is to create somehow create a system of equation A*phi = s, and then use A^-1 to create a system of equation that only depend on s (and implicitly dependent on phi).
Not entirely sure how to proceed wit this knowledge though.

@dharr Awesome, that worked! I wonder where the 2 factor comes from. I saw it in the matlab example too but in most mathematical pages on the subject the factor 2 isn't mentioned. Or maybe I missed it.

@acer Is it possible to compile several Maple procedures into one C code? for example

g := proc(test)
  ...
end:
f := proc(test)
 ...
end

gf := proc(test)

  f(test);
  g(test);

end
cgf := compile(gf)

@acer Right of course, I just thought there might be a general issue with Maple-to-C conversion that simply cannot be fixed. Obviously details matter when optimizing. The code is actually quite simple with loops/summation/multiplication of arrays. The specific code I'm working on is below. I have to call this code like a bazillion times.

Product_3DC := proc(A :: Array(datatype=float[8]), B :: Array(datatype=float[8]), C :: Array(datatype=float[8]),
                  T :: integer, X :: integer, Y :: integer, XX :: integer, YY :: integer) 
local i::integer, j::integer, p::integer;
local t::integer, x::integer, y::integer;
local sum :: float[8];

for t from 0 to T do
  for x from 0 to X-XX do
   for y from 0 to Y-YY do
     sum := 0;
     for p from 0 to t do
       for j from 0 to x do
         for i from 0 to y do
           sum := sum + B[t-p+1,x-j+1,y-i+1]*A[p+1,j+1,i+1];
         end do;
         for i from 1 to Y-y do
           sum := sum + B[t-p+1,x-j+1,y+i+1]*A[p+1,j+1,i+1] + B[t-p+1,x-j+1,i+1]*A[p+1,j+1,y+i+1];
         end do;
       end do;
       for j from 1 to X-x do
         for i from 0 to y do
           sum := sum + B[t-p+1,x+j+1,y-i+1]*A[p+1,j+1,i+1] + B[t-p+1,j+1,y-i+1]*A[p+1,x+j+1,i+1];
         end do;
         for i from 1 to Y-y do
           sum := sum + B[t-p+1,x+j+1,y+i+1]*A[p+1,j+1,i+1] + B[t-p+1,x+j+1,i+1]*A[p+1,j+1,y+i+1] 
                      + B[t-p+1,j+1,y+i+1]*A[p+1,x+j+1,i+1] + B[t-p+1,j+1,i+1]*A[p+1,x+j+1,y+i+1];
         end do;
       end do;
     end do;

     for p from 1 to T-t do
       for j from 0 to x do
         for i from 0 to y do
           sum := sum + B[t+p+1,x-j+1,y-i+1]*A[p+1,j+1,i+1] + B[p+1,x-j+1,y-i+1]*A[t+p+1,j+1,i+1];
         end do;
         for i from 1 to Y-y do
           sum := sum + B[t+p+1,x-j+1,y+i+1]*A[p+1,j+1,i+1] + B[t+p+1,x-j+1,i+1]*A[p+1,j+1,y+i+1]
                      + B[p+1,x-j+1,y+i+1]*A[t+p+1,j+1,i+1] + B[p+1,x-j+1,i+1]*A[t+p+1,j+1,y+i+1];
         end do;
       end do;
       for j from 1 to X-x do
         for i from 0 to y do
           sum := sum + B[t+p+1,x+j+1,y-i+1]*A[p+1,j+1,i+1] + B[t+p+1,j+1,y-i+1]*A[p+1,x+j+1,i+1] 
                      + B[p+1,x+j+1,y-i+1]*A[t+p+1,j+1,i+1] + B[p+1,j+1,y-i+1]*A[t+p+1,x+j+1,i+1];
         end do;
         for i from 1 to Y-y do
           sum := sum + B[t+p+1,x+j+1,y+i+1]*A[p+1,j+1,i+1] + B[t+p+1,x+j+1,i+1]*A[p+1,j+1,y+i+1] 
                      + B[t+p+1,j+1,y+i+1]*A[p+1,x+j+1,i+1] + B[t+p+1,j+1,i+1]*A[p+1,x+j+1,y+i+1] 
                      + B[p+1,x+j+1,y+i+1]*A[t+p+1,j+1,i+1] + B[p+1,x+j+1,i+1]*A[t+p+1,j+1,y+i+1] 
                      + B[p+1,j+1,y+i+1]*A[t+p+1,x+j+1,i+1] + B[p+1,j+1,i+1]*A[t+p+1,x+j+1,y+i+1];
         end do;
       end do;
     end do;
     C[t+1,x+1,y+1] := (1/8)*sum;
   end do;
  end do;
end do;

end:
cProduct_3DC:=Compiler:-Compile(Product_3DC, optimize, inmem=false):

 

@acer Thank you! That works, however it wasn't much of an upgrade in performance for me unfortunately. I didn't expect Maple to have the -02 flag already. I'm curious though why the difference in execution time is so different from the Maple-compiled code and my c-native code. I expected the performance to be better in c-native but not by too much, however they are not even close.

@acer That would be great! Yea I'm using Maple 2019 and Ubuntu 19.10 (I use Windows 10 at work).

@acer Right, ok. I'm trying to write a procedure that multiplies arrays and does transpose etc. I would like to compile this code so it goes quickly.

@Carl Love misunderstanding. I edited my Comment. 

@Carl Love exactly. So I need to receive the Intt arrays for example. The way I have done it now is turn the Intt rtable into a vector which I then Grid:-Run(node, task,...., assigto' =Intt[sx,sy]). But this is a problem because if I need to do this several times I get a left hand side error. So I have to unassign Intt and assign every iteration. 

 

@Carl Love Thanks! I will try that. Maybe you can asnwer another question. My procedure creates several arrays that need to be combined into a long vector. The code that works is
 

Threads:-Task:-Start( null, seq(seq(Task=[task,phi_EQ,X0,RMLt,q_IC[i,j],i,j],i=1..Nx),j=1..Ny) ):


The phi_EQ is the "long" final vector. The task procedure computes sections, or arrays, in parallel and enters it into phi_EQ in the right place. So the task procedure does not conflict with subdomains. Doen below is a sample code. This of course will not run but you see what I'm trying to do. The equations are arbitrary and the cDerivative function just inputs an array and outputs another. Same with the cIntegral. I have tried so many different syntax, or ways, to do this with Grid:-Run but I am struggling to assign the arrays to the final Vector.

task:=proc(phix,x,RMLt,q_IC,sx,sy)
global kappa,tM,xM,yM,NU,Neqs;
local m,p,n,eq1,eq2Intt,v,neq,ind,
local ux,vx:

ind := (1..tM+1, 1..xM+1, 1..yM+1);
for neq from 1 to Neqs do
v[neq] := Array(ind, (m,p,n)-> x[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+(m-1)*(xM+1)*(yM+1)+(p-1)*(yM+1)+(n-1)+1], datatype=float[8]);
od: 

ux  := Array(ind, datatype=float[8]); cDerivative_3XC(v[1],ux,BMAx[sx],tM,xM,yM):
vx  := Array(ind, datatype=float[8]); cDerivative_3XC(v[2],vx,BMAx[sx],tM,xM,yM):

eq1[ind] := -ux[ind]-vx[ind]:
eq2[ind] := -2.0*ux[ind]+vx[ind]:

Intt[1] := Array(ind, datatype=float[8]); cIntegral_tC(eq1,Intt[1],RMLt,tM,xM,yM):
Intt[2] := Array(ind, datatype=float[8]); cIntegral_tC(eq2,Intt[2],RMLt,tM,xM,yM):

for neq from 1 to Neqs do
for m   from 0 to tM   do
for p   from 0 to xM   do
for n   from 0 to yM   do
if m=0 and p <= xM-2 and n <= yM-2 then 
  phix[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+m*(xM+1)*(yM+1)+p*(yM+1)+n+1]:=
         Intt[neq][1,p+1,n+1]+2*q_IC[neq,0,p,n]:

elif p > xM-2 or n > yM-BCy2 then 
phix[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+m*(xM+1)*(yM+1)+p*(yM+1)+n+1]:= 0:

else
  phix[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+m*(xM+1)*(yM+1)+p*(yM+1)+n+1]:=
         Intt[neq][m+1,p+1,n+1]:
fi:

od: od: od: od:

end:

 

@Carl Love 

Yes, that works. I'm using Maple 2018. Any reason for the compound line other than aesthetic? 

I must have made a mistake before because I ran the different alternatives again and you are correct, Alt1 and Alt3 are the same. To be honest I have no idea why it was showing different result before. It was a simple copy paste and print result.

@kfli 
This seems to be correct also

R := norm(df,2);
Q := df/~R;
gamma := Transpose(Q).fval/~R;
 

@Joe Riel

That's really great! I will be using that in all my codes from now on :D I was really missing the Matlab profiling function in Maple.  

@Carl Love 

Hah that's why I need people like you. I was ripping my hair out over that and it wasn't even the issue. 

Well the code does 'work' in the sense that it gives the right solution to the set of equations. There was only that weird issue where the produre just stopped for no apparant reason. No errors or anything, and I thought it had to do with the procedure issue but I was mistaken. I changed a few things as you see. I changed QR deomposition to 'NAG' output and I check the condition number of Q instead of R (because R is now a vector). I am not sure if that is the same thing mathematically so it could be working slightly different than original, but it atleast finishes the procedure. 

I might have been confused on the line of the code you comment. I was trying to implement MTM[ldivide]. I tried all three alternatives.

Alt.1 
R := norm(df,2);
Q := R/~df;
gamma := R/~Transpose(Q).fval;


Alt. 2
R := norm(df,2);
Q := MTM[ldivide](R,df);
gamma := MTM[ldivide](R,Transpose(Q).fval);


Alt. 3
gamma := Transpose(df).fval;

All give different results (ldivide is corrent). I thought that Alt.1 and Alt.2 were equivalent but I was wrong. Alt.2 is the from the original Matlab code. I thought lvdivide was the same as '/~'. The idea is to solve df.gamma=fval with QR decomp. 

df.gamma=fval
QR.gamma=fval
gamma=R^-1.Q' . fval

Side note: If we can make a procedure that alters prexisting Q R instead of making QR from scratch everytime we change df than this code would be working smooth as eggs.

@Carl Love 
That's crazy. Thanks for the info. Where did you read this? As of right now I have changed some things so that the code works for all types of equations. It looks like this if you are interested.

 

restart:
Digits:=12:
Runtime:=time():
with(LinearAlgebra):
with(plots):
with(orthopoly):
with(ArrayTools):
    


N := 2:
phix := Vector(N):
X    := Vector(N):
phix[1] := cos(x[2]):
phix[2] := 3*cos(x[1]):
X[1] := 1.0:
X[2] := 2.0:

#Code AndersonAcceleration.
AndersonAcceleration:=proc(N,phi,X)
global x, xS, here1, here2;
local mMax, itmax, atol, rtol, droptol, beta, AAstart, res_hist, df, DGg, gamma, X0;
local DG, mAA, iter, gval,fval,res_norm, condDF, tol, f_old, g_old, y, i, k, Q, R, QRg, dfT, DGT;

(*
%------------------------------------------------------------------------
% Function xS = AndersonAcceleration(N,phi,x0).
%
% Fixed-point iteration with or without Anderson acceleration.
% 'phi' is the fixed point iteration map and 'xS' is the 
% solution, so that xS = phi(xS).
%
% Input arguments:
%    X0 = Initial value solution. Note: In this function this variable 
%        must be a column vector.
%       1. 'mMax' = maximum number of stored residuals (non-negative integer).
%       NOTE: 'mMax' = 0 => no acceleration. default=1000
%       2. 'itmax' = maximum allowable number of iterations. default=1000
%       3. 'atol' = absolute error tolerance. default=1.0e-6
%       4. 'rtol' = relative error tolerance. default=1.0e-3
%       5. 'droptol' = tolerance for dropping stored residual vectors to 
%       improve conditioning: If 'droptol' > 0, drop residuals if the
%       condition number exceeds droptol; if droptol <= 0,
%       do not drop residuals.
%       6. 'beta' = damping factor: If 'beta' > 0 (and beta ~= 1), then 
%       the step is damped by beta; otherwise, the step is not damped.
%       NOTE: 'beta' can be a function handle; form beta(iter), where iter 
%       is the iteration number and 0 < beta(iter) <= 1.
%       7. 'AAstart' = acceleration delay factor: If 'AAstart' > 0, start 
%       acceleration when iter = AAstart.
%
% Output:
% xS = Solution vector.
%
% The notation used is that of H.F. Walker: Anderson Acceleration:
% Algorithms and implementation
%------------------------------------------------------------------------
*)

mMax    := 100:
itmax   := 100:
atol    := 1.0e-8:
rtol    := 1.0e-6:
droptol := 1.0e12:
beta    := 1.0:
AAstart := 0:

# Initialize storage arrays and number of stored residuals.
DG := Matrix():
df := Matrix():
DGg := Vector(N);
QRg := Vector(N);
mAA := 0:

X0 := X;
for iter from 0 to itmax do

   x:=X0:
   gval := Vector(phi):
   fval := gval - X0:
   res_norm := norm(fval,2):

   # Set the residual tolerance on the initial iteration.
   if iter = 0 then
      tol := max(atol,rtol*res_norm):
   fi:
    
   # Convergence test, if converged the loop stops.
   if res_norm <= tol then
      print('iter',iter,res_norm);
      break;   # Breaks for-loop
   fi:
    
   # If resnorm is larger than 1e8 at iter > 5, problem stops
   if res_norm >1e8 and iter > 5 then
      print('no_convergence',res_norm);
      break; # Breaks for-loop, diverged
   fi:

   # Fixed point iteration without acceleration, if mMax == 0.
   if mMax = 0 or iter < AAstart then
      # We update E <- g(E) to obtain the next approximate solution.
      X0 := gval:     
   else
      # With Anderson acceleration.
      # Update the df vector and the DG array.
      if iter > AAstart then
         if mAA < mMax or Size(df,2) = 1 then
            df := Concatenate(2,df,fval-f_old):
            DG := Concatenate(2,DG,gval-g_old):
         else 
            df := Concatenate(2,df[..,-1],fval-f_old):
            DG := Concatenate(2,DG[..,-1],gval-g_old):   
         fi:
         mAA := mAA + 1:
      fi:   # iter
      # We define the old g and f values for the next iteration
      f_old := fval;
      g_old := gval;

      if mAA = 0 then
         # Initialization
         # If mAA == 0, update X <- g(X) to obtain themah next approximate
         # solution. No least-squares problem is solved for iter = 0
         X0 := gval;
      else
         if mAA > 1 then
            Q,R := QRDecomposition(df,datatype=float,output='NAG');
            if type(Q, 'Matrix'(square)) then
              condDF := ConditionNumber(Q);
            else
              condDF := 1;
            fi:  
            while condDF > droptol and mAA > 1 do
                if mAA = 2 then
                   df := convert(df[..,2..-1],Vector);
                   DG := convert(DG[..,2..-1],Vector);
                else
                   df := df[..,2..-1];
                   DG := DG[..,2..-1];
                fi:
                Q,R := QRDecomposition(df,datatype=float,output='NAG');
                mAA := mAA - 1;
                if type(Q, 'Matrix'(square)) then
                  condDF := ConditionNumber(Q);
                else
                  condDF := 1;
                fi:
            od:
            if Size(df,2) > 1 then
               gamma := LeastSquares([Q,R],fval);
            else
               R := norm(df,2);
               Q := R/~df;
               gamma := R/~Transpose(Q).fval;
            fi:
            
         else
            R := norm(df,2);
            Q := R/~df;
            gamma := R/~Transpose(Q).fval;
         fi:
 
         if Size(gamma,1) > 1 then
            DGg:=DG.gamma:
         else
            DGg:=DG*gamma;
         fi:
         
         # Update the approximate solution.
         X0 := gval - DGg;

         # Damping for non-zero beta
         if beta > 0 and beta <> 1 then
            if mAA = 1 then
               QRg := Q*R*gamma;
            else
               QRg := df.gamma;
            fi:
            X0 := X0 - (1-beta)*(fval - QRg);
         fi:# isa(beta ...
      fi: # mAA = 0
   fi:# mMax == 0

od:
X[1..N] := X0[1..N];
end:

AndersonAcceleration(N,phix,X):
 

1 2 Page 1 of 2