I was curious to see how much the compiler can help in such problems.
It seems that in Carl's version (which is very fast) the compiler is not very useful.
I used practically my original version, made compilable.
Here it is:
Tri1:=proc(n::integer,V::Vector(datatype= float[8]),
X::Vector(datatype= float[8]),Y::Vector(datatype= float[8]),
Ax::float[8],Ay::float[8],Bx::float[8],By::float[8],Cx::float[8],Cy::float[8])
local k::integer,a::float[8],b::float[8];
for k to n do
a:=V[k]; b:=V[n+k];
if a+b>1 then a:=1-a;b:=1-b fi;
X[k]:=Ax+(Bx-Ax)*a+(Cx-Ax)*b;
Y[k]:=Ay+(By-Ay)*a+(Cy-Ay)*b
od;
end:
CTri1:=Compiler:-Compile(Tri1):
Tri:=proc(n,Ax,Ay,Bx,By,Cx,Cy)
local
V:=LinearAlgebra:-RandomVector(2*n, generator= 0..1., datatype= float[8]),
X:=Vector(n,datatype= float[8]), Y:=Vector(n,datatype= float[8]);
global Ctri1;
CTri1(n,V,X,Y,Ax,Ay,Bx,By,Cx,Cy);
X,Y
end:
####
Ax:=-1: Ay:=0: # Triangle T = ABC
Bx:=0: By:=4:
Cx:=2: Cy:=1:
n:=6000: # number of random points
T:=CodeTools:-Usage(Tri(n,Ax,Ay,Bx,By,Cx,Cy),iterations=100):
memory used=195.55KiB, alloc change=18.38MiB, cpu time=940.00us, real time=920.00us, gc time=0ns
Compare with Carl's version (not compiled)
T:= CodeTools:-Usage(SampleTriangle([[-1,0],[0,4],[2,1]], 6000), iterations=100):
memory used=0.86MiB, alloc change=8.21MiB, cpu time=24.18ms, real time=6.52ms, gc time=1.25ms
Note that I have used iterations=100, otherwise the timing was not stable enough.