epostma

1429 Reputation

18 Badges

14 years, 179 days
Maplesoft

Social Networks and Content at Maplesoft.com

I am the manager of the Mathematical Software Group, working mostly on the Maple library. I have been working at Maplesoft since 2007, mostly on the Statistics and Units packages and on contract work with industry users. My background is in abstract algebra, in which I completed a PhD at Eindhoven University of Technology. During my studies I was always searching for interesting questions at the crossroads of math and computer science. When I came to Canada in 2007, I had nothing but a work permit, some contacts at Maplesoft (whom I had met at a computer algebra conference a year earlier), and a plan to travel around beautiful Canada for a few months. Since Maplesoft is a company that solves more interesting math and computer science related questions than just about anywhere else, I was quite eager to join them, and after about three months, I could start.

MaplePrimes Activity


These are replies submitted by epostma

Hi alex_01,

I'm not sure I understand what you mean - I have a hunch but let me ask you to confirm it before I start looking at this in detail. Do you want something like a conditional correlation between X and Y given that X = X0, and then compare  that number for values of X0 that are in the tails versus near the median / mean / ... ?

Also, what assumptions can we make on (X, Y)? Can we assume they're in a bivariate normal distribution, or can we assume nothing at all about them, or something in between?

Erik Postma
Maplesoft. 

Does anybody see a way to write a Maple program that, when run, generates a graph that gives its own listing? Self-printing programs or quines (wikipedia) are well-known, but I haven't seen self-graphing ones before.

This could be a skeleton of the program:

n:=9609393799189588849716729621278527547150043396601293066515055192717028023952664246896428421743\
507181212671537827706233559932372808741443078913259639413377234878577357498239266297155171737169951\
652328905382216124032388558661840132355851360488286933379024914542292886670810961844960917051834540\
678277315517054053816273809676025656250169814820834187831638491155902256100036523513703438744618483\
787372381982248498634650331594100549747005931383392264972494617515457283667023697454610146559979337\
98537483143786841806593422227898388722980000748404719:
h:=17:
w:=106:
tupper:=(x,y)->1-floor(frac((floor(y/h)*2^(h*x-(y mod h)))/2)*2):
with(ImageTools):
View(Create(Array(0..h-1,-w..-1,(y,x)->tupper(x,y+n),datatype=float[8],order=C_order))):

Now we'd need to turn this textual representation into an image (using some font), read it in, find the corresponding numbers n, h, w as Robert explains, and find a fixed point of this map. However, it seems unlikely that this will work straightaway - as the number n grows, the needed screen real estate to display n grows as well, and probably more quickly than n itself. So we would need a way to specify n more compactly - using fewer pixels.

Any takers?

Erik Postma
Maplesoft.

Great answer, @longrob. One note: if you need to omit a row, it's even easier to just use

plot(N[2.., 2], N[2.., 3]);

Hope this helps,

Erik Postma
Maplesoft.

Great answer, @longrob. One note: if you need to omit a row, it's even easier to just use

plot(N[2.., 2], N[2.., 3]);

Hope this helps,

Erik Postma
Maplesoft.

You can also just do A[3] to obtain the third row.

Erik Postma
Maplesoft. 

You can also just do A[3] to obtain the third row.

Erik Postma
Maplesoft. 

@longrob : Sorry, I don't really have much windows experience. But I suggest you contact our support team, support@maplesoft.com. They deal with this sort of stuff on a regular basis. (If you're not in Canada or the US you could instead go to http://www.maplesoft.com/contact/international/ to find your local reseller.)

Best of luck,

Erik.

One case in which I could imagine getting a return value of type function would be if fsolve is unsuccessful in finding a root at full precision; it returns unevaluated in that case. (It does this because later manipulation might change the expression inside, and in that case it might be possible to find a solution when you re-evaluate.) A silly example:

fsolve(abs(x) = -1, x);

does some computation, sees that it can't find a root, and returns the original function call. Something similar can happen with roots that are numerically difficult to find with high precision. So, I echo @longrob's suggestion: it might be a good idea if you could show us exactly what fsolve gets as its arguments.

(As an aside - the opening bracket and closing parenthesis don't match. That must have been a copy-and-paste error.)

Hope this helps,

Erik Postma
Maplesoft.

One case in which I could imagine getting a return value of type function would be if fsolve is unsuccessful in finding a root at full precision; it returns unevaluated in that case. (It does this because later manipulation might change the expression inside, and in that case it might be possible to find a solution when you re-evaluate.) A silly example:

fsolve(abs(x) = -1, x);

does some computation, sees that it can't find a root, and returns the original function call. Something similar can happen with roots that are numerically difficult to find with high precision. So, I echo @longrob's suggestion: it might be a good idea if you could show us exactly what fsolve gets as its arguments.

(As an aside - the opening bracket and closing parenthesis don't match. That must have been a copy-and-paste error.)

Hope this helps,

Erik Postma
Maplesoft.

@longrob : Do any other commands involving Compiler:-Compile work? (See e.g. the ?Compiler/Compile help page.) If not, please look at the instructions at http://www.maplesoft.com/support/faqs/detail.aspx?sid=99396. On OS X, you'd have to install XCode; on 64-bit windows, you'll want to follow these instructions. The 32-bit windows installation comes with a working compiler and on Linux a compiler is usually already installed in the system and picked up automatically.

Erik.

@longrob : I'll answer your questions in random order.

  • Determining the time spent in different parts of a program is called profiling, and in Maple this functionality is provided by the  CodeTools:-Profiling  package - in particular, the CodeTools:-Profiling:-Profile and showstat commands. I think the help pages probably have enough detail to get you started (now that you know where to look! :) ).
  • The difference between versions 2 and 3 are twofold: first of all, the RandomVariables are created only once (creating such a random variable is not a big computation, but the administrative overhead is substantial) and then remembered. This is accomplished in the variableStorage Vector and the getVariables procedure. Secondly, using the profiling mentioned in the previous bullet point, I saw that the line defining expr also took substantial time. This could really only come from the sqrt function call. By enclosing sqrt in forward quotes, I prevented Maple from doing any actual computation there; when the expression enters Sample, and via that enters the external code, all computation happens there anyway.
  • I think leaving out uses S = Statistics; was simply a copy and paste error - I'm pretty sure I had it in my original version 3 code as well. But it doesn't matter - thanks for catching it!

Finally, I realized late last night that part of the evaluation in the external C code still happens inside Maple - the external code calls back into the interpreted Maple environment to evaluate part of the expressions. So I came up with the following:

ncubem4 := module()
local
    ModuleApply, normalSample, sampleStorage, processor;

    normalSample := Statistics:-Sample('Uniform(0, 1)');
    sampleStorage := Vector(0, 'datatype = float');

    processor := proc(s :: Vector(datatype = float[8]),
                      n :: integer,
                      N :: integer)
    local
        i :: integer,
        intermediate :: float[8],
        j :: integer,
        result :: float[8];

        result := 0.;
        for i to 2*n*N by 2*n do
            intermediate := 0.;
            for j from i to i + 2*n - 1 by 2 do
                intermediate := intermediate + (s[j] - s[j+1])^2;
            end do;
            result := result + sqrt(intermediate);
        end do;

        return result / N;
    end proc;

    processor := Compiler:-Compile(processor);

    ModuleApply := proc(n :: posint, N :: posint, m)
        if numelems(sampleStorage) < 2*n*N then
            # Sample storage is too small to contain sufficiently many
            # sampled numbers: allocate a new Vector.
            sampleStorage := normalSample(2*n*N);
        elif numelems(sampleStorage) > 2*n*N then
            # Sample storage is bigger than needed - fill only an
            # initial segment.
            normalSample(ArrayTools:-Alias(sampleStorage, [2*n*N]));
        else
            # Sample storage is exactly the right size - fill it up
            # with fresh samples.
            normalSample(sampleStorage);
        end if;

        return m * processor(sampleStorage, n, N);
    end proc;
end module;
CodeTools:-Usage(ncubem4(15, 1000, 1), iterations=1000):
memory used=1.20KiB, alloc change=0.75MiB, cpu time=2.30ms, real time=2.31ms

One caveat is that this will only work in hardware float mode; if you set Digits higher than evalhf(Digits) (which is currently 15 on all platforms) or you set UseHardwareFloats to false, the code will die miserably. However, this is stuff you shouldn't do with high precision anyway: the Monte Carlo scheme destroys the high precision anyway.

The numbers here are a bit skewed since this is my work machine instead of home - the numbers I get here for the other versions are: 200.64ms for ncubem, 31.55ms for ncubem2, 7.10ms for ncubem3. So here we have another factor of 3 over ncubem3.

Erik.

Nice work, longrob. By the way, here is a neat (I think) trick to make ncubem even faster. The Statistics:-Sample command can sample complicated algebraic expressions involving random variables; all of this is done in external code and therefore lightning fast. So we can do (version 2) :

ncubem2 := proc(n :: posint, N :: posint, m)
uses
S = Statistics;
local
expr,
i,
variables;

variables := [seq(S:-RandomVariable('Uniform'(0, m)), i = 1 .. 2*n)];
expr := sqrt(add((variables[2*i-1] - variables[2*i])^2, i = 1 .. n));
return S:-Mean(S:-Sample(expr, N));
end proc;

If we do this, on my home machine I get:

CodeTools:-Usage(ncubem(15, 1000, 1), iterations=100):
memory used=14.18MiB, alloc change=32.12MiB, cpu time=171.05ms, real time=173.70ms
CodeTools:-Usage(ncubem2(15, 1000, 1), iterations=100):
memory used=0.60MiB, alloc change=0 bytes, cpu time=13.76ms, real time=13.86ms

Further examination shows that about 1/3 of the time is spent generating the random variables (30 per call, in this example) and 1/3 of the time is spent computing the symbolic square root in expr, which returns unevaluated. This leads to the following optimization (version 3) :

ncubem3 := module()
local variableStorage, getVariables, ModuleApply;
variableStorage := Vector(0);

# Return at least k uniform(0, 1) random variables.
getVariables := proc(k)
local
i;
if numelems(variableStorage) < k then
for i from numelems(variableStorage) + 1 to k do
variableStorage(i) := Statistics:-RandomVariable('Uniform(0, 1)');
end do;
end if;

return variableStorage;
end proc;

ModuleApply := proc(n :: posint, N :: posint, m)
local
expr,
i,
variables;

variables := getVariables(2 * n);
expr := 'sqrt'(add((variables[2*i-1] - variables[2*i])^2, i = 1 .. n));
return S:-Mean(S:-Sample(expr, N));
end proc;
end module;
CodeTools:-Usage(ncubem3(15, 1000, 1), iterations=100):
memory used=360.52KiB, alloc change=0 bytes, cpu time=7.00ms, real time=7.06ms

A total speedup of about a factor 25.

Hope this helps,

Erik Postma 
Maplesoft. 

@Andreas Madsen : Well, there really is no difference between an exponential function and a power function, right? The equations y = a * b^x and y = a * exp(B * x) and y = exp(A + B * x) are all equivalent, if A = exp(a) and B = exp(b).

The nice thing about doing the log transform is that that turns it into a linear fitting problem. (Both the exponential and the power fit, since they are the same.) That means that we can discuss the transformed, linear, fitting problem instead of the original nonlinear one. A perfect fit for one problem clearly translates into a perfect fit for the other; we sort of assume that a good-but-not-perfect fit for the one will still translate into a good fit for the other (and actually, the measure of fit may be better in this case than in the original, as a friend recently pointed out to me). For the linear problem, we can talk about R^2, defined in terms of the residual sum of squares and total sum of squares in the log-transformed problem; for the nonlinear problem, we need to resort to pseudo-R^2, defined in terms of the residual sum of squares and total sum of squares in the original problem. Those are different numbers, and the values that result are different.

If instead of the R^2 of the linearized problem, we compute the pseudo-R^2 of the original, nonlinear, fitting problem, we will get a different answer. I have argued above that in some cases, the result from the linear fit would be a better answer, mathematically, but in other cases you might be more interested in the pseudo-R^2 of the nonlinear problem. If that is the case, you can use the residualsumofsquares procedure that you have defined; it computes the residual sum of squares of the original, nonlinear, fitting problem with respect to the solution found in the linearized problem.

If you want to compute the residual sum of squares of the linearized problem "by hand", you can do so as follows:

logY := log~(Y);
fitted := LinearFit([1, x], X, logY, x);
evalf(residualsumofsquares(fitted, X, logY, x); # returns 8.0527587...

So to answer your questions:
- The log transform can be used if it transforms the problem to a linear problem, which is the case for exponential and power functions (they're the same).
- If you are calculating R^2 and for that a residual sum of squares, first decide for which problem to compute them. I would think that if you can linearize the problem, then it's often best to do that first.

Hope this helps,

Erik Postma
Maplesoft

PS: in my code I used a Maple 15 addition, the kernel function ?numelems. I see that you are using ?ArrayTools/NumElems, which was available before but is a little slower. A good idea if you want to be compatible with earlier Maple versions.

@Andreas Madsen : Well, there really is no difference between an exponential function and a power function, right? The equations y = a * b^x and y = a * exp(B * x) and y = exp(A + B * x) are all equivalent, if A = exp(a) and B = exp(b).

The nice thing about doing the log transform is that that turns it into a linear fitting problem. (Both the exponential and the power fit, since they are the same.) That means that we can discuss the transformed, linear, fitting problem instead of the original nonlinear one. A perfect fit for one problem clearly translates into a perfect fit for the other; we sort of assume that a good-but-not-perfect fit for the one will still translate into a good fit for the other (and actually, the measure of fit may be better in this case than in the original, as a friend recently pointed out to me). For the linear problem, we can talk about R^2, defined in terms of the residual sum of squares and total sum of squares in the log-transformed problem; for the nonlinear problem, we need to resort to pseudo-R^2, defined in terms of the residual sum of squares and total sum of squares in the original problem. Those are different numbers, and the values that result are different.

If instead of the R^2 of the linearized problem, we compute the pseudo-R^2 of the original, nonlinear, fitting problem, we will get a different answer. I have argued above that in some cases, the result from the linear fit would be a better answer, mathematically, but in other cases you might be more interested in the pseudo-R^2 of the nonlinear problem. If that is the case, you can use the residualsumofsquares procedure that you have defined; it computes the residual sum of squares of the original, nonlinear, fitting problem with respect to the solution found in the linearized problem.

If you want to compute the residual sum of squares of the linearized problem "by hand", you can do so as follows:

logY := log~(Y);
fitted := LinearFit([1, x], X, logY, x);
evalf(residualsumofsquares(fitted, X, logY, x); # returns 8.0527587...

So to answer your questions:
- The log transform can be used if it transforms the problem to a linear problem, which is the case for exponential and power functions (they're the same).
- If you are calculating R^2 and for that a residual sum of squares, first decide for which problem to compute them. I would think that if you can linearize the problem, then it's often best to do that first.

Hope this helps,

Erik Postma
Maplesoft

PS: in my code I used a Maple 15 addition, the kernel function ?numelems. I see that you are using ?ArrayTools/NumElems, which was available before but is a little slower. A good idea if you want to be compatible with earlier Maple versions.

Agreed, the best way is to make sure that X, Y, Z do not occur in the matrix when you do the substitution.

Of course we don't know how the original poster created the expression in the first place; but if possible, it might be best not to assign a value to R in the first place. You could just use

eval(intermediateResult, R = sqrt(X^2 + Y^2 + Z^2));

whenever you need to display an intermediate result or something like that, but not make the assignment - so that such problems do not crop up in the first place.

Hope this helps,

Erik Postma
Maplesoft.

First 6 7 8 9 10 11 12 Last Page 8 of 21