sand15

reply...

Answered: sand15 1344

Yesterday at 5:39 PM

0 0

@janhardo

Of course Maple 2015 should have ALWAYS return an error when computing R := A.B.A^+

The problem is it didn't when A and B were floats.

In my procedure I mistakenly wrote Wres := (res . Rinv . res^+); where res is a vector column.
With the Maple 2015's bug above I get the correct result.
In your more recent version (I guess) this bug has been fixed, hence the error you are receiving..

Here is my corrected worksheet Iterative_Gaussian_Process_Emulator.mw
Apologies

@C_R In your comments you use L as ...

Answered: sand15 1344

December 22 2025

0 0

@C_R

In your comments you use L as a synomym for lambda. Correct? : right, sorry for my not having been consistent in my notations.

About your error:

In Maple 2015 X0, Y0, X and Y are column vectors: is it so with your Maple Version?
1. If not convert all of them into Vector[column],
2. if yes then :
  1. Download the attached file and duplicate it.
  2. Open them both in your Maple version.
  3. Keep the first one as the reference (do not execute it).
  4. Execute the second one.
  5. Tell me where the error occurs.

Compare_outputs_2015_your_version.mw

Thanks for your last comment and your de...

Answered: sand15 1344

December 21 2025

0 0

@C_R

In particular your informations about TuringBot are very interesting and I will havea look on it soon.

In case you would be interested in doing the same kind of thing in the future, here is a procedure which builds a GPE (Gaussian process Emulator) of an unknown scalar function f : X → Y (X being 1D) iteratively.
There is no stopping criterion: the decision to stop the process is yours as soon as you consider the GPE fits the data points well enough.

The process starts by selecting a small size subset of the data set (which may be interesting if this latter is large) and by constructing a first GPE based on this subset.
Next the process enters an iterative step where a new point (belonging to the data set) is added to the subset at each iteration and the GPE is rebuilt.
The selection of the new point is based on the prediction error of the GPE built at the previous iteration.

The whole process converges quite rapidly.
For your data on can get a very good result by using less than 15% of the data set.

Iterative_Gaussian_Process_Emulator.mw

RationalFractionFit with QR decompositio...

Answered: sand15 1344

December 18 2025

0 0

@dharr

The attached file contains a variant named QR_RationalFractionFit of procedure RationalFractionFit .
Compared to this latter I replaced the call to LinearFit by a the QRDecomposition of the matrix named inputs plus a resolution of the system R . C = Q⁺ . output using LinearSolve.

The attached file presents the [9, 9] case, which generates a Warning, model is not of full rank. You will see that the QR decomposition stabilizes the system an avoid the occurrenceof this warning.

The new QR_RationalFractionFit procedure is incomplete because it does not return the many statistics that LinearfFit returns. So there is still a lot of work to be done butnothing complicated here, simply some time to spend on it.

QR_RationalFractionFit.mw

Using this QR_RationalFractionFit procedure enables to compute the AIC criterion up to values greater than those at which you had to stop because ot the warnings.
The best model from the AIC criterion point of view has a numerator of degree dn = 11 and a denominator of degree dd = 13, corresponding to a 23-parameter model.
Comparing this number to the 101 points the data set contains means that this "best" model has a complexity of about 23% and so that it does not overfit the data.

Note that the fittings have been done up to the [20, 20] rational fraction without receiving any warning.

Thanks for your intervention...

Answered: sand15 1344

December 17 2025

0 0

@dharr

A few points:

The Warning, model is not of full rank comes ftom the fact Fit and LinearFit both use the same normal equations method to assess the regression coefficients C.
In my code I defined a matrix named inputs and a vector named output.
The normal equations are (inputs . inputs⁺) . C = inputs⁺ . output.
When fitting a polynomial function using the normal equation method the condition number of the matrix (inputs . inputs⁺)
generally increases with the degree of the polynomial, which may esult in the warning above.
For @C_R data this warning appears when the confition number is abour 10³⁰ for Digits = 15, that is for rational fractions[8, 7 and beyond], [9, 3 and beyond] and so on.
It is a shame that Fit or LinearFit do not propose more stable methods than the normal equations one to prevent these warnings (the QR decomposition of (inputs . inputs⁺) for instance).
You write "but more recently switched to AIC, though I have the impression that it still overfits more that I would like".
You're right, this sometimes happen with AIC. There also exist an AICc ('c' stand for 'corrected'), ften used for "small" samples, and a BIC (B stands for Bayesian) which generally compares favourably to AIC.
But yes, AIC is not a panacea and it is sometimes necessary to compare AIC results to results from another selection strategy to confirm its conclusions.
You write "At the end of the day your data looks suspiciously noise-free, and the key issue may be more of finding what type of function would look like that".
You're right, and this was the reason of what I said (I was writting a quite detailed comment to your question) in my first comment to @C_R.

I'm still working on this"detailed comment" but in a few words statistical regression contents two stages: the first one is the resilution of, let's say, normal equations (which is pure linear algebra) and the second one if the inferential stage where we try for instance to assess the significance of theregression coeficients... and this is Statistics.
To do inference we need errors, and in return this is the existence of these errors which justify the linear algebra stage, at least if some hypothesis are considered as true a priori (and the inference is also aimed to verify their acceptability... in fact it is like the snake which bites its own tail).
The reciprocal is that without errors we should not even speak of statistical regression. Of course, the linear algebra is still possible but its has nothing to do with statistics.

At the end of the eighties Sachs, Welch, Mitchell and Wynn (SWMW) "rediscovered" an old method used in mining exploration. They found it particularly well suited to build surrogates of computer code outputs, in simple words to buit compact representations of data provides by some code, and thus noiseless.
The method SWMW developped is named Gaussian Process Emulation (GPE), or simply Kriging in honour to geologist engineer D.G. Krige who first introduced it.
A Kriging surrogate is a Gaussian process whose particularity is to pass through the data points (the argument being that if we trust the code which produced the data there is no reason not to pass through these points), so it is an interpolator, but il also statistically predicts what can happen out of these points.
The CurveFitting package proposes a Kriging method since Maple 2019, but its implementation is the one used in geostatistics and not in the analysys of computer code outputs (it's more Krige's method than SWMS's).

I had thought using for @C_R problem, but GPE is not recommended for points so densely distributed points (in fact it will wotk well with only 10 points for instance).
More of this the formal expression of the surrogate is that complex (it involves matrix vector products whose dimensions equalthe number of points) that it is never explicitely given... and I thought this wasn'twas @C_R expected.
Nevertheless GPE corresponds to some kind of linear regression on noiseless data which nevertheless authorizes inference.

Not yet the regression/interpolation dis...

Answered: sand15 1344

December 16 2025

0 0

@C_R

I did some stuff around a trig unction with modulated frequency and amplitude.

To try and find such an acceptable guess I began using Explore, starting with something like C.sin(f(x)) where f(x) is a function of a few parameters.
Next I investigated more complex functions like G(x) = t(x) + a(x).sin(f(x)) where a(x) and and t(x) are also functions on a few parameters.
Note that the codes I provide are quite easy to modify to handle other expressions of G(x) than those I used... just do the things carefully.
After some trials I got an expession which looks like the fits I provided in my answer and could be, if lucky, fit quite well your data.

As G(x) is a highly non linear expression with 8 parameters I naturally tried using NLPSolve to minimize the sum of the squared residuals Y_n - G(X_n) over some a priori 8-dimensional parameter domain (here the (X_n, Y_n) are the points in your data set).

However NLPSolve sometimes fails and when not it still has the drawback to be a local optimization procedure, meaning there is no guarantee that its solution is the best one.
One idea could have been to run NLPSolve with many different initial points and take the best result found.

Nevertheless I prefered writting a simple Genetic Algorithm to perform global minimization.

You will find some results in the attached files which differ only by the expresison of the guesses G(x).
To illustrate my purpose here is an example of a t(x) + a(x).sin(f(x)) fit you can obtain:

TAF_1.mw
TAF_2.mw

My conclusion
Finding a good function "modulated trig" function is very chalenging and require spending a lot of time (not to speak about the coding of an elementary genetic algorithm).
This must be put in balance with the simplicity od finding a rational function which fits the dat very well.
In my opinion, unless you have strong "physical" reasons to fit a modulated trig function (for instance you could know about the physical process which generates your data), I believe it' a waste of time to find one when simple alternatives are possible.
In the contrary, if you have those strong reasons it is worth using my code(s) to try and find such function.

A last point
Even after years of practice in data analysis I didn't know about symbolic regression (not I never heard about it in the congresses or symposia I attended).
But the subject of model selection is not unfamiliar to me and is very classical issue in the statistics community.
Basically it is adressed in two ways (I oversimplify the topic to keep it simple):

A and B being two models, the one consider as the best is the one whose some criterion based on its fidelity to the data and its complexity (roughly the number of parameters -degrees of freedom- it contains) has the best value. Search for instance Akaike Information Criterion on the Web.
For any model we must distinguish between two types of errors.
The error most known measures the discrepancy to the data (the higher this error the lower the fidelity to the data): an example is the residual variance in linear regression. This error is name representation error.
Another error measures the capabiliy of this model to predict data that were not accounted for to fit the model. This error is name generalization error.
Between two (or more) competing models A and B the best one is the ine which realizes the best tradeoff between these two errors. As a rule the closest to the data (representation error) a model is the higher its generalization error, so the idea of some compromise.
The methods used to assess the generalization error are more or less all based on a splitting of the initial data set into a training base (on which the models will be fitted) and a test base (on which their generalization errors will be assessed).
Common splitting methods are Leave One Out, Leave k Out, Bootstrap to cite a few

More or less all model selection strategies use some of these ingredients.
I will tell you more in the paper I have in mind.
See you soon.

Thanks for your reply...

Answered: sand15 1344

December 14 2025

0 0

@C_R

Concening the "model selection" stuff: I'm working on it.
Same thing for a detailed comment about regression/interpolation and the different associated points of views.

It will take me a few to finalize this, see you soon.

@C_R You write "My requireme...

Answered: sand15 1344

December 13 2025

0 0

@C_R

You write "My requirement: An algorithm (not me) that searches for a model that fits a given data set best."

I understand what you likely have in mind saying this but I believe a little clarification is needed.

Indeed the term fit is unclear.
If you consider @Ronan proposal, CurveFitting will give you a continuous function which passes exactly to your (101) points, giving then a perfect representation of tha data.
If you consider @dharr proposal instead, a statistical fit (linear or not) will give you a continuous function which is close, in some sense (often in the least squares sense but other norms, or metrics, can be used) your data points (note that any statical linear model with 101 parameters passes exactly by yout 101 points).

This poses the question of what a good fit is.
From the representation point of view interpolation is perfect because the sum of the residuals "data minus model values" is 0, but classical interpolation methods have generally no generalization capabilities. This means that they may be absolutely incapable to predict what the experimental output would be for a new input, even if this input is close to an input in the data set used to build the interpolation model.

The only noticeable exception is Gaussian Process Emulation, aka Kriging, which is a statistical method that unfortunately Maple locates in the CurveFitting package.
This method enables controlling the prediction error out of the points used to construct (fit) the model.

The least squares point of view is a strategy where we satisfy ourselves with a model which simply passes reasonably close to the data points. As we do not force the model to "bend" itself to passes through these points we generally obtain smoother graphs than with interpolation (let's say with much smaller overshoots).
But this is to the detriment of the fidelity to the data for the residual sums of squre is no longer equal to zero as it was with interpolation models.

Advancet least squares point of view try to realize a balance between the fidelity to the data and a contained generalization error (a thinkg Kriging does naturally... provided it is correctly tunned, which is another issue). Stepwise regression, LASSO regression (you can browse those terms on Wikipedia if you are interested) are two methods among many to realize this balance.

So the core question is indeed "What a good fit is?"

_________________________________________________________________________

Let's skip this philosophical question and let's come back to your problem.

You wrote "My best guess for the dataset above is a periodic function with a modulated argument.".

Does this idea of function with a modulated argument come from your knowledge of the physical process which generates your data, or it is something which came to you by observing a scatterplot of these data?
This question does matter, because in the first case we must try and fir a function of some given structure, while in the second case one may feel free to propose something else (let's say, my best couldn't be yours).

In the attached file I propose you to fit, in the least squares sense, rational fractions to your data.
Indeed those models have two strong interesting properties: they are quite versatile, and they are linear under a proper reparameterization.
In the attached file the rational fraction R(x) to fit is coded [P, Q] where:

.

If P (resp. Q) is a nonegative integer, then numer(R(x)) (resp. denom(R(x)) ) is a dense polynomial of degree P (resp. Q) with indeterminate x.
If P (resp. Q) is a list of nonnegative integer, then numer(R(x)) (resp. denom(R(x)) ) is a sparse polynomial whose monomials have degrees in P (resp. Q)

You will see these rational fractions can fit your data reasonnably well provided the degrees P and Q are high enough

RationalFractionFit.mw

@C_R You write "My requireme...

Answered: sand15 1344

December 13 2025

0 0

@C_R

You write "My requirement: An algorithm (not me) that searches for a model that fits a given data set best."

I understand what you likely have in mind saying this but I believe a little clarification is needed.

Indeed the term fit is unclear.
If you consider @Ronan proposal, CurveFitting will give you a continuous function which passes exactly to your (101) points, giving then a perfect representation of tha data.
If you consider @dharr proposal instead, a statistical fit (linear or not) will give you a continuous function which is close, in some sense (often in the least squares sense but other norms, or metrics, can be used) your data points (note that any statical linear model with 101 parameters passes exactly by yout 101 points).

This poses the question of what a good fit is.
From the representation point of view interpolation is perfect because the sum of the residuals "data minus model values" is 0, but classical interpolation methods have generally no generalization capabilities. This means that they may be absolutely incapable to predict what the experimental output would be for a new input, even if this input is close to an input in the data set used to build the interpolation model.

The only noticeable exception is Gaussian Process Emulation, aka Kriging, which is a statistical method that unfortunately Maple locates in the CurveFitting package.
This method enables controlling the prediction error out of the points used to construct (fit) the model.

The least squares point of view is a strategy where we satisfy ourselves with a model which simply passes reasonably close to the data points. As we do not force the model to "bend" itself to passes through these points we generally obtain smoother graphs than with interpolation (let's say with much smaller overshoots).
But this is to the detriment of the fidelity to the data for the residual sums of squre is no longer equal to zero as it was with interpolation models.

Advancet least squares point of view try to realize a balance between the fidelity to the data and a contained generalization error (a thinkg Kriging does naturally... provided it is correctly tunned, which is another issue). Stepwise regression, LASSO regression (you can browse those terms on Wikipedia if you are interested) are two methods among many to realize this balance.

So the core question is indeed "What a good fit is?"

_________________________________________________________________________

Let's skip this philosophical question and let's come back to your problem.

You wrote "My best guess for the dataset above is a periodic function with a modulated argument.".

Does this idea of function with a modulated argument come from your knowledge of the physical process which generates your data, or it is something which came to you by observing a scatterplot of these data?
This question does matter, because in the first case we must try and fir a function of some given structure, while in the second case one may feel free to propose something else (let's say, my best couldn't be yours).

In the attached file I propose you to fit, in the least squares sense, rational fractions to your data.
Indeed those models have two strong interesting properties: they are quite versatile, and they are linear under a proper reparameterization.
In the attached file the rational fraction R(x) to fit is coded [P, Q] where:

.

If P (resp. Q) is a nonegative integer, then numer(R(x)) (resp. denom(R(x)) ) is a dense polynomial of degree P (resp. Q) with indeterminate x.
If P (resp. Q) is a list of nonnegative integer, then numer(R(x)) (resp. denom(R(x)) ) is a sparse polynomial whose monomials have degrees in P (resp. Q)

You will see these rational fractions can fit your data reasonnably well provided the degrees P and Q are high enough

RationalFractionFit.mw

You're right...

Answered: sand15 1344

December 13 2025

0 0

@dharr

I missed this fundamental point that the equations are evidently not independent.

@salim-barzani

Apologies for what I wrote in my initial answer: I should have paid more attention to the way your 17 equations were built and not speak from a general point of view as @dharr commented.

Both of you: here is a little attempt to provide an analysys of the solutions @dharr got. A few points

All the 28 solutions contain at least one equality of the form 'p = p' where p is any of the 8 parameters.
All the 28 solutions contain at least one equality of the form 'p = 0' where p is any of the 8 parameters.
All the 28 solutions depend on both epsilon, k, h.
I suggest droping the 'explicit' option in solve to get more synthetic representations of the solutions

analysis_sand15.mw

To complete dharr's comment...

Answered: sand15 1344

December 11 2025

0 0

Read carefully the attached file plume_work_sand15.mw to see what questions your problem rises and understand wfy pdsolve returns an error.

This said, my bet is your problem constitutes a model for atmospheric pollution due to the release of a pollutant plume.
Am I right?
If it is so those models are generally transient ones as yours, assuming X and Y are space co-ordinates is a stationnary one... and this is the main thing I don't understand.
See for instance this paper.

If you took your equations from some article can you provide us with?

If you solve a stationnary problem then, obviously, your boundary conditions are not correct (read the attached file)

Great!...

Answered: sand15 1344

December 09 2025

0 0

@dharr

I hadn't gone that far.
You should convert your comment into an answer so that I can vote for it.

Thanks...

Answered: sand15 1344

December 05 2025

0 0

@WD0HHU

for your comment

@WD0HHU @salim-barzani @acerI...

Answered: sand15 1344

December 05 2025

0 0

@WD0HHU @salim-barzani @acer

In case you would be interested in plotting Re(solnum) or Im(solnum) the same I did here is a worksheet that could suit you (please adapt it using the @acer's correction for Maple 2025)

plot-help_sand15_(2).mw

Here is the pattern which replicates periodically in the z direction (explanations within the attached worksheet)

And its z-periodization

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

1344 Reputation

15 Badges

MaplePrimes Activity

These are replies submitted by sand15

reply...

Got it!!!...

@C_R In your comments you use L as ...

Thanks for your last comment and your de...

RationalFractionFit with QR decompositio...

Thanks for your intervention...

Not yet the regression/interpolation dis...

Thanks for your reply...

@C_R You write "My requireme...

@C_R You write "My requireme...

You're right...

To complete dharr's comment...

Great!...

Thanks...

@WD0HHU @salim-barzani @acerI...

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again

From:
To:

Custom Message (optional):