mmcdara

6740 Reputation

18 Badges

8 years, 190 days

MaplePrimes Activity


These are Posts that have been published by mmcdara

I'm particularly interested in data analysis and more specifically in statistical analysis of computer code outputs.

One of the main activity of this very broad field is named Uncertainty Propagation. In a few words it consists in perturbing the inputs of a computational code in order to understand (and quantify) how these perturbations propagates through the outputs of this code.

At the core of uncertainty propagation is the ability to generate large numbers of "random" variations of the inputs. Knowing that these entries can be counted in tens, one sees that the first problem consists in generating "random" points in a space of potentially very large dimension.

Even among my mathematician colleagues an impressive number of them is completely ignorant of the way "random" numbers are generated. I guess that a lot of Mapleprimes' users are too. My purpose is not to give a course on this topic and the affording litterature is vast enough for everyone interested might find informations of any level of complexity.
Among those who have some knowledge about Pseudo Random Numbers Generators (PRNG), only a few of them know that a PRNG has to pass severe tests ("tests of randomness") before the streams of number it generates might  be qualified as "reasonably random" and therefore this PRNG might be released.

One of most famous example of a bad PRNG is given by "randu" (IBM 1966, and probably used in Fortran libraries during more than 30 years), this same PRNG that Knuth qualified himself as the "infamous generator".

These tests of randomness are generally gathered in dedicated libraries and Diehard is probably tone of the most known of them.
Diehard has originally been developed by George Marsaglia more than twenty years ago and it's still widely ued today.

I recently decided, not because I have doubts about the quality of the work done by Maplesoft, to test the Maple's PRNG named "Mersenne Twister". First, because it can do no harm to publish quantitative information that allows everyone to know that it is using a proven PRNG; second, because the (very simple) approach used here can fill the gaps I have mentioned above.

Mersenne Twister (often dubbed mt19937) is considered as a very good PRNG; it is used in a lot of applications (including finance where it is not so rare to sample input spaces of dimensions larger than 1000... ok I know, mt19937 is often considered as a poor candidate for cryptography applications, but it's not my concern here).

I have thus decided to spend some time to run the Diehard suite of tests on a sequence of integers numbers generated by RandomTools[MersenneTwister].


 

restart:


DIEHARD tests suite for Pseudo Random Numbers Generators (PRNG)

Reference: http://webhome.phy.duke.edu/~rgb/General/dieharder.php

The installation procedure (Mac OSX) can be found here
    https://gist.github.com/blixt/9abfafdd0ada0f4f6f26
or here
    http://macappstore.org/dieharder/

For other operating systems, please search on the web pages.


dieharder [-h]   # for inline help
dieharder -l      # to get the lists all the avaliable tests




A description of the many tests can be found here:
    https://en.wikipedia.org/wiki/Diehard_tests
    https://sites.google.com/site/astudyofentropy/background-information/the-tests/dieharder-test-descriptions
    https://www.stata.com/support/cert/diehard/randnumb_mt64.out

General theory about PRNG testing can be found here (a reference among many):
    http://liu.diva-portal.org/smash/get/diva2:740158/FULLTEXT01.pdf

or here (more oriented to the NIST test suite)
    https://www.random.org/analysis/Analysis2005.pdf
    https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-22r1a.pdf



In a terminal window execute the following commands for an exhaustive testing ("-a" option).
The "-g 202" option means that the generator is replaced by a text format input file
(use dieharder -h for more details).

cd //..../Desktop/DIEHARD

dieharder -g 202 -f SomeAsciiFile -a > //..../Desktop/DIEHARD/TheResultFile.txt

Be carefull, the complete testing takes several hours (about 5 on my computer)



__________________________________________________________________________________
 


Maple's Mersenne Twister Generator

Maple help page : RandomTools[MersenneTwister][GenerateInteger]
(see rincluded references to the Mersenne Twister PRNG).

Note: in the sequel this generator will be dubbed mt19937


The Mersenne Twister is implemented in many softwares.
It is higly likely that this PRNG (and the others these softwares propose) have been intensively
tested with one of the existing PRNG testing libraries.
Unfortunately only a few editors have made public the results of these tests (probably because
the implementation in itself is rarely questioned... but a code typo is always a possibility).

One exception is ths software STATA.
A summary of the results can be found here
   https://www.stata.com/support/cert/diehard/.
A complete description of the results of the tests passed is given here
   https://www.stata.com/support/cert/diehard/randnumb_mt64.out

The classical pattern of the performances of mt19937 can be found here

   http://www2.ic.uff.br/~celso/artigos/pjo6.ps.

and the table below comes from it (P means "Passed", F means "Failed"):


____________________________________________________________________________


In the Maple code below, a sequence of N UnsignedInt32 numbers is generated from the
Maple's Mersenne Twister and the result is exported in an ASCII file.
The Seed is set to 1 (SetState(state=1)) to compare, with a small value of N (let's say N=10)
the sequence produced by Maple's mt19937 with the the sequence of the same length generated
by Diehard's mt19937.
To generate this later sequence and save it in file Diehard_mt19937, just run in a terminan window
the command (-S 1 means "seed = 1", -t 10 means "a sequence of length 10"):
   dieharder -S 1 -B -o -t 10 > Diehard_mt19937

About the value of N:

In http://webhome.phy.duke.edu/~rgb/General/dieharder.php it's recommend that N be at least
equal to 2.5 million; STATA used N=3 million.
Other web sources say this value is too small.
For N=10 million the Maple's mt19937 doesn't pass the tests successfully.
I used here N=50 million (the resulting ASCII file has size 537 Mo).



Name of the input file.

The file generated by Maple is named Maple_mt19937_N=5e7.txt



One important thing is the preamble of a licit input file.

This preamble must have 6 lines (the value 10 right to count must be set to the value of N).
A licit preamble is of the form.

#==================================================================

# some text indicating the generator used

#==================================================================

type: d

count: 10

numbit: 32

As Maple_mt19937_N=5e7.txt is generated from an ExportMatrix command, this preamble is added
by hand.
 


Running multiple Diehard tests

To run the same tests used to qualify STATA's Mersenne Twister, open a terminal window,
go to the directory that contains input file Maple_mt19937_N=5e7.txt and run this script:

 for i in {0,1,2,3,4,8,9,10,11,12,13,14,15,16}; do

    dieharder -g 202 -f Maple_mt19937_N=5e7.txt -d $i >> Diehard___Maple_mt19937_N=5e7

 done ;

The results are then forked in the ASCII file Diehard___Maple_mt19937_N=5e7

 

with(RandomTools[MersenneTwister]):

dir := cat("/", currentdir(), "Desktop/DIEHARD/"):
InputFile := cat(dir, "Maple_mt19937_N=5e7.txt"):

SetState(state=1);

N := 5*10^7:

st := time():
S := convert([seq(GenerateUnsignedInt32(), i=1..N)], Matrix)^+;
time()-st;

S := Vector(4, {(1) = ` 50000000 x 1 `*Matrix, (2) = `Data Type: `*anything, (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})

 

84.526

(1)

st := time():
ExportMatrix(InputFile, S, format=rectangular, mode=ascii);
time()-st;

537066525

 

61.926

(2)


Diehard's results


Full test suite (about 5 hours of computational time)

Command :
dieharder -g 202 -f Maple_mt19937_N=5e7.txt -a > Diehard___ALL___Maple_mt19937_N=5e7


The results are compared to those obtained for Diehard's mt19937.
Two ways are used :

  - 1 - In a first stage one generates a stream of PRN and store it in an ASCII file (just as we did with Maple).
         The whole suite of tests is then run on this file.
         Commands (-g 013 codes for mt19937):

         dieharder -S 1 -g 013 -o -t 50000000 > Diehard_mt19937_N=5e7.txt
         dieharder -g 202 -f Diehard_mt19937_N=5e7.txt -a > Diehard___ALL___Diehard_mt19937_N=5e7



  - 2 - The whole suite is run by invoking directectly mt19937 "online"
         Commands :
         dieharder -S 1 -g 013 -t 50000000 -a > Diehard___ALL___Online


A UNIX diff command has been used to verify that the two files Maple_mt19937_N=5e7.txt and
 Diehard_mt19937_N=5e7.txt were identical (thet were).

Note that the Diehard doens't responds identically depending on the stream of random numbers comes from a file
or is generated online (this last [- 2 -] situation seems to give better results).-

Résumé (114 tests):
   - * - Maple's  and Diehard's  mt19937 respond exactly the same way when the stream of random
          numbers is read from an ASCII file (8 tests failed (******) and 6 weak (**)).
   - * - Diehard's  mt19937 fails 0 test and is weak on 4 tests when the stream is generated online
 

 

restart:

dir := currentdir():
FromMapleFile     := cat(dir, "Diehard___ALL___Maple_mt19937_N=5e7"):
FromDiehardFile   := cat(dir, "Diehard___ALL___diehard_mt19937_N=5e7"):
FromDiehardNoFile := cat(dir, "Diehard___ALL___Online"):


printf("                           ======================|======================|======================|\n"):
printf("                          |   From Maple's file  | From Diehard's File  | Diehard online test  |\n"):
printf("==========================|======================|======================|======================|\n"):
printf("          test       ntup | p.value   Assessment | p.value   Assessment | p.value   Assessment |\n"):
printf("==========================|======================|======================|======================|\n"):


for k from 1 to 9 do
  LMF  := readline(FromMapleFile):
  LDF  := readline(FromDiehardFile):
  LDNF := readline(FromDiehardNoFile):
end do:


while LMF <> 0 do
  if StringTools:-Search("|", LMF) > 0 then
    res := StringTools:-StringSplit(LMF, "|")[[1, 2, 5, 6]];
    printf("%-20s  %3d | %1.7f ", res[1], parse(res[2]), parse(res[3]));
      if StringTools:-Search("WEAK"  , res[4]) > 0 then printf("    **     |")
    elif StringTools:-Search("FAILED", res[4]) > 0 then printf("  ******   |")
    else printf("  PASSED   |")
    end if:
  end if:
  LMF  := readline(FromMapleFile):

  if StringTools:-Search("|", LDF) > 0 then
    res := StringTools:-StringSplit(LDF, "|")[[5, 6]];
    printf(" %1.7f ", parse(res[1]));
      if StringTools:-Search("  WEAK"  , res[2]) > 0 then printf("     **    |")
    elif StringTools:-Search("  FAILED", res[2]) > 0 then printf("   ******  |")
    else printf("   PASSED  |")
    end if:
  end if:
  LDF  := readline(FromDiehardFile):
                     
  if StringTools:-Search("|", LDNF) > 0 then
    res := StringTools:-StringSplit(LDNF, "|")[[5, 6]];
    printf(" %1.7f ", parse(res[1]));
      if StringTools:-Search("WEAK"  , res[2]) > 0 then printf("     **    |")
    elif StringTools:-Search("FAILED", res[2]) > 0 then printf("   ******    |")
    else printf("   PASSED  |")
    end if:
    printf("\n"):
  end if:
  LDNF := readline(FromDiehardNoFile):


end do:

                           ======================|======================|======================|
                          |   From Maple's file  | From Diehard's File  | Diehard online test  |
==========================|======================|======================|======================|
          test       ntup | p.value   Assessment | p.value   Assessment | p.value   Assessment |
==========================|======================|======================|======================|
   diehard_birthdays    0 | 0.9912651   PASSED   | 0.9912651    PASSED  | 0.8284550    PASSED  |
      diehard_operm5    0 | 0.1802226   PASSED   | 0.1802226    PASSED  | 0.5550587    PASSED  |
  diehard_rank_32x32    0 | 0.3099035   PASSED   | 0.3099035    PASSED  | 0.9575440    PASSED  |
    diehard_rank_6x8    0 | 0.2577249   PASSED   | 0.2577249    PASSED  | 0.3915666    PASSED  |
   diehard_bitstream    0 | 0.5519218   PASSED   | 0.5519218    PASSED  | 0.9999462      **    |
        diehard_opso    0 | 0.1456442   PASSED   | 0.1456442    PASSED  | 0.7906533    PASSED  |
        diehard_oqso    0 | 0.4882425   PASSED   | 0.4882425    PASSED  | 0.9574014    PASSED  |
         diehard_dna    0 | 0.0102880   PASSED   | 0.0102880    PASSED  | 0.5149193    PASSED  |
diehard_count_1s_str    0 | 0.1471956   PASSED   | 0.1471956    PASSED  | 0.9517290    PASSED  |
diehard_count_1s_byt    0 | 0.1158707   PASSED   | 0.1158707    PASSED  | 0.1568255    PASSED  |
 diehard_parking_lot    0 | 0.1148982   PASSED   | 0.1148982    PASSED  | 0.1611173    PASSED  |
    diehard_2dsphere    2 | 0.9122204   PASSED   | 0.9122204    PASSED  | 0.2056657    PASSED  |
    diehard_3dsphere    3 | 0.9385972   PASSED   | 0.9385972    PASSED  | 0.3620517    PASSED  |
     diehard_squeeze    0 | 0.2686977   PASSED   | 0.2686977    PASSED  | 0.8611266    PASSED  |
        diehard_sums    0 | 0.1602355   PASSED   | 0.1602355    PASSED  | 0.5103248    PASSED  |
        diehard_runs    0 | 0.1235328   PASSED   | 0.1235328    PASSED  | 0.9402086    PASSED  |
        diehard_runs    0 | 0.6341956   PASSED   | 0.6341956    PASSED  | 0.3274267    PASSED  |
       diehard_craps    0 | 0.0243605   PASSED   | 0.0243605    PASSED  | 0.1844482    PASSED  |
       diehard_craps    0 | 0.2952043   PASSED   | 0.2952043    PASSED  | 0.1407422    PASSED  |
 marsaglia_tsang_gcd    0 | 0.0000000   ******   | 0.0000000    ******  | 0.5840531    PASSED  |
 marsaglia_tsang_gcd    0 | 0.0000000   ******   | 0.0000000    ******  | 0.8055035    PASSED  |
         sts_monobit    1 | 0.9397218   PASSED   | 0.9397218    PASSED  | 0.9018886    PASSED  |
            sts_runs    2 | 0.8092469   PASSED   | 0.8092469    PASSED  | 0.2247600    PASSED  |
          sts_serial    1 | 0.2902851   PASSED   | 0.2902851    PASSED  | 0.9223063    PASSED  |
          sts_serial    2 | 0.9541680   PASSED   | 0.9541680    PASSED  | 0.6140772    PASSED  |
          sts_serial    3 | 0.4090798   PASSED   | 0.4090798    PASSED  | 0.2334754    PASSED  |
          sts_serial    3 | 0.5474851   PASSED   | 0.5474851    PASSED  | 0.7370361    PASSED  |
          sts_serial    4 | 0.7282286   PASSED   | 0.7282286    PASSED  | 0.2518826    PASSED  |
          sts_serial    4 | 0.9905724   PASSED   | 0.9905724    PASSED  | 0.6876253    PASSED  |
          sts_serial    5 | 0.8297711   PASSED   | 0.8297711    PASSED  | 0.2123014    PASSED  |
          sts_serial    5 | 0.9092172   PASSED   | 0.9092172    PASSED  | 0.3532615    PASSED  |
          sts_serial    6 | 0.4976615   PASSED   | 0.4976615    PASSED  | 0.9967160      **    |
          sts_serial    6 | 0.9853355   PASSED   | 0.9853355    PASSED  | 0.5537414    PASSED  |
          sts_serial    7 | 0.9675717   PASSED   | 0.9675717    PASSED  | 0.3804243    PASSED  |
          sts_serial    7 | 0.4446567   PASSED   | 0.4446567    PASSED  | 0.0923678    PASSED  |
          sts_serial    8 | 0.7254384   PASSED   | 0.7254384    PASSED  | 0.4544030    PASSED  |
          sts_serial    8 | 0.8984816   PASSED   | 0.8984816    PASSED  | 0.7501155    PASSED  |
          sts_serial    9 | 0.8255134   PASSED   | 0.8255134    PASSED  | 0.4260288    PASSED  |
          sts_serial    9 | 0.6609663   PASSED   | 0.6609663    PASSED  | 0.5622308    PASSED  |
          sts_serial   10 | 0.9984397     **     | 0.9984397      **    | 0.5789212    PASSED  |
          sts_serial   10 | 0.7987434   PASSED   | 0.7987434    PASSED  | 0.8599317    PASSED  |
          sts_serial   11 | 0.5552886   PASSED   | 0.5552886    PASSED  | 0.3546752    PASSED  |
          sts_serial   11 | 0.4417852   PASSED   | 0.4417852    PASSED  | 0.5042245    PASSED  |
          sts_serial   12 | 0.3843880   PASSED   | 0.3843880    PASSED  | 0.6723639    PASSED  |
          sts_serial   12 | 0.1514682   PASSED   | 0.1514682    PASSED  | 0.9428701    PASSED  |
          sts_serial   13 | 0.5396454   PASSED   | 0.5396454    PASSED  | 0.5793677    PASSED  |
          sts_serial   13 | 0.9497671   PASSED   | 0.9497671    PASSED  | 0.3370774    PASSED  |
          sts_serial   14 | 0.3616613   PASSED   | 0.3616613    PASSED  | 0.4372343    PASSED  |
          sts_serial   14 | 0.3996251   PASSED   | 0.3996251    PASSED  | 0.5185021    PASSED  |
          sts_serial   15 | 0.3847188   PASSED   | 0.3847188    PASSED  | 0.3188851    PASSED  |
          sts_serial   15 | 0.1012968   PASSED   | 0.1012968    PASSED  | 0.1631942    PASSED  |
          sts_serial   16 | 0.9974802     **     | 0.9974802      **    | 0.6645914    PASSED  |
          sts_serial   16 | 0.1157822   PASSED   | 0.1157822    PASSED  | 0.3465564    PASSED  |
         rgb_bitdist    1 | 0.4705599   PASSED   | 0.4705599    PASSED  | 0.8627740    PASSED  |
         rgb_bitdist    2 | 0.7578920   PASSED   | 0.7578920    PASSED  | 0.3296790    PASSED  |
         rgb_bitdist    3 | 0.9934502   PASSED   | 0.9934502    PASSED  | 0.5558012    PASSED  |
         rgb_bitdist    4 | 0.3674201   PASSED   | 0.3674201    PASSED  | 0.1607977    PASSED  |
         rgb_bitdist    5 | 0.7930273   PASSED   | 0.7930273    PASSED  | 0.9999802      **    |
         rgb_bitdist    6 | 0.8491477   PASSED   | 0.8491477    PASSED  | 0.3774760    PASSED  |
         rgb_bitdist    7 | 0.1537432   PASSED   | 0.1537432    PASSED  | 0.4715169    PASSED  |
         rgb_bitdist    8 | 0.9454030   PASSED   | 0.9454030    PASSED  | 0.9890644    PASSED  |
         rgb_bitdist    9 | 0.2017856   PASSED   | 0.2017856    PASSED  | 0.0571014    PASSED  |
         rgb_bitdist   10 | 0.9989305     **     | 0.9989305      **    | 0.4575834    PASSED  |
         rgb_bitdist   11 | 0.4441883   PASSED   | 0.4441883    PASSED  | 0.4960057    PASSED  |
         rgb_bitdist   12 | 0.7074388   PASSED   | 0.7074388    PASSED  | 0.6808850    PASSED  |
rgb_minimum_distance    2 | 0.9604056   PASSED   | 0.9604056    PASSED  | 0.8859729    PASSED  |
rgb_minimum_distance    3 | 0.5143592   PASSED   | 0.5143592    PASSED  | 0.3266204    PASSED  |
rgb_minimum_distance    4 | 0.3779106   PASSED   | 0.3779106    PASSED  | 0.3537417    PASSED  |
rgb_minimum_distance    5 | 0.4861264   PASSED   | 0.4861264    PASSED  | 0.9032057    PASSED  |
    rgb_permutations    2 | 0.9206310   PASSED   | 0.9206310    PASSED  | 0.8052940    PASSED  |
    rgb_permutations    3 | 0.9299743   PASSED   | 0.9299743    PASSED  | 0.2209750    PASSED  |
    rgb_permutations    4 | 0.8330345   PASSED   | 0.8330345    PASSED  | 0.5819945    PASSED  |
    rgb_permutations    5 | 0.2708879   PASSED   | 0.2708879    PASSED  | 0.9276941    PASSED  |
      rgb_lagged_sum    0 | 0.0794660   PASSED   | 0.0794660    PASSED  | 0.9918681    PASSED  |
      rgb_lagged_sum    1 | 0.5279555   PASSED   | 0.5279555    PASSED  | 0.1304600    PASSED  |
      rgb_lagged_sum    2 | 0.0433872   PASSED   | 0.0433872    PASSED  | 0.1149961    PASSED  |
      rgb_lagged_sum    3 | 0.0028004     **     | 0.0028004      **    | 0.2731577    PASSED  |
      rgb_lagged_sum    4 | 0.0000074     **     | 0.0000074      **    | 0.8978870    PASSED  |
      rgb_lagged_sum    5 | 0.1332411   PASSED   | 0.1332411    PASSED  | 0.2065880    PASSED  |
      rgb_lagged_sum    6 | 0.0412128   PASSED   | 0.0412128    PASSED  | 0.7611867    PASSED  |
      rgb_lagged_sum    7 | 0.0225446   PASSED   | 0.0225446    PASSED  | 0.4810145    PASSED  |
      rgb_lagged_sum    8 | 0.0087433   PASSED   | 0.0087433    PASSED  | 0.3120378    PASSED  |
      rgb_lagged_sum    9 | 0.0000000   ******   | 0.0000000    ******  | 0.1334315    PASSED  |
      rgb_lagged_sum   10 | 0.4147842   PASSED   | 0.4147842    PASSED  | 0.2334790    PASSED  |
      rgb_lagged_sum   11 | 0.0206564   PASSED   | 0.0206564    PASSED  | 0.6491578    PASSED  |
      rgb_lagged_sum   12 | 0.0755835   PASSED   | 0.0755835    PASSED  | 0.5332069    PASSED  |
      rgb_lagged_sum   13 | 0.3112028   PASSED   | 0.3112028    PASSED  | 0.4194447    PASSED  |
      rgb_lagged_sum   14 | 0.0000000   ******   | 0.0000000    ******  | 0.2584573    PASSED  |
      rgb_lagged_sum   15 | 0.0890059   PASSED   | 0.0890059    PASSED  | 0.0007064      **    |
      rgb_lagged_sum   16 | 0.2962076   PASSED   | 0.2962076    PASSED  | 0.1344984    PASSED  |
      rgb_lagged_sum   17 | 0.2696070   PASSED   | 0.2696070    PASSED  | 0.2242021    PASSED  |
      rgb_lagged_sum   18 | 0.0826388   PASSED   | 0.0826388    PASSED  | 0.0450341    PASSED  |
      rgb_lagged_sum   19 | 0.0000000   ******   | 0.0000000    ******  | 0.5508302    PASSED  |
      rgb_lagged_sum   20 | 0.0101437   PASSED   | 0.0101437    PASSED  | 0.4290150    PASSED  |
      rgb_lagged_sum   21 | 0.1417859   PASSED   | 0.1417859    PASSED  | 0.1624411    PASSED  |
      rgb_lagged_sum   22 | 0.0160264   PASSED   | 0.0160264    PASSED  | 0.5204838    PASSED  |
      rgb_lagged_sum   23 | 0.0535167   PASSED   | 0.0535167    PASSED  | 0.6571892    PASSED  |
      rgb_lagged_sum   24 | 0.0000000   ******   | 0.0000000    ******  | 0.8578906    PASSED  |
      rgb_lagged_sum   25 | 0.8453426   PASSED   | 0.8453426    PASSED  | 0.3568988    PASSED  |
      rgb_lagged_sum   26 | 0.2113484   PASSED   | 0.2113484    PASSED  | 0.9755715    PASSED  |
      rgb_lagged_sum   27 | 0.1903762   PASSED   | 0.1903762    PASSED  | 0.4356739    PASSED  |
      rgb_lagged_sum   28 | 0.0733066   PASSED   | 0.0733066    PASSED  | 0.8354990    PASSED  |
      rgb_lagged_sum   29 | 0.0000000   ******   | 0.0000000    ******  | 0.1716599    PASSED  |
      rgb_lagged_sum   30 | 0.0932124   PASSED   | 0.0932124    PASSED  | 0.0732090    PASSED  |
      rgb_lagged_sum   31 | 0.0000000   ******   | 0.0000000    ******  | 0.3497910    PASSED  |
      rgb_lagged_sum   32 | 0.0843455   PASSED   | 0.0843455    PASSED  | 0.5441949    PASSED  |
     rgb_kstest_test    0 | 0.4399862   PASSED   | 0.4399862    PASSED  | 0.9766581    PASSED  |
     dab_bytedistrib    0 | 0.0748312   PASSED   | 0.0748312    PASSED  | 0.7035800    PASSED  |
             dab_dct  256 | 0.0919474   PASSED   | 0.0919474    PASSED  | 0.3985889    PASSED  |
        dab_filltree   32 | 0.1227533   PASSED   | 0.1227533    PASSED  | 0.7390925    PASSED  |
        dab_filltree   32 | 0.6819630   PASSED   | 0.6819630    PASSED  | 0.1773611    PASSED  |
       dab_filltree2    0 | 0.1774773   PASSED   | 0.1774773    PASSED  | 0.2088828    PASSED  |
       dab_filltree2    1 | 0.1718216   PASSED   | 0.1718216    PASSED  | 0.2257006    PASSED  |
        dab_monobit2   12 | 0.9999881     **     | 0.9999881      **    | 0.8084149    PASSED  |

 

 


 

Download DIEHARD_test_of_MAPLE_MersenneTwister.mw

A lot of supplementary details are given in the attached file.
I let the readers discover by themselves if Maple's implementation of the Mersenne Twister PRNG is correct or not.
Beyond this exercise, I hope this work will be useful to people who could be tempted to test their own generator.

 

 

In the applications I am working on, the information are often represented by hierarchical tables (that is tables where some entries can also be tables, and so on).
To help people to understand how this information is organized, I have thought to representent this hierarchical table as a tree graph.
Once this graph built, it becomes very simple to find where a "terminal leaf", that is en entry which is no longer a table, is located in the original table (by location I mean the sequence of indices for which the entry is this "terminal leaf".

The code provided here is pretension free and I do not doubt a single second  that people here will be able to improve it.
I published it for i thought other people could face the same kind of problems that I do.


 

restart

with(GraphTheory):
interface(version);

`Standard Worksheet Interface, Maple 2015.2, Mac OS X, December 21 2015 Build ID 1097895`

(1)

gh := proc(T)
  global s, counter, types:
  local  i:
  if type(T, table) then
    for i in [indices(T, nolist)] do
      if type(T[i], table) then
         s := s, op(map(u -> [i, u], [indices(T[i], nolist)] ));
      else
         counter := counter+1:
         types   := types, _Z_||counter = whattype(T[i]);
         s       := s, [i, _Z_||counter];
      end if:
      thisproc(T[i]):
    end do:
  else
    return s
  end if:
end proc:

t := table([a1=[alpha=1, beta=2], a2=table([a21=2, a22=table([a221=x, a222=table([a2221={1, 2, 3}, a2222=Matrix(2, 2), a2223=u3, a2224=u4])])]), a3=table([a31=u, a32=v])]);

global s, counter, types:
s       := NULL:
counter := 0:
types   := NULL:

ghres := gh(t):
types := [types]:

t := table([a1 = [alpha = 1, beta = 2], a3 = table([a32 = v, a31 = u]), a2 = table([a22 = table([a222 = table([a2222 = (Matrix(2, 2, {(1, 1) = 0, (1, 2) = 0, (2, 1) = 0, (2, 2) = 0})), a2223 = u3, a2221 = {1, 2, 3}, a2224 = u4]), a221 = x]), a21 = 2])])

(2)


These 3 lines determine the set of edges of the form ['t', v], that are not been captured by procedure h.
They correspond to "first level" indices of table t (v in {a1, a2, a3} in the example above)

L := convert(op~(1, [ghres]), set):     
R := convert(op~(2, [ghres]), set):
FirstLevelEdges := map(u -> ['t', u], L union R minus R):


Complete the set of the edges, build the graph representation TG of table t and draw TG.

edges := convert~({ghres, FirstLevelEdges[]}, set):
TG := Graph(edges):

HighlightVertex(TG, Vertices(TG), white):
p := DrawGraph(TG, style=tree, root='t'):
 


The first line is used to change the the "terminal leaves" of names  _Z_n by their type.

eval(t);

p       := subs(types, p):
enlarge := plottools:-transform((x,y) -> [3*x, y]):

plots:-display(enlarge(p), size=[1000, 400]);

table([a1 = [alpha = 1, beta = 2], a3 = table([a32 = v, a31 = u]), a2 = table([a22 = table([a222 = table([a2222 = (Matrix(2, 2, {(1, 1) = 0, (1, 2) = 0, (2, 1) = 0, (2, 2) = 0})), a2223 = u3, a2221 = {1, 2, 3}, a2224 = u4]), a221 = x]), a21 = 2])])

 

 


This procedure is used to find the "indices path" to a terminal leaf.
FindLeaf is then applied to all the terminal leaves.

FindLeaf := proc(TG, leaf)
   local here:
   here := GraphTheory:-ShortestPath(TG, 't', leaf)[1..-2]:
   here := cat(convert(here[1], string), convert(here[2..-1], string)):
   here := StringTools:-SubstituteAll(here, ",", "]["):
   here := parse(here);
end proc:

# where is a2221

printf("%a\n", FindLeaf(TG, a2221));

t[a2][a22][a222]

 

 


 

Download Table_Unfolding_2.mw

 

Seeking for fast approximate formulas to compute (a huge number of) quantiles of a Gaussian random variable (here the standard one, but its extension to any Gaussian RV is straightforward), I found a few of them in the Abramowitz and Stegun book, page 933, relations 26.2.22 and 26.2.23.
Each approximation model is expressed as a rational fraction, the second one being the more accurate.
Each model depends on (respectively 4 and 6) parameters that are estimated (I guess it was done this way) through a least-square-like method.

See here for an online access http://people.math.sfu.ca/~cbm/aands/page_933.htm.

These approximation, and specially the most accurate one (formula 26.2.23) seem to be still widely used today(1) (see for instance https://www.johndcook.com/blog/normal_cdf_inverse/ ).

As an amusement I decided to compute the best fit by using the Statistics:-NonLinearFit procedure and a sample of (probability, quantile) points where probability ranges in [0.5, 1-1/1000] (the range used in formulas 26.2.22 and 26.2.23 is (0, 0.5] but this is not a point).
Surprisingly Statistics:-NonLinearFit returned, for the two formulas, parameter estimations substantially different from the one given in the Abramowitz & Stegun's book. A reason could be that the points I used when I did the fits weren't the one they used (unfortunately they give no informations about this).

More interesting, whatever the formula I refitted,  NonLinearFit produced an approximation whose the absolute error was smaller by about two orders of magnitude to the onesprovided by Abramowitz and Stegun.
For instance they wrote that the most accurate formula (26.2.23) had an absolute approximation error less than 4.5*10-4 as I obtained a value around 10-6!

(1) To get an idea of the persistence of the use of the formula 26.2.23, just type the value 2.515517 of its parameter c[0] in any search engine.


In the plots below the gray rectangle refers to the region where the approximate ICDF is used for extrapolation.
 

restart:

with(Statistics):

cdf := unapply(evalf(CDF(Normal(0, 1), x)), x):
X   := [seq(0..5, 0.1)]:
A   := cdf~(X):
T  := alpha -> sqrt(-2*log(1-alpha)):
q  := Quantile~(Normal(0, 1), A):
Aq := convert([A,q], Matrix)^+:

r := 1:

J := z -> z - add(a__||k*z^k, k=0..r)/(1+add(b__||k*z^k, k=1..r+1)):


model  := J(T(alpha)):
NL_fit := unapply(NonlinearFit(model, Aq, alpha), alpha);


# these lines are for estimating the performances
B  := Sample(Uniform(0.5, 1), 10^4):
CodeTools:-Usage(Quantile~(Normal(0, 1), B)):
CodeTools:-Usage(Quantile~(Normal(0, 1), B, numeric)):
CodeTools:-Usage(NL_fit~(B)):
#-----------------------------------------------------
Y  := [seq(0..6, 0.01)]:
B  := cdf~(Y):
R1 := Quantile~(Normal(0, 1), B, numeric):
R2 := NL_fit~(B):

plots:-display(
  ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),
  plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

proc (alpha) options operator, arrow; (-2*ln(1-alpha))^(1/2)-(HFloat(2.5454311687345044)+HFloat(0.8058592540791468)*(-2*ln(1-alpha))^(1/2))/(1+HFloat(1.4689746699940707)*(-2*ln(1-alpha))^(1/2)-HFloat(0.34455942407858625)*ln(1-alpha)) end proc

 

memory used=170.31MiB, alloc change=76.01MiB, cpu time=3.06s, real time=3.05s, gc time=54.87ms

memory used=171.59MiB, alloc change=256.00MiB, cpu time=3.12s, real time=3.03s, gc time=154.77ms

memory used=8.24MiB, alloc change=0 bytes, cpu time=95.00ms, real time=95.00ms, gc time=0ns

 

 

r := 2:

 
J := z -> z - add(a__||k*z^k, k=0..r)/(1+add(b__||k*z^k, k=1..r+1)):


model  := J(T(alpha)):
NL_fit := unapply(NonlinearFit(model, Aq, alpha), alpha);


# these lines are for estimating the performances
B  := Sample(Uniform(0.5, 1), 10^4):
CodeTools:-Usage(Quantile~(Normal(0, 1), B)):
CodeTools:-Usage(Quantile~(Normal(0, 1), B, numeric)):
CodeTools:-Usage(NL_fit~(B)):
#-----------------------------------------------------


Y  := [seq(0..6, 0.01)]:
B  := cdf~(Y):
R1 := Quantile~(Normal(0, 1), B, numeric):
R2 := NL_fit~(B):

plots:-display(
  ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),
  plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

proc (alpha) options operator, arrow; (-2*ln(1-alpha))^(1/2)-(HFloat(2.9637294443959394)+HFloat(4.527738737327481)*(-2*ln(1-alpha))^(1/2)-HFloat(0.9571637188191973)*ln(1-alpha))/(1+HFloat(3.472400103322335)*(-2*ln(1-alpha))^(1/2)-HFloat(3.426536241250657)*ln(1-alpha)+HFloat(0.08875278252087411)*(-2*ln(1-alpha))^(3/2)) end proc

 

memory used=170.09MiB, alloc change=32.00MiB, cpu time=3.29s, real time=3.11s, gc time=268.60ms

memory used=170.85MiB, alloc change=0 bytes, cpu time=3.23s, real time=3.10s, gc time=201.52ms
memory used=10.76MiB, alloc change=0 bytes, cpu time=127.00ms, real time=127.00ms, gc time=0ns

 

 

# Optimized "r=2" computation

z_fit := simplify(subs(alpha=-exp(-(1/2)*z^2)+1, NL_fit(alpha))) assuming z > 0:
z_fit := unapply(convert~(%, horner), z);

p := proc(alpha)
  local z:
  z := sqrt(-2*log(1-alpha)):
  z_fit(z):
end proc:

R3 := CodeTools:-Usage(p~(B)):

plots:-display(
  ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),
  plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

proc (z) options operator, arrow; (-2.963729444+(-3.527738737+(2.993818244+(1.713268121+0.8875278252e-1*z)*z)*z)*z)/(1.+(3.472400103+(1.713268121+0.8875278252e-1*z)*z)*z) end proc

 

memory used=1.67MiB, alloc change=0 bytes, cpu time=14.00ms, real time=15.00ms, gc time=0ns

 

 


AS stands for Abramowith & Stegun

J_AS := unapply(normal(eval(J(t), [a__0=2.515517, a__1=0.802853, a__2=0.010328, b__1=1.432788, b__2=0.189269, b__3=0.001308])), t):
J_AS(t);


# for comparison:

print():
z_fit := simplify(subs(alpha=-exp(-(1/2)*z^2)+1, NL_fit(alpha))) assuming z > 0:
map(sort, %, z);

plot([z_fit(z), J_AS(z)], z=0.5..1, color=[blue, red], legend=[mmcdara, Abramowitz_Stegun], gridlines=true);

print():
R2_AS := CodeTools:-Usage(J_AS~(T~(B))):
print():


plots:-display(
  ScatterPlot(R1, log[10]~(abs~(R2_AS-~R1)), legend=Abramowitz_Stegun, gridlines=true, size=[700, 400]),
  ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red),
  plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

(0.1308000000e-2*t^4+.1892690000*t^3+1.422460000*t^2+.1971470000*t-2.515517000)/(0.1308000000e-2*t^3+.1892690000*t^2+1.432788000*t+1.)

 

 

(0.8875278252e-1*z^4+1.713268121*z^3+2.993818244*z^2-3.527738737*z-2.963729444)/(0.8875278252e-1*z^3+1.713268121*z^2+3.472400103*z+1.)

 

 

 

memory used=2.92MiB, alloc change=0 bytes, cpu time=25.00ms, real time=25.00ms, gc time=0ns

 

 

 


 

Download InverseNormalCDF.mw

 

 

1 2 3 4 5 6 Page 6 of 6