Question: Use of 2-D FFT of 'large' matrices: inefficient?!

Here is a test: 

For small matrices, apart from the first call, the performance is almost perfect (🎉!). 
As a comparison, an equivalent test may be performed in modern Python:

As you can see, for 1024×1024, 2048×2048, 4096×4096, 8192×8192, and 16384×16384 matrices, Maple's performance gets pretty poor. Is the FFT procedure not well optimized for larger matrices? I do have read the Fourier Transforms in Maple, yet I cannot find any information on this subject. 
In accordance with the following output 

showstat(DiscreteTransforms::FFT_complex8, 3):

FFT_complex8 := proc()
       ...
   3   :-DiscreteTransforms:-FFT_complex8 := LinkExternal('hw_FFT',2003);
       ...
end proc

it appears that the code hasn't been developed for 20 years. Is it possible to improve the performance of the FFT built into Maple in order that the computation on such a 2¹⁴×2¹⁴ matrix can be achieved in about twenty seconds (rather than in two minutes)?

Note. For these matrices, exact transform results (see below) can be obtained symbolically.

for n from 0 to 12 do
    m := LinearAlgebra:-HankelMatrix(<$ (1 + 1 .. 2**n + 2**n)>, datatype = complex[8], shape = []): gc();
    print(n, andseq(abs(_) < HFloat(1, -10, 2), _ in SignalProcessing:-FFT(m, normalization = none, inplace = true) - Matrix(2^n, <2^(2*n)*(2^n + 1), <'2^(2*n - 1)*(:-cot((k - 1)/2^n*Pi)*I - 1)' $ 'k' = 1 + 1 .. 2^n>>, shape = symmetric, storage = sparse, datatype = complex[8])) (* faster than `rtable_scanblock` and `ArrayTools:-IsZero` and much faster (🎊!) than `comparray` and `verify/Matrix` with testfloat *) )
od:
 = 
                            0, true

                            1, true

                            2, true

                            3, true

                            4, true

                            5, true

                            6, true

                            7, true

                            8, true

                            9, true

                            10, true

                            11, true

                            12, true

However, the main goal is to test the numerical efficiency of Maple's fast Fourier transform algorithm.

Please Wait...