sortLL1 := proc(L::listlist(nonnegint))
local B,X,n,x,j;
B := max(map(op,L)[])+1;
n := nops(L[1]);
X := [seq(B^(n-j), j=1..n)];
SORT_W_ATTR(Float(inner(X,x)), x, x=L, `<`);
end proc:
The variable B is assigned one more than the largest integer in L, the listlist of integers. The variable X is assigned the list [Bn-1,Bn-2,...,B,1], which is dot multiplied (using the undocumented builtin procedure, inner) with each sublist in L to compute the key. Because B is larger than the largest integer in L, the computed key is guaranteed to be injective and monotonic. SORT_W_ATTR does the actual sorting. It uses the technique of sorting with attributes, which is fast because it does not need to compute the inverse of the scalar.
If you do not use tty maple, you can manually expand the macro in the assignment to sortLL1. The equivalent line is$define SORT_W_ATTR(k,x,e,F) \ map(attributes, sort([seq(setattribute(k,x),e)],F))
map(attributes, sort([seq(setattribute(Float(inner(X,x)),x),x=L)],`<`));
While this method is impressively fast, it has a practical problem. As B and n (the length of the sublists) increase, the time required to compute the scalar product increase. Theoretically one would expect this to be O(Bn m log m), where m is the number of sublists in L.
Can we do better? My first thought was to improve the vector, X, used in the dot product. Using powers of B for all the components is too conservative. Here is a modified approach that uses the maximum value in each column (think of L as a matrix) to form the base for each component in X.
sortLL2 := proc(L::listlist(nonnegint))
local M,n,B,X,C,k,x;
M := map(lst -> max(op(lst)), `kernel/transpose`(L));
n := nops(M);
B := rtable(1..n);
C := 0;
for k from n to 1 by -1 do
B[k] := C + 1;
C := M[k]*B[k] + C;
end do;
X := [seq(B[k], k=1..n)];
SORT_W_ATTR(Float(inner(X,x)), x, x=L, `<`);
end proc:
While it takes longer to compute this X, that is only done once, so the resulting speed-up in the scalar products for large lists can be significant. Alas, for uniformly distributed random integers, the effect is miniscule.
After reflecting on this during my daily walk, I came up with a simpler way to compute an appropriate key, using string catenation rather than arithmetic. The following example shows how a listlist of positive integers is converted to a corresponding list of strings.
Here is the code that implements this conversion.[ [ 3, 400, 2 ] → "103" "1400" "1002" → "10314001002" , [ 20, 31, 203 ] → "120" "1031" "1203" → "12010311203" , [ 0, 12, 12 ] → "100" "1012" "1012" → "10010121012" ]
sortLL3 := proc(L::listlist(nonnegint))
local X,x,n,i;
X := map(x -> 10^length(max(op(x))), `kernel/transpose`(L));
n := nops(X);
SORT_W_ATTR(cat("", seq(X[i]+x[i], i=1..n)), x, x=L, 'lexorder');
end proc:
A minor improvement can be made
by using Maple's ability to directly add sequences, term by term:
sortLL4 := proc(L::listlist(nonnegint))
local X,x;
X := seq(10^length(max(op(x))), x = `kernel/transpose`(L));
SORT_W_ATTR(cat("", X+x[]), x, x=L, 'lexorder');
end proc:
sortLL5 := proc(L::listlist(integer))
local X,x;
X := seq(2*10^length(max((-min,max)(op(x)))), x = `kernel/transpose`(L));
SORT_W_ATTR(cat("", X+x[]), x, x=L, 'lexorder');
end proc:
sortLL4([[3,4],[1,2]]);
[[1,2],[3,4]]
attributes("1112");
[1,2]
If two threads called sortLL5 concurrently, a string attributed by one thread could be changed by the other. The call to map(attributes, . ) would then insert a sublist from one thread into the output of the other.
Currently, very few Maple routines use threads, and not all the existing code is thread-safe. However, in the future, as multicore devices become popular, I expect that thread-safety will be important.
The Threads package provides mutexes, which can prevent collisions in data structures however, in this case, because strings are inherently global, there is no robust way to use mutex's to avoid collisions. To avoid the thread-safety issue, attributes must not be applied to strings.
sortLL6 := proc(L::listlist(integer))
local X,x;
X := seq(2*10^length(max((-min,max)(op(x)))), x = `kernel/transpose`(L));
SORT_W_ATTR([cat("", X+x[])], x, x=L, 'lexorder'[1]);
end proc:
Alas, this modification reduces the speed of the routine by about a factor of 2,
compared to the non thread-safe version. However, it is still faster,
and uses less memory, than sortLL1.
Here are plots comparing the time and memory usage of each routine. Integers are uniformly distributed from 1 to max.
Fix n=5 (number of integers in each list)
max=10^5
Vary m from 10 to 10^3
Fix m=100 (number of sublists)
max=10^5
Vary n from 1 to 100
Fix m=100 (number of sublists)
n=5 (number of integers in each list)
Vary max from 10 to 10^4
These plots were created using the Tine package. Here is the Maple Code used to assign and measure the procedures described here.
Comments
sortLL6 is nearly perfect!
Joe,
I'm quite impressed by sortLL6, as it nearly does what I've been needing, namely, a procedure which will sort a list of lists of floats. Can your procedure be adapted to do this? Also, how would one sort on a position other than the first in the nested lists? For instance, given
foo := [[-50001, 1], [61, -99], [303, 4]]
how would one sort on the index 2 position to give
[[61, -99], [-50001, 1], [303, 4]] ?
Once again, thanks for the extremely clever and efficient solution to sorting integer listlists.
Cheers,
Eric.
Partial answer
I'm not sure how you want to sort the list of lists of floats. Could you give an example?
Sorting a list of lists using a particular element as the key is easy. To make it fast use sorting with attributes. Here is how you can do that with your given example,
soarting floats
Joe,
I have a MonteCarlo model which generates large (>2^17) lists of lists of pairs of floats. Here is a tiny example:
[
[2.224000000, 2.018000000],
[3.184000000, 2.034000000],
[6.240000000, 1.876000000],
[2.824000000, 3.066000000],
[4.088000000, 4.030000000]
]
I would like to do an ascending numerical sort of the listlist, indexing on a position in the list element, [float,float]. Sample output sorting on the first position would be:
[
[2.224000000, 2.018000000],
[2.824000000, 3.066000000],
[3.184000000, 2.034000000],
[4.088000000, 4.030000000],
[6.240000000, 1.876000000]
]
and a sample using the second position would be:
[
[6.240000000, 1.876000000],
[2.224000000, 2.018000000],
[3.184000000, 2.034000000],
[2.824000000, 3.066000000],
[4.088000000, 4.030000000]
]
Ideally, a procedure which could be called as follows would be ideal:
:=sortLL(,),
where elements are lists of floats, and is the index of the element in the float list on which the sort will be done. The float lists need not be confined to just 2 elements, as in my example, but could be of arbitrary length. That length, however, should be the same for all float lists in the particular listlist being sorted.
Thanks for your example of how to select a different index. That answers part of the question, but I still am unclear how to procede with float instead of integer.
Cheers,
Eric.
sorting floats (correction)
Joe,
I used angle braces in my last post, and they got swallowed by the blog formatter. The last part should read as follows:
=============
Ideally, a procedure which could be called as follows would be ideal:
:=sortLL("listlist","position"),
where "listlist" elements are lists of floats, and "position" is the index of the element in the float list on which the sort will be done. The float lists need not be confined to just 2 elements, as in my example, but could be of arbitrary length. That length, however, should be the same for all float lists in the particular listlist being sorted.
...
=============
Thanks,
Eric.
my "sorting floats" solution
Joe,
After reading the manual more thoroughly on "sort", I came up with the following:
sortLLe := proc ( L::( listlist(numeric) ), i )
# This procedure sorts a list of lists of numbers in ascending order;
# i is the sort index within the list element.
sort( L, (a, b) -> evalb(a[i] < b[i] ) );
end proc;
It seems to do the job, although it's probably not as fast as any of the solutions you've given.
Cheers,
Eric.
nice
That will, indeed, work. As you suspect, it is not the fastest. The principal reason that it is slow is that every comparison, and there are typically n*log(n) of them, requires a call to the user procedure (x,y) -> evalb(x[i]<y[i]).
The searching-with-attributes that I mentioned above avoids that by first extracting the key from the list than using the builtin sort, which doesn't require an external comparator. The trick is that the computed key for each row (the value in the selected position) is tagged (attributed) with the entire row. After sorting the keys we merely extract the attributes.
Conceptually a better way to do this is to use an inplace-sort on an Array of floats; that can be accomplished with external compiled routines and should be quite a bit faster. I'm looking into doing that.
"slicing" a sorted list
Joe,
This is related to our previous discussion regarding sorted listlists.
Given a sorted list of numeric lists, I would like to slice it into two lists: one comprising the left half of the sorted listlist in which an element of the numeric list element is less than a certain value, and the other, the right half remaining.
I have been using a loop to do this, but, as you might guess, it's exceedingly slow:
L:=NULL:
R:=NULL:
for i from 1 to Sall do
if (S[i][2]>=cutoff) then
R:=R,S[i]:
fi:
od:
I am sure that there are tools in Maple which would allow me to do this more simply and faster. Can you suggest them?
Thanks,
Eric.
Array or Matrix
Eric, I suggest that you abandon using lists of lists for this. I would suggest using either an Array or a Matrix, and perhaps with float[8] datatype for maximal speed and minimal memory footprint.
For one thing, lists of lists are inherently bad for sorting, due to immutability. (Sorting lists of lists via attributes is neat, but the very fact that acrobatics are required makes it suspect.)
Extraction of columns or rows of a Matrix/Array is easy -- there are several nice ways and even nice syntax. If you sort a column, and store the permutations as a side-effect, then you can apply those permutations to all the Array/Matrix at once. There are easy ways to do that too, with nice syntax.
And, if the Array/Matrix is float[8] datatype then you can using Maple's Compiler on procedures that you write to do these tasks. Such a compiled routine could be the fastest approach.
Slicing an Array/Matrix (by sorted values in a named column, or not) is also very easy and relatively fast.
acer
"slicing" a sorted list
Joe,
This is related to our previous discussion regarding sorted listlists.
Given a sorted list of numeric lists, I would like to slice it into two lists: one comprising the left half of the sorted listlist in which an element of the numeric list element is less than a certain value, and the other, the right half remaining.
I have been using a loop to do this, but, as you might guess, it's exceedingly slow:
L:=NULL:
R:=NULL:
for i from 1 to Sall do
if (S[i][2]<=cutoff) then
R:=R,S[i]:
else
L:=L,S[i]:
fi:
od:
I am sure that there are tools in Maple which would allow me to do this more simply and faster. Can you suggest them?
Thanks,
Eric.
select and remove
This sounds like a job for hree of my favorite time-saving Maple commands: select, remove, or selectremove. One of the benefits of these commands is that they do not assume the original data is sorted.
Here's an example of how they can be used in a problem like yours.
First generate some data
Then, separate the list into those elements whose second element is less than 5 or greater than or equal to 5
The key to using select is coming up with the correct test for deciding if each element is selected or removed. In many cases this can be a builtin Maple command, such as isprime or has. For example, to select the real roots of a solution, I do the following:
I hope this helps with your problem,
Doug
explanans
Doug's solution, using selectremove, is the proper approach. I'm responding mainly to point out the reason that your approach is so slow. It isn't so much because it uses a do loop, but rather because you are building up a sequence (two sequences) term by term in the loop. That is, as the loop progresses, R becomes
Each new sequence must be created and stored. For a final sequence of n terms, this is O(n^2) in time and memory usage. The standard method for avoiding this is to add the terms to a table and then, when complete, convert the table to a list. For example
Of course, using selectremove is the better approach here...