Building a Set

I recently discovered a minor variation on the technique of building a set using a table. The purpose for using a table rather than inserting new items directly into a set is that, in a loop, the latter technique is O(n2) rather than O(n).  The way I would normally do this is to assign a counter and an empty table, and then, in a loop, compute the new element, increment the counter, and insert the element into the table at the counter index.  For example,

T := table();
cnt := 0:
while not finished() do
     x := compute();
     if acceptable(x) then
         cnt := cnt+1;
         T[cnt] := x;
     end if;
end do;
# convert T to a set:
S := {seq(T[i], i = 1..cnt)};

Because we are creating a set---so order is not important and terms occur once---a minor optimization is possible.  Rather than saving the value as an entry in the table, save it as an index, with a NULL entry. This eliminates the need for a counter.

T := table();
while not finished() do
    x := compute();
    if acceptable(x) then
          T[x] := NULL;
    end if;
end do;
S := {indices(T,'nolist')};

Note that with the previous method we could have used

S := {entries(T,'nolist')};

Comments

alec's picture

nolist

Joe,

Is nolist faster than map(op,{indices(T)}) or map(op,[entries(T)])?

Alec

Yes

Yes.  In order of speed, with fastest first

 {indices(T,'builtin')}
 {seq(T[i], i=1..cnt)}
 map(op,{indices(T)})

The difference isn't huge, say 5% between each. 

The chief appeal of using the index rather than the entry to save the value is that the code is slightly smaller; no need for allocating and incrementing a counter.  By the way, your route-finding post inspired this, I noticed that you used T[val] := val (or something similar), with val as a dummy index, thought that a clever way to avoid a counter, then realized the entry could be eliminated (made NULL).

alec's picture

Counter

Yes, I thought there about assigning it to NULL, too, but then I would have to use indices, and I tried to write a clean code (meaning that as an instructional example :) Both assigning to NULL and using indices would make code more obfuscated than I wanted. Generally, using indices seem like a good idea. I think, I saw that in someody's else post on this site recently - either Jacques Carette, or Robert Israel, or acer.

I've asked about map and op because it is used in convert/list. Using your version, with nolist, would be faster there.

Alec

clarity

I tend to use convert/list because it shows intent, and thus is clearer, unless there is a significant performance reason for doing otherwise.  On that grounds, this technique is questionable, because for most uses the performance gain is negligible.  Still, eliminating the need for a counter has a certain appeal.

Confusing wording

Interesting, but I was initially confused by your wording.   I thought for a minute you were claiming your modification itself changed O(n^2) to O(n) rather than what I assume you mean which is that using tables in some way is O(n) as opposed to incrementally building a set which is O(n^2).

By the way, if one can easily get some sort of upper bound for the number of entries, is it any quicker to use an Array rather than the table?

 

 

agreed

Yes, on rereading my opening sentence I agree that it is not clear.  As you correctly concluded, both methods using a table are O(n).

If you have an upper bound on the number of entries, using an rtable instead of a table with the "standard" method does use slightly less memory, however, I don't observe a difference in speed.

Addendum: I've edited the blog entry to clarify the meaing.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
}