Recently there was some discussion in the Maplesoft office about unisex baby names (that is, names that nearly as likely to belong to males as females). Whenever discussing names, I usually head to the US Social Security Administration's wonderful baby names site. They have data on the top 1000 male and female names for children born in the US each year for more than the last century (that includes about 80% of names). They slice the data a little by US state, and popular names for twins and such, but they do not include data on unisex names. So, I applied Maple to the task.

If I wanted data for just one year, I could just load the page in Firefox and save the HTML source. This is what I did initially. I then edited the HTML to get rid of everything except the table of names. I called the file "2006names.html" and then loaded it into Maple using XMLTools.

with(XMLTools):
# Parse the HTML table into Maple's internal XML format
xt := ParseFile("2006names.html"):

# break out the row and column elements of the table into a nested list
thedata := map(GetChildByName, GetChildByName(xt, "tr"), "td");

# remove empty elements
thedata := remove(x->x=[], thedata):

# convert the table elements to stings - all the XML is now gone:
thedata := map(x->map(y->TextText(GetChild(y,1)), x), thedata);

maletable := table(map([x[2] =
         parse(StringTools:-SubstituteAll(x[3], ",", ""))], thedata));
femaletable := table(map([x[4] =
         parse(StringTools:-SubstituteAll(x[5], ",", ""))], thedata));

Now, at this point, I had the name data in two tables with strings for indices and numbers as data.

maletable["Colin"];
                3850    

Now, all that is left is to combine the tables. I chose to count a name as "unisex" if there are no more than twice as many males with the name than females or vice versa. So, build a table collecting all such names:

unisextable := table();
for x in indices(maletable, 'nolist') do
    if assigned(femaletable[x]) and 
       min(femaletable[x],maletable[x]) / 
       max(femaletable[x],maletable[x]) > 1/2 
    then
        unisextable[x] := femaletable[x] + maletable[x];
    end if;
end do:

Just for fun, I decided to combine similar names as well using the Metaphone algorithm.

unisexhash := table();
#Combine Homophones
for x in indices(unisextable, 'nolist') do
    mp := StringTools:-Metaphone(x);
    if assigned(unisexhash[mp]) then
        unisexhash[mp] := [unisexhash[mp][1] + unisextable[x], 
                           unisexhash[mp][2] union {x}];
    else
        unisexhash[mp] := [unisextable[x], {x}];
    end if;
end do:

Now sort the unisex names in order of popularity:

unisexlist := sort([entries(unisexhash, 'nolist')], (x,y)->x[1]>y[1]):

and print the results:

i:=1:
for x in unisexlist do
    if nops(x[2]) = 1 then
        printf("#%2.d - %4.d - %s\n", i, x[1], x[2][]);
    else
        printf("#%2.d - %4.d - %s, %s\n", i, x[1], x[2][1], x[2][2]);
    end if;
    i:=i+1;
end do:

to get our list of the most popular unisex names in 2006:

# 1 - 9416 - Riley
# 2 - 8634 - Payton, Peyton
# 3 - 4168 - Dakota
# 4 - 2678 - Casey, Kasey
# 5 - 2306 - Skyler
# 6 - 1321 - Harley
# 7 - 1260 - Amari
# 8 - 1255 - Justice
# 9 - 1116 - Rowan
#10 - 1072 - Jaylin
#11 - 1065 - Jessie
#12 -  719 - Dominique
#13 -  598 - Armani
#14 -  508 - Finley
#15 -  484 - Shea

Of course, I was not satisfied with names from just one year. But, downloading and editing the HTML files by hand is a pain if you want more than a couple years. So, I used Maple's Sockets and StringTools packages to do the work for me. But, this post is getting lengthy, so I will save that for Part 2.


Please Wait...