dr_whom: (Default)
[personal profile] dr_whom
So suppose I'm interested in studying (or perhaps more likely, assigning my students to study) popular androgynous names—i.e., names that belong to both a lot of males and a lot of females, for some values of "a lot". Suppose for instance I'm interested in questions like "have androgynous names gotten more popular over time?" or "what are the phonological characteristics that make a name likely to be a popular androgynous name?"

I could look at the variables of popularity and androgyny independently. For instance, I could look at the top n most popular names, and see how many of them seem to be above some threshold for androgyny: e.g., in 2013 data, Avery is the 33d most popular name at about 11,000 individuals, and it's 82% female and 18% male; Jordan is 59th at 8,000 individuals, and 15% female and 85% male, and so on. Doing this, setting somewhat arbitrary cutoffs, there are 50 names with more than 1000 individuals of whom no more than 90% are of the same sex.

But I find that a little unsatisfying. First of all, it requires two arbitrary cutoffs. Second, there's no direct intercomparability between names. Should Jordan be regarded as a stronger or weaker contribution to the inventory of "popular androgynous" names than, say, Charlie (given name, not nickname), which is substantially more androgynous (46% female) but substantially less popular in total (2900 individuals)? If an androgynous name becomes more popular over time but less gender-balanced, does that mean the popularity of androgynous naming is increasing or decreasing?

To simplify questions like these, I'd like to have a composite index of some kind—to have a single quantity which measures the extent to which a name is both androgynous and popular.

Lieberson et al. (2000) quantify the prevalence of androgynous naming over the population as a whole by means of the following computation: for each girl, calculate the percentage of people sharing her name who are boys, and then average that figure over all the girls in the population. (Or vice versa.) This calculation for the population as a whole immediately suggests a combined androgyny-cum-popularity index for each individual name: instead of averaging over the population as a whole, you sum over the number of bearers of each name and get fm/(f+m), where f and m are the number of girls and boys with each name. (This probably wants to be scaled to normalize with respect to the number of individuals in the data, so that different years of birth can be directly compared.)

This has the obviously desirable properties for a composite index: it's symmetrical with respect to m and f, and it increases as f+m increases (holding f/m constant) and as f/m approaches 1 (holding f+m constant). In the 2013 data, the most androgynous-popular names by this measure are Riley (f = 4900, m=2500), Avery (f=9100, m=2000), and Peyton (f=4500, m=1800), which all seem like reasonably good candidates for both popular and relatively gender-balanced.

Looking over the data as a whole, though, I feel like this formula gives popular a little too much weight relative to androgynous. In 1983 data, seemingly obviously masculine names like Michael, David, Matthew, and Christopher are in the top 25 for fm/(f+m) values—and not because they're much more androgynous than they look, but just because they're so popular as to compensate for the tininess of the fraction of girls with the same names, each well under 1%. I mean, I went to high school with a girl named Michael who was born in 1982, so I know some of these are real people, but I wouldn't be surprised if more of these numbers are due to typos and input errors than actual girls with these names. So this is just a gut feeling, but I'd be happier with a formula that's less susceptible to small fractions of large names.

So the formula I want is probably something like (f+m)e–(log(f/m))^2—the exponential just converts the ratio of f to m into a scale of gender-balancedness from 0 to 1 in a smooth way, and then we multiply that by raw popularity. This formula gives Avery a higher index than Riley (despite it being more popular and less gender-balanced), but at least it kicks Michael, Christopher, David, and Matthew out of the top 25 in 1983 (they're all still in the top 100, though).

So. Do you think I'm on the right track for a good composite index to use? Should I be using a composite index at all? Should I suck it up and admit the 266 girls named "John" in 1983 make it an androgynous name? Any other suggestions?

Date: 2014-10-01 06:16 am (UTC)
From: [identity profile] dumble.livejournal.com
I doubt that a composite index is really a good idea here. I don't know exactly what you're going to do with these names, but popularity and androgyny will likely have different effects on whatever it is, and moreover, there could be some interesting relationship between popularity and androgyny themselves that you'd miss out on by amalgamating them (e.g. maybe more popular names tend to be either more or less androgynous).

I've been working with something similar lately with my binomial expressions, which have (what I call) both an "overall frequency" (how often do you form a binomial with these two words in whatever order) and a "relative frequency" for each order (what proportion of the time a binomial appears in the given order). There's an interesting relationship between overall frequency and relative frequency (namely that more overall-frequent expressions tend to be more polarized), and these two measures also interact non-trivially in their effects on other behavioral measures.

I'd be glad to talk about this in more detail if you want!

Date: 2014-10-01 10:18 pm (UTC)
From: [identity profile] dumble.livejournal.com
If the immediate goal is just to come up with a list of popular androgynous names, I think your first ideas (using thresholds for both popularity and androgyny) is the way to go. Even if you have a composite index, you're still going to have to set some arbitrary threshold for what values of that index constitute "popular-androgynous" enough.

Date: 2014-10-01 12:00 pm (UTC)
From: [identity profile] midnight-sidhe.livejournal.com
How far back are you going with this? Are you going to have them start with the Merediths, or earlier, or later?

One of the things that has always fascinated me about this is the way the definition of which phonological characteristics make a name "androgynous" have definitely changed, even though a lot of the semantic-ish categories haven't.

Date: 2014-10-01 10:25 pm (UTC)
From: [identity profile] midnight-sidhe.livejournal.com
If you decide to stick with the very recent things, be sure to make it clear to the students that this is a study of increasing popularity in a certain set of androgynous names, rather than the phenomenon of androgyny.

I'd be really interested to see a long-term study of this -- it's clearly been a thing since sometime in the early twentieth century at least.

Date: 2014-10-03 07:46 pm (UTC)
From: [identity profile] miraclaire.livejournal.com
I'd love to see a long-term study of this, too!

Date: 2014-10-01 03:45 pm (UTC)
From: [identity profile] jason brodsky (from livejournal.com)
I think fm/(f+m)^2 is a very reasonable measure of androgyny, since that's the uncertainty on anyone's guess whether a given Avery or Michael is m/f. That seems like a good social definition of the androgyny of the name.

I think the problem kicks in when you multiply it by the popularity of the name. I'm not sure your search really considers two names equivalent if one is twice as popular but half as androgynous.

One really simple measure that has the desirable properties you mention but seems a bit wonky in other ways is: what names have the largest population of their minority gender? The least desirable property is that if 1000 female Rileys changed their name to Sue, Riley wouldn't lose any of this metric. That said: if 1M Sues changed to Rileys, we probably don't want Riley to go up.

Other vaguer thoughts: can you calibrate the number of typos in the database using a name that really truly is only given to one gender (e.g., by targeting a demographic unlikely to get inventive about names)? If so, that number can probably adjust the fm/(f+m) metric in a way that kicks Michael out of the top 25.

Date: 2014-10-02 02:16 am (UTC)
From: (Anonymous)
I don't think there is any principled way to choose one formula for the composite over another. If I had to choose one, your
(f+m) exp(-log(f/m)^2) seems pretty good: I definitely think you want something of the form (f+m) g(f/m), with g(f/m) = g(m/f) and g decaying rapidly as f/m goes to 0 or infinity. Your choice of g(x) = exp(-(log x)^2) seems more natural than x/(1+x)^2, which was your first suggestion: Gaussians are common everywhere, and the rapid decay as x goes away from 1 seems good to me. Note, though, that (f+m) exp(-a log(f/m)^2) is just as well motivated for any other positive a.

Here is what I would do instead. Go through the top 1000 (or whatever you have data for) names and find the records for androgyny. (fake data follows)

Name # of occurences (f+m) androgyny ratio (min(f/m, m/f))

Noah 1000000 0.01
Sue 900000 0.05
...
Peyton 7300 0.4
...
Yasha 20 1

So, as you go down the table, the popularity declines and the androgyny ratio increases. Any name N which is both less popular and less androgynous than some other name N' get's discarded.

You have now found the popularity/androgyny frontier. Make a scatter plot and try to find a curve

F(popularity, androgyny) = constant

that roughly fits the curve, for F(x,y) some fairly simple function. Then F is your composite index.

David Speyer

Date: 2014-10-02 05:18 pm (UTC)
From: (Anonymous)
Fun, thanks! And much shorter than I had guessed. The most popular+androgynous name must be on this list.

My proposal before seeing data was that I wanted to say these should be all roughly equally popular-androgynous, at least after discarding the ends of the list. I think I feel pretty good about that claim for Logan-Milan.

Charlie is a weird case though. Presumably, what we are seeing is that parents of boys usually name their son Charles and call him Charlie, while parents of girls actually give the name Charlie. In general, nicknames are a tricky issue for you, I'd think (Pat=Patrick=Patricia, Sandy=Alexander=Alexandra, etc.)

I might play with this data while procrastinating.

David

Date: 2014-10-02 05:59 pm (UTC)
From: (Anonymous)
I imagine your final column is min(f/(f+m), m/(f+m)), not min(f/m,m/f) as in my fake example? (Since it ends at 0.5, not 1.) That confused me in a few places that I don't have time to go back and fix right now; you should check my work for missing factors of 2.

David

Date: 2014-10-02 08:58 pm (UTC)
From: (Anonymous)
I'm not claiming one is particularly better than the other, just that it confused me. (It could affect the final result though, since f/(f+m) is not a linear function of f/m.)

How much do you actually want to think about this? It would be fun to write a note that said "Riley is the most androgynous name", and would probably get linked a bit, but I don't know how worthwhile it actually is.

One of my bad habits is that when I have two difficult courses to prepare, an NSF grant application due (turned in today, yay!) and not enough time for either because of the High Holidays -- I start looking for a fun new project to think about. Then I don't actually finish the fun new project because I have so many old projects.

David

Date: 2014-10-02 05:18 pm (UTC)
From: (Anonymous)
"popular-androgynous" is a hyphen, not a minus sign. (I changed to + to avoid ambiguity in one case but not the other.)

Date: 2014-10-02 05:37 pm (UTC)
From: (Anonymous)
That is some amazingly linear data!

http://www.math.lsa.umich.edu/~speyer/NamePlot.png

The line is

androgyny = 0.5 - 0.0000275 frequency

I put the 0.5 into my model by hand but, if I take a best fit line without putting it in, I still get 0.504. Inverting the model*,

frequency = 18000 (1-2*androgeny)

In other words, a completely gendered name, which was maximally popular in all other ways, would get about 18000 recipients. Each 0.1 increase in androgyny loses you about 3600 recipients. (Again, restricting to the names which are most popular at that androgyny level.)

Given this, I like

frequency + 36000 androgyny = (f+m) + 36000 min(f/m,m/f)

as a measure -- I think of it as "how popular the name would be, if there were no androgyny penalty".

Disclaimer: Pure mathematician here, no formal training in statistics.

* These equations are not mathematically inverse, because I am trying to pay token respect to significant digits.

Date: 2014-10-03 07:50 pm (UTC)
From: [identity profile] miraclaire.livejournal.com
Another thing is spelling variants that indicate whether a name is masculine or feminine... I've been kind of surprised at the number of people who don't understand that Jesse/Jessie are different (and aggravated that people we interact with on a regular basis don't bother to learn it). Not to mention the number of comments about how crazy and modern and gender-neutral Jesse's name is, when we named him after my grandfather, who was given the name in May 1919.

Date: 2014-10-05 10:38 am (UTC)
From: [identity profile] miraclaire.livejournal.com
Haha -- that'll show me :)

December 2025

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930 31   

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 15th, 2026 07:25 am
Powered by Dreamwidth Studios