Some pointless calculations about letters
Jun. 29th, 2022 09:51 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
If you write out the alphabet:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
...obviously each letter is used the same number of times: 1/26 of the entire string, about 3.8% of the time.
If you replace each letter with its NATO alphabet symbol:
ALFA BRAVO CHARLIE DELTA ECHO FOXTROT GOLF HOTEL INDIA JULIETT KILO LIMA MIKE NOVEMBER OSCAR PAPA QUEBEC ROMEO SIERRA TANGO UNIFORM VICTOR WHISKEY XRAY YANKEE ZULU
...then the appearances of the letters in that 138-letter string are as follows:
A: 14 (10.1%)
B: 3 (2.2%)
C: 5 (3.6%)
D: 2 (1.4%)
E: 15 (10.9%)
F: 4 (2.9%)
G: 2 (1.4%)
H: 4 (2.9%)
I: 11 (8.0%)
J: 1 (0.7%)
K: 4 (2.9%)
L: 9 (6.5%)
M: 5 (3.6%)
N: 5 (3.5%)
O: 14 (10.1%)
P: 2 (1.4%)
Q: 1 (0.7%)
R: 11 (8.0%)
S: 3 (2.2%)
T: 8 (5.8%)
U: 5 (3.6%)
V: 3 (2.2%)
W: 1 (0.7%)
X: 2 (1.4%)
Y: 3 (2.2%)
Z: 1 (0.7%)
And then, if you take that string, and replace each letter in it with its NATO alphabet symbol, so ALFA becomes ALFA LIMA FOXTROT ALFA and BRAVO becomes BRAVO ROMEO ALFA VICTOR OSCAR and so on...
...and you iterate that process, each time replacing each latter with its NATO alphabet symbol, it looks like the percentages asymptotically converge toward values approximated by:
A: 15.8%
B: 0.8%
C: 5.9%
D: 2.1%
E: 9.2%
F: 4.3%
G: 1.0%
H: 3.8%
I: 8.3%
J: 0%
K; 1.4%
L: 7.6%
M: 5.1%
N: 3.2%
O: 13.0%
P: 0%
Q: 0%
R: 9.1%
S: 3.3%
T: 3.9%
U: 0%
V: 1.0%
W: 0%
X: 1.1%
Y: 0.3%
Z: 0.%
I haven't proven those or calculated the exact limits or anything; I just iterated the process a bunch of times until the first decimal place stopped changing. I think it's interesting how the limits differ from the percentages in the first iteration, in the string ALFA BRAVO etc. For instance, in the first iteration, E is the most common letter at 10.9%, but in the limit A and O are both well above E; this is because a lot of the mass of E in the first iteration comes from words like JULIETT, QUEBEC, and YANKEE, which don't propagate themselves much (if at all) in further iterations, whereas A appears in a more frequently-occurring set of letters, such as OSCAR and INDIA and an extra time in ALFA. C and U both start out at the same rate in the first iteration, but C stabilizes at a relatively high 5.9% (well above average!) while U approaches zero, just because C appears in the highly-frequent ECHO and OSCAR, while U appears only in JULIETT, QUEBEC, and ZULU (aside from itself, UNIFORM), and J, Q, and Z don't appear in any other letters, so U just adds a paltry four occurrences per iteration while most letters are increasing exponentially.
I don't know if this is useful for anyone, or anything, but I did the calculations, and figured I should record the result someplace.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
...obviously each letter is used the same number of times: 1/26 of the entire string, about 3.8% of the time.
If you replace each letter with its NATO alphabet symbol:
ALFA BRAVO CHARLIE DELTA ECHO FOXTROT GOLF HOTEL INDIA JULIETT KILO LIMA MIKE NOVEMBER OSCAR PAPA QUEBEC ROMEO SIERRA TANGO UNIFORM VICTOR WHISKEY XRAY YANKEE ZULU
...then the appearances of the letters in that 138-letter string are as follows:
A: 14 (10.1%)
B: 3 (2.2%)
C: 5 (3.6%)
D: 2 (1.4%)
E: 15 (10.9%)
F: 4 (2.9%)
G: 2 (1.4%)
H: 4 (2.9%)
I: 11 (8.0%)
J: 1 (0.7%)
K: 4 (2.9%)
L: 9 (6.5%)
M: 5 (3.6%)
N: 5 (3.5%)
O: 14 (10.1%)
P: 2 (1.4%)
Q: 1 (0.7%)
R: 11 (8.0%)
S: 3 (2.2%)
T: 8 (5.8%)
U: 5 (3.6%)
V: 3 (2.2%)
W: 1 (0.7%)
X: 2 (1.4%)
Y: 3 (2.2%)
Z: 1 (0.7%)
And then, if you take that string, and replace each letter in it with its NATO alphabet symbol, so ALFA becomes ALFA LIMA FOXTROT ALFA and BRAVO becomes BRAVO ROMEO ALFA VICTOR OSCAR and so on...
...and you iterate that process, each time replacing each latter with its NATO alphabet symbol, it looks like the percentages asymptotically converge toward values approximated by:
A: 15.8%
B: 0.8%
C: 5.9%
D: 2.1%
E: 9.2%
F: 4.3%
G: 1.0%
H: 3.8%
I: 8.3%
J: 0%
K; 1.4%
L: 7.6%
M: 5.1%
N: 3.2%
O: 13.0%
P: 0%
Q: 0%
R: 9.1%
S: 3.3%
T: 3.9%
U: 0%
V: 1.0%
W: 0%
X: 1.1%
Y: 0.3%
Z: 0.%
I haven't proven those or calculated the exact limits or anything; I just iterated the process a bunch of times until the first decimal place stopped changing. I think it's interesting how the limits differ from the percentages in the first iteration, in the string ALFA BRAVO etc. For instance, in the first iteration, E is the most common letter at 10.9%, but in the limit A and O are both well above E; this is because a lot of the mass of E in the first iteration comes from words like JULIETT, QUEBEC, and YANKEE, which don't propagate themselves much (if at all) in further iterations, whereas A appears in a more frequently-occurring set of letters, such as OSCAR and INDIA and an extra time in ALFA. C and U both start out at the same rate in the first iteration, but C stabilizes at a relatively high 5.9% (well above average!) while U approaches zero, just because C appears in the highly-frequent ECHO and OSCAR, while U appears only in JULIETT, QUEBEC, and ZULU (aside from itself, UNIFORM), and J, Q, and Z don't appear in any other letters, so U just adds a paltry four occurrences per iteration while most letters are increasing exponentially.
I don't know if this is useful for anyone, or anything, but I did the calculations, and figured I should record the result someplace.