Home
Theory
Lossless
VQ
Speech
Image
Download
Links

Data-Compression.com

Info on Grad Schools:
Gradschools.com Banner

Buy Books, CDs, & Videos:
In Association with Amazon.com

Digital Camera:
Digital Camera

Rio MP3 Player:
Diamond Rio 500 Portable MP3 Player
Diamond Rio 500 Portable MP3 Player

Memory Upgrade:

$1.23 per issue:
PC Magazine

Ebay Auction:
Find your computer at eBay

$1.83 per issue:
Linux Journal

Check out great fares at Orbitz
People Search No Info No Charge only $19.95!

Statistical Distributions of English Text


Contents
  1. First-Order Statistics
  2. Second-Order Statistics
  3. Third-Order Statistics
  4. Notes

I. First-Order Statistics
 

|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
|    a    |    b    |    c    |    d    |    e    |    f    |    g    |    h    |    i    |    j    |    k    |    l    |    m    |    n    |    o    |    p    |    q    |    r    |    s    |    t    |    u    |    v    |    w    |    x    |    y    |    z    |  SPACE  |
|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
 0.0651738 0.0124248 0.0217339 0.0349835 0.1041442 0.0197881 0.0158610 0.0492888 0.0558094 0.0009033 0.0050529 0.0331490 0.0202124 0.0564513 0.0596302 0.0137645 0.0008606 0.0497563 0.0515760 0.0729357 0.0225134 0.0082903 0.0171272 0.0013692 0.0145984 0.0007836 0.1918182 

 

II. Second-Order Statistics
 
  The second number in the first row is the conditional probability P(Xi=b|Xi-1=a)=0.0228302
 

     ||=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
     ||    a    |    b    |    c    |    d    |    e    |    f    |    g    |    h    |    i    |    j    |    k    |    l    |    m    |    n    |    o    |    p    |    q    |    r    |    s    |    t    |    u    |    v    |    w    |    x    |    y    |    z    |  SPACE  |
     ||=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
a    ||0.0002835 0.0228302 0.0369041 0.0426290 0.0012216 0.0075739 0.0171385 0.0014659 0.0372661 0.0002353 0.0110124 0.0778259 0.0260757 0.2145354 0.0005459 0.0195213 0.0001749 0.1104770 0.0934290 0.1317960 0.0098029 0.0306574 0.0088799 0.0009562 0.0233701 0.0018701 0.0715219 
b    ||0.0580027 0.0058699 0.0000791 0.0022625 0.3416714 0.0002057 0.0004272 0.0003639 0.0479084 0.0076894 0.0000000 0.1150560 0.0012816 0.0003481 0.0966553 0.0000158 0.0000000 0.0740301 0.0226884 0.0107430 0.1196127 0.0011550 0.0000316 0.0000000 0.0864502 0.0000000 0.0074521 
c    ||0.1229841 0.0000271 0.0215451 0.0005246 0.1715916 0.0000090 0.0000000 0.1701716 0.0565490 0.0000000 0.0453966 0.0488879 0.0000000 0.0000362 0.1759242 0.0000090 0.0017185 0.0376812 0.0010492 0.0906756 0.0358361 0.0000000 0.0000000 0.0000000 0.0041969 0.0000090 0.0151774 
d    ||0.0280345 0.0005057 0.0002585 0.0081086 0.1224833 0.0006799 0.0054844 0.0007080 0.0794902 0.0003484 0.0001911 0.0092662 0.0021466 0.0030456 0.0397283 0.0001630 0.0000225 0.0178918 0.0307037 0.0009159 0.0178805 0.0027759 0.0013655 0.0000000 0.0076478 0.0000000 0.6201541 
e    ||0.0545873 0.0012798 0.0224322 0.0843434 0.0317097 0.0085640 0.0052834 0.0017762 0.0127186 0.0002605 0.0010967 0.0339975 0.0186268 0.0815271 0.0032334 0.0101307 0.0021424 0.1307517 0.0712793 0.0241537 0.0014289 0.0157312 0.0070879 0.0105139 0.0125997 0.0001831 0.3525610 
f    ||0.0638579 0.0002384 0.0003179 0.0002086 0.0928264 0.0500293 0.0000199 0.0000993 0.0820576 0.0000000 0.0000199 0.0266638 0.0000397 0.0000894 0.1545186 0.0001689 0.0000099 0.0825344 0.0039539 0.0341940 0.0334986 0.0000099 0.0001987 0.0000000 0.0015200 0.0000000 0.3729250 
g    ||0.0592435 0.0003842 0.0005205 0.0020078 0.1482326 0.0002727 0.0101631 0.1420108 0.0501091 0.0000248 0.0000372 0.0395122 0.0029870 0.0127906 0.0573224 0.0005577 0.0000000 0.0884686 0.0261142 0.0062466 0.0256309 0.0000372 0.0003470 0.0000000 0.0032720 0.0001363 0.3235710 
h    ||0.1580232 0.0007737 0.0020460 0.0005185 0.4597035 0.0004627 0.0000359 0.0000718 0.1252667 0.0000000 0.0000040 0.0014278 0.0013042 0.0012922 0.0700557 0.0000439 0.0003191 0.0117178 0.0022056 0.0297253 0.0131497 0.0000000 0.0010290 0.0000000 0.0072309 0.0000000 0.1135928 
i    ||0.0166996 0.0069144 0.0486793 0.0363474 0.0480664 0.0271435 0.0307856 0.0000775 0.0004826 0.0000035 0.0073125 0.0526842 0.0412929 0.2618995 0.0497818 0.0062698 0.0004333 0.0437620 0.1157982 0.1198384 0.0007010 0.0235788 0.0000211 0.0018810 0.0000000 0.0032265 0.0563193 
j    ||0.2106638 0.0000000 0.0000000 0.0000000 0.1906420 0.0000000 0.0000000 0.0000000 0.0004353 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.2644178 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3299238 0.0000000 0.0000000 0.0000000 0.0002176 0.0000000 0.0036997 
k    ||0.0169234 0.0011671 0.0005058 0.0017118 0.3321662 0.0041628 0.0004669 0.0007781 0.1300965 0.0000000 0.0003112 0.0185963 0.0009726 0.1009570 0.0113601 0.0012060 0.0000000 0.0004279 0.0613523 0.0022954 0.0029956 0.0000000 0.0041239 0.0000000 0.0086757 0.0000000 0.2987473 
l    ||0.1016800 0.0005515 0.0020459 0.0668636 0.1657445 0.0134024 0.0011801 0.0001542 0.1107889 0.0000119 0.0053728 0.1355180 0.0055389 0.0009726 0.0826499 0.0022654 0.0000059 0.0018443 0.0230153 0.0180635 0.0144461 0.0041630 0.0025797 0.0000000 0.0968765 0.0000237 0.1442414 
m    ||0.1539307 0.0285939 0.0001653 0.0025384 0.2496134 0.0017798 0.0000195 0.0003015 0.0877464 0.0000195 0.0000000 0.0015756 0.0221846 0.0029567 0.1098532 0.0485124 0.0000000 0.0169910 0.0249954 0.0008461 0.0385435 0.0000292 0.0001167 0.0000000 0.0505257 0.0000000 0.1581614 
n    ||0.0240107 0.0005432 0.0423173 0.1767352 0.0849166 0.0053036 0.1188694 0.0028799 0.0295789 0.0012223 0.0071353 0.0087755 0.0006582 0.0085073 0.0653564 0.0003343 0.0009716 0.0004144 0.0427003 0.0956004 0.0093814 0.0033500 0.0008497 0.0003343 0.0121150 0.0001288 0.2570099 
o    ||0.0083175 0.0072923 0.0127087 0.0203076 0.0029439 0.1135873 0.0060659 0.0018527 0.0087857 0.0001978 0.0106912 0.0268647 0.0580447 0.1459838 0.0330625 0.0138659 0.0002308 0.1175433 0.0322680 0.0492657 0.1337201 0.0164801 0.0488371 0.0005374 0.0033923 0.0008571 0.1262960 
p    ||0.1284508 0.0004427 0.0004427 0.0004713 0.2213542 0.0001428 0.0000857 0.0221226 0.0538854 0.0000286 0.0001143 0.0957597 0.0010854 0.0005856 0.1212242 0.0607692 0.0000000 0.1362487 0.0222939 0.0408603 0.0270926 0.0000000 0.0011711 0.0000000 0.0042274 0.0000000 0.0611405 
q    ||0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0002284 0.0002284 0.0000000 0.9949749 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0045683 
r    ||0.0733524 0.0032081 0.0116789 0.0284070 0.2345530 0.0056616 0.0107385 0.0026432 0.0792432 0.0000435 0.0087196 0.0117263 0.0192448 0.0221961 0.0919374 0.0048043 0.0000316 0.0189406 0.0459213 0.0421561 0.0173721 0.0070603 0.0019873 0.0000040 0.0284504 0.0055945 0.2243241 
s    ||0.0349781 0.0006441 0.0157796 0.0015208 0.1179849 0.0010558 0.0004688 0.0569819 0.0506053 0.0000495 0.0053780 0.0114497 0.0065520 0.0022488 0.0491264 0.0287844 0.0008309 0.0001906 0.0463897 0.1269191 0.0330152 0.0000800 0.0053856 0.0000000 0.0020925 0.0000000 0.4014880 
t    ||0.0393295 0.0001590 0.0037195 0.0000674 0.0892434 0.0009218 0.0000404 0.3352928 0.0666758 0.0000054 0.0000162 0.0146273 0.0009110 0.0011051 0.0913053 0.0000809 0.0000027 0.0310281 0.0245378 0.0171177 0.0185732 0.0000027 0.0078702 0.0000000 0.0121422 0.0002776 0.2449470 
u    ||0.0261517 0.0181796 0.0459729 0.0223272 0.0308931 0.0058765 0.0505571 0.0000699 0.0298191 0.0000087 0.0001572 0.1066327 0.0308669 0.1156002 0.0020170 0.0448465 0.0001746 0.1626908 0.1207345 0.1249869 0.0000349 0.0009343 0.0002008 0.0008819 0.0002969 0.0010042 0.0580839 
v    ||0.1022242 0.0000000 0.0000000 0.0049559 0.6796927 0.0000000 0.0000000 0.0002371 0.1467561 0.0000000 0.0000000 0.0001423 0.0000000 0.0128284 0.0429195 0.0000000 0.0000000 0.0008299 0.0003083 0.0000000 0.0025847 0.0005928 0.0000000 0.0000000 0.0038888 0.0000000 0.0020393 
w    ||0.1832539 0.0003329 0.0002984 0.0018938 0.1605624 0.0013085 0.0000344 0.1893372 0.1788924 0.0000000 0.0005050 0.0089412 0.0002755 0.0372798 0.0933831 0.0000803 0.0000115 0.0082066 0.0126485 0.0018135 0.0011707 0.0000000 0.0003214 0.0000000 0.0006887 0.0000000 0.1187604 
x    ||0.0600144 0.0000000 0.1573582 0.0010050 0.0554200 0.0000000 0.0001436 0.0132089 0.1122757 0.0000000 0.0000000 0.0014358 0.0001436 0.0000000 0.0055994 0.2157933 0.0031587 0.0000000 0.0027279 0.2360373 0.0195262 0.0051687 0.0001436 0.0093324 0.0020101 0.0000000 0.0994975 
y    ||0.0072178 0.0039321 0.0011985 0.0020738 0.0562745 0.0015217 0.0003097 0.0007137 0.0141393 0.0000135 0.0000269 0.0031914 0.0039051 0.0022488 0.1205478 0.0027875 0.0000000 0.0048882 0.0324935 0.0109613 0.0005925 0.0000673 0.0016025 0.0001347 0.0000943 0.0002020 0.7288617 
z    ||0.4219769 0.0007526 0.0060211 0.0067737 0.3038133 0.0000000 0.0000000 0.0005018 0.0709985 0.0002509 0.0000000 0.0198194 0.0000000 0.0000000 0.0730055 0.0000000 0.0000000 0.0002509 0.0017561 0.0005018 0.0037632 0.0010035 0.0000000 0.0000000 0.0100351 0.0268440 0.0519318 
SPACE||0.1062437 0.0444502 0.0391600 0.0282947 0.0213084 0.0400793 0.0171783 0.0606047 0.0678165 0.0034660 0.0045451 0.0243019 0.0406429 0.0234882 0.0649920 0.0273498 0.0022208 0.0214068 0.0704687 0.1460781 0.0092399 0.0079497 0.0606385 0.0001107 0.0114638 0.0002911 0.0562102 
     ||=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|

 

III. Third-Order Statistics
 
  The fourth number in the second row is the conditional probability P(Xi=d|Xi-1=b,Xi-2=a)=0.0085877
  For brevity, we include only the first 5 rows.
 
  The entire matrix (729 rows x 27 columns) can be downloaded from the third-order statistic file.
 

     ||=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
     ||    a    |    b    |    c    |    d    |    e    |    f    |    g    |    h    |    i    |    j    |    k    |    l    |    m    |    n    |    o    |    p    |    q    |    r    |    s    |    t    |    u    |    v    |    w    |    x    |    y    |    z    |    S    |
     ||=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
a a  ||0.0000000 0.0000000 0.0106383 0.3297872 0.0000000 0.0106383 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0212766 0.4361702 0.1063830 0.0000000 0.0000000 0.0000000 0.0425532 0.0106383 0.0000000 0.0106383 0.0000000 0.0106383 0.0000000 0.0000000 0.0000000 0.0106383 
a b  ||0.0075307 0.0140045 0.0000000 0.0085877 0.0202140 0.0000000 0.0000000 0.0029066 0.1762452 0.0014533 0.0000000 0.4001850 0.0000000 0.0019818 0.2522130 0.0000000 0.0000000 0.0146651 0.0622275 0.0000000 0.0192892 0.0000000 0.0000000 0.0000000 0.0085877 0.0000000 0.0099088 
a c  ||0.0066204 0.0000817 0.0963629 0.0006539 0.2368615 0.0000000 0.0000000 0.1985288 0.0402942 0.0000000 0.1439313 0.0111157 0.0000000 0.0001635 0.0089089 0.0000000 0.0155292 0.0263997 0.0003269 0.1977115 0.0092358 0.0000000 0.0000000 0.0000000 0.0048222 0.0000000 0.0024520 
a d  ||0.0304960 0.0001415 0.0000708 0.0352367 0.1144131 0.0038916 0.0004245 0.0027595 0.0417463 0.0038916 0.0000000 0.0098351 0.0241987 0.0031840 0.0245525 0.0000000 0.0000708 0.0103304 0.0175476 0.0022642 0.0116040 0.0342461 0.0005661 0.0000000 0.0419585 0.0000000 0.5865704 
a e  ||0.0345679 0.0000000 0.0024691 0.0098765 0.0074074 0.0000000 0.0222222 0.0000000 0.0000000 0.0000000 0.0000000 0.1753086 0.0024691 0.0123457 0.1135802 0.0000000 0.0049383 0.0839506 0.0074074 0.0049383 0.0246914 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.4938272 
     ||=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|

 
 

IV. Notes
 

  • The above statistics are derived from these classics:
    1. Origin of Species (Charles Darwin)
    2. The Voyage of the Beagle (Charles Darwin)
    3. Jane Eyre (Charlotte Bronte)
    4. Wuthering Heights (Emily Bronte)
    5. Tarzan of the Apes (Edgar Rice Burroughs)
    6. The Return of Tarzan (Edgar Rice Burroughs)
    7. Paradise Lost (John Milton)
    All of these are available at www.literature.org.
  • All upper-case letters were converted to lower-case.
  • All numbers and all special characters were removed.
  • A carriage return is treated as a space.
  • There are exactly 5086936 characters in these files.
  • The first-order statistics are comparable with Table 2.1 in Blahut's Information Theory book.
  • Unless every elements of a row are zeros, the row sum should be approximately one. There may be round-off errors.

 

Home
Theory
Lossless
VQ
Speech
Image
Download
Links

Uncle Sam

Copyright © 2000-2001, Nam Phamdo