2011-09-25
Elias
- Installing GraphViz on my laptop to view the FSTs drawn in .dot format.
Chapter 3 Exercises
3.2
FST from book:
---3-17.dot---
digraph 3.17 {
rankdir=LR
q0 [shape=doublecircle]
q1 [shape=doublecircle]
q2 [shape=doublecircle]
q0 -> q0 [label="^:ε\nother\n#"]
q0 -> q1 [label="z,s,x"]
q1 -> q0 [label="#\nother"]
q1 -> q1 [label="z,s,x"]
q1 -> q2 [label="^:ε"]
q2 -> q0 [label="#\nother"]
q2 -> q1 [label="z,x"]
q2 -> q5 [label="s"]
q5 -> q2 [label="^:ε"]
q5 -> q1 [label="z,s,x"]
q5 -> q0 [label="other"]
q2 -> q3 [label="^:ε"]
q3 -> q4 [label="s"]
q4 -> q0 [label="#"]
}
Extended FST for "sh" and "ch":
---3-17e.dot---
digraph 3.2 {
rankdir=LR
q0 [shape=doublecircle]
q1 [shape=doublecircle]
q2 [shape=doublecircle]
q0 -> q0 [label="^:ε\nother\n#"]
q0 -> q1 [label="z,s,x,\nsh,ch"]
q1 -> q0 [label="#\nother"]
q1 -> q1 [label="z,s,x,\nsh,ch"]
q1 -> q2 [label="^:ε"]
q2 -> q0 [label="#\nother"]
q2 -> q1 [label="z,x"]
q2 -> q5 [label="s"]
q2 -> q6 [label="sh,ch"]
q6 -> q2 [label="^:ε"]
q6 -> q1 [label="z,s,x,\nsh,ch"]
q5 -> q2 [label="^:ε"]
q5 -> q1 [label="z,s,x,\nsh,ch"]
q5 -> q0 [label="other"]
q2 -> q3 [label="ε:e"]
q3 -> q4 [label="s"]
q4 -> q0 [label="#"]
}
3.3
digraph kins {
rankdir=LR
q0 [shape=doublecircle]
q1 [shape=doublecircle]
q2 [shape=doublecircle]
q0 -> q0 [label="^:ε\n#\n\other"]
q0 -> q1 [label="c"]
q1 -> q0 [label="#\nother"]
q1 -> q2 [label="^:ε"]
q2 -> q3 [label="ε:k"]
q3 -> q4 [label="ing"]
q4 -> q0 [label="#"]
}
3.5
Trying to draw in .DOT and write a FST for Soundex Algorithm in Python's NLTK.
Started working on written FST. (soundex.py) -- Still buggy/not done. Uploaded current version.
---Soundex Algorithm---
Given a last name:
Remove all non-initial instances of ['a','e','h','i','o','u','w','y']
Replace all ['b','f','p','v'] with '1'
Replace all ['c','g','j','k','q','s','x','z'] with '2'
Replace all ['d','t'] with '3'
Replace all ['l'] with '4'
Replace all ['m','n'] with '5'
Replace all ['r'] with '6'
Replace all sequences of the same number with a single number
Convert to Letter-Digit-Digit-Digit. Add trailing zeros if necessary.
Eg: 'Zeidan' -> Z350
'Mahoney' -> M500