2011-09-18
Elias
Book notes
Chapter 3
Intro
- Morphological parsing
- Finite-state transducer
- Common lemma of words
- Lemmatization
- "sang", "sung", "sings" ==> "sing"
3.1
Morphology
- Morpheme: meaning-bearing sub-unit of a word, eg: "cats": 'CAT' + 'S'
- Affixes: ENG: no more than 4 or 5.
- Agglutinative languages: stacking multiple affixes (eg: Turkish, Finnish, German, Inuktitut)
- Clitic: Morpheme that acts syntactically like a word, but is reduced in form and attached to another word ('ve in I've, and other contractions)
- Inflection:
- English nouns: Plural and possessive. (-s and '+ -s)
- English verbs: Regular: 4 forms: stem, -s form, -ing participle, -ed (past) participle. Irregular: 5-8 forms: stem, -s form, -ing participle, preterite, past participle.
- Gerund: Verb as noun (raining...)
- Derivational
- A combination of a stem and a grammatical morpheme, usually resulting in a different class of word.
- Clitics
- am -> 'm, have -> 've, are -> 're, ...
3.2
Input English Morphological Parse
----------------------------------------
cats cat +N +Pl
cat cat +N +Sg
cities city +N +Pl
geese goose +N +Pl
. .
. .
. .
from p53
- Building a morphological parser:
- Lexicon: List of stems and affixes, w/ basic info about them (N, V, etc...)
- Morphotactics: Model of morpheme ordering that explains which clases of morphemes can follow other classes inside a word. (eg: Pl follows N)
- Ortho. rules: Spelling rules used to model changes that occur in a word, us. when two morphemes combine. (eg: y -> ie [city -> cities])
3.3
Constructing FS Lex.
- Eng. derivational morphology sig. more complex than inflectional morphology.
- Morphological recognition
- Det. when an input str of letters makes up a legitimate English word.
- Eg: Noun recog FSA (fig. 3.7 p 56)
3.4-3.7
FS Transducers
- As recognizer: FST that takes a pair of str as input and output; accept if str-pair is in the str-pair language, reject if not.
- As generator: Machine that outputs pairs of str of the language; output: y/n and pair of output str.
- As translator: Reads a str and outputs another str
- As set relater: Computes relationships between sets.
--> For MP: FST as translator: in: str of letters, out: str of morphemes
FST Definition:
Q finite set of states, q0, q1, q2, ..., qn
∑ finite set corresponding to input alphabet
Δ finite set corresponding to output alphabet
q0 ∈ Q start state
F ⊆ Q set of final states
δ(q,w) transition function/matrix between states
σ(q,w) output function giving set of possible output strs for each state and input.
- FSA isomorphic to regular languages, FST isomorphic to regular relations
- FST closed under union (q0, q1 ∈ Q, q0+q1 ∈ Q)
- Inversion: q0<->out
- FST as parser -> FST as generator
- Composition: T1: I1->O1, T2: O1->O2 => (T1∘T2): I1->O2
3.5
Lexical entry for 'goose', geese->goose g:g o:e o:e s:s e:e
Exercises
3.2
3.3/4
(Will probably be drawn out on paper first)
3.5
(Will probably be drawn out on paper first)
3.9
Jim