2012-02-08

The Rosetta Stone

Hieroglyphs/Egyptian Demotic/Ancient Greek

Challenges

1. Lexical ambiguity

One wordform, many meanings:
- (en) Kill: (fr) Tuer, terminer;
- (en) Book: (fr) Livre, reserver;
- (en) Chair: (fr) Chaise, Président; ...
- (en) Buffalo: (fr) (ville de) Buffalo, NY; buffle; bison

2. Differing word orders

(en) [Subject] – ((trans.) Verb) – {Object} – ("IBM bought Lotus", "[The dog] (bit) {the boy}")
- (jp) [Subject] – {Object} – ((trans.) Verb) – ("IBM Lotus bought", "[The dog] {the boy} (bit)")
(en) [Det.] – (Adj) – {Noun} – ("[The] (red) {house}")
- (fr) [Det.] – {Noun} – (Adj) – ("[The] {house} (red)" "la maison rouge")

3. Unpreserved syntax

La botella entró a la cuerva flotando
(the bottle entered the cave floating)
- The bottle floated into the cave

4. Syntactic ambiguity

"Visiting relatives can be a nuisance."
- "Visiter la famille peut être ennuyant"
- "Les membres de la famille qui visitent peuvent être ennuyants"

5. Idiosyncracies

"Burn the midnight oil"
- "Travailler tard" (work late)
"Faire la guele" ("do the snout")
- "Complain"

Classical Machine Translation: Dictionaries

Few rules... if preceding word is ... then ...

 Translate 'much' or 'many' into Russian
 
 if preceding word is how return skol’ko
 else if preceding word is as return stol’ko zhe
 else if word is much
 if preceding word is very return nil
   else if following word is a noun return mnogo
 else (word is many)
   if preceding word is a preposition and next word is a noun
      return mnogii
   else  return mnogo

No analysis
- word order can seem funny in translation
- Part of speech

Classical Machine Translation: Transfer-based approach

Transfer-based approach involves three phases:
- Analysis: e.g., build syntactic parse trees of the source sentence.
- Transfer: e.g., convert the source-language parse tree to a target-language parse tree.
- Generation: e.g., produce an output sentence from thetarget-language parse tree.
These systems can involve fairly deep analysis, including semantic analysis.

Source

https://courseware.stanford.edu/pg/courses/lectures/214428
- Knight's MT Workbook
- Knight FAQ - short correction to previous
- lecture slides 1 - 5 cover this material; essence is this chain of "IBM models 1-5".

Jim says

I suggest that to really understand this stuff - and explain it to me - you try to code (likely in python) the simplest toy model you can invent to implement this (model1 -> model2 -> ...) chain of counting/probability calculations, as described in the "Knight: Statistical MT Workbook". (Note that there's an FAQ related document which looks like its actually a correction.)

http://cs.marlboro.edu/ courses/ spring2012/jims_tutorials/ elias/ 2012-02-08
last modified Wednesday February 8 2012 11:01 am EST

Jim's
Tutorials

course

navigation

2012-02-08

The Rosetta Stone

Challenges

Classical Machine Translation: Dictionaries

Classical Machine Translation: Transfer-based approach

Source

Jim says

Jim'sTutorials

course

navigation

2012-02-08

The Rosetta Stone

Challenges

Classical Machine Translation: Dictionaries

Classical Machine Translation: Transfer-based approach

Source

Jim says

Jim's
Tutorials