Jim's Pig Latin
Assignment
Working with the others in your group, design and implement an
industrial strength piglatin text conversion utility, which you will
present in class. Include
A specification of what your utility does.
Documentation on how to install and use it.
Test cases and examples.
I also want you to think of this effort as an exercise in project
management, so be thinking about how you're making decisions, how
you're sharing the work, and what software tools you're choosing to use.
Design
Research
Googling "pig latin" and following links on those pages I find:
I choose to follow rules very similar to Donelly's, and used his transformation tool
to to generate some examples.
But I also want to do better than he allows with utf-8 characters like ä or ó .
Here are some resources I used to think about that.
Rules
These are the text to Pig Latin transformation rules I'll implement.
- Words that start with a vowel (a,e,i,o,u) will have "way" appended
- or => orway
- apple => appleway
- Otherwise move initial consonant cluster to end then append "ay"
- cat => atcay
- then => enthay
- school => oolschay
- "y" is a special case: within a word it will be treated like a vowel.
- my => ymay
- style => ylestay
- yellow => ellowyay
- ... but I'm not at all sure that will "do the right thing" in all cases.
- TODO: think about Y more.
- "qu" is a special case, it isn't broken up at the start of a word.
- Punctuation and whitespace are not changed.
- Capitalization should be move to the appropriate new first letter.
- My name is Smith. => Ymay amenay isway Ithsmay.
- Hyphenated words are treated as two words.
- Pure numbers (23, 0xead, 3.22, 1.23e-7) are not changed.
- (23, 0x77ff, 1.23, 7.7e-16) => (23, 0x77ff, 1.23, 7.7e-16)
- Mixed letters and numbers: leading numbers are vowels; embedded aren't.
- one2 s78 7th => one2way 78say 7thway
- Vowels with diacritic marks (i.e. á, é, í, ó, ú, ü, è, Ë) are still vowels.
- TODO: Find a tool to do utf-8 diacritic extraction,
or explicitly generate a list of vowelscapitalizationdiacritic characters
- Words with only consonants are prepended with A and appended with ay.
- His name is Dr. Jones. => Ishay amenay isway Adray. Onesjay.
- ... I just didn't like what happens Dr without this.
- ... but this will make the transformation difficult to invert.
- Contractions are treated as one word.
- And I can't stand him. => Andway Iway an'tcay andstay imhay.
- Punctuation including brackets is left unchanged.
- TODO: if we want to treat HTML differently, we'll need a preprocessor.
- (foo) => (oofay)
- Words in non-European alphabets are left unchanged.
- He said "Сказки братьев Гримм" on the 12th of month 7. =>
Ehay aidsay "Сказки братьев Гримм" onway ethay 12thway ofway onthmay 7.
- Alle mensen worden vrij en gelijk in waardigheid en rechten geboren. =>
Alleway ensenmay ordenway ijvray enway elijkgay inway aardigheidway
enway echtenray eborengay.
- Bạn thì sao? Tôi nhớ bạn lắm => ? (some of these are eurpean ...)
- TODO: define better what to do when words with mixes
of Latin & non-Latin symbols.
stack & API
- Code as a python3 module (python2 has worse utf-8 support)
- ... with doctests (which is simple though repetitive for this use case)
- imjaysigpayatinlay.py (i.e. jim's pig latin)
- API : piglatin = text_to_piglatin(text)
- Put in a Makefile to run the tests and save the output.
status
$ tree
.
├── Makefile
├── imjaysigpayatinlay.py
├── readme.md # this file
└── test_output.txt
$ make # create test_output.txt
$ more test_output.txt # ... and look at it.
Thu Feb 8 02:22:43 EST 2018
Trying:
text_to_piglatin("apple") # word starts with vowel
Expecting:
'appleway'
ok
Trying:
text_to_piglatin("cat") # word starts with consonant
Expecting:
'atcay'
ok
...
Failed example:
text_to_piglatin("(foo) [bar]") # parens and brackets
Expected:
'(oofay) [arbay]'
Got:
'oo)(fay ar][bay'
...
21 tests in 4 items.
7 passed and 14 failed.
***Test Failed*** 14 failures.