assignment 2 comments
Jim says
The coding parts were generally OK -
you engaged the questions and said
some reasonable things.
The regex piece generally felt somewhat sloppy :
- While I didn't say this explicitly, it would have been good to describe what you were trying to match.
- And after providing code, it would have been nice to see test strings (success & failure) and output.
- The "." character in a python (or most languages) regex is almost always "match any character" - almost everyone missed that.
- Adding an optional "www\." (or even worse www.) to the start of a URL regex doesn't actually accomplish much ...
I looked into the email spec, and can trivially prove that valid email
addresses form a regular language: all valid emails are under 254
characters. Therefore the language is finite, and therefore regular.
The wikipedia argicle "email address" goes into some
of the complications of email addresses. The definition
and what's allowed has changed somewhat over time,
which means it's messy to even decide what the spec is.
If "obsolete" forms are allowed, then it's the comments
within brackets (comments in email addresses??)
which can apparently be nested (??!?) which make
it impractical to match with a regex. (See "validation
and verfication" in the wikipedia article.)
The modern form an email address is pretty simple
localpart@domain
though localpart can have quoted strings and
many other funky characters ... which not all
email forms and what-not follow.
It might be worth talking about the email spec a bit, in part because
it leads into our next grammar conversation.