notation ======== math definition code ---- ---------- Σ alphabet A∪B set union A|B ∅ empty set {} (where the curlys delimit the set) ε empty string '' or "" (where ' or " is the string delimiter) A○B concatentation AB A∪B○C A|BC A* A* (same!) (A)B grouping (A)B (same!) Then we can define A+ = "one or more" = AA* A? = "zero or one" = (A|ε) . = "any one char" = (a|b|c|...) list all symbols of Σ [abc] = "char class" = (a|b|c) [^abc] = "inverse class" = (d|e|f|...) list all symbols of Σ except those regular expression ================== ∪ union (also spelled "|" in coding regex) ∩ intersection ∅ empty set (also {} in coding) coding conventions ================== * Since regexes are often used in programming languages to match only part of a string - and perhaps return or remember what was matched - the zero-width-match symbols ^ and $ are often added to match the start and end of the string. This doesn't change the power of the math regexes; it just changes what the regex means ... either part of a string or the whole string. (It does however open up an ambiguity for the coding case where you may want to find the substring that matches. If the regex is A.*A, then In a string like ABBACCA the match could be either ABA or ABBACCA, both of which are consistent with A.*A. So then you need to worry about "lazy" and "greedy" versions of the operators.) * Many languages treat lines and strings somewhat differently, and so often "." doesn't match the newline character "\n". Again, this doesn't change the power of the math regex, it just treats one character (newline) in a special way. * "Remember this" operators are often added to coding regexes, often just parens, along with ways to match was seen previously. So for example "(.)\1" could mean "any single character twice", where \1 means "same as whatever was in the first parens. With this addition, these are *not* math regelar expressions. This allows *more* expressive power than the formal math notion of "regular expression", which cannot capture the same-as-that-other-thing notion. (Although (.)\1 could be done by explicitly listing all pairs of the finite alphabet.) Note that in both versions, "." means "any single character". ".." means "any character followed by any second character" and ".*" means essentially "any character string".