==In formal language theory== Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. Given a finite alphabet ? the following constants are defined: *(''empty set'') {{Unicode|?}} denoting the set {{Unicode|?}} *(''empty string'') ? denoting the set {?} *(''[[Literal string|literal character]]'') ''a'' in ? denoting the set {''a''} and the following operations: * (''concatenation'') ''RS'' denoting the set { ?? | ? in ''R'' and ? in ''S'' }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}. *(''alternation'') ''R|S'' denoting the set union of ''R'' and ''S''. *(''[[Kleene star]]'') ''R''* denoting the smallest [[superset]] of ''R'' that contains ? and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in ''R''. For example, {"ab", "c"}* = {?, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }. The above constants and operators form a [[Kleene algebra]]. Many textbooks use the symbols {{Unicode|?}}, {{Unicode|+}}, or {{Unicode|?}} for alternation instead of the vertical bar. To avoid brackets it is assumed that the Kleene star has the highest priority, then concatenation and then set union. If there is no ambiguity then brackets may be omitted. For example, (ab)c is written as abc and a|(b(c*)) can be written as a|bc*. '''Examples:''' *a|b* denotes {?, ''a'', ''b'', ''bb'', ''bbb'', ...} *(a|b)* denotes the set of all strings consisting of any number of ''a'' and ''b'' symbols, including the empty string *b*(ab*)* the same *ab*(c|?) denotes the set of strings starting with ''a'', then zero or more ''b''s and finally optionally a ''c''. *((a|ba(aa)*b)(b(aa)*b)*(a|ba(aa)*b)|b(aa)*b)*(a|ba(aa)*b)(b(aa)*b)* denotes the set of all strings which contain an even number of ''b''s and an odd number of ''a''s. Note that this regular expression is of the form (''X'' ''Y''*''X'' U ''Y'')*''X'' ''Y''* with ''X'' = a|ba(aa)*b and ''Y'' = b(aa)*b. The formal definition of regular expressions is purposefully parsimonious and avoids defining the redundant quantifiers ? and +, which can be expressed as follows: ''a''+= aa*, and ''a''? = (?|a). Sometimes the complement operator ~ is added; ~''R'' denotes the set of all strings over ? that are not in ''R''. The complement operator is redundant: it can always be expressed by only using the other operators. Regular expressions in this sense can express exactly the class of languages accepted by [[Finite state automaton|finite state automata]]: the [[regular language]]s. There is, however, a significant difference in compactness: some classes of regular languages can only be described by automata that grow [[Exponential growth|exponentially]] in size, while the length of the required regular expressions only grow linearly. Regular expressions correspond to the type 3 [[Formal grammar|grammars]] of the [[Chomsky hierarchy]] and may be used to describe a [[regular language]]. We can also study expressive power within the formalism. As the example shows, different regular expressions can express the same language: the formalism is redundant. It is possible to write an [[algorithm]] which for two given regular expressions decides whether the described languages are essentially equal, it reduces each expression to a minimal deterministic finite state automaton and determines whether they are [[isomorphic]] (equivalent). To what extent can this redundancy be eliminated? Can we find an interesting subset of regular expressions that is still fully expressive? Kleene star and set union are obviously required, but perhaps we can restrict their use. This turns out to be a surprisingly difficult problem. As simple as the regular expressions are, it turns out there is no method to systematically rewrite them to some normal form. They are not finitely axiomatizable. So we have to resort to other methods. This leads to the [[star height problem]]. It is worth noting that many real-world "regular expression" engines implement features that cannot be expressed in the regular expression algebra; see [[#Patterns for irregular languages|below]] for more on this.