Regular expressions (commonly called regex) are a type of grammar that creates strings (here in the formulaic sense). In a sense, a regular expression is a mathematical formula that creates strings. They are especially useful in string pattern-matching. More formally, a regex defines a regular language
Definition
Regular Expression ^definition
A regular expression, , over an alphabet is defined as:
- (sometimes replaced with ) means OR: defines the language
- means concatenation: (commonly the circle is omitted) defines the language
- means the Kleene star: defines the language
defines the (regular) language generated from the regex
Some examples of regexes:
- defines the language
- defines the language
Since regexes borrow all the same operations on languages, we can ‘pull’ out the :
- (Pay attention! The empty word is not the empty set!)
Binding Power
Just like in formula parsing, regexes have a very, strict, well-formed syntax, that is unambiguous, as well as a slightly looser syntax. For example:
In our looser syntax, we ignore unnecessary parentheses, as well as treat
defines the exact same language as , but is much more simpler. However, this is only possible when we understand the binding powers of the operations for languages:
Expression Tree
Like formula parsing, we can optionally construct an expression tree to visually see the hierarchy in a more complex regex:
Considered the regex below:
%%🖋 Edit in Excalidraw, and the dark exported image%%
Theorems
T1: A language is regular iff it can be described by a regular expression. ^t1
Let be a language. Then:
Proof equivalence, we need to prove both 'sides': being regular implies that there is a regex that yields the same language and that if we have a regex for some language , it implies that is regular.
Since this statement is an
Start by the left equivalence (): is a regular language There is a regular expression that corresponds to .
We can use the definition of a regular language:
So, we just need to construct a DFA that recognises .
We use a proof by induction:
Base Case(s):
We can construct trivial automata that recognise :
Equivalent DFA
%%🖋 Edit in Excalidraw, and the dark exported image%%
%%🖋 Edit in Excalidraw, and the dark exported image%%
%%🖋 Edit in Excalidraw, and the dark exported image%%Inductive Step: We can use the existing proofs that union, concatenation and Kleene star are closed with respect to regular languages. Just use the same DFA used in the proofs of these theorems.
#todo finish both sides of implicaiton
Examples
1: Deriving the equivalent language from a regular expression
Find the associated language of the regular expression given below:
Solution
When ‘parsing’ a regular expression, it helps to consider the innermost regex first.
- gives the language
- gives the language
- Taking the Kleene star of that all binary strings of even length.