Regular Expression

Regular expressions (commonly called regex) are a type of grammar that creates strings (here in the formulaic sense). In a sense, a regular expression is a mathematical formula that creates strings. They are especially useful in string pattern-matching. More formally, a regex defines a regular language

Definition

Regular Expression ^definition

A regular expression, $R$ , over an alphabet $Σ = {a_{1}, a_{2}, \dots, a_{n}}$ is defined as:
$R := a_{1} ∣ a_{2} ∣ \dots a_{n} ∣ ϵ ∣ \emptyset ∣ (R \cup R) ∣ (R \circ R) ∣ (R^{*})$

$\cup$ (sometimes replaced with $∣$ ) means OR: $(1 \cup 0)$ defines the language $L = {0, 1}$

$\circ$ means concatenation: $(0 \circ 1) = (01)$ (commonly the circle is omitted) defines the language $L = {01}$

$^{*}$ means the Kleene star: $0^{*}$ defines the language $L = {ϵ, 0, 00, 000, \dots}$

$L (R)$ defines the (regular) language generated from the regex $R$

Some examples of regexes:

$c a \circ (t \cup b \cup p)$ defines the language $L = {c a t, c ab, c a p}$
$0 \circ 1^{*}$ defines the language $L = {0, 01, 011, 011, 01111, \dots}$

Since regexes borrow all the same operations on languages, we can ‘pull’ out the $L$ :

$L (a) = {a}$
$L (ϵ) = {ϵ}$
$L (\emptyset) = \emptyset$ (Pay attention! The empty word is not the empty set!)
$L (R_{1} \cup R_{2}) = L (R_{1}) \cup L (R_{2})$
$L (R_{1} \circ R_{2}) = L (R_{1} R_{2}) = L (R_{1}) \circ L (R_{2})$
$L (R^{*}) = (L (R))^{*}$

Binding Power

Just like in formula parsing, regexes have a very, strict, well-formed syntax, that is unambiguous, as well as a slightly looser syntax. For example:

R_{1} = ((a \circ b) \cup c) R_{2} = ab \cup c

In our looser syntax, we ignore unnecessary parentheses, as well as treat $a \circ b = ab$

$R_{2}$ defines the exact same language as $R_{1}$ , but is much more simpler. However, this is only possible when we understand the binding powers of the operations for languages:

Binding Powers

Expression Tree

Like formula parsing, we can optionally construct an expression tree to visually see the hierarchy in a more complex regex:

Considered the regex below: %%🖋 Edit in Excalidraw, and the dark exported image%%

Theorems

T1: A language is regular iff it can be described by a regular expression. ^t1

Let $L$ be a language. Then:
$L is a regular language ⟺ \exists regular expression, R, L (R) = R$

Proof

Since this statement is an equivalence, we need to prove both ‘sides’: $L$ being regular implies that there is a regex $R$ that yields the same language and that if we have a regex $R$ for some language $L$ , it implies that $L$ is regular.

Start by the left equivalence ( $⟸$ ): $L$ is a regular language $⟸$ There is a regular expression $R$ that corresponds to $L$ .

We can use the definition of a regular language:

So, we just need to construct a DFA that recognises $L$ .

We use a proof by induction:

Base Case(s): $R = a \in Σ ∣ ϵ ∣ \emptyset$

We can construct trivial automata that recognise $L (R)$ :

$R$ $L (R)$ Equivalent DFA
$a \in Σ$ ${a}$
%%🖋 Edit in Excalidraw, and the dark exported image%%
$ϵ$ ${ϵ}$
%%🖋 Edit in Excalidraw, and the dark exported image%%
$\emptyset$ $\emptyset$
%%🖋 Edit in Excalidraw, and the dark exported image%%
Inductive Step: $R = R \cup R ∣ R \circ R ∣ R^{*}$

We can use the existing proofs that union, concatenation and Kleene star are closed with respect to regular languages. Just use the same DFA used in the proofs of these theorems.

$R$	$L (R)$	Equivalent DFA
$a \in Σ$	${a}$	%%🖋 Edit in Excalidraw, and the dark exported image%%
$ϵ$	${ϵ}$	%%🖋 Edit in Excalidraw, and the dark exported image%%
$\emptyset$	$\emptyset$	%%🖋 Edit in Excalidraw, and the dark exported image%%
Inductive Step: $R = R \cup R ∣ R \circ R ∣ R^{*}$

#todo finish both sides of implicaiton

Examples

1: Deriving the equivalent language from a regular expression

Find the associated language of the regular expression given below:
$((0 \cup 1) \circ (0 \circ 1))^{*}$

Solution

When ‘parsing’ a regular expression, it helps to consider the innermost regex first.

$(0 \cup 1)$ gives the language ${0, 1}$

$(0 \cup 1) \circ (0 \cup 1)$ gives the language ${0, 1} \circ {0, 1} = {00, 01, 10, 11}$

Taking the Kleene star of that all binary strings of even length.

Questionably Accurate Notes

Explorer

Regular Expression

Definition

Binding Power

Expression Tree

Theorems

Examples

Table of Contents

Related Concepts

See Also:

Questionably Accurate Notes

Explorer

Regular Expression

Definition §

Binding Power §

Expression Tree §

Theorems §

Examples §

Table of Contents

Related Concepts

See Also:

Definition

Binding Power

Expression Tree

Theorems

Examples