CFGs are used in many markdown file formats, such as JSON and XML. In fact, the definition of regular expression is actually a CFG. The ability to recursively define a set of rules gives CFGs much more power than regular expressions.

Definition

Context-free Grammar (CFG) ^definition

A context-free grammar is defined as a 4-tuple:

  • = An alphabet. In the context of CFGs, this is also referred to a set of terminals (since they do not recursively depend on anything, think of base cases in recursion
  • = A finite set of variables (also called non-terminals, for obvious reasons). These act like recursive expressions.
  • = A finite set of production rules, written in the form . takes in a variable and returns a string containing terminals (belonging to ) and/or more variables (belonging to ).
    • can (and usually is) self-referential. A production rule is valid.
    • Another common notation used is that when we use a production rule, on a string to get , we can write it as :
  • = The start variable

Note the similarities between production rules and logical substitution, with the only difference being that a logical substitution applies to the entire formula, replacing every instance, while a production rule replaces once instance at a time.

Derivation ^definition-derivation

Let and be two strings containing variables and/or terminals (). Also let be similar strings containing variables and/or terminals. Then derives iff:

z

x \Rightarrow^* z

* We say that $c$ is *derived from* $p$ * The derivation must be *finite* (as in $y_{k}$ is defined for some natural number) * Note the similarities to [resolution proofs](Resolution%20Proof.md)
  • Strings in are known to be in sentential form (because they still need to recursed further, so not quite a sentence)
  • A string in is a sentence (since it contains all terminals).

Language of a CFG ^definition-language

The language, generated by a CFG, (with start variable ) is given by:

Ambiguity

Ambiguity

A string is ambiguous with respect to a grammar, , iff has more than one parse tree for . That is, the parse tree for is not unique.

A grammar, is ambiguous if it contains at least one ambiguous string.

Backus–Naur Form (Shorthand Notation)

A way to simplify writing the production rules, , is by using the Backus-Naur Form, where:

Becomes:

A further simplification is to only define the production rules for a CFG, because anything on the left of the arrow must be a variable, and anything on the right (that isn’t already a variable), must belong to the alphabet.

For example:

results in:

Generating A Language

Generating a language from a CFG is very straightforward. Let be a (initially empty) string and

  1. , set the string to be the start variable of
  2. While contains at least one variable:
    1. Pick any one production rule from and apply it to one instance/occurrence of the variable. If contains multiple variable, pick one. For example, if a variable is encountered and contains , replace one occurrence of in with .
  3. When has no more variables (contains only terminals), stop.

Applying every permutation production rules across every variable occurrence will generate a langage from .

Parse Tree

Context-free grammars can easily be depicted by an expression tree, due to its recursive nature.

The expression tree is a rooted tree where

  • An expression tree for a CFG must have the start variable, as the root.
  • Each node is either a terminal or variable.
  • Each non-leaf node and its children is associated with a production rule. In particular, if the production rule is A→wA→w then AA is the label of the non-leaf node and the symbols in ww are the labels of the children, in left to right order.
  • Each leaf is a terminal. The concatenation of the leaf labels, in left to right order, gives the string

For example, take the CFG defining arithmetic expressions. The expression tree (for a given expression) would be of the form: %%🖋 Edit in Excalidraw, and the dark exported image%%

Examples

E1: Arithmetic BODMAS as a CFG ^e1

Any arithmetic expression can easily be expressed as a CFG:

    • needs to also include our operators and brackets, since being a language, we have no notion of ‘evaluating’ an expression (no semantics), but we have a very strict syntax.
  • : For defined arthmatic operators versus expressions

R:\quad\text{op} \to + \ \vert \ - \ \vert \ \times \ \vert \ \div \ \vert \ \quad\quad \text{expr} \to 0 \ \vert \ 1 \ \vert \ \dots \ \vert \ 9 \ \vert \ \text{(expr)}\ \vert \ \text{expr op expr}

* $S = \text{expr}$

E2: CFG for equal number of 0s and 1s in order

The context-free language given below can be expressed as a CFG:

We can express this using only the production rules: