%%🖋 Edit in Excalidraw, and the dark exported image%%

They are evaluated i n a ‘bottom-up’ approach, with the subtrees of a node being evaluated before the node itself. This is similar to post-order traversal, except nodes are evaluated instead of just being visited.

Generation

Query Blocks

The query optimiser simplifies a complex query into it’s basic query blocks:

  • A query block is any statement that starts with a SELECT operation.
  • The optimiser attempts to optimise innermost blocks first, that is subqueries first.
---Outer block
SELECT name FROM A
	WHERE age IN
	--Inner block
	SELECT MAX(age) FROM B
Conversion to Relational Algebra expressions

Each query block is then converted to it’s corresponding relational algebra expression:

  • SELECT attribute FROM S is converted to
  • WHERE conditiion is converted to
    • Any where condition which compares primary keys gets converted to a natural join

Relational algebra equivalences are used to control the order of operations:

Equivalences

Automatic Conversations
  • Cross Product + Selection over equal ids = Natural Join:
  • Selection is distributed over joins (to reduce join cost) when possible:
  • Projection is distributed over joins when possible. Ensure that the subset of attributes to be projected is in the ‘outer’ loop:

Cost Estimation

For every generated query plan, the cost estimator must estimate:

  • The size of the result for each operation in the Query Plan Tree (in pages)
  • The cost of each operation in the tree. See query optimisation for various methods

It uses catalogues to do this:

For each block, the maximum size of the result is given by the product of the number of tuples in the FROM clause. The maximum is when a cartesian product (cross product) is performed.

Like in query optimisation, we can use the Reducing Factor: Reducing Factor

When selecting from a single table , with predicates, we have the estimated size:

When combining it with joins, of tables, we have:

Note that there are no reduction factors if there is no selections being performed.

Reducing Factor
PredicateWhy?
is the number of distinct values
If we match against a certain key, then only elements which are equal to that key are returned.
The domain of values is . Since we only match the
fraction of the entire domain from to , we do the final fraction
Opposite of the above
(Joins)
ElseA magic number, Chosen only when no other option is available, is not very accurate
Single-Relation Queries Cost (No Joins)

When querying over a single relation/table, the query optimiser tries multiple access paths to obtain the final result:

  • Linearly scan through the unsorted data, checking all predicates at each instance.
  • Indexed searching over a primary key
  • Clustered Index matching one or more predicates.
  • Unclustered index matching one or more predicates.
Multi-relation Queries Cost (Joins)

For join operations:

  1. Select the order of joining (since join is commutative). There are orderings (where is the number of relations)
  2. For each join, select a joining algorithm. Commonly used are Hash Sort, Sort-Merge Join, and Block-Based Nested Loop Join
  3. For each relation, treat as a single-relation query and use access methods as above

In total, the number of possible query plans are:

Since there are possible join orderings, the possible configurations for the query plan tree increase very quickly. As such, a property is enforced on the tree, namely that it becomes a left-deep join tree. This also removes the need for any auxiliary arrays %%🖋 Edit in Excalidraw, and the dark exported image%%

#todo Examples: