A database differs from standard filesystem in it’s implementation. It stores files, which are not the same as standard files found in filesystem, but are collections of pages that store records. Depending on the file structure, a database be optimized for different functions.
File
In a database, a file is collection of pages which contain records. More specifically, a file is a doubly-linked list where a page is a node connected to other pages. Pages can be considered as arrays of records.
%%🖋 Edit in Excalidraw, and the light exported image%%
A DBMS can implement the CRUD (Create Read Update Delete) principle only if it can:
- Read a particular record (using it’s record id)
- Insert, delete & modify a record
- Scan all records. It should also be able to use conditions to scan particular records (such as when constructing SQL queries)
The common file structures are:
Analysing Runtimes
In order to properly analyse what makes a file structure optimized, we first need to identify a basic operation, similar to analysing time complexity of algorithms.
The basic operation for database systems is bringing a page from HDD to RAM (which is require in order to process it). This is also known as the number of page accesses also known as disk I/O operations.
Heap File
- Simplest of the three file structures
- Pages are allocated (and de-allocated) as records grow and shrink. That is, it is a dynamic data structure
- Not sorted
%%🖋 Edit in Excalidraw, and the light exported image%%
Operation | Speed | Why? |
---|---|---|
Insertion | Fast | Because it is basically an unsorted linked list, has very fast insertion and deletion operations as we can simply add new pages to the end of the file |
Deletion | Fast | Same process as insertion |
Querying an attribute / Return pages satisfying some condition | Slow | Need to linear search through all pages. |
Sorted File
- Like heap file, but records (and hence pages) are sorted based on some attribute.
- Very fast for range queries, but hard to maintain, since records (and sometimes pages) need to be reordered every time a new record is inserted
%%🖋 Edit in Excalidraw, and the light exported image%%
Operation | Speed | Why? |
---|---|---|
Insertion | Slow | Need to resort the file every time a new record is inserted |
Deletion | Fast | Deleting a record still keeps the file sorted. |
Querying an attribute / Return pages satisfying some condition | Fast | Can binary search through records (and pages) |
Indexed File
Another file structure uses indexes to allow faster file access.