Databases need to be regularly backed up to prevent loss of data. Backup strategies are methods of backing up that trade storage space for reliability.
Why Backup?
To protect data from:
- Human Error: Accidentally applying
DROP
orDELETE
on a table - Hardware or Software malfunction
- Application bugs
- Crashes
- Component failure
- Malicious attacks
- SQL Injections
- Natural disasters
- Government/Legal changes
Failure
Failures can be divided into the following categories:
- Statement failure: Syntactically incorrect, e.g. mistyped SQL statement
- User Process failure: The process doing the work fails (errors, dies)
- Network failure: Network failure between the user and the database
- User error: User accidentally drops the rows, table, database
- Memory failure: Memory fails, becomes corrupt
- Media Failure: Disk failure, corruption, deletion
Backup
Simply put, a backup is simply a copy of data that can be restored. In most cases, this can be done by simply copying (and overwriting) a backup over the existing data.
Recovery is the process of applying a backup or using log files to try to restore lost or old data.
There are many classifications of backups:
- Physical vs Logical
- Online vs Offline
- Full vs Incremental
- Onsite vs Offsite
Backup Types
Physical
- Raw copy of files and pages in database
- Suitable for large databases, when fast recovery is needed
- Ideally, should also be an offline (cold) backup.
- Includes transaction log
- Only portable to machines of the same configuration (OS, application, etc.)
- Restore process:
- Shut down DBMS
- Copy (& overwrite backup)
- Restart DBMS
Logical
- Done via SQL queries
- Slower than a physical backup
- Hardware independent, since using SQL not OS copy
- Output takes up more storage than physical
- Does not include config or logs
- Server is available during backup i.e. can be an online backup
- In MySQL, done via either:
- MySQLDump
SELECT [query] INTO OUTFILE
- Restore process:
- mysqlimport
- or
LOAD DATA INFILE
Online
- Also called hot backups
- Occurs when database is live (can still be used)
- Need to ensure proper locking is in place to ensure integrity of data
Offline
- Also called cold backups
- Results in downtimes (when no users can use data)
- Easier to perform, but should not be done in applications that should have very low downtimes (hospitals, for example)
- Sometimes, backups are taken from a replication server, which is a copy of the original database (and synced)
Full vs Incremental
- A full backup contains all the data in the database
- Recovery is simple, just copy the backup
- Incremental backups only store any changes made since last backup
- In most cases, these are just the transaction logs
- Recovery requires applying all incremental backups from failure.
- Recovery process:
- Freeze database
- Copy backup log files
- Ask database to re-apply log files
A backup policy is the schedule of full and incremental backups to be applied. A good backup policy conducts backups when the database load is low (not many users using it).
An example of a backup policy is doing full backups on the weekends, and incremental backups on the weekdays.
Onsite vs Offsite
I