I Don’t Need Backup
This heading describes a novice who trusts that the multiple moving parts to their server infrastructure will never fail. Whilst idealistic, there are various reasons why systems may fail – a power surge, or an erring piece of code that led to a wipeout of your data.
We discuss various backup options and the fundamentals for a good backup policy
3-2-1 Rule of Backups
The photographer Peter Krogh, famously said, “There are two kinds of people in the world – those who have had a hard drive failure, and those who will.”
He goes on to espouse the 3-2-1 backup rule which we paraphrase here
3 – Keep three copies of your data (including the source)
2 – Store them in two different media
1 – Keep one copy offsite
In practice, it is sometimes hard to have data on different media. The intent behind is to reduce the probability of failure. Two disk drives would probably have the same probability of failure, but a disk drive and an optical disk would have different rates. If not possible, go ahead with a single medium. At the minimum, your offsite backup should be in a different data center.
What Should You Backup
This seems like a no-brainer, but depending on size and time factors, it is important to choose the right balance between backing up everything and backing up just the basics.
Consider an ecommerce store where orders are processed 24×7, if you backup the entire database every 10 mins, you are probably going to end up with punctuated periods of downtime and a disk that is rapidly filling up. On the other hand, if you backup just once a day, a systems crash leads to upto 24 hours of missed orders and hundreds of unhappy customers.
When determining what you should backup, take into account how the data is being used and how long it would take to backup/restore. For e.g., you may choose to exclude backing up your application code on the server primarily because it is always deployed from a Git server and you could redeploy code at a touch of a button. Though this does not mean, you don’t backup your git code, which is a chapter in itself.
Backup Types
Backups come in different forms and shapes, we will cover three key backups
Full Backup: This is the complete backup of everything – you can include your code, databases, any assets and even some logs if you choose. This takes the most space. However, the advantage is that restoring from a full backup is easier. Full backup on a VPS could be achieved as simply as a shell script performing a series of cp and putting this script in cron. However, you could also use other software such as rsync.
Incremental Backup: In this form of backup, you start with a full backup and at subsequent points, your backup is an increment (or if applicable, a decrement) of the changes since the last backup. Utilities such as bacula, backintime, DAR can be used here. Restoring from an Incremental backup requires you to restore the full backup and every incremental backup since that.
Differential Backup: This is similar to Incremental, though the key difference is that the delta being backed up is compared to the full backup and not just the last backup. Most utilities that support incremental backup also allow for differential backups. To restore, start from the full backup and add the latest differential backup to it.
Sizing your Backups
When it comes to backups, it is important to make sure that your zeal for backing up doesn’t lead you to disk space shortages or clogs your network line.
If you are backing up a 50GB volume of application, content and data, you will need another 50GB for the full backup. Assuming a 5% daily increment in data, you will need the below space by the end of the week.
Original Data | 50 GB |
Full Backup (Day 0) | 50 GB |
Incremental Backup Day 1 | +2.5 GB |
Incremental Backup Day 2 | +2.5 GB |
Incremental Backup Day 3 | +2.5 GB |
Incremental Backup Day 4 | +2.5 GB |
Incremental Backup Day 5 | +2.5 GB |
Incremental Backup Day 6 | +2.5 GB |
Total Data | 115 GB |
With differential backup, you will need even more space
Original Data | 50 GB |
Full Backup (Day 0) | 50 GB |
Differential Backup Day 1 | +2.5 GB |
Differential Backup Day 2 | +5.0 GB |
Differential Backup Day 3 | +7.5 GB |
Differential Backup Day 4 | +10.0 GB |
Differential Backup Day 5 | +12.5 GB |
Differential Backup Day 6 | +15.0 GB |
Total Data | 152.5 GB |
If you are retaining backups for longer, the space requirements keep increasing. You should determine the backup frequency based on the number of copies you will retain, the type of backup and available space.
For full storage computations, we refer you to the excellent online tool at javascript:void(0);
The other aspect that goes hand-in-hand with storage is the network connection between the main data and the offsite backup. Even if you assume that the offsite server is used only for storage, you will need to account for the network bandwidth being used to transmit and backup your current data. Network connectivity between data centers would be typically better than between the data center and your end user. A remote backup could easily take most of your network throughput. Try to schedule remote backups at a point where regular traffic to your main server is not at its peak. If it is not possible, add a throttle rule to your backup process. For e.g., rsync lets you pass a –bwlimitparameter to throttle transmission speeds.
Backup Policy
A good backup policy is meant to provide the right guidelines on what needs backing up, when it needs to be backed up and procedures on how to restore when catastrophe strikes. Make sure your backup policy covers the following sections
The method: This covers how you will backup (software tools, media types etc.)
Scope: What needs to be backed up? Do you also want to backup logs (often required for PCI compliance)
Frequency: When do you backup data? Is it daily, hourly or in an extremely relaxed stance – weekly?
The Guardians: Who can access your backups? Backups aren’t of much use if they can be easily modified by everyone. Limit access to administrators to avoid accidental erasures
Data Protection: How is your backup protected? Will you use data encryption? If so, at what level?
Restoration: This section outlines how you can recover. This will be guided by the method you use for backing up (Full vs Incremental etc.)
Over a period of time, backing up data might become second nature. A backup policy lets you sleep well at night knowing that if something fails, you can come out on top without panicking.