How we at ISPsystem did backups. Part one

The story of how ISPsystem developed a backup solution. Says development manager Alexander Bryukhanov.

Как мы в ISPsystem резервные копии делали

All users are divided into three groups:
those who do not backup,
those who already do them,
and those who check done.


Someone in my story will simply amuse, but someone in him will recognize himself. This is a story about how it was at ISPsystem 15 years ago, how and why it changed, what it came to. In the first part I’ll tell you how we started to develop a solution for virtual server backups.

So, in the yard the beginning of the 2000s. OpenVZ and KVM have not yet been created. It turns out FreeBSD Jail and on its basis we are the first to develop a solution for the provision of virtual server services.

If you have data, you also have a problem: how can you not lose this data?

At first they dispensed with the usual archiving of virtual server files, since UnionFS allows this.

True, there is a slippery moment: when you delete a file from a template, a so-called WHITEOUT is created, which tar does not see. Therefore, when recovering from such a backup, deleted files, if no others were created in their place, rise from the ashes of the template.

The service, as they say, flooded, and we made incremental archives. On FreeBSD, tar knew how to call it out of the box.

The service is still pearl. The bill for the servers at our clients didn’t go to pieces, but to the racks (for the guys who started with the Internet at 56K and rooms at 20 squares - it was very good). And over time, problems began to arise.

Problem One: CPU


Somewhere at this moment, we began to look at ready-made solutions. Then, apart from bacula, a very young product at that time, I did not find anything suitable. We tried to deploy it in one of the data centers, but he did not meet our expectations. It turned out to be quite difficult to configure, getting files from it was not as convenient as from a regular .tgz archive, and the performance was not impressive.

Lowering the priority of the backup process didn’t lead to anything good either: the backups either didn’t have time to be done within a day or even ceased to be created.

The solution lay on the surface - you need to make the backup to a separate machine! Fortunately, this is done on the knee through a shell script. They did, and instead of the usual file server, we had a full-fledged backup server. The problem with the CPU has been resolved! But then another appeared.

Problem Two: Disk


Especially with weekly full backups. The answer was found quickly. We had the previous copy, in which there are most files! Therefore, the next step was to obtain the contents of the files for the backup not from the server, but from the previous copy. So the first implementation of ispbackup appeared. And it raised the speed at times!

Along the way, this allowed to solve the WHITEOUT problem: readdir () “does not see” the deleted files, but fts_read () sees them !

The stream used to compress gz generally does not involve reading from the middle, and repacking data is a rather resource-intensive task.

To do this, backup copies were cut into separate parts (the part contained a certain set of files in its entirety and had a restriction on the offset of the beginning of the file relative to the beginning of the archive). In order not to repack the files during their repeated use, several parts from the previous one could be used completely in the new archive. Reused parts could contain outdated data, a backup “compression” function was implemented to get rid of them.

We also got a funny unexpected bonus. “Hot” files began to be gradually collected in some parts, and “cold” in others, which somewhat optimized the process. It's nice when something good happens on its own :)

Problem Three: What if something went wrong?


If at some point something went wrong, a broken archive could be created, which could go unnoticed for months. In fact, until the moment you need it ... Advice based on your own bitter experience: If your data is important to you, check your backups.

Epilogue


In general, the instrument was surrounded by crutches. But it worked !!! For several years we lived happily, and then hard drives fell sharply and the size of virtual machines in a short time increased significantly (if not by orders of magnitude).

For a while we resisted. We introduced a setting for the virtual server: whether to backup it daily, weekly or not at all, but it was already an agony. Backup gave way to reliable RAID or network storage, followed by their sensitive monitoring. KVM and OpenVZ appeared.

Instead of backing up all the files, we began to write a backup of user data for ISPmanager, but this is a completely different story.

Who cares, the source code for ispbackup is available on github .