MySQL file storage and quick distribution

I think many had the need to store files associated with the record in the table. It can be a picture for the news, an avatar, a file uploaded by the user - yes, anything. Usually, in this case, they just do it - the file lays in the file system, and the link to it is in the database record.
But such a classic campaign has many disadvantages:
  • files are not deleted when the corresponding database record is deleted
  • problems while trying to update the file
  • violation of synchronization between the database and file system during transaction rollback
  • when backing up and restoring information in the database, a desynchronization with the file system may occur
  • files are not subject to access restrictions imposed by the database

You can read more about the problems that arise when storing files separately from the database in the SQL Antipatterns presentation , section Phantom Files, page 60. By the way, the author of the presentation offers a solution - store the files directly in the database, in a BLOB field. True, the observation follows that this should be a balanced decision in each case. Indeed, with this method of storing files, the web server must call a script at each request that will extract the file from the database and give it to the user, which will inevitably negatively affect performance.
To find a solution to this problem, a brainstorming session was conducted and several options for solving the problem were invented:
  1. Before deleting an entry, do SELECT with the same condition and get the names of the files that need to be deleted. The problem is that if there are a lot of deleted files, this operation may take some time, and for a good time it is necessary to block the table for reading and writing, and in many cases this is unacceptable.
  2. Before deleting, set the record to be deleted for deleted records, obtain all records with this mark and delete the files associated with these records, and finally delete all records with this mark. The queries working with this table should be modified so that they do not select records with the flag set. Disadvantages - the need to edit many queries, in addition, in our project, records for deletion are selected by rather complicated SELECT, which cannot be converted into one UPDATE.
  3. The first two methods try to solve the problem of "lost" files when deleting records in the database, which occurs with the "classic" method of storing files, but they do not solve the remaining problems of this approach, so we tried to come up with solutions that use positive moments of storing files directly in the database and get rid of the disadvantages inherent in this approach.
  4. Use triggers . Unfortunately, MySQL does not have support for working with files in its language; such commands would have to be implemented independently, tinkering with the MySQL source code. Of the minuses - the files should be stored on the same host as the database, we did not find such ready-made solutions for MySQL, we did not need such ready-made solutions.
  5. Store files in the database, but give them directly to the web server, without PHP. You can implement this by writing a module to a web server (nginx for example) that would allow you to send files directly from MySQL or using the MySQLfs file system driver . This approach solves all the problems listed above, but its drawback is the additional overhead of storing files in MySQL.
  6. Specialized Storage Engine for MySQL, storing records as files.

Let us dwell in more detail on the last point. After all, what constitutes a file system is a specialized database, which, using the “file name” key, allows you to get a record — its contents. That is, you can implement your own data storage engine for MySQL, in which each record will have three fields:
CREATE TABLE `data_storage`.`files` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`path` VARCHAR( 255 ) ,
`data` BLOB
) ENGINE = FILES

Data can be inserted into such a table only in the `data` field, while they are simply saved in a file, a unique name is automatically generated for it (using the` id` field as a prefix) - for example, 764533, and in the `path` field it is automatically substituted the correct path MySQL has put our data in - for example, '/mnt/storage/mysqldata/76/45/33/764533_myfile.jpg'. Thus, data stored in such a table can be accessed as simple files, while MySQL will maintain data integrity. Thus, this way of storing files is devoid of almost all the drawbacks of the classical approach (except for access restrictions, but it can also be done using a simple script and the X-Accel-Redirect header nginx) and at the same time does not decrease performance when uploading files to clients.
The problem is small - we could not find a ready-made implementation of such a data storage engine for MySQL, although the idea is generally simple. Perhaps one of the habro-people will give a link to a ready-made implementation of such a storage engine, because the idea is floating on the surface, and it is certain that someone could already implement it.

This article was written in collaboration with viperet

PS moved to MySQL blog