Scale the database of a real project with SQL Azure Federations. Part 2: Input

  • Tutorial
The last time we looked at the theoretical part of SQL Azure Federations. What to think about and what to consider when migrating to SQL Azure Federations. I note that the essence is not even in the technology itself. If the task is to scale the database, it does not matter using Federations, MySQL Cluster or another way, the first thing you should think about is the database architecture. The database that needs to be scaled in the first place should be architecturally oriented to this.

So, back to our project. The subject area of ​​the database is personal finance accounting. The database diagram is shown in the figure.



As we see the database is quite simple. Each system object is an entity with basic properties (Id, Name, Description). Specific entities are the Account (inherited from it: Bank account, Credit card), the Spending category (inherited from it: Budget, as well as child categories) and Account transactions.

In addition to tables, the database contains some logic for adding entities to the database (framed in the form of stored procedures), as well as a couple of View, to display the results of typical queries to the database.

The source code for the SQL script to create the database can be found here .



It is clear that in a real project the number of artifacts in the database can be an order of magnitude greater, however, the migration of even such a small database can show the main rake that you may encounter when using SQL Azure Federations.

Analysis

Before rushing to migrate a database to use SQL Azure Federations, you need to determine which data in the database is logically independent. What type of data can be distributed across different databases. In fact, you must select a table whose data will be divided into several databases, the so-called federated table.



If we look at the structure of the database, then the first candidate for mind comes the basic table of entities (Entity). The script for creating this table is as follows:
CREATE TABLE Entity (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[Name] NVARCHAR(MAX) NOT NULL,
[Description] NVARCHAR(MAX) NOT NULL,
)

At first glance, this table is ideal for splitting. However, it is not. Yes, it does not contain foreign keys, it has a fairly simple structure, and the maximum number of records stored in it. However, according to the logic of the database, the data of the "successor tables" is associated with this table. That is, all entities created in the system have an entry in the entity table.

Consider this example. Suppose we break the data into a range of identifiers (ID field) of entities. Suppose records with IDs 1-50 are stored in the first shard, 51-100 in the second, and so on.



The user is trying to add a new record to the Operations table. Let it be data on the purchase of a package of milk. Assume the account entity ID is 1, the expense category ID is 6. We also assume that there are already 50 records in the first database, which means that the new record should get an identifier equal to 51, that is, get into the second shard.

The request to add new data to the table will look like this: The
request will be executed absolutely correctly. Let's try now to get a list of all operations for this account (ID = 1). There is a corresponding view in the database for this. Its code is as follows:
SELECT
Account_Entity.Description AS 'Account',
Operation_Entity.Name AS 'Operation',
Operation_Entity.Description,
Operation.Amount,
Operation.Currency,
Operation.Date
FROM
Operation
INNER JOIN Entity AS Operation_Entity ON Operation.EntityId = Operation_Entity.Id
INNER JOIN Account ON Operation.AccountId = Account.Id
INNER JOIN Entity AS Account_Entity ON Account.EntityId = Account_Entity.Id
INNER JOIN Category ON Operation.CategoryId = Category.Id
INNER JOIN Entity AS Category_Entity ON Category.EntityId = Category_Entity.Id
WHERE
Account.Id = 1

As we recall, the data of one account is stored on different shards. Therefore, for this query to return correct results, it must be performed on each shard separately.
USE FEDERATION Entities(EntityId = 1) WITH RESET, FILTERING = OFF
GO
...
USE FEDERATION Entities(EntityId = 51) WITH RESET, FILTERING = OFF
GO
...

I think there is no need to explain what a huge blow the application performance from this approach will be. If even the simplest query now needs to be executed twice! Obviously, the Entity table is not suitable for us.

If we recall the multi-tenant approach to database design, when each user works with his own database and their data does not intersect, then the question arises. Is it possible to implement something similar within the framework of SQL Azure Federations? Where each shard will contain the data of one user (Account). Indeed, this approach will be quite logical. From the point of view of business logic, it would look like this:
Let's say several family members use one program. Each of them maintains a budget separately. Also suppose the husband conducts his accounting, the wife - his. Thus, the husband’s data (AccountId = 1) does not overlap with the wife’s data (AccountId = 2). In this case, splitting into shards according to the table of accounts looks quite logical.



Adding a new account will correspond to adding a shard. Frequent operations, such as: working with a list of categories, operations, etc. will not lead to a decrease in productivity.
USE FEDERATION Accounts(AccountId = 1) WITH RESET, FILTERING = OFF
GO

After executing such a request, it’s immediately clear which user we are currently working with. So the same operation will be performed only once.

So, we examined two options for splitting an existing database. Now we have decided on which field we will logically divide the data into different databases. Next time, we will directly spread the data into different shards. Do not switch! Have a good start to the new work week. Thanks for your attention!