Scale the database of a real project with SQL Azure Federations. Part 3: Migration

  • Tutorial
I greet all the inhabitants of Habrahabr! Have a good new working week! So, we continue the process of database migration to use SQL Azure Federations. As you remember, the last time we decided on which table and on which field we will split the database into shards. Let’s finally do it!

Migration


So, we will split the database according to the table of accounts (Account), since the data stored in it and associated with it logically do not intersect with each other. Since we have a database creation script, let's try to adapt it to use SQL Azure Federations.

We assume that the database is already created in the Windows Azure Management Portal or through SQL Server Management Studio.

Open the script for creating objects in the database.
USE xPenses
GO

IF EXISTS (SELECT name FROM sysobjects where name = N'Operation') DROP TABLE Operation
...

The first thing to remove is the use of the USE operation, because this is one of the basic limitations of SQL Azure. One database - one connection. Instead, add requests to create a federation:
-- A database must be selected before executing this statement
CREATE FEDERATION Accounts(AccountId BIGINT RANGE)
GO

USE FEDERATION Accounts(AccountId = 1) WITH RESET, FILTERING = OFF
GO

Please note that for example, connecting using SSMS to SQL Azure Server, you need to select a database from the list to complete the query.



Thus, we will create a new federation, the data of which is distributed by the value of the account identifier (Account ID). Please note that at the moment no tables have been created in the database, that is, the AccountId field is not associated with any data set in real tables. The name of the field may also differ from the name of the field of the table by which the distribution will be performed.

Here we can see another, logical, limitation of SQL Azure Federations . The field over which the distribution will be carried out should be of the type INT, BIGINT, UNIQUEIDENTIFIER and VARBINARY.

After creating a federation, we need to select the first shard into which we begin to enter data. That is, a shard that stores the data of the first account (AccountId = 1).

See our script below. We need to modify the creation of the table of accounts so that SQL Azure knows that the data of this table by the Id field will be distributed by shards.
CREATE TABLE Account (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[EntityId] INTEGER NOT NULL FOREIGN KEY REFERENCES Entity(Id),
[Currency] NVARCHAR(3)
)

Thus, the table creation script turns into the following:
CREATE TABLE Account (
[Id] BIGINT NOT NULL,
[EntityId] INTEGER NOT NULL FOREIGN KEY REFERENCES Entity(Id),
[Currency] NVARCHAR(3),
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
) FEDERATED ON (AccountId= Id)

So what has changed? The ID field type has become BIGINT. In addition, we lost the ability to automatically generate values ​​for this field when inserting a new record. This is another limitation of SQL Azure Federations. However, you can still use the DEFAULT keyword. For example, this will be useful if the type of the ID field is UNIQUEIDENTIFIER. In this case, we can declare the field this way:
Then when inserting new records into the table, we do not need to specify the ID of the created record. When working with other types, this logic should be implemented at the application level.

The next thing you should pay attention to is the declaration of the main key of the table. We need to explicitly indicate that the key being created will be clustered.

The last thing to do is to indicate with the FEDERATED ON keyword that this table will be federated. The data in it will be divided by the ID field.

So, with the creation of the table of accounts we figured out. We are going further. As you can see from the database scheme, the accounts table is the parent in relation to the tables “credit card” and “bank account”.





That is, the BankAccount and CreditCard tables have a foreign key to the Account table. Since the account table is now federated, it is impossible to ensure the integrity of links from one table to another.

This is another limitation of SQL Azure Federations . Federated table data tables cannot be referenced in other tables. That is, it will be necessary to remove all foreign key from the tables that refer to the table of accounts (Account).

Thus, the table creation script, for example, CreditCard, instead of the form:
Will take the form:
CREATE TABLE CreditCard (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] BIGINT NOT NULL,
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)

That is, the integrity of the links in the records from the credit card table to the accounts table rests on the shoulders of the application logic.

If you try to execute the database creation script at this stage, then all the tables will be successfully created, however, in addition to the tables, the database also contains two procedures. The first of them - adding a new category does not require changes, because its logic does not go beyond the scope of one shard.

But the procedure for adding a new account (AddAccount) requires minor changes. So consider the source code for this procedure:
CREATE PROCEDURE AddAccount(
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)

INSERT INTO Account VALUES (@EntityId, @Currency)

IF (@Instrument = 'BankAccount')
BEGIN
INSERT INTO BankAccount VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END

IF (@Instrument = 'CreditCard')
BEGIN
INSERT INTO CreditCard VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END
GO

In fact, the changes that need to be made are fairly obvious. Since we have lost the ability to automatically generate the ID field value for an account, this logic rests on the shoulders of the application logic. That is, we need to make changes to the procedure heading:
Accordingly, inserting entries into the accounts table, instead of the code:
Now it will take another parameter (the ID of the account being created) as an input:
Probably the next logical question will be: is it possible to add the use command to the body of the procedure US Federation (USE FEDERATION)? Since, if we received the ID of the account we are creating, we know which federation we need to work with, which means we can immediately proceed to use the necessary shard:
USE FEDERATION Accounts(AccountId = @AcccountId) WITH RESET, FILTERING = OFF
GO

Unfortunately, if you do so, then SSMS will give you an error. The thing is that the AddAccount procedure is stored in a specific shard, which means that we do not have the ability to use USE FEDERATION. In addition, the use of USE FEDERATION is generally not possible in the procedures. The code for switching federations should be located "one level up".

The changes that need to be done in the database creation script are over. We can execute it without errors. As a result, one root database (federation root) and one shard (federation member) will be created.



Sharding


All that remains for us to do is to actually produce scaling databases. That is, to separate the data related to one account from the data of another account.

To do this, create a separate script:
The first thing this script does is switch to federation root, that is, it works within the xPenses database, since federation information (metadata) is stored there.

Then we indicate that we want to break the federation with the name Accounts, starting with the value of the AccountId field equal to 2. That is, the data of the first account remains in the already created shard, and the data of other accounts is transferred to the next shard. Please also note that we do not indicate anywhere that the data is broken down into the table of accounts. We only work with xPenses database metadata!



So, we execute this command and ... It is quite likely that when you refresh the Object Explorer window you will see the following:



Instead of one new shard, we got ... As many as 3! In fact, there is nothing unusual in this. The thing is that the data stored within the first shard is copied in accordance with the value of AccountId. That is, for an account with an ID of 1, you need to copy the data to one shard, with an ID of 2 to another. Naturally, this takes time. After SQL Azure redistributes the data into shards, we will see that we really have a database with federation metadata and two shards.



Now, if we need to let us split the data into 3 shards, for example, account data with an ID of 3 should be transferred to a separate shard, it will be enough for us to execute the following command:
ALTER FEDERATION Accounts SPLIT AT (AccountId = 3)
GO

Conclusion


We examined the migration process of a database creation script when using SQL Azure Federations. As you can see, most sharp corners are quite simple. However, a fairly large part of the database logic must be carried “higher”. We stopped at the database level. In real-world projects, it is highly recommended that you carefully analyze your domain and database architecture before you migrate to SQL Azure Federations.