Learning Tarantool + Lua

  • Tutorial
I want to share the experience of learning Tarantool. I will not write about all the advantages and features of Tarantula itself, there were many articles on this topic (for example, this , this and this ). This post talks about how to start working with a tarantula and about some of the features and goodies that you can get from the box.


Actually, the first thing I came across was the installation. Since I needed to install it for tests on MacOS, then most likely, few will come to this, but nonetheless. The package that was offered on the site was not installed either because of any dependencies, or because the system has already experienced more than one experiment. Therefore, I decided to compile from source.

The installation process is well described in README. Do not forget to pump out submodules if the sources were pumped out from git. Even when building under MacOS, you don’t need to be afraid that not all tests pass - the documentation says that this is normal.
If you want to get the console client, cmake must be run with the key DENABLE_CLIENT = true. Actually, after make we get the server and client, if asked, src / box / tarantool_box and client / tarantool / tarantool respectively.

Configure the server

As an example, you can take the configuration of one of the tests, for example test / box / tarantool.cfg
One of the important parameters is slab_alloc_arena - this is the amount of memory used by the Tarantula. I advise you to study this parameter in more detail. It’s also worth paying attention to rows_per_wal, so you won’t be surprised why there are so many small files)))
Now let's get to the fun part. The tarantula needs to know only about indexes, and it does not matter to him what will be dull and what size it will be. Actually, in the config we describe only indexes. In more detail types of indexes can be studied in the documentation. From the main: when choosing an index, you need to understand exactly what it is for. The HASH index cannot be non-unique. TREE indexes are good to use for organizing a sorted list by non-unique values. Indexes can also be composite. Indexes are described for each space. There may be many spaces.
Total: imagine that we need a space with 5 fields. The first field is non-unique, while the first + second field are unique; on them we will make point selections. The fourth field contains a certain parameter for sorting. Total we build a unique index:
space[0].index[0].type = «HASH» # тип индекса
space[0].index[0].unique = 1 # признак уникальности
space[0].index[0].key_field[0].fieldno = 0 #номер записи в тупле
space[0].index[0].key_field[0].type = «NUM» # тип данных
space[0].index[0].key_field[1].fieldno = 1
space[0].index[0].key_field[1].type = «NUM»

We build an index for sampling a pack of non-unique records by the first field:
space[0].index[1].type = «TREE»
space[0].index[1].unique = 0
space[0].index[1].key_field[0].fieldno = 0
space[0].index[1].key_field[0].type = «NUM»

Build an index to select sorted records by the first + third field
space[0].index[2].type = «TREE»
space[0].index[2].unique = 0
space[0].index[2].key_field[0].fieldno = 0
space[0].index[2].key_field[0].type = «NUM»
space[0].index[2].key_field[1].fieldno = 3
space[0].index[2].key_field[1].type = «NUM»

Well, do not forget that the space needs to be made active:
space[0].enabled = 1

Now we will describe another space where the stupas from two fields will be stored: the first is unique, the second is not. Typical key value storage in its simplest representation:
space[1].enabled = 1
space[1].index[0].type = «HASH»
space[1].index[0].unique = 1
space[1].index[0].key_field[0].fieldno = 0
space[1].index[0].key_field[0].type = «NUM»

Actually, with these settings, the Tarantula is ready to work. You need to initialize the storage - and go.
> ./src/box/tarantool_box --init-storage
tarantool/src/box/tarantool_box: space 0 successfully configured
tarantool/src/box/tarantool_box: space 1 successfully configured
tarantool/src/box/tarantool_box: creating './00000000000000000001.snap.inprogress'
tarantool/src/box/tarantool_box: saving snapshot './00000000000000000001.snap'
tarantool/src/box/tarantool_box: done

As you can see, he created all the files in the folder from where we launched Tarantula. If you want to change it, then in the settings there is a parameter wirk_dir, which can be determined as you wish.

After that we start the server:
> ./src/box/tarantool_box --background
> ps xa | grep tarantool
 5627   ??  Us     0:10.55 tarantool/src/box/tarantool_box --background

Hooray! You can start filling in the data and extracting them in the right sequence and according to the necessary criteria.

Getting started

How to use the console client and paint each team in detail now I will not - this could be the topic of a separate article. And now I will dwell on the procedures on Lua. One of the interesting features, in my opinion, is the built-in procedures. With their help, you can create some kind of “black box” with business logic, which can be easily and independently of the rest of the code changed, thereby separating the technical part from the business model. I think that Lua is a very good hint that business logic will be stored there.
So, the tarantula at startup tries to load the init.lua file, into which we will add our functions.

When writing functions, special attention should be paid to data types that, as it were, are not present in Lua and, for example, are not present in Perl, but due to the peculiarities of protocol implementation, numerals from Perl in Lua do not come at all like they come from the console client. So, until the data types in the adapters to the Tarantula have been supported, you can always transfer strings and where you need to convert them to numbers.

We will write a procedure that will change one field depending on the value of another field and the current date. A very standard task, which, as a rule, is “washed out” in the code, and during the next refactoring, an error occurs with the calculation of dates in this logic.
function increase_score(id, id2)
        local id = tonumber(id) — переводим строки в числа
        local fid = tonumber(id2)
        if(id == nil or fid == nil) then — проверяем входящие параметры
                return false
        -- получаем дату в структуре date_time
        local dt = os.date("*t") 
        -- получаем таймстемп на начало текущего дня
        local cd = os.time({year = dt.year; month=dt.month; day=dt.day}) 
        -- получаем запись из 0 спейса, поиск производим по 0 индексу
        local tup = box.select('0','0', id, fid) 
        if( tup == nil ) then 
                -- если такой записи еще не было, то создадим ее
                box.insert('0', id, fid, cd, 100,cd) 
                local lu =  box.unpack('i', tup[2])
                local sc =  box.unpack('i', tup[3])
                local la =  box.unpack('i', tup[4])
                -- получим разницу дней прошедших с последнего обновления
                local diffs = (math.floor(sc/((la-lu)/24/60/60+30))+1)*((cd-lu)/24/60/60) 
                if( diffs < 0 or diffs > sc ) then 
                -- подстрахуемся от того, что дата последнего обновления:
                -- - не более 30 дней назад 
                -- - при делении с округлением и умножении мы не превысили исходное значение 
                        diffs = sc
                -- добавим константу по условиям задачи 
                diffs = 100 - diffs 
                box.update('0',{id;fid},'+p=p=p',3, diffs,2,cd,4,cd)
                -- обновим запись в 0 спейсе по первичному ключу

In total, it can be considered that this is an atomic action from the point of view of an external system.

Now a few words about performance. The call frequency of this procedure has reached 20,000 rps. At the same time, the pebble load was 67%.

Next, I’ll tell you about another goodness. The update with the calculation described above is good, but in addition to the update, as a rule, the task says that these data must be obtained and, moreover, they must be sent in sorted order. In order not to make sorting to an external system and not to do sorting ourselves, we use indexes.
function get_top(uid)
        local id = tonumber(uid) — переводим в чиселку
        if(id == nil) then  -- проверяем, что входящий параметр в порядке
                return false    
        local ret = {}
        -- создаём итератор по индексу который будет идти в порядке убывания начиная с id. 
        -- Особенность этого итератора в том, что можно указывать только одно из значений полей
        -- используемых в индексе, не указанные могут иметь любые значения. 
        -- Но надо учитывать, что итератор может выйти за пределы переданных параметров, 
        -- а могут быть и меньше (или больше, зависит от типа итератора)
        for v in box.space[0].index[2]:iterator(box.index.LE, id) do 
                if( v == nil or #ret == 10 or box.unpack('i',v[0]) ~= id) then break end 
                -- проверяем, что:
                -- - индекс не закончился, 
                -- - мы не набрали 10 нужных нам записей
                -- - мы еще не вышли за пределы указанного параметра
                table.insert(ret, v)
        return unpack(ret)

So we created a procedure that, using the delights of the index, returns us a list of dummies sorted in descending order. Regarding the sorting order, you can search the box.index.LE documentation and see what is the difference and how it works.

Well and most importantly: from the client, these procedures are called like this:
> lua increase_score(1,2)
> lua get_top(1)

In the next article I will write how one of the common tasks can be solved with all this good and show the features of using the driver to communicate with the Tarantula from Perl.