Couchbase example in conjunction with PHP

Disclaimer


This article does not urge you to quit everything you’re used to and switch to using Couchbase, without regard to all your past experience and the shoals you encountered when developing your own projects. This article is intended to be only a brief description of the technology of using Couchbase Server in conjunction with PHP and more. Perhaps it will be interesting to some as a description of the possibilities, and perhaps also as an evaluative perspective on prospects.

What is it and what does it eat with


Couchbase is another area of ​​NoSQL databases developed by Couchbase, Inc and is a direct descendant of the traditions and problems of its parent, CouchDB. This is a document-oriented database, which implies the storage of each individual record as a document, although this is not a strict rule, and any value (up to BLOB lines) can act as a record, but the charm of this (and other databases too) is document-based data storage method.

A document-based data storage method implies that all data will be stored in the form of so-called documents, i.e. sets of fields, combined into a document on the principles of common sense and the general logic present in the record. An example of such a record can be a user profile, for example, with a list of fields such as: login, password, email and others. In this case, the document storage standard is the document format in the form of a JSON string. This format was deliberately chosen by the creators, as it is quite popular, easily interpreted, and human-readable. But not the point. It is important that you have an idea of ​​what a document is and how it looks inside the database.

Those. уемые компоненты для работы


To work with Couchbase using PHP, we need a few points of software:


After successfully installing all this stuff and launching successfully, we have the opportunity to use a class called Couchbase, the description of which is in the official Git repository of Couchbase . For the convenience of further use, I advise you to add it to your project so that your favorite IDe can successfully earn auto-set.

In addition, for convenient work, you will need to create a separate Bucket (analogue of the database) in couchbase itself, thanks to which you will not have to spoil the general and standard default. This is done by going to the localhost address : 8091 / index.html # sec = buckets and clicking on the "Create new bucket" button.

Start coding


Coding something abstract does not make sense, so let's take a very specific example above, namely, the user profile. Suppose our user has several fields: login, password, email, well, from implicit - this is its identifier of type integer. In the JSON representation, the resulting document will look like this:
{
  "login": "megausername",
  "password": "my secured password!",
  "email": "email@example.com"
}


First, we need to find out how this business can be saved to a database and how to get it out of there. This is done quite simply, and is clearly visible in the following example:
<?php

/**
 * В этой строке создаем подключение к базе данных Couchabse с указанием всех необходимых параметров, с некоторыми 
 * оговорками. Например имя хоста к которому нужно подключаться, гужно передавать в виде массива, потмоу что 
 * Couchbase чаще всего работает в виде кластера, и для успешной работы кластера типа master-master нужно будет 
 * указывать оба хоста, на которых размещен Couchbase Server. Имя пользователя, пароль базы данных и имя Bucket в 
 * описании не нуждается, а вот последний параметр, говорит нам о том, что подключение к базе данных не нужно 
 * устанавливать каждый раз при каждом запросе.
 */
$couchbase = new Couchbase(array('localhost'), 'couchbase_user', 'couchbase_password', 'users_bucket', TRUE);

/**
 * Создаем документ в виде обычного массива данных стандартными средствами PHP
 */
$document = array(
    'login' => 'megausername',
    'password' => 'my secured password!',
    'email' => 'email@example.com'
);

/**
 * Теперь производим сохранение данной записи в базу данных. Однако нам надо генерировать ID пользователя, что
 * средствами Couchbase делается несколько оригинально, т.к. в этой базе данных нет механизма автоматического
 * autoincrement полей. Для этого в базе данных мы заведем счетчик и будем его инкрементить при каждом создании
 * пользователя. Кстати, при создании нового значения, отсчет начинается с нуля. 
 */

$userId = $couchbase->increment('counter::users', 1, TRUE);

/**
 * Ну а теперь собственно можно сохранять пользовательские данные в базу данных. Для этого существуют два варианта
 * действий. Первый - это использование метода add, который попытается создать новую запись в бвзе данных, и в случае, 
 * если она существует, вернет ошибку. И второй - set, который перезапишет уже существуюущее значение по данному ключу,
 * или создаст новую запись в базе данных, если ее не существует. Что использовать - решать вам, но в данном случае 
 * целостность данных важнее и мы будем использовать метод add
 */

try
{
    $couchbase->add("profile::{$userId}", json_encode($document));
}
catch(\CouchbaseException $e) // в случае возникновения ошибки, будет сгенерирован Exception с ее описанием
{
    // который мы успешно выводим в лог и забываем о ней, так как пока не хотим ее обрабатывать.
    error_log("Совершенно неожиданная проблема с Couchbase: {$e->getMessage()}");
    exit(1);
}

/**
 * Ну и наконец мы достаем документ из базы данных и читаем его значения. Делается это при помощи метода get. 
 * В его писании просто нечего описывать, кроме того факта, что он вернет вам не массив данных, а JSON строку, 
 * которую необходимо будет распарсить и уже дальше работать с полученными данными. 
 */

try
{
    $userData = json_decode($couchbase->get("profile::{$userId}")); 
}
catch(\CouchbaseException $e) // в случае возникновения ошибки, будет сгенерирован Exception с ее описанием
{
    // который мы успешно выводим в лог и забываем о ней, так как пока не хотим ее обрабатывать.
    error_log("Опять проблемы с Couchbase: {$e->getMessage()}");
    exit(1);
}


As you can see, in the above example, there are absolutely no difficulties, except for generating a user ID. Difficulties will begin immediately after when you need to start searching.

What is View and how to work with it


If you go to the Couchbase server control panel, you will notice one wonderful thing called Views. It will have two sub-items “Development views” and “Production views”. It’s not difficult to guess the display (and in the context of Couchbase it’s a selection). While there is empty, but we will figure out how to spoil there.

View is essentially an index, the rules for creating which can be described in JavaScript. Yes Yes. Indexes here are created on the basis of the logic described by you and carry not only an enumerated function, but also some semantic function. For example, you can include users in the index whose email is longer than n characters, or only certain fields are present. For working with indexes, we only have JavaScript, but it is more than enough. The index is updated either on demand (when requesting for data) or automatically when the database fragmentation reaches a certain percentage (specified in the Bucket settings). This point also needs to be considered when developing.

There are two ways to create a view. The first one is to write JavaScript rules directly in the control panel in the Development section and transfer them to Production, or directly from PHP to pull the setDesignDoc method with a description of scripts that will directly go to the Production section.

First, consider the script and what it should consist of. The script is a function at the input of which a description of the meta information of the document and its contents. Let's look at creating an index based on the user's login.
function (doc, meta) {
    // если тип документа является JSON и в документе присутствует поле login
    if (meta.type == "json" && doc.login) {
        // то из ключа записи достаем userId (он у нас записан через двойное двоеточие)
        var userid = meta.id.split("::");
        // и добавлем полученные данные в индекс ключ/значение
        emit(doc.login, parseInt(userid[1]));
    }
}

Thanks to the JavaScript method described above, it is possible to understand that only entries that have the login field will be included in the index (which is generated using the emit method). As you can see, the JavaScript function is made in the form of a callback function, which will be applied to each record located in this Bucket. It should be noted that the view can be created at any time during the existence of the bucket life cycle. Those. if you have a need to add new functionality, you can easily add new views and live on as you want.

So. We will figure out how we can find out the user ID if we only know his login. To do this, we need to create a new index (we will create it immediately from the PHP code) and call it.
/**
 * Создаем описание нашего view 
 */
$designedDocument = array(
    'language' => 'javascript',
    'views' =>array(
        'login' => array(
            'map' => 'function (doc, meta) {if (meta.type == "json" && doc.login) {var userid = meta.id.split("::"); emit(doc.login, parseInt(userid[1]));}}'
        )
    ),
);

/**
 * Вызываем сохранение данного дизайн документа в базу данных 
 */
$couchbase->setDesignDoc('userFields', json_encode($designedDocument));


If immediately after executing this code, we go into the Couchbase control panel, we will see (on large volumes) the progress of creating the index in the upper right. Upon completion of which, you can check its operation by opening it in the control panel and clicking on the "Show Results" button. In the answer, we will see the key / value pairs that were generated by the JavaScript callback method.

From the PHP code, we can get a selection by the following query:
// ответ придет не в виде JSON строки, а в виде массива данных
$result = $couchbase->view('userFields', "login");

The answer will be an array of two elements: total_rows - where the total number of records at this index will be contained and the rows field - in which there will be an array of arrays from our sample in the form: array ('id' => 'profile :: 0', 'key '=>' megausername ',' value '=> 0). In this array of fields: _id is the identifier of the document that fell into the selection. Key is the key that was specified during the formation of the index, and value is the value that we generated.

But you should take into account the fact that in this way we get the whole index, which is not quite suitable for us to search. And if we need to find only the identifier? Do not iterate over the entire index manually. Of course not. And for this, during each request to view, you can specify additional parameters. For example, if we want to know only the identifier of the user specified by login, then we must specify a specific key in the request to view. It is done like this:
$result = $couchbase->view('userFields', "login", array('key'=>'megausername'));


And this is a happy moment when, as a result, we will only have a record whose key is equal to 'megausername'. With which we can work and be happy. There is only one pitfall. As mentioned above, the index is not rebuilt at the time of adding or changing the record to the specified Bucket, but only when a certain percentage of Bucket fragmentation is reached.

Suppose we have a need to check for the uniqueness of a username before performing any operation. For example, when registering a user. Naturally, we will fulfill the request for this view and analyze the database response. However, there is a possibility that at the same moment a user with the same name has just registered, and the index has not yet had time to rebuild. Naturally, information will come to us that there is no such record in the index and we will get some conflict. In order to avoid such a situation, it is possible, when calling the view index, to indicate to him the need to rebuild the index. Those. all operations that were not performed on this index will be completed first, and after which the query will be completed and the result will be returned. This is done by adding the stale option with a value of FALSE. Делается это вот так:
$result = $couchbase->view('userFields', "login", array('key'=>'megausername', 'stale'=>FALSE));


As a result of this request, all operations related to working with the index will be completed and you will get a real result, which is in the database. It is important to consider this point when working with such specific data.

Conclusion


As you can see, working with the Couchbase Server is not so difficult, it is important to thoroughly study the documentation before starting work and not to forget to think and analyze your actions. I won’t agitate everyone to switch to Couchbase, but I would like to say that for me personally, the ability to work with the database without fundamentally changing the structure when adding new fields was a very “tasty” factor in the development of the system described above. However, harsh reality returns everything to its place. In my particular case, the question arose about generating statistics on individual fields in real time, and interacting with the statistics storage / analysis system running on the MSSQL engine. This fact led to the organization of “crutches” for the convenient work of our DBA developers, what actually resulted in duplication of the field management system in SQL format. If you want to use NoSQL database engines in your projects, I advise you to start with standalone projects that are not tied to the internal infrastructure, as you can’t integrate painlessly.

If you have any questions, I advise you to study the documentation, which can be found at www.couchbase.com/documentation.
If you still
have interest in this topic, then it can be revealed at a slightly more complex level. Consider the methodology for migrating from SQL thinking to NoSQL documented in the following articles. Consider how you can organize GROUP BY, ORDER and other interesting things with Couchbase, as well as consider more deeply the issues of optimization and design of document-oriented databases.