InterSystems Caché and NoSQL Technologies

Modern highly loaded applications have changed the requirements for the DBMS - today, effective technologies for creating specialized solutions with guaranteed response times when processing large amounts of data are needed. However, despite the emergence of relatively new technologies such as NoSQL, the potential of long-standing approaches has not yet been fully realized.

High-load Internet projects and XTP applications (extreme transaction processing) changed the requirements for DBMS technologies. The priority requirements were simplicity of development, the possibility of specializing the technology of stored data for a specific project, maintaining a constant response time of the system with increasing load, ensuring low cost of scaling and the cost of processing large amounts of data.

In response to new needs, the NoSQL movement arose - a new class of databases that promises developers high speed of making changes to applications, low costs for scaling and processing / storage of large amounts of data, high speed of work on relatively inexpensive hardware - values ​​that have always been important for InterSystems technology. Almost always, NoSQL databases implement a database application paradigm that differs from the usual one - the transition from the concept of an integrating DBMS for several applications to the concept of a DBMS for one application or one project, and more - a separate specific task within the project.

NoSQL's generic features include the use of non-relational data models, simple APIs or access protocols (compared to traditional ones), the ability to scale horizontally on demand for a certain set of operations on many servers, distributed storage of data on many servers, and the efficient use of distributed indexes and memory for queries , free handling of serious and unshakable things for traditional DBMS - data integrity and transactions.

Today there are more than a hundred NoSQL solutions that differ in approaches to scaling and distributed data storage, supported data models and storage schemes (including storage implementation). A separate comparison point is undoubtedly queries to stored data and their execution - in the NoSQL world, a standard query language does not yet exist, and a transparent understanding of the principle of operation is necessary for the successful implementation of queries by the developer.
At its core, the design of NoSQL solutions is aimed either at combating large amounts of data or with their increased complexity. Idea taken from presentation by Neo4J Emil Eifrem . Interestingly, he talks about the theoretical isomorphism of data - the same information can be represented in different models - from graph to key / value. The feasibility of this approach is demonstrated by InterSystems using the Unified Data Model principle at Caché.

The common thing that NoSQL projects have in common is the widespread use of compromises in relation to conflicting requirements, the rejection of which was impossible and was a dogma for traditional DBMSs - for example, avoiding support for all ACID properties in favor of horizontal scalability. The most popular way of explaining the reasons why trade-offs are natural is the CAP theorem. Freely interpreting its meaning, we can say that it is impossible to be reliable, fast, distributed and integral at the same time - however, there may be options.

Another source of compromise is the nature of the data of the problem being solved. It is here that the requirement for technology is important, which should be flexible and make maximum use of the features of the subject area. For example, if you can parallelize data processing and use the principle of “shared nothing” (without sharing resources), then you need to use this effectively for both storage and execution of requests. In this case, it is necessary to build a model for storing and distributing data that relies on this possibility, however, in relational databases there is almost no freedom of choice and you have to use what is, for example, you cannot store data on different servers on a per-unit basis. At the same time, unlike traditional DBMSs, NoSQL gives the developer more freedom in using the natural features of the design task, which can be used for example for horizontal scaling. However, the developer at the same time bears more responsibility for decisions made on the architecture of data persistence. In part, NoSQL databases can also be compared by the presence of mechanisms standard for traditional DBMSs or classified by the applied engineering solutions, which are proposed as an alternative to the traditional DBMS properties.

Despite the diversity of NoSQL projects, now there is not one that could be safely called a universal and comprehensive NoSQL platform - this contradicts the very principle of specialization, which is explicitly or implicitly traced in NoSQL. Therefore, if you had an idea to use NoSQL approaches in the next project, then most likely you will have to answer a number of questions and resolve many risks, for example: which data model to choose; how stable and mature the selected technology is; how serious the changes in the code will be in case of an attempt to change the NoSQL solution to another, more efficient one; whether the query language is sufficiently complete and technologically advanced to meet design requirements. Separately, it is worth noting что многие NoSQL технологии были созданы специально в рамках конкретного проекта и в некоторой степени подобны флюсу — есть вероятность того, что отлично закрывая требования и задачи первоначального проекта они могут не очень подходить в вашем случае.

In the situation of the first project using the NoSQL approach, a hybrid approach to building a subsystem for managing stored data would be a wise decision. For the hybrid approach, two possible designs can be proposed - the simultaneous use in the project of both NoSQL technology and the usual DBMS, or the use of technology that supports the concepts of both worlds to the necessary extent. And in this case, InterSystems Caché provides a unique opportunity to provide such a hybrid technological platform - mature, tested, supported.

The first, obvious, phonetic similarity that immediately attracts attention when comparing NoSQL and InterSystems Caché is non-relational. Caché is based on an implementation of a simpler than relational model, called global by name of its atomic elements (or to be precise, the full name of Global Persistent Variables or just globals). Globals do not have a schema, allow dynamic addition of columns, use sparse storage of column values. At the global level, you can use, if desired, locks, transactions, do distributed storage and partitioning. Sacrificing some inaccuracy in the definition, you can think of globals as a structure similar to an associative array in PHP or HashMap in Java.

Globals as a simple and flexible data model provide an excellent basis for building non-relational models that are used in NoSQL: key-value, extensible records, column-based, graphs. Detailed examples of the implementation of popular models in NoSQL are given in the article A Universal NoSQL Engine ( translation in Habré ) - the authors offer solutions for four types of data models.

For example, the implementation of bulk storage (from A Universal NoSQL Engine, Using a Tried and Tested Technology):

At the global level, there is no declarative query language customary for relational databases. Queries are determined in an algorithmic way - query execution is reduced to the execution of code written in the Caché Object Script language, which provides a sufficient set of simple, efficient operations for working with data stored in globals. The uniqueness of Caché Object Script as a programming language is that it is perhaps the only language in the syntax of which a construction is explicitly introduced to indicate where the variable is stored - in memory or roughly speaking on disk. Imagine that in such traditional platforms as Java or .NET there would be such an opportunity - in many ways there would simply be no problem with overcoming the environment between the program and the database. The lack of such a design for universal programming languages ​​after working with Caché seems strange - it is natural to assume that the code works not only with variables in memory, but also with stored variables. In this case, you do not need to predefine the structures in the database - you just work with them in the same way as with variables in languages ​​with weak typing.

The concept of the Unified Data Model is based on the principle of "data alone - there are many presentation models"

Following InterSystems, which has already implemented globally-based object-oriented and relational (SQL) accesses, you are able to implement your own, unique data model and, like the models already ready for use in Caché, use the Unified Data Model principle - management approach persistence, which involves working with the same data in different models, depending on the ease of use in the context of a specific task. For example, for quick insertion and reading, it is possible to use the key-value model, and for queries, the capabilities of the relational model are used. When building your query language for NoSQL solutions, you can use Caché Object Script, which provides a set of simple operations for working with data stored in globals.

A separate, but not as obvious as the non-relational aspect of the comparison of Caché and NoSQL is the distribution and scaling. If we compare Caché in such an important category for NoSQL as providing horizontal scaling and distributed storage using mechanisms such as sharding and partitioning, then on the one hand Caché has no ready-made out of the box options with such names. From another point of view, this is not entirely true, because in Caché it is either done a little differently, or, again, like in the case of globals, reliable basic technologies are provided to effectively provide such capabilities. Using ECP, the concept of areas (namespace), Subscript Level Mapping, it is possible to implement efficient distributed data processing.

Partitioning and distributed storage using ECP and SLM:
Realizing that NoSQL is now attracting the attention of many developers, InterSystems has released a free DBMS called InterSystems Globals . The goal of the Globals release is to introduce developers to technology that is the heart of Caché and to expand the circle of developers and architects who know how to use it.

Globals, like many other NoSQL projects, involves free use for development and distribution. Non-relational models can be implemented in Globals, and as an example of such an implementation, an open source project Globals Document Store (GDS) API has been created. Globals can be successfully used in projects where high speed and productivity are required, the order of Globals speed is tens and hundreds of thousands of records per second).

InterSystems Globals provides simple APIs for working from .NET, Java and Node.js. In contrast to the Caché Object Script story described above, in the case of Globals a different approach is used - operations on working with globals are available from a programming language external to the DBMS. At the same time, the process of an application working with Globals (for example, JVM) becomes actually one of the DBMS processes.

In the case of Java, technology allows you to quickly implement your own stored data structures, which are also natural for the language. For example, you can quickly implement an analog of HashMap, the data of which will be stored in the DBMS. With this approach, as with Caché Objects Script, the differences between the variable in memory and on disk begin to disappear.

For Node.js, access to Globals immediately provides the opportunity to work with javascript-natural data types - arrays and javascript objects can be immediately saved, read and modified without additional development overhead, which greatly simplifies the problem of data persistence when working in javascript. In addition to this, the Globals in conjunction with Node.js provides a high speed of work - for comparison, the Globals for tests are faster than Redis (one of the most widely used NoSQL projects, including those known for their speed).

InterSystems Globals is positioned as a NoSQL database, but it differs from the main NoSQL stream in several aspects: InterSystems Globals has no restrictions on a specific data model, unlike many other solutions you can use locks and transactions, Globals provides both efficient work with data in memory and data integrity on disk. At the moment, Globals does not have the ability to work with distributed data in a distributed manner. Globals use a stable core technology that is guaranteed to be developed and maintained.

Globals, unlike Caché, provides a core for working with globals, without object and relational accesses. But when developing a project, there is always the opportunity to switch to Caché without changing the application code - The Globals API is a subset of Caché Extreme technology.

Summing up, we can say that now we can think of Caché as a NoSQL database and more - a universal, stable platform for NoSQL projects with support for object and relational models not inferior in performance to traditional DBMSs. And despite the fact that NoSQL - a term that has become quite popular quite recently - fully corresponds to the company's values ​​that remain the same - just like for 30 years, the technology that underlies Caché allows you to quickly create solutions that are completely specific your project. Why do you need technology from another project if you can use yours?