The architecture of the Pandorama BigData infrastructure and protecting its data from failures

If the Google mantra sounds like “searching for all the information in the world with one click,” the mantra of the young Russian project Pandorama goes further: “without any click, we will find all the information you are interested in.”



The Pandorama application offers its users an “endless” personalized news feed based on their personal information preferences, without requiring the reader to work with the “tags”, “categories” or “likes” of friends. First you need to answer a couple of questions about a few funny pandas, and then you just need to ... read the proposed tape. Those news that you read will be automatically analyzed and processed by the system so that in the future there will be more and more such news in the feed, and less and less news.



Pandorama

Pandorama already unites more than 40 thousand users around the world, and this number is constantly growing. This article discusses the BigData infrastructure of this project, which operates in 24x7 mode, the mechanisms for ensuring its fault tolerance, and the protection of its data from failures, built using Veeam Backup & Replication Cloud Edition .



So how does the Pandorama web service work? Every day, his robot is constantly looking for new information, bypassing many pages on the network (about 35 thousand sources are analyzed daily). Each article found is processed using its own linguistic algorithms, after which it is automatically assigned one or more tags like “iPhone 5s” depending on the content. The end user receives a personalized article feed compiled based on his personal interests identified by the Pandorama system. The project of Pandoram was originally planned by the founders as international, aimed at the mass market, so English was chosen as the language of the news feed.



image

Figure 1. Registration window in Pandorama



Pandorama currently rents four dedicated physical servers with more than 30 virtual machines (VMs) deployed. To provide scalability and fault tolerance, the following set of technologies is applied:



  • Ethernet Networking (LACP)
  • Web Load Balancing (HAProxy)
  • Network Connection Balancing (DNS Round Robin and NLB)
  • Geographically distributed caching of pictures and static resources (CDN)
  • NoSQL, including sharding and mirroring (MongoDB)
  • SQL mirroring
  • VM replication and VM backup locally and to the cloud using Veeam Backup & Replication Cloud Edition

More on this will be described below.



Pandorama Infrastructure


Like any startup, Pandorama's budget is very limited, so the team is trying to invest money in everything as efficiently as possible, including infrastructure. Initially, several hosting options were considered. First of all, they thought about Amazon, but preliminary calculations showed that this option is too expensive. In general, in many cases, Amazon is a good starting point if the architecture is built on small, well-replicated modules. However, in the case of Pandorama this scheme did not work - the project infrastructure includes several “heavy” servers involved in linguistic analysis. Large amounts of memory and fast disks are important here, and renting such virtual machines (taking into account additional fault tolerance measures) was too expensive for Pandorama. Another hosting option is renting physical servers with installing their own VMs on them. This path turned out to be more suitable for the price.



At the moment, now at the physical level, Pandorama is the following infrastructure.



Рисунок 2. Схематичное изображение инфраструктуры Pandorama – физические сервера и подключение к сети

Figure 2. Schematic diagram of the Pandorama infrastructure — physical servers and network connectivity



The whole system is balanced in terms of load and has some redundancy in the amount of stored data for fault tolerance. For example, if one of the VMs fails, the data will not be lost, and the other VMs will take over the load. Somewhere this is achieved by replicating the VM, somewhere by replicating the data by the application inside the VM (for example, in the case of MongoDB). As you noted, the Pandorama infrastructure does not have a single storage (expensive), but all VMs are balancedly distributed across physical servers.



Each of the four hosts has 128 GB RAM and 2 Xeon E5-2670 processors. Given Hyper-Threading, we get 32 ​​vCPUs. A RAID-10 array of SATA disks for hosting VMs and data that do not require quick access. For VMs with active I / O, an array of SSDs. To extend the life of SSDs, Pandorama architects made sure that guest OS file systems work with TRIM through a hypervisor. Of course, you must also regularly check the status of the drives themselves.



Since each server has a 4 x 1Gb Ethernet NIC (in addition to KVM Over IP), it was decided to organize 2 networks inside the infrastructure. The external network through the infrastructure of the hosting provider is connected to the Internet via a gigabit channel, and the internal is isolated. At the same time, in each of the networks 2 x 1Gb are combined using Lacp into one logical connection. Separation of the internal and external networks made it possible to conveniently and simply eliminate the influence of internal service traffic on external "client" traffic. And LACP simultaneously improves both performance (by balancing TCP connections between interfaces) and fault tolerance by redundant channels.



Logically, the Pandorama infrastructure can be divided into 3 separate blocks.



Рисунок 3. Схематичное изображение инфраструктуры Pandorama по модулям

Figure 3. Schematic representation of the Pandorama infrastructure “by modules”



Content delivery core. The workload of this block depends on the number of sources that need to be circumvented (now about 35 thousand per day), and the number of articles and texts that need to be processed. This block should not suffer from user load on the site. Conversely, the Front End interacting with the user should not feel a delivery problem. These parts are separated. The third block, the binder, transmits and stores data.



Pandorama uses many different mechanisms for collecting and processing text and graphics, for example:



  • Crawling articles with filtering out irrelevant content (similar to getpocket.com or readability.com )
  • Automatic article tagging: this is about “cakes”, and this is about “cameras”
  • Finding Similar Articles or Double Images

About 40 services work on the content delivery, and for some of them methods of computer training are used (this is the topic of a separate article :). Delivery services are packaged in units that can operate relatively independently (it turns out a VM containing about 40 services in a specific configuration). Further, these units are replicated on different hosts, which ensures scalability and fault tolerance of the delivery.



A bit about the data core. Intermediate data that does not require storage is transmitted via the RabbitMQ bus. This tire is light and unpretentious. RabbitMQ has several failover features. You can make message queues transactional, i.e. each of them will be forcibly saved to disk. You can set up a cluster with queue mirroring. For Pandorama, these mechanisms turned out to be redundant, so just a cold copy was created, a replica of the VM from RabbitMQ on the neighboring host, which will start if the main VM stops working.



Another thing is the database. The main database is MongoDB. To increase productivity and reliability, sharding , mirror replicas and backups are used, which will be discussed below. One problem point is a poorly scalable SQL database. The fact is that in Pandorama there is a lot of code that has not yet been translated into work with MongoDB. Therefore, SQL is used in some cases, and its fault tolerance was achieved due to the hot reserve - mirroring.



And now about what users interact with - Front-end.



Рисунок 4. Пример персонифицированной ленты Pandorama.

Figure 4. An example of a personalized Pandorama tape.



Here a whole bunch of technologies are used to increase reliability and performance. There are 2 clusters of web servers:



  • The first cluster serves the user and generates a personal content feed for him. Failure protection and load balancing is done using HAProxy. At the same time, so that the HAProxy instance itself does not become a point of failure, several of them are deployed with DNS Round Robin configured. This is a good old tube technology that just works when the user selects a random IP address from the list. This allows you to distribute the load without a dedicated balancer, but what about a user session? This problem is resolved by HAProxy, which knows which of the cluster's web servers serves the current user.
  • The second cluster is designed to deliver static content, where there is no concept of a user session. This is where DNS Round Robin would have come in handy, however, unfortunately, the CDN used in Pandorama does not yet support this technology, so NLB is configured in Pandorama .

A few words about the CDN. Image upload traffic is many times greater than all other traffic with Pandorama. Pandorama uses CloudFlare as the CDN. With the exception of a few comments, this service is completely satisfied and allows you to save more than 85% of the traffic.



Pandorama Crash Data Protection


The issue of data protection and fault tolerance arose even at the stage of designing the service infrastructure. In terms of fault tolerance, the system is now configured in such a way that when one physical server fails, the remaining three will take the burden on themselves. In terms of data protection from failures, Veeam Backup & Replication Cloud Edition protects the virtual part of the infrastructure through backups and replication of key working virtual machines to neighboring hosts.



How is data protection in Pandorama adjusted?



Рисунок 5. Принципы резервирования данных в Pandorama

Figure 5. Principles of data backup in Pandorama



  1. If the application inside the VM allows you to do replication in the built-in way, then this functionality is used. For example, MongoDB and SQL are mirrored this way and the image cache in the web cluster is synchronized.
  2. If this is not possible, then the entire VM is replicated, for example, RabbitMQ VM - once every 4 hours.
  3. In addition to replication, there should be full and differential backups ... If the application allows you to backup using the built-in functionality, you can use it, as, for example, in the case of SQL or saving web server logs as separate files.
  4. Otherwise, a backup of the entire VM is made. For example, the Pandorama command comes from MongoDB in the absence of an acceptable backup solution.
  5. Once a week, VM backups are uploaded to the Amazon Glacier cloud. Moreover, deduplication and compression in Veeam Backup reduce the size of the backup from 140GB (the size of the original VM) to 10GB.
  6. Other backups are also sent to the cloud. It is important to note here that Veeam helps not only make backups in a virtual environment, but it is also used in Pandorama to upload other files to the cloud.

In fact, in terms of data protection, Pandorama successfully implemented the 3-2-1 backup rule : data copies are stored in at least three copies (source data, local replica, and one copy in the cloud), in two different physical environments (locally on disks and in the cloud), and one copy was taken out of office.



Recovery Testing


The relevance of testing recovery from backups was discussed earlier in this post . The Pandorama team is conducting a system restore test after various “crashes”.



The following scenarios are considered:



  • Disabling a single VM or physical server. The team emulated various events, for example, disconnecting the VM one after another or the whole host, and watched how this would affect the entire system. At the same time, a user load simulation was taking place. Testing revealed certain problems that were later fixed.
  • The entire data center burned down. Data protection in the event of such a development of events is a simple backup in Amazon Glacier. This is the worst case scenario, and it is acceptable for Pandorama that system recovery will take place within a couple of days. Why is that? Because protection with a lower RTO from such an accident costs more than a startup can afford, and this is not necessary.

Conclusion


The example of the Pandorama project proves once again: that storing a large amount of unstructured data (BigData) and protecting it, system fault tolerance, and solutions to provide simultaneous access of many users to the Internet service - everything can be configured relatively simply, functionally, and not expensively .



Useful links:




Авторы: Мария Левкина (Veeam) [ vMaria ], Константин Пичугов (Pandorama), Александр Ширманов (Veeam) [ sysmetic ]