How to execute many UI tests in parallel using Selenium Grid?

Hello everyone! I work at Avito and develop test tools. When we got a lot of UI tests, we ran into the problem of scaling Selenium servers, and now I’ll tell you how we solved it.


And so how do you still perform many UI tests in parallel using Selenium Grid? Unfortunately, nothing.
Selenium Grid is not able to perform a large number of tasks in parallel.
Want to register a really large number of nodes? Well, try it.
Do you want speed? It will not be - the more nodes are registered on the grid, the less stable each test is. As a result, restarts.
Want fault tolerance in case the Grid stops responding? Also not: you cannot start several replicas and put a balancer in front of them.
Do you want to update the Grid without downtime and so that the tests currently running do not fall? No, this is not about Selenium Grid.
Do you want not to keep thousands of Seleniums of different configurations in memory, but to raise them on demand? Will not work.
Want to know how to solve all these problems? Then I invite you to read this article.
* (My report with the same name was already heard at Heisenbug 2017 Moscow , and perhaps some readers are familiar with it. Under the cut is a more detailed text version of the story about the instrument).



A small digression about how the Selenium server works.


  • In order to start managing the browser, you need to send a request create session to the Selenium server.
  • As a result, the browser opens on the node, and a token returns to you sessionId , sending which in each request, you control the browser.

Okay, why do you need a Selenium Grid? Selenium Grid provides a single point for working with many Selenium-servers of different configurations:


  • it allows you to create a session on a free node that matches your filtering criteria, for example, according to the browser version;
  • stores information about which session is open on which node and proxies all requests to the target node, so it makes no difference for the client to work with one node, or with the grid.


Great tool, right?



But when using it, we encountered a number of problems.


1. Unpredictable behavior.
Briefly, what you want and when it wants to fall off, and you can’t influence it in any way.


  • We very often faced situations when tests worked perfectly in a single thread, but with multithreaded execution through the grid there were unpredictable crashes.
  • From time to time, tests simply didn’t fall on a part of the nodes, although physically they were available, the queue from the tests accumulated on the grid. As a result, half of the release suite fell off by timeout.

2. Lack of support for a large number of nodes
When trying to register many nodes (and we want many nodes), registration will occur, but testing the application in many threads will still fail, as most of the tests will fail.



3. Scalability
The first thing that comes to mind when the limit of nodes = N on the selenium grid is reached, at which stability does not suffer, is to take two, three, five, (at least ten) grids, register N nodes for each, and twist in front of all this stuff, some balancer and run tests in 10 * N threads. But no, Selenium Grid doesn't work like that. Because all the information about nodes and sessions is stored in the memory of a particular node and is not rummaged between them. The following problem is closely related to this.



4. Fault tolerance
If you turn off the machine where the hub is located, then all tests die immediately, because you do not have any backup hubs that the following requests can go to, because again, everything is in memory. And this is absolutely not possible to scale (of course, you can always rewrite a couple of grid classes, but more on that later). The weak point is the Selenium Hub, when it falls, the nodes become inaccessible.



5. The inability to dynamically create nodes using a container orchestration system
If for testing you need many different configurations of nodes with different browser configurations, another problem arises: this whole zoo takes up a lot of memory space. Suppose you have: 300 nodes with Google Chrome (150GB RAM) + 300 nodes with Firefox (150GB RAM) and another 200 nodes of some Firefox Nightly with magic plug-ins (100GB RAM). 400GB RAM is constantly busy, plus I want to effectively redistribute nodes throughout the day, say, take all 400GB with seven hundred chromes when testing one suite and flexibly replace them when tests with other needs appear in the queue.


Docker is ideal for solving this problem, since it allows you to quickly lift a container with fresh Selenium and kill it just as quickly after completing the test. And since we need a lot of seleniums, all this will not fit on one iron server, there is a need for container orchestration on the cluster. There are several popular solutions on the market for this task, we use Kubernetes. Why we chose Kubernetes, you can listen here . Selenium standard methods cannot solve this problem.


6. It is impossible to update / restart the grid without downtime.
Another consequence of storing sessions in memory. Not that it's a supercritical minus, but still unpleasant.


All of the above is a situation in which we one day find ourselves.


Known Solutions


The Grid Router and the new Go Grid Router implementation are a good solution, but unfortunately far from ideal. The main problem of the feature is that it is not a replacement for the Selenium Hub, it is another proxy from above.


Hence the name - Grid Router, because it manages not grids, but grids, so there are disadvantages.


  • An attempt to create a new session occurs not on a grid with free nodes, but on a random one (you can control the distribution of a random variable using weights). If it was not possible to create a session on one of the grids, the request will go to the next, and so until the grids run out. Thus, the time to create a new session can be delayed for significant periods of time.
  • If one of the selenium hubs falls, then all information about the sessions will be lost, and the nodes will be disconnected from the network. Since so far, all interactions go through the hub and session data is stored in the hub.
  • It is quite difficult to add another hub to the system, because data about the hubs is stored in xml files and synchronization with files occurs according to the signal of the operating system. There are no transactions, everything is bad.

Selenoid is a tool for running tests in docker containers. With each request to create a session, a fresh container is launched and when the session is closed, it is deleted. The tool is great, but there are downsides:


  • does not support any orchestration system;
  • still stores session information in memory, and as a result, has problems with scaling and fault tolerance.

When we faced all these problems, we decided to take an interest in the experience of other companies. Yandex wrote on a Habrahabr blog that it was not possible to register many nodes and work with them; they use the Grid Router to solve this problem. Grid Router is not suitable for our tasks.


Alfa-Bank wrote that everything hangs in them if grid is not used for some time, and our experience confirms this - we had the same thing regularly.
Of course, we did not neglect github selenium, where we found several issue ... Here is an example of the authors' attitude to what is happening:


Q: «selenium-grid version 3.0+ support hub high availability?»
A: «I would recommend having a separate server monitor the hub and then if/when the hub goes down it would know how to restore the hub.»

We realized that we had nothing to hope for, and began to solve our problems ourselves.


Study


We decided to start from a simple path, deployed a certain number of seleniums in the kubernetes cluster, put the ip in the database, and setUp() go directly to setUp() the database to take the ip from there, which was not used for the longest and run the test without storing sessionId and blocking nodes setUp() anywhere sessionId . Так как воркеров с тестами было < количества селениумов, переполнения не должно было происходить.


This decision immediately showed its viability.


We got:


  • predictable behavior;
  • fault tolerance at the database level;
  • scalability
  • support for a large number of nodes;
  • upgrade without stopping the tests, because it's just the code that lies in your repository, and runs when the tests run.

But faced with a number of problems:


  • no support for Capabilities selection mechanism;
  • there is no convenient mechanism / grid / register;
  • Portability is missing - the system no longer works as a service, depends on one programming language and is implemented in the same repository with tests.

The last problem is the most important, because if you sew it into the code of the test framework, then you automatically need to support this in each of your test frameworks, in all repositories in all the languages ​​used.


The most important thing in this experiment is the experience gained. We made sure that Selenium Grid can be implemented normally.


Final decision


First of all, we began to consider the idea with a fork / pull request of selenium. But after a more detailed acquaintance with the project code, we realized that it’s cheaper and more reliable to write your bike.


Let's list again what we want from the new tool:


  • predictability of behavior;
  • fault tolerance;
  • scalability
  • portability
  • support for a large number of nodes;
  • Capabilities support
  • on-demand Node in Kubernetes;
  • collecting metrics in statsd;
  • mechanism / grid / register;
  • upgrade without stopping the tests.

What ultimately happened:


  • an application that solves all of the above problems;
  • cross-platform application, tested on linux and macos;
  • written in Go;
  • stores data in mysql.

As a result, we managed to solve all the problems. The application was written in Go. The application itself is stateless - sessions are stored in mysql, if desired, it is not difficult to support any other database. On-demand creation of containers is implemented only in Kubernrtes, but you can send a pull request with the implementation of container creation / removal methods in any other system. Go is compiled for different platforms, but it was enough for us to test the performance only on linux and macos, in the theory of other systems there should be no problems.


Now the main question. How many lines of test code did we have to rewrite during the transition to this tool? Who believes that 10000/1000/100? Zero! Nothing had to be rewritten; it is fully compatible. You just need to deploy the application and specify its address, and that’s it. You don’t have to do anything else.


As a result, we got the following scheme:



How to use it? There are 2 modes:


  • Persistent - everything is as before, start the selenium server with the parameter -role node , indicate where the hub address is, the node is registered, you can use:

java -jar selenium-server.jar -role node -hub http://127.0.0.1:4444/grid/register


  • On-demand - you need to add docker images and information about what capabilities they implement in the grid config. Next, run the grid, request a session, the node itself is created in the cluster.

...
      "type": "kubernetes",
        "limit": 20,
        "node_list": [
          {
            "params": {
              "image":"myimage:latest",
              "port": "5555"
            },
            "capabilities_list": [
              {
                "browserName": "firefox",
                "browserVersion": 50
        ...

Total


We have been using this solution in production for quite some time, it works and does not require any support. In the process, we once again became convinced that we should not be afraid to make bicycles. Popular solutions are not always good, you should always explore the possibilities for solving problems.