Unit testing with screenshots: breaking the sound barrier. Transcript of the report

Testing layout regression with screenshots is fashionable, you won’t surprise anyone. We have long wanted to introduce this type of testing at home. All the time, questions of ease of support and application were confusing, but to a greater extent - the bandwidth of solutions. I wanted it to be something easy to use and fast to work. Ready-made solutions did not fit, and we undertook to do our own thing.


Under the cut, we’ll tell you what came out of it, what tasks we solved, and how we made sure that testing with screenshots had practically no effect on the total time taken for the tests. This post is a transcript of the report that was delivered at HolyJS 2017 Moscow . You can watch the video here , and read and see the slides below.



Hello everyone, my name is Roman. I work at Avito. I am engaged in many things, including open-source, the author of several projects: CSSTree , basis.js , rempl , CSSO maintainer and others .


Today I’ll talk about unit testing with screenshots. This report is a story about the search for engineering solutions. I will not give recipes for all occasions. But I will share the direction of thought: where to go to do everything well.


Bicycles are not always bad. Those who know me remember that I often try to do something new, despite the fact that there is a lot of ready-made. What does this lead to? If you don’t give up, then you can find solutions, and not at all where you were looking for them.


Since we have announced the theme of screenshots today, I’ll say that testing can be accelerated not only by optimizing the code. The problem may not only be in it. And experimenting, you can get interesting moves and solutions.


When any problems arise, modern front-end vendors usually go directly to NPM, to StackOverflow and try to use ready-made solutions. But not always npm install can help. “The spirit of adventurism has disappeared within us”: we rarely try to do something on our own, to dig deep into it.



We will fix it.


Unit Testing: Tools


Testing can be different: unit, functional, integration ... In this report I will talk about unit testing of components or some blocks that we want to test for regression of layout.


We wanted to tackle this topic for a long time, but all did not reach our hands. We needed a solution so that it was simple, cheap and fast.


What are the options?


  • Ready services
  • Ready-made tools, such as Gemini;
  • You can write your own.

Services do not suit us for certain reasons: we do not want to use external services, we want everything to be inside.


What about the finished tools? They are, there are several of them, but they are usually focused on walking through the urls and “clicking” certain blocks. This didn’t quite suit us - we wanted to test the components and blocks, their condition, taking screenshots.


There is a Yandex Gemini tool , a good thing, but it looks like a spaceship. It is difficult to get started, configure, you have to write a lot of code. Perhaps this is not a problem. But the problem for me was that taking a simple test from readme, copying it a hundred times, I got this figure: 100 282x200 images are checked for about two minutes. It is too long.


As a result, they began to do their own thing. About this will be today's report. I’ll run ahead: I’ll show you what happened.



So, having some kind of test for marking up a component on React, we add one line in which we take a screenshot and call the “magic” method toMatchSnapshotImage() . That is, one additional line in the test - and among other things, we check the status of the component with a screenshot.


In numbers: if two identical screenshots with a size of 800x600 are compared, then, in our solution, the comparison takes about 0 ms. If the screenshots are a little different, and you need to count the pixels that are different, it takes about 100 ms. Updating screenshots, getting a picture, when we initialize the “base” of reference screenshots, it takes about 25 ms per screenshot. Whether it is a lot or a little, we'll see later.


If we make our decision, which can take a screenshot from the current layout and compare with the standard, what needs to be done for this? First, get the static layout of the component with the necessary styles and resources, load it all into the browser, take a screenshot and compare it with the reference screenshot. Not so difficult.



Markup generation


Let's start with the markup generation. It is divided into several steps. First we generate the HTML component. Then we determine which dependent parts are: what styles he uses, which images he needs, and so on. We are trying to collect all this into a single HTML document, in which there are no links to local resources or files.


HTML generation


HTML generation is highly dependent on the stack you are using. In our case, this is React. We take the ready-made react-dom / server library , which allows us to generate a static string, the very HTML that we need.



That is, we connect react-dom/server , call the method renderToStaticMarkup() - we get HTML.


CSS generation


Going further: CSS generation. We already have HTML, but it most likely still includes many styles and other resources. All this needs to be collected. What is the plan of action here? Firstly, you need to find the files that are connected and used in the components. And convert CSS files so that they do not contain links to resources. That is, find links to resources and inline them into CSS itself. Then glue it all together.


The solution, again, depends on the stack. In our case, we use Jest as a test runner, Babel to convert JavaScript and CSS Modules to describe styles.


To get started, do a CSS file search.



CSS Modules implies that CSS is connected in JavaScript as a regular module, that is, it is used import either require() .


Technically, you need to intercept all such calls and transform them in such a way as to preserve the paths that were requested.


To do this, we wrote a plugin for Babel. Jest has the ability to customize the JavaScript transformation (maybe you already do this if you use Jest). Using customization transform , scripts are added to transform resources that match the rule. In our case, we need JavaScript files.



The script creates a transformer using babel-jest . To other settings, we need to add our own plugin, which will do the necessary.



The plugin task consists of two parts. First, a search is made for all import that are reduced to require() , so that it is easier then to look for CSS connections. After that, everything is require() replaced with a special function:



Such a function initializes a global array for storing paths, adds new paths to this array and returns the original export that was replaced. The plugin code is 52 lines. The solution can be simplified, but so far there has been no need.


At the time of generating the HTML markup of the component, the array includedCssModules will contain all the paths that were requested through require() . All that remains for us is to convert the paths to the content of these files.


CSS handling


At this stage, we need to go around all the CSS files, find links to resources in them and inline them. And we also need to turn off the dynamics: if animation or some dynamic parts are used, the result may be different, a screenshot can be taken at an unpredictable moment.


Inline resources


To inline the resources, we wrote another plugin. (You can use the ready-made one, but in this case it turned out to be easier to write your own).



What does it all look like? Remember we added the plugin in jest-transform ? The story is the same here, only we use a special plug-in for CSS Modules, namely for babel-jest one that has the ability to customize CSS preprocessing: a certain script that will convert CSS before it is used.


So, add to the processCss path to our plugin and write the plugin itself. The CSSTree parser is used for it . The point is not only that I am its author;) - it is fast, detailed and allows, for example, to search for paths and URLs without complex ones RegExp . It is also error-tolerant: if there are incomprehensible parts in CSS, then nothing will break, just these parts will remain unassembled. But this rarely happens.



The plugin searches CSS URLs and replaces them with inline resources.


What's going on here? In the first line, we get AST, that is, we parse the CSS string into a tree. Next Url we go around this tree, find nodes of the type Url , select a value from them and use it as the path to the file that you want to inline. In the end, we simply call translate , that is, we transform the transformed tree back into a string.



Implementing inline resources is not as complicated as it might seem:


  • take a URI, resolve it relative to the CSS file in which it is used (because the paths are relative);
  • we read binary data from the file;
  • calculate the mime type by extension;
  • we generate Data URI of a resource.

All! We have inline resources. The features described are 26 lines of code that do everything you need.


What else may be useful to write your own solution: we can expand it, for example, later we added the conversion of animated GIFs to static images. But more on that later.


Getting rid of the dynamics


The next step is to get rid of the dynamics. How to freeze an animation and where does it happen?


Dynamics appears in:


  • CSS Transitions
  • CSS animations
  • a carriage in the input fields that blinks and may disappear or appear at any time;
  • animated GIFs and a number of other moments.

Let's try to “disconnect” all this, so that the same result is always obtained.


CSS Transition


Zero all transitions-delay and transition-duration .



In this case, everyone transition will be guaranteed to be in the final state.


CSS Animation


We do the same with CSS animations.



Here you can see this hack:



Pay attention to the value animation-delay: –0.0001s . The fact is that without this, in Safari, animations will not have a final state.


And the last: we drove the animation to the end (final state), but the animations differ from the transitions in that they can be repeated. Therefore, we pause the animation state by setting animation-play-state to paused . Thus, the animations are paused, that is, they stop playing.



Carriage


The next moment is the carriage in the fields. The problem is that it blinks: at some point we see a vertical line, at some point - no. This may affect the resulting screenshot.


In recent months, a feature has appeared in browsers caret-color : first in Chrome, then in Firefox and Safari (Technology Preview). To “turn off” the carriage, we can make it transparent (set the value as color transparent ). Thus, the carriage will always be invisible and will not affect the result.


For other versions of browsers, it will be necessary to come up with something else, but this is only when we use them for screenshots.



GIF


With GIF, the situation is a little more complicated. The task is to leave one static frame out of the animated GIF. I tried to find a module for this, put it in and forget about the problem. As a result, I found many libraries that resize pictures, change the palette, make GIFs from several images, or, conversely, make a set of images from an animated GIF. But I did not find a package that makes an animated GIF static. I had to write myself.


After two hours of searching the library, I decided to see how complicated the GIF format is. I read Vicki , opened the specification from the 89th year - it turned out quite understandably.



GIF consists of several blocks: at the beginning there is a signature describing the size of the image and a table of indexed colors. Then blocks come sequentially: Image Descriptor Block, which is responsible for the graphics, and Extension Block, in which you can store the palette, some text, comments, copyrights and more. At the end of the file is Trailer, a special block that says GIF is over.


Thus, you need to go through these blocks and filter (delete) all Image Descriptor Blocks except the first. Here is a link to the Gist with code that does the necessary. I wrote it in a couple of hours, debugged it, while it worked perfectly, I did not find any problems.


Bottom line: GIF images are static, animations are off, all CSS paths are there. It remains to glue. What could be simpler, it would seem?


CSS splicing


Let's see how Jest works. It usually runs in parallel, runs several threads that run tests. Each test file is launched in one of the streams, and each file is a separate context that does not fumble data between other contexts. And the problem is that we have a CSS transformation, where we accessed the source code of the CSS files, is out of the context of the test, and we cannot access this content. Читать CSS из файла мы тоже не можем, потому что CSS уже преобразован, хранится в самом JavaScript , в каком-то окружении, контексте, воркере.


How to rummage CSS between tests? We made a little hack. Each worker creates a temporary file in JSON format, where the key is the path to CSS, and the value itself is already converted CSS. Each thread reads this file, takes what is needed from there and does concatenation within the context of the test.



Here we read some temporary file, parse it in JSON, add the necessary content to it. Filename is the key, CSS is the converted value. And write back the converted map.



When we generate CSS for the screenshot, we read from this file, use includedCssModules (an array of CSS paths), get the content of the necessary files and do it join() .


It remains to collect everything together.


Final assembly



Generate the final HTML. First we set the styles that turn off the dynamics (animations). In the second style, all the glued CSS that we found connects. Each test will have its own set of these styles, because when we make a require() component, it pulls up its dependencies, which will be on our list. As a result, only used CSS files are connected, and not all CSS in the project. HTML, respectively, we received earlier - this is the code of the component itself.


As a result, we have achieved our goal. We can generate the HTML component in the right state plus the CSS that it needs.


So, all the markup is assembled, the animation is off - everything is ready to take a screenshot. The solutions are not perfect, you can do better, but for this you need to dig further inside Jest, Babel, CSS Modules and so on to get more elegant and stable solutions. But on the whole, this suits us, and we can move on.


Screenshots


Today, taking screenshots in the browser is quite simple. A few years ago, this could be a difficult task, it was necessary to use complex solutions. Today, there are headless browsers that run without a GUI, in which you can download arbitrary code and watch how it works, including taking screenshots.


Also, all modern browsers support WebDriver. If you use, for example, Selenium, then everything is relatively simple. There are libraries, helpers that simplify writing tests for such environments.


In our case, we made simple comparisons using a single browser. While there was no need to do cross-browser comparisons, we used Puppeteer , a special library that can run Chrome headless and provides a fairly convenient interface for working with it. Here is the main code that takes the screenshot.



Puppeteer is connected here, the browser starts, and when it is necessary to take a screenshot, we call the function screenshot() with some HTML. This function creates a new page, inserts the transmitted HTML into it, takes a screenshot, closes the page and gives us the result of the screenshot. Works. Not difficult. But it turned out, not so simple.


The fact is that when we run the code locally, everything works fine for us. We have a reference image and a new one, because we create a new image in the same browser version, in the same system where we made the reference. But when we started to run all this on CI, where we no longer have Mac, not Windows, but Linux, our own version of Chrome, our own anti-aliasing rules, our fonts and so on, the images turned out different. That is, they began to get different results.


What to do? There are several solutions. Some solutions try to overcome this difference with the help of mathematics. They do not compare pixel by pixel, but the pixel and neighboring pixels - that is, a non-rigorous comparison with a certain tolerance. This is expensive and somehow strange, I would just like to compare pixel by pixel.


We went in the direction of another solution: make an external microservice where you can send a POST request with HTML code, and at the output get an image, a screenshot, which we need.


What are the advantages we got? It doesn’t depend on the machine where the tests are run, you can update, change the browser version - it doesn’t matter, on the microserver side there is always the same browser that gives the same result.


Also, local settings for a new project are not required. No need to launch browsers, configure Puppeteer and other things, we just make a POST request and get an image. It turns out even faster, oddly enough, although there are network costs. We send a request to the service, there are caches and a warmed-up browser that very quickly gives an image. Plus PNG is small enough, they shake well, network traffic is not very large.


Cons too. Service may fall at any time, you need to monitor its "health". Everyone knows that the browser can “eat up” a lot of memory, even if simple pages are visited. A service can suddenly pour in, it can’t cope, its resources are limited. And if the service crashes, it cannot render images - our tests fail.


Accordingly, if everyone at the same time decides to check the screenshots (run), it may turn out that either they have to wait a long time, or the service stops responding, because there is a big load. The latter is being solved more or less: we have a local cloud where we can create several service instances and distribute the load. Nevertheless, such a problem exists. What does the death of a service look like?



There is a certain instance of the service that works, it has a limited amount of memory (in this case 1 GB), it can “eat” all available memory and stop responding. In such cases, only a reboot helps.


The microservice solution has another side. When we took screenshots by code, the idea came up to teach the service to give screenshots not only by code, but also by URL, plus a selector. What the service does in this case: goes to the page, and either takes a full screenshot of the page, or the block that the passed selector matches. This turned out to be convenient and useful for other tasks that are not at all related to testing. For example, now we are experimenting: we insert screenshots of pages into the documentation, into our knowledge base, instructions for our services, parts of the site, using for images ( <img> ) as the URL to the service. It turns out that when we go into the documentation, we always have actual screenshots. And you do not need to constantly update them. This is a very interesting solution, which turned out by itself.


The application of the method, when we can use the URL to the screenshot service as an image, and thus get a screenshot of the page or block, is very useful for other tasks, not only for documentation. For example, you can build functional site graphs: in each block of the graph there may be screenshots of pages or blocks that will be updated with each rollout of the site.


Image Comparison


So, we pass directly to testing with screenshots. We got the code, got screenshots from this code, it remains to compare them.


Let me remind you, we use Jest to test components. It is important.
Jest has a killer feature - a comparison of snapshots.



We do markup of objects, some data, and we can call a method toMatchSnapshot() for this data. How the method works:


  • casts the checked value to a string using various methods like JSON.stringify() ;
  • if the test is new, the result is stored in the local result database for subsequent comparison;
  • if the test is restarted, then the previous snapshot is taken from the database and two lines are compared, whether they are equal or not. If they are not equal, then there will be an error, that is, the test fails. If the test has been deleted, Jest notes that there is an irrelevant snapshot in the database so that there is no garbage.

Using the method, toMatchSnapshot() we can check whether the markup (HTML) of the components has changed or not, and we do not need to write code in order for snapshots to compare, update, store and so on. Magic!


But back to the image comparison. We have binary images, this is not a string representation. There are no built-in tools for comparing images in Jest yet. There is a ticket on GitHub on this topic , they are waiting for a pull request. Maybe they’ll do it themselves with time. But at the moment there is a plugin from American Express - jest-image-snapshot . It is well suited to start to immediately begin to compare binary images. It looks like this:



We plug this module and extend it with a expect new method toMatchImageSnapshot() , which is taken from jest-image-snapshot .



Now we can add screenshot comparison to the test. Getting a screenshot is an asynchronous operation, because we need to make a request to the server, wait for a response, and only then we get the contents of the image.


When the image is received, we make a call toMatchImageSnapshot() . About the same story is happening as with toMatchSnapshot() . If the test is new, then the image is saved as is to the file. If it’s not new, then we read the file that we have, look at what has come and compare (more on this later), are they equal or not.


What problems does this plugin have? When we started using it, it was not fast enough. Recently it has become faster, which is already good. Also, it does not account for Jest modes. (It’s interesting here that Jest works in several modes: by default, it simply appends new snapshots to your database or disk, there is an update mode when snapshots are overwritten or deleted are no longer relevant. There is a CI mode in which files are not created, are not updated and are not deleted, only what is and what came to us is compared). So, the plugin jest-image-snapshot does not take this into account, it always writes files, regardless of which mode we have turned on, which is bad for CI, because CI can falsely pass a test for which there simply is no snapshot.


Also, the plugin does not work well with counters and does not know how to delete obsolete images: over time, tests change, the description for them changes, images that are not relevant can remain in the code base, they just lie dead weight.


But the key issue is performance. When we compared 800x600 images, we got 1.5-2 seconds per image. How bad is that? For example, we now have over 300 images. We expect that in the near future the score of screenshots will go to thousands, time will constantly grow. But 300 screenshots are already five minutes.


We forked jest-image-snapshot and started fixing problems. First of all - image comparison. We started with the fact that we had about 300 screenshots and initialization (when we say: “take screenshots for all components and just save as a reference”) took 10-20 seconds, quickly enough. At the same time, a check was done for 4.5 minutes. It turns out that the check takes several times more than taking the screenshot itself. And this despite the fact that the time is indicated for three Jest threads, in one thread it turns out about 12-15 minutes.


Why so slow? Each time he jest-image-snapshot receives a new image, he compares them pixel by pixel through blink-diff , a special library that can compare two PNGs.


We will calculate what is needed to compare two 800x600 images. One such image is 480 thousand pixels. Two images - twice as much. Each pixel is 4 bytes (RGB and transparency). We get about 2 MB per image. To get the answer, whether the images are equal or not, we need to allocate about 4 MB of memory. 300 images - 300 MB each, 4 MB each, and then free.


As you know, this is a blow to the Garbage Collector and other things, that is, not very well. In addition, we need to go through the arrays, compare element by element. That is, the number of pixels already plays a role here: you need to sort through half a million elements in two arrays and compare.



Here's how image comparison libraries work. For example, blink-diff , with which we started, spends about 1.5-2 seconds when comparing images (800x600). There is a faster comparison implementation - the pixelmatch library . She works three times faster. By the way, we jest-image-snapshot have already transferred to it (but after we forked). There is looks-the same from the Gemini team, which can do extra magic, but uses the same pixelmatch on the hood and the comparison speed is comparable.


How to make it faster? An important observation helped here: during repeated runs of the tests (namely, in this case, a comparison is necessary), most of the images coincide. Therefore, you can start comparing files without decoding them. PNG compresses well, the file size is not very large (for example, for 800x600 it is usually several tens of kilobytes, and not 2 megabytes, as in the expanded form). And if the files are not equal byte, then compare pixel by pixel.


How to quickly compare two files? Of course, a hash!



We use the built-in module node.js - crypto which can do hashes. We write a small function that does md5 or sha1 , it doesn’t matter. Add binary data, and get hashes in hex , for example.


Let's take a look at the time. For 4 KB images, approximately 58 thousand comparison operations per second are obtained, which is not bad. For 137 Kb it turns out already about 2 thousand. These seem to be pretty good numbers.


But this is wrong. Getting a hash is an overkill. We essentially do a comparison of the two buffers, counting the hash from them. And what is a buffer? This is an array of numbers, and to compare them, you need to go through these arrays byte-by-bye (or non-byte-by-bit) and compare whether they are equal or not equal. To calculate the hash, you must also go through both arrays and additionally perform many operations to calculate the hash. A simpler solution is this:



Buffer instances have a method equals() (there is still a method compare() that returns –1, 1, and 0, which is useful for sorting). The bottom line is that we get two buffers, reading the files, and call the method equals() . The implementation of this method is very fast: if you have a 4 KB image, you can perform about 3 million comparison operations per second, and if you have a 137 KB image, 148 thousand operations. This is 50-70 times faster than using a hash.


Pixel by pixel comparison


So, we answered the question whether the files are equal or not. And now we know how to do it very quickly. And if they are not equal, then you need to compare pixel by pixel. There are libraries for this, but why not make a comparison yourself? We saw the numbers of the libraries, but we wanted to try it ourselves, how much faster it would be.



You can notice that a lot of preparatory work is being done here: in order to compare two PNGs, you must first decode them, because they store information in a compressed form. For encoding, GZIP is used, which packs the data, and before that, cunning line-by-line algorithms are used, depending on the structure of the image that is being encoded.


So, the comparison function receives two buffers ( actualImage and expectedImage ), and the first thing it does is decode using the fast-png expectedImage library . Next, it creates Uint32Array arrays for the same buffers. It seems that an additional copy of the memory is being done, but not really. The fact is that when typed arrays are created with a buffer passed to them, they do not create a copy of memory. They use the same piece of memory as the transferred buffer, but they get a different data access interface. In this case, in actual.data for example, a byte array will be stored, and when a Uint32Array is created, then a four-byte addressing to the same array is obtained. Thus, you can compare much faster, not by one byte, but by four at once. These will be our pixels: we go through the array, compare whether the pixels are equal or not, and compare how many inequalities happened.


To then calculate the percentage of non-matching pixels, we return the height and width of the image that we decoded (consider:) count / (width * height) .


Let's calculate the time of comparison of two images, when you need to count the number of pixels. For 800x600, this is ~ 100 ms, several times faster than the libraries that we could use for this operation.


Diff Image Generation


The next question: how to see the difference with the naked eye? How does it even work?


To do this, an additional buffer is created that will store this difference between the two images. All matching pixels are discolored and highlighted so that there is a hint of the original image, but this did not interfere much to display the difference. And for mismatched pixels, we write a red pixel in the diff buffer.



This function is similar to the previous one, only a few new lines are added here. First, a new buffer is allocated, we make alloc() the same size as the current image (they are equal in length). Create a Uint32Array for it. In addition, two additional sections appeared in the loop. In the case where the pixels are not equal, we simply write a red pixel there. These are four bytes. The leftmost is transparency, we make it as opaque as possible. Then blue, green and red at the end. Red is a completely red pixel. Next, we bleach and highlight the pixels, and in the end we additionally return all three buffers.


How to highlight pixels?



We take a pixel, take three components - red, green, blue, - put it together, divide by the maximum amount - it turns out three times in 255. The brightness of gray is obtained. Next, we transform this brightness so that it is in the upper quarter of the range. In this case, the image is less contrast and the difference in the images is better visible. We take the final step gray (this is the same gradation of gray) and add it back to the pixel. In addition, we maintain transparency from the original pixel.


Thus, we get three images: a standard, a new screenshot and the difference between them. Now you need to make a composition to see what happened, what became, and what is the difference between them.



We glue three buffers ( actual , expected and diff ) with Buffer.concat() , a method that receives an array of buffers and returns a new merged buffer. Next, we use the same fast-png library, but already for PNG encoding. We pass the combined buffer into it, the original width, and the height is three times as large, because there will be three images in a row.



At the output, we get a reference image from above, a new one from below (where I added my hand), and diff in the middle: bleached colors, the picture is gray. What does not match? Red hand.


What's in the numbers? Encoding takes a bit longer than decoding, so it turns out 250 ms for two images (full time, that is, comparing and generating a diff image). Acceptable. Faster than other libraries.


A few thoughts on implementation. The described solution was made on JS, it uses PNG decoding / encoding, GZIP unpacking / packing and other operations. This is all work with buffers, work with bytes. Today there is WebAssembly and other things that are well suited for this and can speed up such operations. In theory, it is quite possible to make a library that will do the same, but faster.


Another observation: we create a diff image so that the developer sees the difference. But in fact, we do not need diff! Many tools have built-in image comparisons that work with Git. Moreover, they are more functional than three consecutive images. For example, GitHub has this feature, and many clients for it. We use BitBucket (Stash), there it looks like this:



Using these tools, you can see what has changed in the image. Why then do the diff? Therefore, we create it only when the user (developer) asked about it. And this rarely happens.


Jest, by the way, has a special mode expand . In this mode, when we look for the difference between two non-matching values, we should be provided with the most complete information about the difference. By default, Jest shows a minimum of information to get a basic idea of ​​what's wrong with the test. If there are not enough details, we launch it with a flag --expand , and we are given more information. Our fork does just that: if expand it is not set, we simply report how many pixels did not match, and if it is transmitted, we additionally generate a diff image.


Even faster?


Is it still possible to do faster? It would seem that everything is already good. But we constantly get a new screenshot. What does it mean? This means you need to make a new request to our service, or raise the browser and run the code there. But in most cases, when we take a screenshot, we get the same result for the same code. Why, then, generate a new screenshot if the code from which the previous screenshot was generated matches what we got now? In this case, in theory, you do not need to make requests to the screenshot service.


When testing with screenshots, the first thing we do is generate an HTML component with all the styles and dependencies, that is, the code by which the screenshot will be taken. After that we have:


  • reference image (screenshot);
  • the code from which this reference image is generated;
  • new code to test.

First of all, we compare the old and new code. If they match, then we do not need to make a request to the service to get a screenshot. In this case, we use the reference image, as if we received it from the screenshot service. And if there is no image or the code differs in something, only then we perform a request to the service.


Each time we run the tests, regardless of whether the images coincide (reference and new), we save the current code to a file (if CI mode is not enabled, of course). That is, we save the code from which the image is obtained. And this code is used in the next test run to determine whether to send a request for a screenshot service or not. It turns out that we unload the service without making requests to it unnecessarily.


But the question is: how do we store this code? At first I had a crazy idea - in the image itself. Why not? I knew that you can store arbitrary data in PNG, but I wanted to understand how simple it is. After all, GIF could be overcome, why not play with PNG?


Open Wiki , read about PNG format. We read how several libraries work with PNG. Everything turns out to be simpler than with GIF. If in GIF sections and blocks of different lengths with different formats, then in PNG all sections are of the same type and consist of 4 parts: block length, block type, checksum and data itself.


To add code (text) to PNG, you need to add a new section to the file. And we can both write the section and read it back. Here is the solution . What is it? When we take a new screenshot, we take the current screenshot, read the section with the code from it and compare it with what we just generated. If they match, then we use the image as is.


It worked, it works very cool. But there is a problem - the human factor. Since we store the code in the image itself, this affects its contents in terms of bytes. At the same time, the reference code and the new one can generate the same image in terms of pixels (for example, when changes in the code do not affect the visual part). And it turns out a strange story that the file has changed (byte composition), but comparison tools show that there are no visual differences - all the pixels are the same. Therefore, you can store the code in the file itself (and this is a cool solution in my opinion), but it is not very suitable from the point of view of human perception. As a result, we store the screenshot code in a separate file, next to the image file.


When we added a signature verification (code for generating screenshots) to avoid unnecessary requests for the service, the test time fell significantly. Previously, we waited 45-50 seconds for all unit tests to pass, but now the tests pass in 12-15 seconds locally. It is time for a complete check of approximately 300 sufficiently large images (800x600). The bonus was reduced load on the screenshot service.


Imagine: before, when someone ran a unit test, the service received about 300 requests in a few seconds. If several colleagues ran tests at one moment, then the service became rather sad (generating a screenshot is not a cheap operation). Now everything is much better in terms of service load.


GIT image storage


Another topic that few people remember, because it is a bit off the front end, is storing the image in Git. We have to store the reference images in Git itself, so that there is something to compare, next to the tests.


The problem is that Git, like many VCS, is mainly designed to work with text files. So it does not store all each version of text files, instead it stores a list of deltas of changes (patches of the original file). For binary data, this story does not work, Git does not know how to make a delta for binary files, therefore it saves each version of binary files as a whole. This greatly inflates the story of Git.


The plugin for Git - GIT LFS helps to solve the problem . The theme is quite developed, it is supported by most of the tools for working with Git. Here's how it works:



With certain settings for the binary file, Git itself saves a text file, usually consisting of three lines: the version of GIT LFS, the hash of the file and its size. And the images themselves are stored in a special storage for binary data. When you do a pull repository locally, the images are downloaded and saved to disk as real files. When you do push, the image itself is uploaded to the repository, and its hash and size are written in Git. That is, git push / pull works as usual and does all the magic.


CI, or beyond the frontend


Another unexpected moment: when we dispersed all this to 12-15 seconds locally, I thought that on CI everything got faster. But it was not so. At CI, time fell, but not significantly: it took 14 seconds there, it became 4 seconds. In general, cool: during this time, 700+ snapshots were checked, of which 300+ images. But if you look at the full time of the task, all the checks turned out 3.5 minutes. It seems to be a moment for sadness. One could wait for someone else to do better, for example, one could find a good devops engineer to fix and figure it out, but I decided why not try it myself? So much has already been done, why not get into TeamCity and see what the problem is.


A lot of interesting things happen inside CI: staging is done when git is raised, git checkout is done, dependencies are made, then unit tests, eslint-s, stylelint-s and so on are launched. It works out every time. If you run "on cold", it took, in our case, 3.5 minutes. But if you dig deeper into the settings, configure the checkmarks and caches, of which there are a lot, it turns out that you can overclock greatly. Due to a number of settings, the time dropped to 30 seconds or even less.


results


As a result, now all unit tests of the component library are performed locally in 12-15 seconds. This time includes checking 300 fairly large screenshots (800x600). Passing all checks on CI - 20-30 seconds. Time is important: when we commit to the repository, we need to wait for all the checks to pass to freeze the branch. If all the checks pass 3.5 minutes, we have to wait, it slows us down. 20-30 seconds is quite acceptable.


Our plugin for Jest. It turned out to solve many problems, everything works correctly: counters are correctly counted, modes are taken into account, expand is supported, it works quickly, signatures are supported (the code from which the screenshot is obtained). The plans are as follows: we will try, most likely, to copy this into Jest. Plan B, if it doesn’t work out, either merge with jest-image-snapshot or publish as a standalone package, whichever jest-image-snapshot is easier.


Everything else is “home solutions” that we don’t know yet whether to publish or not. When we’ll run in and be sure that everything works correctly, well and stably, let's think about Open Source.


Summary


As a result, we wrote several plugins for Babel, CSS Modules, Jest. It turned out a solution that is completely under our control, we know how it works, we can tune and expand as we like to our tasks and requirements. And, most importantly, the time of unit tests was not affected by the fact that we added verification with screenshots.



Two resulting numbers: here on the left for 11 seconds without screenshots, and on the right along with screenshots. Time is within the margin of error. 328 images were checked: they are relevant and there are no differences.


The main bonus for us: working on this task, we realized that screenshots can be used not only for testing. There are interesting cases that we will further develop: use in documentation, tools, and so on. This is a different story, if something works out, we’ll talk about it.


What have we learned? We learned how Jest works inside, how GIF, PNG formats are arranged, what to do with these images. We began to better understand the Buffer API, how TeamCity is configured, and how to make unit testing screenshots quickly enough.


My main thesis: unit testing with screenshots can work with bullet speed. You need to think about it and look for options. And do not be afraid to make your decisions, they lead to unexpected results.


That's all. Thank!