Graal: how to use the new JVM JVM compiler in real life

At the main Siberian Java-conference JBreak-2018 , held in Novosibirsk, Christian Thalinger from Twitter shared his practical experience of using Graal. The company (Peter-Service) sent our entire working group to the conference, and we came to listen to this report as a whole. Understandably, given the fact that Graal is still considered a bold and potentially dangerous experiment (although it is very likely that it will go into JDK 10). It was very interesting to know how this new product manifests itself in battle - not just anywhere, but in the development of such a level.

Christian Talinger has been working with Java virtual machines for more than a dozen years, and the key skill in his expertise is just JIT compilers. It was Christian who introduced Graal and initiated its current (quite, according to Chris, active) use in the production environment of Twitter. And, according to Talinger, this innovation saves the company a decent amount of money by saving iron resources.

In this interview with JBreak organizers, Christian lucidly explains the basics - what is Graal and how to manage it. But the report in Novosibirsk was more practice-oriented: its main task was to show the audience how to simply and painlessly start working with Graal, and why it is worth trying to do it.

To begin with - after all, a couple of theoretical introduction. So what is a JIT just-in-time compiler? To run the Java program, you need to perform several steps: first compile the source code in the instructions for the JVM - bytecode, and then run this bytecode in the JVM. Here the JVM acts as an interpreter. The JIT compiler was created to speed up Java applications: it optimizes the bytecode to be launched by translating it into low-level machine instructions directly during program execution.

HotSpot / OpenJDK uses two levels of JIT compilation implemented in C ++. These are C1 and C2 (also known as client and server). By default, they work together: first, a quick but superficial optimization is performed using C1, and then the hottest methods are further optimized using C2.

In Java 9, JEP-243 implemented a mechanism for embedding a Java compiler in the JVM. And this is the dynamic compiler - JVMCI (Java Virtual Machine Compiler Interface). Actually, this mechanism supports Graal. I must say that in Java 9 Graal was already available as part of JEP-295 - AOT-compilation (Ahead-of-time) in the JVM. True, even though the AOT compilation mechanisms use Graal as a compiler, this JEP states that the initial integration of Graal code into the JDK is supposed only within the Linux / x64 platform.

Thus, to try Graal, you need to take the JDK with AOT and JMVCI. Moreover, if you need to run on MacOS or Windows platforms, you will have to wait for the release of Java 10 (in the corresponding ticket JDK-8172670 fix version is put in the top ten).

Here Christian drew the attention of listeners to the fact that in the current JDK distributions, the Graal version, to put it mildly, is outdated (either a year ago, or even younger). But here Java 9 modularity comes to our aid. Thanks to it, we can collect the latest version from Graal sources and embed it in the JVM using the command --upgrade-module-path. Since the development of Graal was started long before the module system, a special tool is used for its assembly - mx, which to some extent repeats the modular Java system. The tool runs on Python 2.7, all links can be found in the Graal repository on GitHub .
That is, we first deflate and install mx, then deflate Graal and assemble it into a module via mx, which then replaces the original module in the JDK.

At first glance, these manipulations may seem complicated and time-consuming, but in reality this trait is not so terrible. And in principle, the ability to replace the Graal version, without waiting for the release of the patch on the JDK or even the new JDK, personally seems to me more than convenient. At least Christian showed how he himself collected all this live on machines in the cloud. True, an error occurred while assembling Truffle - some additional dependencies were required installed on the machine. But Graal assembled correctly and was then used in this form (from which we conclude that you can completely forget about Truffle: Graal is completely independent of it).

Next: in order for the JVM to start using Graal, you need to set 3 additional flags:

-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:-EnableJVMCI

Since Graal is essentially a normal Java application, it also needs to compile and prepare itself for work (the so-called bootstrapping). In the "on-demand" mode, this happens in parallel with the start of the application, in which case Graal uses C1 to optimize its code.

There is also the option to explicitly start initialization before starting the application, and in this scenario you can even instruct Graal to optimize itself. However, this usually takes much more time and does not provide significant benefits. The grail is initialized a little longer than C1 / C2, and more actively uses free processor power due to the fact that it needs to compile more classes. But these differences are not so great and are practically leveled, lost in the general noise during application initialization.
In addition, since Graal is written in Java, it uses heap to initialize (in the case of C1 / C2, memory is also used, only through malloc). The main memory consumption is at the start of the application. Both Graal and C1 / C2 use free kernels when compiling. Grail memory consumption can be monitored by enabling GC logging (currently there is no isolation of the hip to initialize Graal from the main hip of the application).

Well, we learned how to set it all up - it's time to understand why. What are the benefits of using Graal?

Christian used a practical example to answer this question. He launched a couple of benchmarks from one project written in Scala: one was actively working with the CPU, and the other was more actively interacting with memory. On the benchmark that worked with the CPU, when using Graal, there was a noticeable slowdown on average by a second due to a longer start (the benchmark itself took 5 seconds to complete). But on the second benchmark, Graal showed quite a good result - ~ 20 seconds against ~ 28 on C1 / C2. And this despite the fact that, as Christian noted, the example with Scala Graal does not work as well as it could (due to the dynamic structure of the bytecode generated by Scala). That is, we can hope that in the case of a pure Java application, everything should be even better.

Plus, when displaying GC logs, it was clear that with Graal the application produces much less garbage collections (about 2 times). This is due to a more efficient escape analysis, which allows you to optimize the number of objects created on heap.

Summarizing my personal impressions of what I heard, I’ll say that the report seemed to me quite comprehensive, and did not at all carry an advertising message in the spirit of “all urgently switch to Graal”. It is clear that there is no magic pill, and everything is always determined by the real application - Christian himself admits that the specific values, of course, depend on specific benchmarks. Anyone who decides to try the Grail, in any case, will have to use the scientific poke method, run and (probably) find bugs (and better then edit them and fill out pull requests in the Graal repo).

But overall, with the current trend towards the use of microservices and stateless applications - and, as a result, towards a more active (and correct) application of Young Gen - Graal looks very good.

So, if the project can be translated with little blood into Java 9 (or written from scratch on it), I would definitely try Graal. And I, for example, was even pleased that the emphasis in the report was made specifically on Graal as a JIT compiler - because, on the whole, an ordinary Java developer needs it in that quality (that is, without Truffel and other things GraalVM, which Oracle has recently combined into a framework for development and runtime for various languages ​​based on JVM). It would be interesting to test the memory costs and see how noticeable the difference between the standard C1 / C2 and Graal is. On the other hand, despite the fact that a fairly decent amount of memory is allocated to the application nowadays, and its main amount is consumed at startup (and today it is usually the initialization and start of the container that already launches the application itself), these numbers видимо, в любом случае не столь значимы.

Here you can download the presentation from the report.

In truth, I personally became so interested in the idea that I plan to repeat all the steps Christian did, but try to run Java benchmark suites directly (for example, DaCapo and SPECjvm2008 - I’m not so good at Java benchmarking, so I would be grateful if someone will suggest more appropriate options in the comments or hp). Well, and closer to the specifics of the work - I'll try to sketch out a simple web application (for example, SpringBoot + Jetty + PostgreSQL), drive under load and compare the numbers. I promise to share the results with the community.