Side Effects Cryptography

Thanks to Richard Snowden, more and more people now know what the NSA is and what it does. Based on the internal presentations that have been disclosed, it is obvious that the NSA spends a lot of effort not only on collecting traffic and introducing the "right" programs on the network of Internet providers and software giants, but also on analyzing cryptographic algorithms. 178-page document with the national security budget for 2013. It follows that 11 billion dollars were spent on the Consolidated Cryptologic Program project. What can be done for that kind of money? Surely spend with benefit. For example, the construction of a giant computing center in Utah for $ 2 billion, right in the Mormon den. The center contains 2300 m2 of server space, has its own power plant of 65 megawatts and 60 thousand tons of refrigeration equipment to cool it all. In 2012, it was veiled from official lips that the NSA had recently achieved breakthrough successes in cryptanalysis and hacking complex systems. Is this why they needed a new data center? Cryptography guru Bruce Schneier commented on these statements and expressed the opinion that the NSA is unlikely to be able to crack any modern, strong cipher, such as AES, in the near future. And then he made the assumption that the NSA would focus its efforts not on “honest” hacking of algorithms, but on finding vulnerabilities in the implementation of these algorithms. Bruce identified several areas where success can be achieved:
  • attack on a key generation procedure where hacking random number sensors are used
  • attack on a weak link in data transmission (for example, a channel is well protected, and a network switch is bad)
  • attack on ciphers with weak keys, which still remains in some places due to oversight of system administrators (a good candidate is RSA with a 1024-bit key)
  • side effects attack

Let's try to figure out what attacks on side effects are.

A great many works and studies have been devoted to the analysis of the security of various cryptosystems. At the same time, the “cryptosystem” is considered very widely here - it can be a certain encryption protocol, a hardware device, or a whole software and hardware system with servers, subscribers, etc. First of all, the system’s ability to withstand a certain kind of attack is analyzed: cracker (in the literature on cryptography he is usually given the name Eve or Mallory), who has access to a certain set of knowledge and tools. He uses them to hack into the system: he tries to calculate the key, read encrypted messages, and replace data or digital signatures. If a potential attacker cannot access the secret information of this system, using plausible computer power for a reasonable time, the system is considered stable. A good form in cryptography is considered not the invention of its own algorithm, but the use of methods that are persistent and time-tested, since they are carefully studied and their strength is usually supported by mathematics. The question arises: if in my device I use an algorithm based, for example, on some proven theorem, can I be calm (at least until the invention of quantum computers)? It turns out, no.

Traditional models of security analysis consider the “experimental” object as a kind of black box that performs a certain operation, for example, encryption: plain text is sent to the input, encrypted appears at the output. Let also a key be stored inside this box (as an option, it can be set from the outside, be constant, or be generated for each session - it does not matter). The main thing is that the key is not accessible to the outside world, and Eva is content with the hacker's standard arsenal: intercepting data at the input and output, the ability to supply arbitrary data in large quantities, accurate knowledge of the algorithm itself and its parameters, etc.

Interacting with the outside world through radiation, electricity consumption, as well as any other recorded manifestations, electronic devices or computer programs on these devices can give an attacker a lot of useful information. These manifestations are called side effects, sometimes side effects, and in English. literature has an established term - the Side Channel. For a hacker, this information can be “worth its weight in gold”, as it can save him from the unpleasant prospect of exhaustive search. By analogy with opening a door lock: why would a thief-house-holder take a long time to pick master keys, if it is enough to unscrew the bolts on which the door is mounted?

The origins


The first serious mention of the “benefits” of side effects can be found in the autobiography of a British Mi-5 agent, Peter Wright, which describes an attempt to crack the cipher of a rotary cryptographic machine that was stationed at the Egyptian embassy in London in the 60s. They obviously lacked computing power in those days, and then the author suggested installing a microphone and recording clicks emitted by the machine, while every morning it was put in order by a technician. Thus, it was possible to calculate the position of two of the three wheels, so that in the end it helped to open the code.

Comprehensive side effects began to be considered in 1995, after the publication of research by Paul Kocher, in which he proved the presence of a statistical relationship between the value of the secret key and the time spent by the crypto device to calculate the exponentiation operation. Since then, it has become clear that the implementation of cryptography on a certain hardware device does not look so "reinforced concrete" anymore. An important place in ensuring the security of various systems is occupied by cryptoprocessors. They are optimized for crypto operations, execute their own isolated code that does not interact with the hostile environment in any way, and store the secret key in their own memory, which often has protection - when an unauthorized read is detected, key material is destroyed. С развитием анализа побочных эффектов криптопроцессоры уже не кажутся такими стойкими по крайней мере по двум причинам:
  • they are available (bank cards with a chip, SIM cards, DRM chips in tablets and laptops, tokens - we encounter all this every day)
  • deploying an attack on side effects does not require any gigantic computing power and ultra-expensive equipment. For example, a digital oscilloscope with a resolution of 100 MHz costs around $ 2000


Attack Classification


Attacks on side effects in the literature are usually classified according to several independent criteria.
1. Upon intervention

  • passive attack : the attacker does not interfere with the operation of the device and is only an observer collecting information. In this case, the “experimental” device works exactly as if the attack did not occur
  • active attack : the attacker interferes with the operation of the device by acting on both external and internal interfaces, or at least it turns on the device and initiates its work

2. By degree of intervention

  • intrusion attack : a direct impact on the inside of the device. It can be either simple electrical measurements in conductors, or quite hardcore methods, for example, creating conductive structures by ion irradiation, laser cutting of crystals, deliberate interference with the purpose of influencing the execution of instructions, etc.
  • close attack without intrusion : attackers measure everything that is measured, but do not interfere with the normal operation of the device. Usually measure power consumption and runtime, and also read signals in accessible conductors
  • attack from a distance : similar to an attack without intrusion, but in this case there is no physical access to the device, which means that the set of measured parameters is greatly narrowed. On the other hand, the goal of such an attack can be not only an isolated device, but also a network service or just a program

3. By the method of data analysis

  • simple analysis : a series of a small number of measurements is carried out and each measurement is analyzed separately - they try to identify the relationship between the instructions being executed and the information being leaked. Separately, it means that they are not trying to reveal correlations between different measurements, as well as how the change in the initial data affects the change in the observed data. The main problem with this method is to separate the actual manifestations of classified information from useless noise.
  • differential analysis : trying to identify the relationship between the source data and the observed data. This is achieved by carrying out a large number of measurements and subsequent statistical analysis of the results. The analysis helps to level out the influence of noise - allows you to "separate flies from cutlets." At the same time, the attacker is repelled by some simplified model of the device of interest to him:



Practice - Time Attack on RSA Encryption


We describe the popular and well-studied in the literature time attack on RSA encryption. The basis of the attack is an accurate measurement of the time spent on performing crypto operations. Here it is assumed that the hacker has the device itself, the equipment necessary for supplying data to the input and measuring time stamps, and he also reliably knows that RSA is used in the device.

Time attacks are, in principle, possible due to the fact that the attacked device spends different times on crypto operations, both depending on the input data (whether it is a cipher or plaintext), or on the key. This difference is amplified by optimization: the processor “seeks” to optimize to the maximum and, as a result, some operations can be performed much faster than we theoretically expected.

Как известно, в основе алгоритма RSA (и Diffie-Hellman то же) лежит операция возведения в степень по модулю. Предлагаю использовать следующие обозначения:

m – открытый текст (от message )
c – шифр (от cipher )
{e, n} – открытый ключ (от encryption – берется при шифровании)
{d, n} – закрытый ключ (от decryption – берется при расшифровании)
w – разрядность ключа (от width )
d[i] – i-й бит ключа

Тогда процесс расшифрования будет выглядеть так:
m = c^d mod n
Here n is publicly available, as it is part of the public key, and c can be obtained by listening to the data channel. Our goal is to find d.

There are various algorithms for calculating this formula, since calculating it "forehead" is too expensive. Suppose that our crypto device uses the simplest fast exponentiation algorithm, often called the binary exposure algorithm, or, as in foreign sources, square and multiply or exponentiation by squaring. Here is his pseudo code:
int squareAndMultiply(c, d, n) {
	R = array(0..w-1)
	S = array(0..w-1)
	s(0) = 1
	for (k = 0, k < w, k++) {
		if (d[k] == 1)
			R(k) = (s(k) * y) mod n
		else
			R(k) = s(k)
		s(k+1) = (R(k) ^ 2) mod n
	}
	return R(w–1)
}

Obviously, iterations for the zero bits of the key will take less time than for single bits, since in the first case the processor does the multiplication, and in the second only the assignment. We show how to calculate the entire key by measuring the execution time of the squareAndMultiply function. There will be no exact formulas; we describe the general meaning.
The method proposed by Kosher consists in iteratively calculating the key bits: first, bit 0, then bit 1, and so on, with each such iteration having the following properties:
  • some bits have already been computed (or none if this is the 1st iteration)
  • the remaining bits are unknown, including the current bit being examined, but their values ​​are equally distributed (if this is not the case, then the random number sensor used to generate the key should be copied)

At each iteration, the hacker takes a large number of measurements, the purpose of each of which is to obtain 3 values:
  1. total function execution time
  2. time taken by the system to process already known key bits
  3. time per operation (s (k) * y) mod n in our algorithm

Parameter 1 is easiest to measure by supplying a certain ciphertext to the input, 2 and 3 are more complicated, but also real, if you know the features of the physical implementation of the device or slightly “dissect” it. Then the time taken to process all unknown bits following the test bit will be equal to:
  • T = T1 - T2 , if the studied bit = 0
  • T = T1 - T2 - T3 , if the studied bit = 1

Having made many measurements for this bit under study and having accumulated a whole "heap" of T values, we can calculate the probabilities that this bit is 1 and 0, respectively, using conditional probability formulas.

We just looked at the simplest exponentiation algorithm for a time attack. In modern cryptosystems, it is rarely used, and instead, more optimal methods are used. This is usually an algorithm based on the Chinese remainder theorem, a Montgomery algorithm, or a Karatsuba algorithm. But even these "advanced" algorithms, when applied in their pure form, are prone to temporary attacks, which was proved by a successful attack on the OpenSSL server.

Attack on OpenSSL Server


Initially, it was believed that the fate of temporary attacks is exclusively hardware devices: a hacker takes, for example, a smart card, hangs it with sensors and devices, and then extracts the secret keys. But as David Brumley and Dan Boni from Stanford University showed, software is subject to a temporary attack, in particular, a Web server that accepts an SSL connection using the "stock" OpenSSL library version 0.9.7. And this is despite the fact that a typical server performs many parallel processes, and the channel of access to the server also contributes. However, three successful attack options were made:
  1. Client attack on the Apache web server, accessible over the local network. For greater conviction, the channel between the client and the server included several routers and switches
  2. Attack of one process to another, while both running within the same machine
  3. An attack on a distributed key storage system in which a private key is stored on an isolated virtual machine. Here, the Web server itself is not involved in decrypting data, but makes requests to the key machine

SSL / TLS is described in detail in RFC 6101 (SSL v3.0), 2246 (TLS v1.0), 4346 (TLS v1.1), 5246 (TLS v1.2) and 6176 (TLS v2.0), therefore we focus on the part that is the target of the attack. So, a typical handshake includes the following steps:
  1. The client sends a ClientHello
  2. message in which it transmits a 28-byte random number and a list of supported ciphers
  3. The server sends a ServerHello
  4. message , similar to the client
  5. The server sends a Certificate
  6. message with its certificate
  7. The client, having received the server certificate, extracts the server’s public key from it, encrypts it with a 48-byte random number, and passes it to the server in the ClientKeyExchange
  8. message
  9. The server decrypts the random number of the client with its private key, after which the mutual master key is calculated

Note that in clause 4, the block is formatted in accordance with the PKCS # 1 standard (type 2):
[ 0x00 ] [ 0x02 ] [ строка-заполнитель (padding) ] [ данные ]
, after which it is encrypted.

The iteration of the attack consists in sending test data to the server as an encrypted block. After decryption, the server naturally discovers that the data is not formatted according to PKCS # 1. A momentary interruption of a handshake at this stage would open up a vulnerability (an attack against which was shown by Danel Bleichenbacher of Bell Laboratories), therefore modern SSL / TLS implementations "pretend" that there is no error and continue handshake. As a result, the client and server will calculate a different master key that will pop up with the following messages - anyway, the client will receive an Alert message and the connection will be interrupted, but this is of little interest to us. The main thing is that time is measured from the moment of sending ClientKeyExchange and before receiving a response from the server. Here we recall what N = p·q is the RSA-module, and the secret and open exponent are related by a ratio d·e = 1 mod (p-1)(q-1) . So, after a series of measurements, you can restore q bit by bit, and therefore find the private key. Where did such miracles come from? For accurate proof, I would have to give a whole bunch of mathematical formulas, which is beyond the scope of this article, and instead I will describe the general principles that underlie the analysis. There are two of them.

To begin with, we note that OpenSSL uses the binar algorithm described previously for modulo exposure, but with a number of optimizations. First, applying the Chinese Residue Theorem , the problem is m = c^d mod N divided into two subtasks m1 = c1^d1 mod p and m2 = c2^d2 mod q , then from two numbers m1 and m2 according to a special formula it turns out m . Secondly, the modulo multiplication used in the algorithm is optimized by the Montgomery method . The essence of the method is to get away from calculating the original module and calculating the module equal to degree 2, which is much faster for the processor. At first, both factors are converted into a special Montgomery form, then a quick multiplication is done, after which the result is converted to a regular form. All would be fine, but in 2000 the German professor Werner Schindler discovered that the g closer to a multiple q , the more time the whole algorithm takes. Here is the first principle : Measuring time, Montgomery method, it is possible to conclude that, as far as g close to the q , 2q , 3q etc.

Move on. The simplest multiplication (without taking modulo) is implemented in OpenSSL in two ways: conventional and the Karatsuba method . The complexity of the usual method is estimated at O(n·m) , where m and n are the sizes of multiples. The Soviet scientist A. A. Karatsuba first invented the method of fast multiplication with complexity O(n^1,58) . This is what OpenSSL uses when multipliers are the same size. The second principle follows from here . : if g is slightly less than a multiple of q, then the fast method will be used, and if a little more, then more time will be spent.

The authors of this attack on OpenSSL managed to demonstrate that about a million requests were enough to find a 1024-bit key, which took about 2 hours. But do not rush to panic. Starting with version 0.9.7b, OpenSSL contains protection against temporary attacks. The defense consists in a useless calculation x = r^e · с · mod N until the decryption operation itself, where r is a random number, e is a secret exponent, c is encrypted text. This operation introduces a random delay and does not allow the key information to “leak” into the side effects. The cost of protection is from 2% to 10% performance loss.

Power Attacks


Let's move on to another class of attacks on side effects - attacks on energy consumption. These attacks can really be applied only to hardware cryptographic modules, and the more isolated and narrow the function the module performs, the more successful the attack. Obviously, different instructions executed by the processor consume a different amount of energy, which is ultimately determined by the number of switching transistors. (From the course of physics, we all know that MOS transistors consume current at the time of switching, and at rest their current is negligible, which can not be said about TTL transistors). Therefore, instructions or groups of instructions can be identified on the consumption chart. But the same Kosher showed that by analyzing the consumption patterns of smartcards, you can extract the secret key.

Here is the consumption graph for DES encryption of one block: the initial permutation is visible, 16 encryption rounds and the final permutation:
Here is a detailed graph of the 2nd and 3rd rounds:
Consumption attacks are usually classified as simple and differential (abbreviated as SPA, Single Power Analysis, and DPA, Differential Power Analysis). Charts, like the ones above, are analyzed in SPA attacks, where the main goal is to map the parts of the chart to executed instructions or functions and somehow determine the values ​​that appear in the device registers. The SPA attack assumes that there is detailed information about the internal implementation of the device. A DPA attack, on the other hand, is based on a statistical analysis of the results of multiple measurements. For example, the classic attack on DES smart cards described by Kosher looks like this:
  1. the energy consumption of the last few rounds is recorded for 1000 DES encryption, and for each round, the encryption result is also recorded
  2. the schedule for each such encryption is divided into 100,000 equal segments, and each segment is assigned its power consumption
  3. the resulting matrix 1000 x 100000 is analyzed by statistical methods and find the key, which was unchanged throughout the entire measurement period. (For the exact algorithm, the reader is invited to refer to the original source, since the format of the habra article is not very convenient for laying out multi-story formulas)

Counteraction

We examined several options for attacks on side effects. The whole world realized that when designing cryptosystems, such attacks must be taken into account even at the development stage. Here they act in several directions:
  • Obstructing reading side information. A very limited measure, unless you prevent physical access to the device. In other cases, hacking usually comes down to more expensive equipment for data collection, and if its value exceeds the value of the hacking itself, then the goal is achieved
  • Introducing random interference into leakage signals. This is already a more effective measure, but it also has its pitfalls. Firstly, obtaining random data in a computer system is a problem in itself: where to get it and in the right amount. If the interference is "not entirely" random, then the patterns can be calculated and the noise can be further separated from the useful signal. Secondly, even for “good” random noise, a hacker can find a reducing algorithm, and the more data he collects, the greater the probability of success
  • An attempt to make the system deterministic. For example, so that all operations take the same time - then an attack against time will be useless. This is also not easy to achieve, since processor and compiler optimizations are entering the arena, the presence of several levels of cache memory with different access times - all this plays into the hands of the cracker
  • Designing cryptographic algorithms and protocols that are initially resistant to leaks through side channels is the most reliable counteraction. For example, apply such an algorithm to calculate a certain formula, the execution time of which would not depend on the bit depth of the input data, nor on the number of zeros or ones in them, or on any other properties of the arguments

The design of side effect attacks has been the subject of many scientific papers. And in no case should you think that only old crypto algorithms are susceptible to these attacks: RSA, DES, Diffie-Hellman. There is already evidence of "leaks" from AES, and from algorithms on elliptic curves. Covering all this in one article is unrealistic. My task is only to interest readers to delve into this fascinating topic.

For those who want to go deeper