And how many drones give milk? Reverse engineering techniques in sound design

  • Tutorial
The article is an attempt to figure out whether it is possible to convey the experience of reverse engineering in sound design by means of expressiveness of the Russian language.

Once, while reading music forums, I came across a topic discussing sound design in the movie Oblivion. People were interested in the process of making drone sounds. Since there were essentially no answers in a few weeks later, that topic did not appear in the official video , I decided to try to find the answer on my own using reverse engineering methods.

The first scene with the drone (at the 12th minute of the film), which can be viewed on YouTube, was chosen as a reference . After several hours of work, I was able to get the following result:


The entire development cycle can be divided into the following stages:

  1. Search for any additional information on the topic
    • video from the studios involved in the dubbing of the film
    • interview with the creators
    • discussions on specialized forums (suddenly someone already figured out everything and we invent a bicycle)
    • related technical articles
  2. Analysis of the original
    • visual analysis of waveforms, spectrograms, etc.
    • making a list of all sounds used in the scene
    • description of each sound in technical terms (timbre, spectrum, type of synthesis, layers, articulation)
    • compilation of a list of associations for each sound (objects, emotions)
    • grouping related sounds (to avoid repeated actions)
  3. Choosing the Right Tools
    • spectrum analyzer
    • audio editor
    • synthesizer
  4. Synthesis
    • synthesis itself
    • additional sound design

Analysis of the original

First of all, using ffmpeg, I cut a 30-second reference scene from the movie and saved it as an audio file, which I imported into the main host for convenient A / B comparison in the process. Then, using SoX , he made large-format (2000x2000 pixels) spectrograms of each audio channel. Despite the fact that I spend most of my work with the spectrum in Adobe Audition, where I have my own spectral editor, SoX spectrograms allow you to quickly get an idea of ​​the sound picture as a whole and the filling of each of the 6 5.1 sound channels.

Spectrogram of the original 5.1 scene sound in Adobe Audition

Since the episode I selected is quite static, the main sounds are in the central channel, which is confirmed by the spectrogram. This greatly facilitates further work. Using ffmpeg, I export the center channel and open it in an audio editor.

Wave and spectral modes of displaying the sound of the central channel
As a rule, the mode of viewing the waveform helps with the analysis of simple sounds, and the main information we can get with it is at what point the sounds appear, what is their amplitude and duration. In the case of complex scenes in which there are background noises and multilayer elements, you can safely switch to the spectral mode.

In a nutshell, the difference between the wave and spectral modes: in the wave mode, sound is presented in two-dimensional space XY, where X is the time axis and Y is the amplitude of the wave in dB. The spectral mode allows you to see the sound in the three-dimensional space XYZ, where X is the time, Y is the frequency range in Hz, and Z is the intensity (volume) of the signal, which is set by color, according to the principle: the louder the sound, the brighter the color.

Let's analyze the first 6 seconds of the scene. This is how its spectrum looks:
After carefully listening to the scene and examining the spectrogram, the following sound elements can be distinguished:
We break them into logical groups:
We get the following list:

  1. Dog's bark
  2. Drumroll
  3. High frequency drone activation sound
  4. The sound of the servo mechanism
    a. High-frequency noise (in addition to 5a)
    b. High-frequency noise (in addition to 5b)
    c. Servo locking sound (addition to 7)
  5. Signal
    a. “Question”
    b. "Answer"
  6. Low frequency signal (addition to 7)
  7. Siren
  8. Sound of work двигателя
  9. Background broadband noise (environmental sounds, wind, sand, etc.)

This is our sound card. Let me remind you that the map! = Territory. In this case, this is my subjective vision of the sound content of the scene. Another person's card and groups may turn out to be different. And there is nothing wrong with that, it is important to understand that our further actions and the final result will depend on how plausible and detailed we draw the map.

So the map. Dog barking and drum roll are not related to the drone, so let's move on to point 3.
The drone activation sound spectrogram

The spectrogram shows that the beginning of this sound fits in the range from 5000 to 10000 Hz and then linearly goes into the range from 6000 to 12000 Hz. This means that we can synthesize a static sound, with a spectrum like at the beginning of an activation sound, and then, using automation, smoothly change the pitch to a state at the end of the activation sound. The very sound of this element has tonal characteristics and in the spectrum, among the noise, individual strips of harmonics are visible. It can be assumed that initially it was a harmonic-rich signal (for example, a sawtooth wave), which was processed by a band-pass filter (with a passband of 5000-10000 Hz). Let's try to repeat this process.


Synthesizer U-HE Zebra is known among musicians and sound designers not only for its annoying appearance, but also for its very flexible modular organization, as well as a large number of unique effects that allow you to create sounds of almost any complexity. Well-known sound designer Howard Scarr used Zebra to create sounds for “Inception”, “The Dark Knight”, “The Dark Knight Rises” and many other films.

Synthesizer U-HE Zebra. Drone activation sound preset

The logic of the preset in the screenshot above is simple: the Wrap effect (to enrich the spectrum with additional harmonics and noise) and Bandworks (a band-pass filter that removes everything from the spectrum except the 5000-10000 Hz range) are applied to the oscillator OSC1 generating a sawtooth wave. The pitch of OSC1 (Tune) changes over time using the MSEG1 envelope. At the end of the chain, a cut-off filter (VCF1) cuts off frequencies above 10,000 Hz that Bandworks could not handle, and also slightly compresses the sound with resonance (Res) and saturation (Drive). The whole process of sound production can be represented in the form of a chain of modules:
OSC1 -> Wrap -> Bandworks >>> MSEG1 >>> VCF1 -> Res -> Drive >>> Envelope 1

The last module in the list is the so-called The ADSR envelope , which in our case controls the change in the overall volume.

As a result of this operation we get:
Comparison of the spectrum of the source (A) and synthesized (B) activation sounds
Download MP3 example from Google Drive

Synthesis of mechanisms

The synthesis of servomechanisms is a separate topic and I will not consider it in detail in this article, since in the original scene, recorded samples were most likely used to voice these elements. I can only say that the sound of the operation of any mechanism consists of three phases: on, on, off. Sound of work is a looped short fragment, which is repeated until the off phase occurs. The repetition of a looped fragment with a frequency above 20 times per second displays this (carrier) frequency (oscillations) in the region that is heard by a person. What we hear in such a situation is called a drone. Drones, for example, include the sounds of working fans, engines of machines and machine tools, drills, electric shavers, buzzing insects, etc. ... Drones (as well as any other sounds) are musical (when you can determine the pitch, that is, tonality) and atonal (tonality is difficult or impossible to determine). In the case of a flying drone in our scene, we are dealing with a running engine at the time of acceleration, that is, it is an atonal drone, the carrier frequency of which is gradually increasing. In the screenshot with the groups, this sound is marked with the number 8, and it is synthesized according to the same principle as the previous element. In the spectrogram, we select the place where all harmonics are clearly visible, record their frequencies at this point in time and recreate using one or more synthesizer oscillators. Then we automate the change in pitch, simulating acceleration. Since the sound of the engine does not play a significant role in our scene, I did not play it in all the details, but quickly threw a preset for Zebra to demonstrate the idea itself:

Preset engine sound
Spectrogram synthesized engine sound
Download MP3 sample from Google Drive

Siren synthesis

Move on. The siren timbre stands out from the drone’s sound palette primarily by the presence of character. It is not like the rest of the cold electronic tweeters and buzzers. This is a sound that clearly promises impending troubles to the one who hears it, as if hinting that something seems to have gone wrong (and this will be a disaster).

Spectrogram of a siren from a film

This is a sound rich in harmonics, in the harmonics a slight vibration with an unfixed frequency can be seen, which is typical for sounds of wildlife. The siren resembles the cry of a person or an animal and in timbre is similar to something in between the sounds A, Y, S, which confirms the version that it is a live sound. At first I thought that the sound engineers who worked on the film probably read Philip Dick and maybe decided to use the bleating of a sheep as a source of this sound - a kind of Easter egg. But looking for sheep on, I came to the conclusion that their voices are too high and therefore you need to look for a larger animal, but with similar voice characteristics. The first sample of a mooing cow turned out to be what I was looking for.

Applying the effects of time stretch, distortion, and slightly adjusting the pitch, we get the following:
Download an MP3 example from Google Drive
Add a low-frequency signal (6) and reverb:
Download an MP3 example from Google Drive
Compare the spectrum of the siren from the movie and the final version of our cows:


Signal Synthesis

All three elements 5a, 5b and 6 are intervals played by one or three similar instruments, in the timbre of which, traits characteristic of FM synthesis are traced . The sound also resembles DTMF signals . These moments are determined without analyzing the spectrograms, just by ear, just like the intervals themselves: for 5a this is a triton up, for 5b it is a quint down, for 6 - a quart up. Then, experimenting with the FM oscillator in Zebra, pretty quickly we get a similar sound.


Download MP3 example from Google Drive

The oscillator OSC1 generates a sine wave that sets the pitch. OSC2 and FM-oscillator FMO1 are in dissonance with each other and with OSC1 (that is, their frequencies are not multiples of the frequency of the OSC1 tone), which results in this intense sound, something like a siren or a police cracker.


Background noises probably belong to the most underestimated by most people information component of the sound picture. However, the same is true for the background in all other areas of life. At one time, the emergence of such a genre as landscape, became a revolution in painting, the layman template was torn at the sight of paintings where there was no main character. Today everyone knows what Mona Lisa looks like, but not everyone can remember what is shown behind her, whether she is sitting by the open window or maybe standing in a clean field. Nevertheless, if the background is completely removed, it immediately catches the eye. The same story with the sound background. If it is not, the scene loses its realism, atmosphere and semantic load. Sound events occur in ... "nowhere." Therefore, in order to revive the sound of our scene, I picked up on the one suitable for it emotional background .

Финальная версия со всеми свистелками и гуделками:

Download MP3 example from Google Drive


When I conceived this article, I planned to describe the process of creating all the sounds that are in the final version. But judging by the amount of water I’ve bred here, even a few have probably reached this paragraph. Hello! Thank you for reading to the end.


FM-синтез (Википедия)
DTMF (Википедия)
ADSR-огибающая (Википедия)
Сэмпл коровы
Сэмпл окружающей среды
Сэмплы сервомеханизмов из финальной версии [1] , [2]
Howard Scarr
U-HE Zebra

Only registered users can participate in the survey. Please come in.

Does Habr need a built-in player for audio files?

  • 80.0% Yes, that would be convenient 681
  • 3.4% No, I like to open everything in new browser tabs. 29th
  • 2.5% And I like to download files to disk and open them with my favorite player! 21
  • 7.3% It was necessary to fill everything on YouTube and not to bathe ... 62
  • 6.8% My name is Andrey, I am deeply offended by the name of this topic. 58