John Carmack: A week-long vacation spent programming

Original author: John Carmack
  • Transfer
Your attention is invited to the translation of the post of John Carmack, published by him on Facebook last week and gained some popularity .

After a long break, I finally decided to take another vacation, which I spent on programming. For a whole week I was able to work quietly in hermit mode, away from the usual pressure of work. My wife generously offered me to take such a vacation for several years now, but vacation and I are basically weakly compatible things.

As a change of scenery after my current work on Oculus, I wanted to write several realizations of neural networks from scratch in C ++, and I planned to do this using strictly the OpenBSD system. Someone from my friends noticed that this is a rather random set of technologies, but in the end everything worked well.

Despite the fact that I did not have to use OpenBSD in my work, I always liked its idea - it is a relatively minimalistic and independent system with holistic vision, as well as an emphasis on quality and craftsmanship. Linux is capable of much, but integrity is not about Linux.

I am not an ardent fan of Unix. I do a good job with this operating system, but it works best for me in Visual Studio under Windows. I thought that the week of immersion in work in the style of the old Unix school in full will be interesting to me - even if it means that I will work more slowly. It was a kind of adventure in the spirit of retrocomputing - fvwm and vi became my best friends for a while . Notice - not vim , but the real real vi from BSD.

In the end, I was not able to explore the system as deeply as I wanted - I spent 95% of the time doing only simple actions in vi / make / gdb . I appreciated the high quality of the man pages because I tried to do everything in the system itself without resorting to searching the Internet. It was fun to see links to things that had gone back over 30 years, like Tektronix terminals.

I was a little surprised that C ++ support was not up to par. G ++ did not support C ++ 11, LLVM C ++ did not work well with gdb . Gdb repeatedly crashed - it seems to me because of problems with C ++. I am aware that it is possible through the ports ( ports ) to upgrade to the latest versions, but I decided to use only the base system.

Looking back, I think that I just had to go the “full retro” way and write everything in ANSI C. In my life there are often days when I, like many older programmers, think in the following spirit: “Perhaps, in the end C ++ is not as good as it is customary to think ... ". I like a lot about him, but I don’t have the burden of writing small projects on pure C.
Perhaps, on my next vacation I will try to use only one emacs - This is another important layer of programmer culture, which I never managed to get to know properly.

I have a great general understanding of how most machine learning algorithms work, I had to write a linear classifier and decision tree, but for some reason I always avoided neural networks. At heart, I suspect that the fashionable “hype” around machine learning has aroused the skeptic’s caution in me, and I still have some reflexive bias about the approach “let's drop everything into the neural network, and let it understand it.”

Continuing to stick to retro themes, I printed a few old publications by Jan Lekun and I was going to do all my work offline, as if I really were in a mountain hut - but it all ended with the fact that I reviewed many of the lectures of the Stanford CS231N YouTube course , and they really turned out to be useful. I rarely watch a video lecture due to the fact that it is usually difficult for me to justify so much wasting my time - but on vacation you can afford it.

I don’t think that I have any worthwhile thoughts about neural networks that I should share - but it was an extremely productive week for me that helped turn theoretically “book” knowledge into real experience.

In my work, I used my traditional approach: first, quickly get the result by writing a harshly rough “hacked” code, and then write a new implementation from scratch based on the lessons learned - this way both implementations will be working, and if necessary, I can compare them with each other ( cross check ).

At first, I misunderstood a couple of times the backprop method - a turning point was the comparison with numerical differentiation! It seemed interesting to me that the training of the neural network goes even when its various parts may not be completely correct - as long as the received sign remains correct most of the time, things often go further.

I was pleased with the resulting code of my multilayer neural network; it is in a form that I can simply use in my further experiments. Yes, for something serious, I will have to use the existing library, but in life there are many cases when it is convenient that you have at hand only a couple of .cpp and .h files in which you yourself wrote each line of code. My code for a convolutional neural network ( conv net ) turned out to be able to reach only the “works, but with a bunch of hacks” phase - I could spend another day or two on it in order to write a clear and flexible implementation.

It seemed interesting to me that when I tested my initial neural network on MNIST ( database of handwritten digits ) before adding any convolutions, I got much better results than the indicated values ​​for non-convolutional NNs indicated in the LeCun comparison of 1998 - about 2% errors on the test set with one layer of 100 nodes, versus 3% for the wider and deeper networks of the time. I think the point here is in modern best practices - ReLU , Softmax and improved initialization.

This is one of the most impressive properties of neural networks - they are all so simple that breakthrough achievements can often be expressed with just a few lines of code. Apparently, there is some similarities with ray tracing from the world of computer graphics, where you can quickly implement a physically based light transport ray tracer and create modern images if you have the necessary data and enough patience to wait for the execution results.

I understood the principles of overtraining / generation / regularization much better. by exploring several training parameters. On the last night of my vacation, I did not touch the architecture and just played with hyperparameters. As it turned out, maintaining concentration while “ she is training ” turned out to be much harder than in the classical case when “ it is compiling ”.

Now I will look in both to find a suitable opportunity to apply new skills in my work at Oculus!

I’m scared to even think about what my mailbox and workplace managed to turn into during my absence - we'll see tomorrow.