Tail recursion optimization in .NET and Nemerle

Recently chiaroscuro wrote a provocative article with a yellow headline, "Why Cycles Should Die . " Its discussion took about 180 comments, but the article itself went negative and did not get on the main page, despite the fact that it contained a sound idea about using recursion instead of loops.

In this article I will supplement his post and give examples of implementation in one of the best .NET languages ​​- Nemerle, and I will also make a hollywood statement about the advantage of .NET over Java.

Firstly, I want to note one plus of using recursion instead of loops, which I missed chiaroscuro - decomposition. Considering a program in the popular imperative language (Java, C #, Pascal, C), we are dealing with a stream of commands. A means of decomposing this thread, say to reuse code or test it, is to break it down into procedures that used to be called routines. The name “subprogram” contains their essence - we take the main flow of commands, cut out a piece, wrap it in some workaround and the procedure is ready, but it is the same stream, only smaller. We have come to the conclusion that in the language some means serve the calculation - imperative instructions - those that make up the stream of calculations, while others serve this stream - the procedures. In the case of FY, recursion performs the same tasks as loops in imperative languages, and also serves as a means of decomposition, that is, a program, written in FY itself falls apart into separate logical parts. It’s worth noting right away that there is a problem of littering the global namespace; but in many FYs it is elegantly solved - a function within its body can declare a function that is visible only to it.

The next thing you should pay attention to in the article is comments, in many of them the hara-people insisted as one “recursion is a beautiful, elegant, but resource-eating evil”. In the general case, this is true, but there is a class of recursive algorithms that are comparable in terms of resource consumption with the rotation of this recursion into a loop - such a recursion is called tail recursion. It is very simple to recognize it: if the function received the value of the function, did nothing with it and returned it as its value, then we are dealing with tail recursion, in the case of recursion, of course. It is easy to understand why this happens - if there is no action with the value, you can free the stack from local variables before calling the second function, in addition, you can substitute your own return address instead of yourself, as your return address.

These were additions to that article, and now the main remark: to completely abandon cycles is stupid and, at present, it makes sense only if this is academic programming whose purpose is to study compiler optimization; when programming is daily bread, then you need to use the tools that are most appropriate for the task at the moment. For the future, I will not teach haskell until it is taught to automatically parallelize to many processors. I will choose a language that provides me with a choice of a tool with which I can quickly solve my problem. That is, the language must be hybrid and support both a functional and an object-oriented paradigm. That language is Nemerle.

An attentive reader may ask where does Java go. The fact is that the Java virtual machine, unlike .NET, does not support tail recursion optimization. By the way, for many it may turn out to be news that it is in .NET, since the most popular programming language for it - C # - does not support it. After I found out that tail recursion optimization is supported in .NET, I got an idea to test it, which was the reason for this post.

The algorithm for finding the last 5 digits of the nth Fibonacci number will become a experimental rabbit, and it will be tested in Nemerle, C # and Perfect C #, let's call it a language that is fully compatible with C # but has the property of tail recursion optimization. To get the assembly compiled by Perfect C #, I compiled the code in C #, decompiled it with ildasm, added il the tail instruction before the function call, and compiled il code with ilasm. Below are the codes of programs in these languages:
And now the most interesting part is the test results. To warm up, I decided to count the last 5 digits of the 40,000th Fibonacci number: Nemerle managed for 0.58 milliseconds , PerfectC # for 2.389 milliseconds , and C # for 0.782 milliseconds . After that, I decided to test the algorithm on the 100,000th Fibonacci number and got the following results: Nemerle - 1.421 milliseconds , PerfectC # - 3.244 milliseconds , C # fell with a StackOverflowException.

The results are such that Nemerle takes the first place, and the second share PerfectC # and C #, since the second one falls in deep recursion, but where we can’t get ahead of PerfectC #.

UPD The
reference implementation of the last 5 digits of 40,000 Fibanacci numbers is 0.565 milliseconds and 100,000 Fibanacci numbers is 1.407 milliseconds.