Functional programming with PHP generators

Original author: Chris Opperwall
  • Transfer


The generators are cool. They facilitate the writing of iterators by defining functions instead of creating entire classes that implement Iterator . Also, generators help create lazy lists and endless streams. The main difference between the generator function and the ordinary function is that the ordinary one can return only once (after that its execution stops), and the generator function during the execution is able to return several values. In between the returns, the generator’s performance is paused until the next start. Therefore, generators can be used to create lists with lazily generated values, that is, each element in the list is calculated only at the time of demand.


A striking example of the difference between early and lazy generation is the range function, which takes parameters start and end then returns a sequence of integer values ​​that starts with start and ends one element before end . In the case of a regular function, you have to create a new list, add all the elements to it, and then return the list. With this approach, it range consumes a amount of memory proportional to the size of the range. And depending on your environment, it range(1, 10000000) can clean up all the memory allocated by the PHP process. And this happens because of the early creation of the entire list of elements, even before returning to the caller.


Using the generator function, you can create lazy range using a constant amount of memory. This is achieved using a while loop that passes the value of the initial parameter and then increments the value of the initial argument. When the cycle ends, the function reaches the end of its body and returns a value, thereby ending the generator. That is, it is enough for the generator function to only track what value of the sequence should be returned, and not to store all the values ​​in memory. If you continue the example, you can create an infinite range generator function that takes only the initial argument. In this case, we will have a while-loop with a predicate always equal to true, so the loop never ends. This allows the calling function to decide how many values ​​to read from the generator. And if the infinite generator is called 100 times, then it will generate only 100 values. If it is no longer called before the calling function is completed, the generator will pause and eventually the garbage collector will clean it. In other words, if you will be using foreach If the generator generates an infinite stream of values, the generator will iterate indefinitely. The PHP documentation describes the operation of the generators and the example with the function well range .


And here is a more practical illustration: the program needs to get 200,000 objects from the external API, extract some subset of data from each document, and then put each modified object into storage. If you do everything right away, you will first have to create a list of 200,000 objects, then remove elements from the list and then modify each element before entering it into the repository. If you use tools like array_filter and array_map , then at each of the listed operations an intermediate list is created. By combining generators, you can create a pipeline, each stage of which is calculated in a lazy way. For example, you can use a generator that lazily and individually takes objects from the API. Such a generator can be used by another generator, which only creates objects that pass a certain test, and this second generator will be used by a third generator, which creates a modified version of the object received from the API.


PHP Generators


Since generator functions return objects that implement Iterator them, they can be used in the same place as iterators, as is the case with foreach . Unfortunately, generators cannot be thrust wherever an array can be used. This is true, for example, for array_map , array_reduce and array_filter . It's a pity, because I prefer these functions instead of more imperative cycles for and foreach . We at iFixit use several alternatives based on Iterator instead of arrays providing the same functionality as array_map , array_filter with array_reduce .


Note: all of these examples require PHP 7.1 functionality. In previous versions of the language, the code will not work.


Map


<?php

declare(strict_types = 1);

/**
 * Like array_map() but over an iterable, and it returns a new iterable with
 * mapping instead of a mapped array. The callable should take two arguments:
 * a value to map and its key in the stream.
 */
function iterator_map(callable $cb, iterable $itr): iterable {
   foreach ($itr as $key => $value) {
      yield $cb($value, $key);
   }
}

Map converts, or "maps" one list of values ​​to another, applying a transform function to each value of the input list and adding the result to the output list. It is based iterator_map on the same idea, only the function as the second argument contains the result of calling the converter function for each element of the iterable list


<?php
// assuming range_generator is a range function that returns a
// generator and not a list.
$bigList = range_generator(1, 100000000);
$bigListPlusOne = iterator_map(function($num) {
   return $num + 1;
}, $bigList);
// At this point no work has been done.
// $bigListPlusOne is a generator that will lazily produce
// all numbers from 2 to 100000000
// (100000000 because the end of the range is exclusive)

Filter


<?php

declare(strict_types = 1);

/**
 * Like array_filter() but over an iterable, and it returns a new iterable with
 * filtering instead of a filtered array. The callable, if non-null, should
 * take arguments like iterator_map(). When the callable is null, null values
 * will be filtered out (NOT falsey values, just x === null).
 */
function iterator_filter(?callable $cb, iterable $itr): iterable {
   $cb = $cb ?: function($x) {
      return $x !== null;
   };
   foreach ($itr as $key => $value) {
      $keep = $cb($value, $key);
      if ($keep) {
         yield $value;
      }
   }
}

Filter takes a list and creates a new one, which includes those elements from the input list, when passed to a filter function or predicate, a "truthy" value is generated. It is based iterator_filter on the same idea, only a function produces only those values ​​that generate "true" values ​​when passing to a predicate. If the value does not generate a "true" value, the generator does not produce this element and proceeds to the next one from the input iterable until some element generates a "true" value from the predicate. Then the filter returns the value and pauses until it is called again.


Reduce


<?php

declare(strict_types = 1);

/**
 * Like array_reduce() but over an iterable, and it returns a single value as
 * the result of calling $cb over the contents of iterable $itr.
 *
 * If $initial is not null, $initial is set to the value of the first element
 * of the iterable, and $cb is called with the first element as the carry value
 * and the second element of the array as the current value.
 */
function iterator_reduce(callable $cb, iterable $itr, $initial = null) {
   if (is_null($initial)) {
      $initial = $itr->current();
      $itr->next();
   }
   $carry = $initial;
   while ($itr->valid()) {
      $carry = $cb($carry, $itr->current());
      $itr->next();
   }
   return $carry;
}

Reduce takes a list and an optional initial value, and returns a value that is the result of applying a reducer function to each element. The convolutional function has a parameter carry containing the result of calling this function in relation to the previous element, and there is also a parameter current representing the current element in an iterated sequence. In some languages, this operation is called folding.


iterator_reduce slightly different from the previous two functions, since it does not have a return type declaration iterable . The function can create single values ​​- numbers, Boolean or lowercase. All this is useful when you need to get a list or generator and extract grouped or aggregated values ​​from it, such as the sum of the properties price in the list of objects Product .


Putting it all together


Now let’s put everything together in one small example. We will retrieve data from the online storage service (let's call it Storeify). The task of the program: to extract all orders of the previous day and calculate the total daily income from the sale.


In our hypothetical world, there can be from 100 to 1,000,000,000 orders per day, so we can’t just get them all from the API without having to set up a huge server bill that can store orders in memory at the same time. Let's create a generator to lazily retrieve orders from the Storeify API as needed.


With the help of map , filter and reduce we will divide the problem into tasks, so that it is easier to understand the program and accompany it. Since we only need items or products in each order, we use iterator_map to return items, as well as a function flatten to turn the list of item lists into a single list. After that, we will choose iterator_filter to filter out those positions that are not products that we analyze. Next, take the filtered product stream and using iterator_reduce we will turn their fields price and quantity total revenue for this product from the previous day.


<?php

declare(strict_types = 1);

/**
 * Example code for consuming data from a Store API (called Storeify) and chaining generators
 * together to filter, map, and eventually reduce all Order data into a
 * daily revenue total.
 */

function getOrdersFromLastDay(): iterable {
  $limit = 20;
  $requestParams = [
    'lastUpdatedAfter' => new DateTime('yesterday'),
    'limit' => $limit,
    'offset' => 0
  ];
  $orders = [];

  do {
    // Grab a batch of orders from Storeify and yield from that list.
    $orders = Storeify::getOrders($requestParams);
    yield from $orders;

    $requestParams['offset'] += $limit;
  } while (!empty($orders));
}

/**
 * Consumes a generator that produces lists of products
 * and produces a new generator that yields a flat list
 * of products.
 */
function flatten(iterable $itr): iterable {
  foreach ($itr as $products) {
    yield from $products;
  }
}

/**
 * Assume orders have a Shipping Country and a list of Products
 * [
 *    'ShipCountry' => 'US',
 *    'Products' => [
 *       [ 'name' => 'Pro Tech Toolkit', 'price' => 59.95, 'quantity' => 2],
 *       [ 'name' => 'iPhone 6 Battery', 'price' => 24.99, 'quantity' => 1]
 *    ]
 * ]
 */
$orders = getOrdersFromLastDay();

// Flatten list of orders into list of all products sold.
$allProducts = flatten(iterator_map(function($order) {
  return $order['Products'];
}, $orders));

// Only include 'Pro Tech Toolkit' purchases.
$toolkitProducts = iterator_filter(function($product) {
  return $product['name'] === 'Pro Tech Toolkit';
}, $allProducts);

// Up until this point, no work has actually been done.
// $toolkitProducts is a generator can be passed around to other functions
// as a lazy stream of pro tech toolkit products.

// Once iterator_reduce is called, it begins winding its way through
// the composed generators and actually pulling down resources from the Store API
// and mapping and filtering them.
$dailyToolkitRevenue = iterator_reduce(function($total, $toolkit) {
  return $total + ($toolkit['price'] * $toolkit['quantity']);
}, $toolkitProducts, 0);

Conclusion


Perhaps the use of generators in functional programming can be called doping. Unfortunately, PHP is not an ideal language, but, fortunately, we have all the tools to combine generators with functional concepts like map , filter and reduce .


Update (2018–03–18)


A commenter from the r / programming branch mentioned a library iter written by a person who implemented generators in PHP. This library implements all the examples from this article and much more, so I highly recommend that you feel it if you plan to use generators in your code base.