FizzBuzz implementation on FPGA

Original author: Ken Shirriff
  • Transfer
Recently, I became interested in FPGA programming and decided that it would be interesting to implement the Fizzbuzz game algorithm on FPGAs . FPGA (Field Programmable Gate Array ) is an interesting chip. It is programmed to execute arbitrary digital logic. You can construct a complex circuit without laying physical channels between individual gates and triggers. A microcircuit can turn into anything, from a logic analyzer to a microprocessor and a video generator.

FizzBuzz Test - write a program that gives numbers from 1 to 100, where multiples of three are replaced by the word "Fizz", multiples of five by the word "Buzz", and multiples of fifteen by "FizzBuzz". Since such a program is implemented in several lines of code, it is often asked at interviews to weed out those who do not know how to program at all.


Mojo FPGA board connected to serial-to-USB port. The big chip on the board is Spartan 6 FPGA.
Implementing FizzBuzz in digital logic rather than code is pretty pointless, but it seemed like a good example for me to learn. 1 For this project, I used a simple Mojo V3 development board FPGA for beginners. It has the Xilinx Spartan 6 family FPGA installed. This is one of the smallest FPGAs, but it has 9,000 logic cells and 11,000 triggers - so the baby is capable of much.

FPGA serial output


What does it mean to implement FizzBuzz on FPGA? This means that the contacts of the general-purpose input / output interface (GPIO) from the board can be connected to anything, so the FizzBuzz output will appear in different forms: LEDs, seven-segment indicators, LCD, VGA monitor. I decided that the output of text to the terminal via a serial channel is most consistent with the spirit of the "standard" FizzBuzz program. So the first thing to do is to implement serial output to FPGA.

The basic idea of ​​serial communication is to transmit bits one at a time. RS-232 serial port - A simple protocol for serial data transfer, invented in 1960 for connecting teletypes and modems. The diagram below shows how the “F” character (binary 01000110) is sequentially transmitted over the channel. First, a start bit (zero) is sent to indicate the start of a character. 2 Then eight bits of the character are transmitted in reverse order. At the end, a stop bit is sent. Between the characters, the bar remains in standby mode (one) until the next character is transmitted. At a transmission speed of 9600 baud, the transmission of each bit takes 1/9600 seconds. With eight data bits without a parity bit and one stop bit, this protocol is known as 8N1. There are many protocols for serial data transfer, but 9600 8N1 is very common.


Serial Transmission of the F Symbol at 9600 Baud / 8N1
The first step in implementing such a serial transmission is to create 1/9600 second intervals for each bit. This interval is measured by counting 5208 clock pulses per Mojo. 3 I made a 13-bit counter for periodically counting from 0 to 5207. A simple state machine keeps track of which bit is output in each interval. It starts with the start bit, changes its state as it processes eight data bits and a stop bit. The state is stored in a four-bit register (in general, in FPGA you have to deal with a large number of clock pulses, counters and state machines).

To create an interval and state registers in FPGA, I wrote code in the Verilog hardware description language. I will not explain Verilog in detail - I hope you will understand how it works. In the following code, the first lines specify a 13-bit register counter and a four -bit counter register state . The counter is incremented to 5207, then reset to 0, while state increasing by one to process the next output bit. (Note that <= this is an assignment operator, not a comparison 4 ). The line always @(posedge clk) indicates that the treatment unit is triggered on the positive edge of the clock frequency.

reg [12:0] counter;
reg [3:0] state;

always @(posedge clk) begin
  if (counter < 5207) begin
     counter <= counter + 1;
  end else begin
    counter <= 0;
    state <= state + 1;
  end
end

Although this sounds like regular program code, it works in a completely different way. In a common programming language, operations are usually performed sequentially, since the program is executed line by line. For example, the processor checks the value counter . Then adds one to counter . But Verilog does not have a processor or an executable program. Instead, the code creates hardware to perform operations. For example, to increase counter , one addition scheme is created, and to increase state , another scheme is created. Additional logic is created for comparison with 5207. Unlike a serial processor, FPGA does everything in parallel. For example, FPGA simultaneously compares 5207 on each clock cycle, increasing and resetting counter , as well as increasing state . Because of this, in highly parallel tasks, the FPGA can be much faster than the processor.

The next part of the code outputs the corresponding bit for each state. As before, it looks like regular program code, but generates hardware circuits, rather than operations that are performed sequentially. In this case, the code creates a logic gate (essentially a multiplexer) to select the correct value out .

case (state)
  IDLE: out = MARK; // high
  START: out = SPACE; // low
  BIT0: out = char1[0];
  BIT1: out = char1[1];
  ...
  BIT6: out = char1[6];
  STOP: out = MARK;
  default: out = MARK;
endcase

There is still some code to set constants, initialize counters, start and stop the transmission of each character, but the code above gives a general idea of ​​how Verilog works. All code for serial data output see here .

FizzBuzz Algorithm


The next step is to determine what specifically to send over the serial link. How to translate numbers from 1 to 100 in ASCII characters? This is trivial on a microprocessor, but difficult in digital logic. The problem is that converting a binary number to decimal requires division by 10 and 100, and division is very inconvenient to implement in valves. I decided to use a
binary decimal counter (BCD), which stores each of the three digits separately. This made the counter a bit more complicated, since nodes are introduced into the counter circuit to ensure that the counting process is corrected when overflowing in one direction or another from the nine. But then simplified the issuance of numbers.

I wrote a BCD module ( source code ) for a three-digit counter. He has three four-bit counters digit2 ,, digit1 and digit0 . The flag increment signals an increase in the counter. Usually only increases digit0 . But if it reaches nine, it turns to 0, but increases digit1 . If it digit1 also becomes 9, then it turns to 0, and increases digit2 . Thus, there is a countdown from 000 to 999.

if (increment) begin
  if (digit0 != 9) begin
    // Regular increment digit 0
    digit0 <= digit0 + 1;
  end else begin
    // Carry from digit 0
    digit0 <= 0;
    if (digit1 != 9) begin
      // Regular increment digit 1
      digit1 <= digit1 + 1;
    end else begin
      // Carry from digit 1
      digit1 <= 0;
      digit2 <= digit2 + 1;
    end
  end
end

As before, keep in mind that this only looks like a regular program, but in reality the code turns into a bunch of logic gates that generate new values digit2 , digit1 and digit0 at each clock cycle. The system does not follow the instructions in order, so performance is not limited by the number of instructions, but only by the delay of signals propagating through the valves.

The next task is to check the divisibility by 3 and 5. As a simple division, division with the remainder is also easy to implement on a microprocessor, but difficult in digital logic. There is no built-in division operation, so you have to create a whole bunch of valves. Although the IDE can synthesize them for division with the remainder, but it is tasteless. Instead, I just made counters for the remainder of dividing by 3 and by 5. For example, mod3 will simply output 0, 1, 2, 0, 1, 2 ... 5

The last part of FizzBuzz is code to display each line character by character. In the program, we can simply call the sequential output procedure for each character. But in FPGA, you need to keep track of which character is sent to the output using another state machine. Note that when converting each digit to ASCII, binary 11 is concatenated using a slightly strange syntax 2'b11 . The code given here is a bit simplified. The complete code contains checks for zeros at the beginning, so "001" is output as "1".

state <= state + 1; // Different state from serial
if (mod3 == 0 && mod5 != 0) begin
  // Fizz
  case (state)
   1: char <= "F";
   2: char <= "i";
   3: char <= "z";
   4: char <= "z";
   5: char <= "\r";
   6: begin
     char <= "\n";
     state <= NEXT; // Done with output line
     end
  endcase
end else if (mod3 != 0 && mod5 == 0) begin
  ... Buzz case omitted ...
end else if (mod3 == 0 && mod5 == 0) begin      
 ... Fizzbuzz case omitted ...
end else begin 
  // No divisors; output the digits of the number.
  case (state)
    1: char <= {2'b11, digit2[3:0]};
    2: char <= {2'b11, digit1[3:0]};
    3: char <= {2'b11, digit0[3:0]};
    4: char <= "\r";
    5: begin
     char <= "\n";
     state <= NEXT;
    end
  endcase
end

If you put everything together, then the final FizzBuzz scheme consists of numerous state machines and counters. The main state machine controls the code above, moving through the characters of the string. For each symbol, the state machine activates the serial output module and waits for the output of the symbol. Inside the module, the state machine passes through each bit of the symbol. It waits until the baud meter counts the bit width. When the symbol transfer is completed, the serial output module sends a signal to the main state machine. Then he goes to the next character in the string. When the line is completed, the main state machine increments the BCD counter (which counts from 1 to 100) - and starts the output of the next line.

In programming languages, it is easy to perform sequential operations, loops, subroutine calls, and so on. But in FPGA you need to explicitly track what is happening with the help of state machines and counters. In exchange, FPGA provides powerful concurrency and control.

Running FizzBuzz on FPGA


To compile Verilog code, I used the Xilinx ISE tool (see below). In this development environment, you can write code, simulate it, and synthesize logic circuits for loading into FPGAs. The ISE tool is pretty straightforward and explained in the Mojo tutorials . The synthesis process is much slower than compilation: for my FizzBuzz program, it took about 45 seconds.


By writing Verilog code in the Xilinx ISE tool, you can program the FPGA functionality.
Once my code worked in Simulator 7 , I downloaded it to the FPGA board via a USB cable. I connected the FPGA output pin to the USB-to-serial 6 adapter and used a terminal emulator ( screen ) to display serial output on the computer. I clicked the reset button on the Mojo board - and (after a little extra debugging) FizzBuzz output appeared (below).


First page of FizzBuzz FPGA output in a terminal emulator.
The image below shows the raw serial data output from FPGA (yellow). This is the end result of the FizzBuzz circuitry on the FPGA board - a pulse train. The oscilloscope also shows decrypted ASCII characters (green). In the image - FizzBuzz output for 2, 3 and 4 (CR and LF are carriage return and line feed characters).


Serial data output signal (yellow) at the beginning of FizzBuzz output. Decoding ASCII - green

What happens inside the FPGA?


You may ask how the Verilog circuit description is converted to digital logic and how FPGA implements this logic. The ISE synthesis tool takes a description of Verilog - and generates circuits suitable for implementation inside the FPGA. First, from the Verilog code, he synthesizes a “netlist”, defining logic and connections. Then it translates the lists into FPGA primitives in accordance with the capabilities of a particular chip (in our case, Spartan 6). At the end, an optimization procedure is started to minimize the signal path.


FizzBuzz circuit diagram

The figure above shows the FizzBuzz chain in the form in which the synthesis tool generates it. As you can see, Verilog code turns into a big ball of schemes. Each block is a trigger, logic element, multiplexer, or other element. The blocks consist of counters, state registers, and FizzBuzz logic. Although it seems that there are a lot of logic elements, less than 2% of the chip power is actually used. Close-up (below) shows the trigger diagram (with the inscription "fdre") 8 and the search table (with the inscription "lut5") from the BCD counter. What is convenient in Verilog is that you design a circuit at a high level - and it turns into a low level circuit. This is called RTL ( Register Transfer Layer). ) and allows you to apply registers and high-level operations on them, without worrying about low-level hardware implementation. For example, you can simply write count + 1 - and get a binary adder circuit.


Detailed trigger and lookup diagram
FPGA uses an interesting technique for logical equations. Instead of linking individual gates, a lookup table (LUT) is used, with which it is convenient to implement arbitrary logic. Each table has six input lines, so it can implement any combinatorial logic with six input signals. It turns out 64 different combinations of the input signal and a 64-bit truth table. Having saved this table as a 64-bit image, the LUT can implement any logical function.

For example, part of the logic for an output pin is equivalent to the logic circuit below. It is implemented by storing a 64-bit value FFFFA8FFFFA8A8A8 in the lookup table. In the Spartan 6 chip, this LUT resides in 64 bits of static RAM, which loads during FPGA initialization. Since there are 5720 separate tables in the microcircuit, it can be programmed to implement many arbitrary schemes.


Logic elements implemented by one lookup table in FPGA
The last part of the task is the matrix switch, which randomly connects the circuits. In Spartan 6, several LUTs, triggers, and multiplexers are assembled into configurable logic blocks (CLBs) 9 . These CLBs are connected by matrix switches , as shown below. Each matrix switch block is programmed to connect different buses, which allows you to lay the FPGA contacts as you like. An important part of the FPGA synthesis process is block positioning to minimize wiring distance. This is necessary both to minimize the propagation delay of the signal, and to preserve the reserve of interconnect paths.


The Spartan 6 FPGA matrix switcher allows arbitrary connections between CLBs. From the user manual

Should I try FPGA?


Personally, at first I looked with concern at the FPGA, because they seemed scary and strange. Although there is a learning curve, FPGAs are not as complex as I thought. If you are interested in new programming paradigms, then FPGA will certainly show things from a different angle. What you take for granted, such as sequentially performing operations, will come to the fore. You can experiment with high concurrency. And FPGAs will help you better understand how digital circuits work.

However, I would not recommend experimenting with FPGAs if you are at least a little unfamiliar with connecting LEDs / switches and do not understand the basic digital logic: valves, triggers, and state machines. But if you're comfortable with Arduino, then FPGA is the smart next step.

Probably, for most applications, the microcontroller is better than FPGA and is easier to program. If you do not need high data rates or concurrency, then FPGA will probably be redundant. In my case, the microcontroller barely coped with a three-megabit Ethernet gateway , so for the next project I look at the FPGA.

Is Mojo a good board to start with?


The Mojo FPGA development board is sold by Adafruit and Sparkfun , so I thought it was a good hacker choice. The board is designed for beginners in FPGA programming, and it copes well with this role. The creators of Mojo have written a large collection of Verilog tutorials . 10 . If the manual is written for a specific board, then problems with the board and tools can be solved much faster. Mojo is programmed with a standard USB cable - it’s more convenient than special JTAG adapters.


Mojo FPGA with Spartan-6 FPGA Chip

Although Mojo has many I / O pins, no peripherals are sold with it except eight LEDs. It would be more pleasant to experiment with the board if there are buttons, seven-segment displays, VGA output, sensors and so on. (They are not difficult to connect to Mojo, but it would be convenient to have it included). In addition, some development boards have external RAM, while Mojo does not. This is a problem for applications like a logic analyzer that requires a lot of memory. 11 (Although you can extend Mojo with an IO pad or RAM pad ).

A Good Book To Get Started With Mojo - FPGA Programming . It also describes much cheaper motherboards Papilo one and Elbert 2 . If you want to see other options, here is a list of FPGA development boards .

Conclusion


The FizzBuzz task is impractical to implement on FPGA, but it was interesting, I learned a lot about FPGA programming. But of course, I won’t get a job if they ask FizzBuzz an interview! My code is on GitHub , but keep in mind that I'm still new to this business.

Notes and links


1. Implementing a microprocessor on an FPGA is a trivial task. For example, with the Spartan 6 chip, you can press a couple of buttons in the IDE wizard - and it will generate a circuit for the MicroBlaze processor . Therefore, a smart person would write FizzBuzz code in several lines of C, and then run it on the processor inside the FPGA. But for me it is too easy. 2. The start bit is necessary, otherwise the receiver will not be able to determine the beginning of the character if the first bit of the character is 1.

3. Since Mojo has a clock frequency of 50 MHz, for 9600 baud each bit will take 50,000,000/9600, or about 5208 clock cycles. This is not a very high speed, so for the experiment I ran the code at a speed of 10 million bauds (counting up to five clock cycles for each bit) - and the circuit survived (the USB-to-serial interface only supports speeds up to 230400 bauds, so I checked the result on the oscilloscope). 4. In Verilog, it <= is a non-blocking assignment operator, and it = is a blocking assignment operator. Non-blocking assignments occur in parallel and are usually used for sequential logic (clock trigger). Block assignments are used for combinatorial logic (no clocks). This is a bit confusing, see details. here . 5. I used not a binary, but a binary-decimal counter, so the remainder of dividing by 5 is determined almost trivially, looking at the last digit. But the remainder of dividing by 3 is not trivial, so I had to leave the option with a counter. 6. It was not possible to direct serial input directly to a computer because it does not have a serial port. Instead, I used the USB-to-serial adapter , FTDI Friend by Adafruit. This adapter can also receive a 3.3-volt signal, unlike the uncomfortable +/− 15 volts in the original RS-232.

7. Debugging FPGA is completely different from debugging a program. Since FPGA is essentially a black box, you first need to check everything in the simulator, otherwise you will end up in the “FPGA-hell” with blinking LEDs, trying to understand what is happening. In the process of debugging, to simulate the circuit, you need to write a “testbench” - Verilog code, which gives different input data at different times ( example ). Then run the simulator (below) and check that the output is correct.


Xilinx FPGA

If something goes wrong, the simulator allows you to walk through the internal signals and identify the problem. After full testing in the simulator when running on a real FPGA, my code has only the most trivial problems. The main one was that I assigned a serial output to the wrong contact, so there was no output. 8. Spartan 6 FPGA supports several types of triggers. FDRE is a D-trigger with a synchronous reset / set input and clock operation.

9. Spartan 6 FPGA (CLB) configurable logic blocks are quite complex. They contain LUT modules, 8 triggers, wide multiplexers, logic, distributed RAM, and shift registers. Rigidly soldered components of these blocks slightly reduce flexibility, but greatly simplify the design of matrix switches. CLBs are described in detail in the CLB User Guide . Spartan 6 FPGAs have other types of blocks, including clock generation blocks and DSP blocks capable of quickly multiplying 18 × 18 bits. 10. An alternative to Verilog is VHDL , which is also supported by the development environment. Mojo also supports Lucid , a simpler FPGA language developed by the Mojo team. Mojo explain the language, and there’s another book on the subject. However, I decided it was better to learn a standard language than Lucid. 11. Mojo has no external RAM, but 576 kilobits of internal memory. Although this is a tiny amount compared to boards that have megabytes of external DRAM.