ARMs for the smallest: layout-2, interrupts and hello world!

  • Tutorial


I found the opportunity to “finish off” the cycle with another article, where I will summarize a small result. In fact, only now we got to the point where we usually start programming:
  • consider a “complex” GNU ld build script;
  • Learning to use interrupts
  • finally get to hello world!


Previous articles in the series:


Examples of code from the article: https://github.com/farcaller/arm-demos
Last time we found out which sections we might encounter in a compiled application and what their typical content is. In particular, we dealt with .data and .bss . Let me remind you that .data global (static) variables with the value specified during compilation are stored in. This section must be copied from flash memory to RAM. In .bss global variables with a zero value are stored, it must be reset.

In typical conditions, procedures from crt0.a (Wikipedia suggests that this name means C RunTime 0, where 0 means the very beginning of the life of the application). Today we will write an analog crt0 for our toy platforms.

Disclaimer. In GNU ld, many of the same things can be done in different ways, using syntax variations and layout flags. All of the methods described below are a figment of my imagination, written under the influence of layout scripts from LPCXpresso. If you know a more effective method for solving any of the described situations, write to me!

In-memory data initialization


Check out the file 04-helloworld/platform/protoboard/layout.ld . In general, there are no significant changes relative to the previous version: several constants, description of memory, sections. Let's look at a section .data for an example:
.data : ALIGN(4)
{
    _data = .;

    *(SORT_BY_ALIGNMENT(.data*))
    . = ALIGN(4);

    _edata = .;
} > ram AT>rom = 0xff


A section .data with alignment of 4 bytes is written to the output file .data (i.e., if the cursor points to the address 0x00000101 before this section, then it .data starts from 0x00000104). The section is located in RAM ( > ram ), but it is loaded from flash memory ( AT>rom ).

The design =0xff defines the fill pattern. If unaddressed bytes are formed in the output section, their value will be set to the value of the byte-placeholder. 0xff is selected for the reason that the erased flash memory is all units, i.e. writing 0xff (unlike 0x00, for example) is an empty operation.

Next, the _data current cursor position is saved. Since the section is in RAM, then _data will indicate its very beginning, in this case: 0x10000000.

In turn, in the section, all source sections with names starting with are copied .data from all input files, while sorting them by size. Sorting plays a very important role, consider it with an example:
uint16_t static_int = 0xab;
uint8_t  static_int2 = 0xab;
uint16_t static_int3 = 0xab;
uint8_t  static_int4 = 0xab;


Four variables are defined for the section here .data . What gets into the final file?
.data           0x0000000010000000        0xc load address 0x00000000000007b0
                0x0000000010000000                _data = .
 *(.data*)
 .data.static_int2
                0x0000000010000000        0x1 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000000                static_int2
 *fill*         0x0000000010000001        0x3 ff
 .data.static_int3
                0x0000000010000004        0x4 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000004                static_int3
 .data.static_int4
                0x0000000010000008        0x1 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000008                static_int4
 *fill*         0x0000000010000009        0x1 ff
 .data.static_int
                0x000000001000000a        0x2 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x000000001000000a                static_int
                0x000000001000000c                . = ALIGN (0x4)
                0x000000001000000c                _edata = .

Notice the *fill* -bytes that align variables around the word boundary. Due to a bad order, we lost 4 bytes just like that. Repeat the operation, this time using SORT_BY_ALIGNMENT:
.data           0x0000000010000000        0x8 load address 0x00000000000007b0
                0x0000000010000000                _data = .
 *(SORT(.data*))
 .data.static_int3
                0x0000000010000000        0x4 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000000                static_int3
 .data.static_int
                0x0000000010000004        0x2 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000004                static_int
 .data.static_int2
                0x0000000010000006        0x1 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000006                static_int2
 .data.static_int4
                0x0000000010000007        0x1 build/d0f0154f60ed1a9c2083183e7c731846451d2bdb_helloworld.o
                0x0000000010000007                static_int4
                0x0000000010000008                . = ALIGN (0x4)
                0x0000000010000008                _edata = .

The variables are neatly sorted, and we saved a bunch (33%) of memory!

We return to the cursor, which now immediately indicates the end of all .data . The design . = ALIGN(4) aligns the cursor (if the data in the input sections is insufficient for full alignment) along the word boundary. The final value is written to _edata .

In addition to the addresses in the memory, we need to know where the section is in the flash memory, for this purpose at the beginning of the scenario declared symbol: _data_load = LOADADDR(.data) . LOADADDR is a function that returns the load address of a section. Besides it there are some more interesting functions: ADDR returns a “virtual” address, SIZEOF - section size in bytes.

Take a look at the initialization code section .data , 04-hello-world/platform/common/platform.c :
uint32_t *load_addr = &_data_load;

for (uint32_t *mem_addr = &_data; mem_addr < &_edata;) {
    *mem_addr++ = *load_addr++;
}

In the loop, we copy the values ​​from load_addr to mem_addr .

Typically, this initialization is carried out as early as possible, if possible - as one of the very first tasks. There is a quite reasonable explanation for this: before initialization, access to global variables from C will return “garbage”. In our case, initialization is carried out after the call platform_init , since this function does not depend on the data in .data / .bss , and its execution will allow the subsequent code to be executed faster, which, in the end, will give a performance boost. The minus was the appearance of a separate one platform_init_post , where the global variable is initialized with the value of the system bus frequency.

The last section is /DISCARD/ - is special, it's a kind of / dev / null linker. All incoming sections will be simply thrown away (as you remember, if a section is not specified explicitly, it will be automatically added to a suitable memory area). This section is described more for clarity, since the input sections in the case of the ARMv6-M0 are guaranteed to be empty.

About different interrupts


Pay your attention to the slightly modified first section .text , where two new ones fall: .isr_vector and .isr_vector_nvic . Both are wrapped in a KEEP instruction , which prevents the linker from “optimizing” them as unnecessary. .isr_vector contains a common interrupt table for Cortex-M, which can be examined in the file platform/common/isr.c :

__attribute__ ((weak)) void isr_nmi();
__attribute__ ((weak)) void isr_hardfault();
__attribute__ ((weak)) void isr_svcall();
__attribute__ ((weak)) void isr_pendsv();
__attribute__ ((weak)) void isr_systick();

__attribute__ ((section(".isr_vector")))
void (* const isr_vector_table[])(void) = {
    &_stack_base,
    main,             // Reset
    isr_nmi,          // NMI
    isr_hardfault,    // Hard Fault
    0,                // CM3 Memory Management Fault
    0,                // CM3 Bus Fault
    0,                // CM3 Usage Fault
    &_boot_checksum,  // NXP Checksum code
    0,                // Reserved
    0,                // Reserved
    0,                // Reserved
    isr_svcall,       // SVCall
    0,                // Reserved for debug
    0,                // Reserved
    isr_pendsv,       // PendSV
    isr_systick,      // SysTick
};


As you can see, we have moved away from declaring a table in an assembler file and describing it in C terminology. We also introduced independent interrupt handlers (instead of one common one hang ). All these handlers by default execute an infinite loop (although isr_hardfault I slipped a debugging LED a couple of times while writing examples for the article), but since they are declared with an attribute weak , they can be redefined in any other file. For example, timer.c there is its own implementation isr_systick , which will fall into the final image.

The continuation of the table is made in a similar structure isr_vector_table_nvic , since it already depends on the specific processor, but the essence remains the same.

And about interruptions


Let's say a little more about interrupts. The general essence of interruptions is the call of the handler as a reaction to any external events (relative to the code that is executed at the time of the event). A nice feature of Cortex-M: the processor itself will pack / unpack the register values, so interrupts can be written like normal functions in C. Moreover, the nesting of interrupts will also be worked out automatically.

NVIC - a nested vector interrupt controller handles interrupt from the periphery behind the ARM core. It allows you to set different interrupts with different priorities, disable them centrally or generate an interrupt programmatically.

Let's look at the new systick-based timer implementation:
static volatile uint32_t systick_10ms_ticks = 0;

void platform_delay(uint32_t msec)
{
    uint32_t tenms = msec / 10;
    uint32_t dest_time = systick_10ms_ticks + tenms;
    while(systick_10ms_ticks < dest_time) {
        __WFI();
    }
}

// override isr_systick from isr.c
void isr_systick(void)
{
    ++systick_10ms_ticks;
}

The standby cycle puts the processor into interrupt standby mode (sleep mode) until the system counter exceeds the required value. At the same time, every 10 ms, SysTick overflows and generates an interrupt, by which it isr_systick increments the counter by 1. Note that it is systick_10ms_ticks declared as volatile , this allows the compiler to understand that the value of this variable can (and will) change outside the current context, and it should be re-read each time from main memory (where the interrupt handler will change it).

libgcc


In this code, we use the division operation for the first time. It would seem that this is complicated, but there is no hardware instruction for division in Cortex-M0 :-). The compiler knows about this, and instead of a division instruction, inserts a function call __aeabi_uidiv that divides the numbers programmatically. This function (and a few more similar ones) are implemented in the compiler support library: libgcc.a. Unfortunately, our linker does not know anything about it, and we come across an unpleasant error:
build/5a3e7023bbfde5552a4ea7cc57c4520e0e458a53_timer.o: In function `platform_delay':
timer.c:(.text.platform_delay+0x4): undefined reference to `__aeabi_uidiv'

The correct solution is to replace the linker call directly with the gcc call, which will already figure out what to link to. True, gcc may overdo it somewhat, so we inform it through -nostartfiles that we have our own initialization code, and through -ffreestanding that our application is independent and does not depend on any OS.

Finally, hello habr!


This version is somewhat significant, since it has a UART driver, which means that we will see the real work of our code not only by a flashing LED. But first, the driver:
platform/protoboard/uart.c
extern uint32_t platform_clock;

void platform_uart_setup(uint32_t baud_rate)
{
    NVIC_DisableIRQ(UART_IRQn);
First of all, we will turn off the interrupt on the NVIC if it was enabled.
    LPC_SYSCON->SYSAHBCLKCTRL |= (1<<16);

    LPC_IOCON->PIO1_6 &= ~0x07;
    LPC_IOCON->PIO1_6 |= 0x01;

    LPC_IOCON->PIO1_7 &= ~0x07;
    LPC_IOCON->PIO1_7 |= 0x01;
Next, we turn on the microcontroller block, which is responsible for configuring the pins, and configure them in TXD / RXD UART mode. This code shed a lot of my blood when I tried to understand why UART does not work after a reboot. Be careful, sometimes obvious things turn off by default!
    LPC_SYSCON->SYSAHBCLKCTRL |= (1<<12);
    LPC_SYSCON->UARTCLKDIV = 0x1;
Now you can turn on the UART itself, and at the same time set the input frequency divider.
    LPC_UART->LCR = 0x83;

    uint32_t Fdiv = platform_clock     // системная частота
            / LPC_SYSCON->SYSAHBCLKDIV // разделенная на делитель для периферии
            / LPC_SYSCON->UARTCLKDIV   // на делитель самого UART
            / 16                       // и на 16, по спеке
            / baud_rate;               // и, наконец, на бодрейт

    LPC_UART->DLM = Fdiv / 256;
    LPC_UART->DLL = Fdiv % 256;

    LPC_UART->FDR = 0x00 | (1 << 4) | 0;

    LPC_UART->LCR = 0x03;
In addition to the classic 8N1 mode, we open access to output dividers that set the bitrate. We calculate the divisors and write them into registers. For the curious, the formula is in section 13.5.15 of the manual. In addition, it describes an additional divider for even more accurate body rate. In my tests, the 9580 worked quite well :-)
    LPC_UART->FCR = 0x07;

    volatile uint32_t unused = LPC_UART->LSR;

    while(( LPC_UART->LSR & (0x20|0x40)) != (0x20|0x40) )
        ;
    while( LPC_UART->LSR & 0x01 ) {
        unused = LPC_UART->RBR;
    }
We turn on the FIFO, reset it, make sure that some strange data is not buried in the registers.
    // NVIC_EnableIRQ(UART_IRQn);
    // LPC_UART->IER = 0b101;
We include interruptions on reception (actually not). There is no interrupt handler in the example, so we do not need interrupts either.

For LPC1768, the code is very similar, so I will not parse it. I only note that there all peripherals are turned on when loading, which simplifies the situation.

An important point: mbed has three UARTs displayed outside, and several pin options for each. Since USB communication would take significantly more code, you will have to hook the FTDI cable to UART, in the example these are pins P13 / P14.

To summarize


We figured out the linker, we have a ready-made backbone where you can expand the database and write drivers. Or even take CMSIS and a demo from the manufacturer (just read the code, the examples in LPCXpresso have typos of varying degrees of sadness).

I have enough ideas for further articles, but I don’t have enough time, too many interesting things have not yet been programmed! I will try, nevertheless, to return to the “microworld” of embeddates after the “macrocosm” of office days.

P.S. Как всегда, большое спасибо pfactum for proofreading text.

Лицензия Creative Commons This work is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported . The program text for the examples is available under the Unlicense license (unless otherwise expressly indicated in the file headers). This work is written solely for educational purposes and in no way affiliated with the current or previous employers of the author.