optimizing away null loops (or rather, doing so SOMETIMES but not others.)

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is there some compile-time (of gcc) or run-time option that controls whether "empty loops" are optimized away to nothing?  Or did something change to explicitly detect delay loops?

I was looking at some code recently that contained one of the usual delay loops:

		for(int i=0; i<900000; i++){}  //Run a few cycles doing nothing

and with the same optimization level (-Os), some versions of gcc produce code that implements the empty loop (arm-8-2019-q3. from ARM), and some versions (7-2017-q4-major, from Arduino) just remove it (as I'm used to on AVR and etc.)

 

Last Edited: Fri. Oct 22, 2021 - 01:08 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In particular, the following code, compiled with arm-gcc version 5.4, 6, 8, 9, or 10 and -Os, -O2, or -O3 will optimize away the loop in delay(), but NOT the for loop in main() ??

void delay() {
  for (int i=0; i < 9000000; i++) {}
}

int main() {
  while(1) {
    for(int i=0; i<9000000; i++){}  //Run a few cycles doing nothing
  }
}

arm gcc 7 optimizes away both loops.

 

from gcc 10:

/Downloads/gcc-arm-10/bin/arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -g -Os -Wall -Wextra loop.c -c; arm-objdump -S loop.o

loop.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <delay>:
void delay() {
  for (int i=0; i < 9000000; i++) {}
}
   0:   4770            bx      lr

Disassembly of section .text.startup:

00000000 <main>:

int main() {
   0:   4b02            ldr     r3, [pc, #8]    ; (c <main+0xc>)
  while(1) {
    for(int i=0; i<9000000; i++){}  //Run a few cycles doing nothing
   2:   3b01            subs    r3, #1
   4:   2b00            cmp     r3, #0
   6:   d1fc            bne.n   2 <main+0x2>
   8:   e7fa            b.n     0 <main>
   a:   46c0            nop                     ; (mov r8, r8)
   c:   00895440        .word   0x00895440

 

(I'm not happy about the extra "cmp" instruction, either.  The subs will have set the flags.  with cpu=cortex-m4 it does better.)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
void    usleep( uint32_t TimeUS ) __attribute__((used, optimize("O3")));

void    usleep( uint32_t TimeUS )
{
    ...
    for ( uint32_t cc = 0; cc < SysClksUS; ++cc )
    {
        // suppress optimizing out this otherwise useless loop
        asm volatile ("" : : "rm" ( cc ) );
    }
}

I'm using this for example to keep such loops using GCC (since GCC7 up to and including GCC10).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would recommend using the SysTimer for delay loops, otherwise there is always a change of the delay being optimized away, or the CPU caching giving unreliable results.  The Sys Timer is a peripheral on the CPU and is present on all ARM CPUs.  Here is an example

 

void DelayMs(uint32_t mS)
{
    int i;

    SysTick->LOAD = 4000 - 1;                                               /* reload with number of clocks per millisecond */
    SysTick->VAL = 0;                                                       /* clear current value register */
    SysTick->CTRL = SysTick_CTRL_ENABLE_Msk | SysTick_CTRL_CLKSOURCE_Msk;   /* Enable the timer */
    for(i = 0; i < mS; i++)
    {
        while((SysTick->CTRL & 0x10000) == 0);     /* wait until the COUNTFLAG is set */
    }
    SysTick-> CTRL = 0;                            /* Stop the timer (Enable = 0) */
}

 

John Malaugh

Last Edited: Fri. Oct 22, 2021 - 04:42 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I already know how to implement accurate delay loops by either forcing the compiler not to eliminate the loops, or by using assorted timers.

In this case, I'm trying to understand the compiler behavior...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm trying to understand the compiler behavior...

No way ;) Well, I assume that useless code without side effects is optimized out. Nowadays this is usually the case.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Apparently it's a bug that slipped in in v8.

It "might" get fixed in 10 and 11...

https://gcc.gnu.org/bugzilla/sho...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for reporting back yes