Function specific optimization not working

Go To Last Post
29 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm trying to set the optimization level of a single function using the atrribute as below:

 

__attribute__((optimize("O3"))) void test(void)

{

function code

}

 

I also did this in the function declaration.  This function follows the global optimization setting instead and ignores this function specific optimization.  In other words if I set the global to debug, this routine follows that level of optimization.  Any clue as to why?

Last Edited: Sat. Feb 5, 2022 - 01:09 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nafai wrote:
This function follows the global optimization setting instead and ignores this function specific optimization

How do you determine that?

 

EDIT

 

Optimize attribute documentation: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Function-Attributes.html

 

which notes that you can also do it with #pragma: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas

 

Have you also tried the #pragma approach?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Fri. Feb 4, 2022 - 06:39 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Nafai wrote:
In other words if I set the global to debug, this routine follows that level of optimization.

 

How did you determine which level of optimization it follows?

 

If you figured that out by inspecting/comparing the generated code for different levels of optimization, then why haven't you provided a sample in your question?

Dessine-moi un mouton

Last Edited: Fri. Feb 4, 2022 - 06:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I determine the optimization using this code which must put out specific size pulses on to a port. when globally optimized for O3 a pulse is about 100ns, but goes out to 1.88us when i switch to debug even with the attribute set to o3.

 

I have not tried pragma, i'll give that a shot now.

 

for(x=0;x<64; x++)
    {
    shifter=matrixdata[x];
    for(y=0;y<16;y++)
        {
        gpio_set_pin_level(MATRIX_SIN,shifter&0x0001); //write out lsbit
        shifter>>=1; //shift data to next switch value            
        for(z=0;z<9; z++) //delay for the required 20ns setup time (make it 30 for safety)
            asm("nop");
        gpio_set_pin_level(MATRIX_SCLK,true);     //pulse sclk high            
        for(z=0;z<7; z++) //delay for the required 40ns hold time (make it 60 for safety)
            asm("nop");
        gpio_set_pin_level(MATRIX_SIN,false); //set data low to minimize radiated energy
        for(z=0;z<6; z++) //this makes the pulse high time 100ns which is the shortest pulse allowed
            asm("nop");
        gpio_set_pin_level(MATRIX_SCLK,false);     //set sclk back low again
        for(z=0;z<438; z++) //this makes the time low 5ms total which seems to be pretty low noise
            asm("nop");
        }
    }

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nafai wrote:
code which must put out specific size pulses on to a port

Sounds like  a job for assembler - rather than trying to massage the compiler ...

 

Please post the compiler's generated  assembler for each case 

 

Please see Tip #1 in my signature, below, for how to properly post source code:

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
Sounds like  a job for assembler

Sounds more like a job for  __builtin_avr_delay_cycles() defined as:

 

 extern void __builtin_avr_delay_cycles(unsigned long);

https://gcc.gnu.org/onlinedocs/gcc/AVR-Built-in-Functions.html

 

Note: I assume OP is using avr-gcc. (A reasonable assumption I believe)

Last Edited: Fri. Feb 4, 2022 - 08:50 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I looked into doing it all in assembly, and yes, it would be better, but looking at the dissassembly, that code seems so all over the place.  Too much to post.  With the port output included there is a lot going on. 

 

Pragma is no improvement.

 

Does it seem there is no way to get the __attribute__ optimize to work?  I mean it should!  The documentation indicates it should, and with O3 the pulses are what they should be, so that certainly seems like the best solution. 

 

Does this work for other users?  I'm using Atmel Studio 7 and a SAMS70 chip.

 

In the end it doesn't matter as I'll be compiling the whole thing at O3, but I was looking at that so I could continue to debug with that section still working at the O3 level.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

N.Winterbottom wrote:

awneil wrote:

Sounds like  a job for assembler

 

Sounds more like a job for  __builtin_avr_delay_cycles() defined as:

 

 extern void __builtin_avr_delay_cycles(unsigned long);

 

 

I originally tried something similar for the SAMS70 chip I'm using under Atmel Studio 7.  It's called delay_us(us) and unbelievably it too is sensitive to global optimization level.  It is MUCH longer (like 10X or more) when used under different optimizations.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How fast is your clock?

At 20 MHz, a cycle is 50 ns.

 

As a workaround to the original complaint,

you might put the function in a separate file.

 

Also, the code looks simple enough that manual

compilation into assembly ought not be too difficult.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nafai wrote:
SAMS70

So why posting in the AVR section?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

How fast is your clock?

At 20 MHz, a cycle is 50 ns.

 

As a workaround to the original complaint,

you might put the function in a separate file.

 

Also, the code looks simple enough that manual

compilation into assembly ought not be too difficult.

 

300Mhz.  the port routines are very slow as they have tons of lines of assembly code when not using optimization.  Faster with 03 but not super great. 

 

I may still go that route, although I'm studying assembly from a document and not from the disassembly code as I just don't get that code, it can be very scattered and cryptic.  As I learn more, I'll give it a go. 

 

But as I said, for now, I'll just do global 03 and it will work and ignore it if I'm debugging.  I have a lot of other code to build.

 

I am disappointed though that the attribute route just doesn't work, especially since it should.  It was the simplest of all solutions.

 

Ken

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
So why posting in the AVR section?

 

I felt the attribute command was generic enough to post here since it is a compiler directive and not really related to the micro, although perhaps it is not working because of the micro I'm using, I don't know.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nafai wrote:
This function follows the global optimization setting instead and ignores this function specific optimization

So what was the global optimisation?

 

Could be that the differences in -O3 only really show up on a wider scale ... ?

 

https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

N.Winterbottom wrote:
Note: I assume OP is using avr-gcc. (A reasonable assumption I believe)

How wrong was I ?

 

Nafai wrote:
SAMS70 ... 300Mhz

In that case you're on a fool's errand; especially so if that processor has an instruction cache.

 

You should use hardware instead. I recommend either:

 

  1. SPI interface fed with a byte sequence that in effect generates a bit stream.
  2. Employ a hardware timer and spin wait until the pre-calculated count is found.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Three of the delay loops could be unrolled by hand.

The other, 438 = 19*23 + 1 ,

could use help from some macros.

 

Edit: That said, cycle-counting delay loops often does not work well.

Moderation in all things. -- ancient proverb

Last Edited: Fri. Feb 4, 2022 - 10:39 PM