Slow PIO interrupt handling

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

I'm trying to read a full port when one of its bit raises. It works, but there are 2 things that bothers me:

 - I find the handling of the interruption slow

 - sometimes it's even slower.

 

I use a SAM4E8E at 120MHz

IRQ has the highest priority: 0

It's compiled with -O3 flag

 

Here is the interruption code:

void PIOD_Handler(void) {
	uint32_t portd;

	portd = PIOD->PIO_PDSR;
	pio_get_interrupt_status(PIOD);
	pio_set_pin_high(PIO_PC22_IDX);
	
	// Some code

	pio_set_pin_low(PIO_PC22_IDX);
}

On the next screenshots, the blue signal is the one on portD that that triggers the interrupt

The green one is PC22 that show the execution time of the interruption.

 

Here we can see what happens most of the time: the pin on portD goes high, the interruption is then executed.

 

screen1

 

The first thing that annoys me is why does it take nearly 1µs between pin change and PC22 change?

The asm code is quite small:

void PIOD_Handler(void) {
00404DA8   push	{r3, r4, r5, r6, r7, lr}		 
	portd = PIN_PP_CLK_PIO->PIO_PDSR;
00404DAA   ldr	r0, [pc, #304]		 
	pio_get_interrupt_status(PIOD);
00404DAC   ldr	r3, [pc, #304]		 
	portd = PIN_PP_CLK_PIO->PIO_PDSR;
00404DAE   ldr	r5, [r0, #60]		 
	pio_set_pin_high(PIO_PC22_IDX);
00404DB0   ldr	r6, [pc, #304]		 
	pio_get_interrupt_status(PIOD);
00404DB2   blx	r3		 
	pio_set_pin_high(PIO_PC22_IDX);
00404DB4   movs	r0, #86		 
00404DB6   blx	r6	

 

On this second screenshot, here is what happens sometimes (sometimes means rarely, but it happens)

 

screen2

 

The interruption takes 6µs, which is way too much for my application. My µC handles other things like USB CDC, but given that IRQ priority of USB is 5 and IRQ priority of PIO is 0, USB shouldn't be a problem.

Any idea what could cause such a long delay?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are your sure your not running at 24MHZ? Your execution time for the interrupt does not match your expected asm code execution time. It seems to be roughly 5x slower than 120MHz. I just estimated all that, but count your asm cycles and check execution time to verify clock speed... I bit bang a port to verify sometimes. Also is your port clock running at the expected speed? And is this the only interrupt enabled during the trouble? a variable latency of 1 to 6 seems like some another interrupt is eating cycles.

Last Edited: Fri. Jan 8, 2016 - 10:51 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I read your other post about O3 and O1 optimization, In gnu/gcc I have had O3 cause issues in the past in windows apps and thus dont use it. I use O2 or the default. I realize this is not windows but maybe worth mentioning since you might be using gnu/gcc

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Two functions are called. What is in those functions?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for your comments. I changed a little the beginning of the interrupt function to see more clearly the timing. Here is the ASM code:

void PIOD_Handler(void) {
00404000   push	{r4, r5, r6, lr}		 
	portd = PIN_PP_CLK_PIO->PIO_PDSR;
00404002   ldr	r0, [pc, #316]		 
	PIOC->PIO_SODR = 1 << 22;
00404004   ldr	r2, [pc, #316]

So it should take 3 step to put PC22 hight. 3 * (1/120MHz) = 25ns. My Scope show 462ns ...

 

I also tried -O2 compilation option without any change.

 

And while testing CPU speed I indeed saw something strange. I did a simple main:

int main (void)
{
	SystemInit();
	board_init();

	while(1) {
		PIOC->PIO_SODR = 1 << 22;
		PIOC->PIO_CODR = 1 << 22;
	}
}

the loop takes 3 ASM instructions:

004066FC   str	r3, [r2, #48]		 
004066FE   str	r3, [r2, #52]		 
00406700   b	#-8

And the frequency on PC22 pin is 24MHz. I expected it to be 120/3 = 40MHz

 

Additional information (even if those constants are not used in SystemInit() function), here is what I have in conf_clock.h:

#define CONFIG_SYSCLK_SOURCE        SYSCLK_SRC_PLLACK

#define CONFIG_SYSCLK_PRES          SYSCLK_PRES_2

#define CONFIG_PLL0_SOURCE          PLL_SRC_MAINCK_XTAL
#define CONFIG_PLL0_MUL             20
#define CONFIG_PLL0_DIV             1

#define CONFIG_USBCLK_SOURCE        USBCLK_SRC_PLL0
#define CONFIG_USBCLK_DIV           5

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I estimate 5 machine cycles for those 3 instructions - some bus operations add an extra cycle, so at 5 cycles, 24 *5 = 120.

 

There's quite a few instructions that take more than one cycle. Read the ARM doc ( it's a hefty read) to get the lowdown. Also, depending on the address for a read or write can incur an extra cycle.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok thanks. Indeed STR takes 2 cycles. That shows that I'm fine with clock setup. SysClk is at 120MHz. Then why does it take so long to enter my PIO handler?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not having read the datasheet on the sam4e, but see if there is an input delay due to filtering. Also look at the speed of the peripheral clock.

Edit: i did just have a quick look at the datasheet. Page 702.... There's filtering available, so that might be an explanation.

Last Edited: Mon. Jan 11, 2016 - 10:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For the filtering: I indeed had only Glitch filtering activated. But I already tried removing this, and that doesn't help. Glitch filtering only remove glitchs < 1/2 peripheral clock anyway, so it must not add a long delay at all.

 

I don't think I can choose the speed of the peripheral clock. I can just activate it or not.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Make the simplest code that exhibits the problem and send to Atmel. Make sure there are no other interrupt sources enabled. The problem could be cache configuration.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wrote a very small project, and I see that the delay is now 200ns. Time to raise output pin in the interrupt handler is 4 CPU cycles (33ns). That means 2 things:

 - something in my projects slow interrupt handling to 460ns (and sometimes to 6us!). No idea what it could be.

 - Even in the small project, that means that interrupt handling takes 160ns (19 CPU cycles) isn't it strange ?

 

PS: for information, glitch filtering adds 4-5ns delay.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

asterix86 wrote:
- Even in the small project, that means that interrupt handling takes 160ns (19 CPU cycles) isn't it strange ?

19 CPU cycles sounds about right for the Cortex M4 with 5 flash wait states (CPU running at 120MHz).

 

Interrupt latency is 12 cycles plus 5 cycles for flash wait states puts you at 17

From ARM https://community.arm.com/docs/DOC-2607

 

The interrupt latency listed in table 1 makes a number of simple assumptions:

  • The memory system has zero wait state (and with resources not being used by other bus masters)
  • The system level design of the chip does not add delay in the interrupt signal connections between the interrupt sources and the processor
  • The Interupt service is not blocked by another current running exception/interrupt service
  • For Cortex-M4, with FPU enabled, the lazy stacking feature is enabled (this is the default)
  • The current executing instruction is not doing an unaligned transfer/bitband transfer (which can take 1 extra transfer cycle)
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was about to summarize my progress about this issue, and I indeed found that page which validate the small project behavior.

 

I also found what caused the sometimes 6us delay. In fact this delay happen in a periodic manner: every exactly 100ms 5 or 6 times, then 9 seconds later 5 or 6 times every 100ms exactly, etc. And the cause of that is USB stack handling. Even without sending or receiving anything over USB this happens very periodically. Removing udc_start() call from my project solved this 6us delay. Of course it's not a solution as I need USB in my project. So there must be something in the USB CDC implementation that prevent IRQ handling. There are a lot of cpu_irq_save() / restore() which probably causes that.

 

That doesn't seem to be the problem that causes my full project to react in 500ns instead of 200ns though.