ATSAME70 is too slow

Go To Last Post
6 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi there,

 

we are working on an audio project, where we move some firmware from an STM32F407 (ARM Cortex M4) to an ATSAME70 (ARM Cortex M7). Despite the ATSAME70 runing at 300 MHz, while the STM32F407 runs at only 168 MHz, the ATSAME70 is definitely slower in execution speed. The MCU clocks have been checked and are (seemingly?) correct.

 

Here some details:

  • Audio is captured by an extern audio codec (SGTL5000) and transferred via I2S to the MCU using 16 bit stereo at 44.1 KHz sampling rate. On the MCU the DMA collects 32 stereo samples before it calls an ISR, where the captured data is elaborated in realtime.
  • In the ISR a reverberation algorithm is called. The entire source code uses only intern memory and is optimized at "O3" optimisation level on both MCUs.
  • The ISR execution time is 0.726 msec (= 1/(44100/32)). All times have been measured via oscilloscope, by setting and reseting a GPIO pin at start and end of the ISR.
  • I made sure the ATSAME70 runs actually at 300 MHz by programming Timer0 to output a 1 kHz signal, which is correct.

 

Now, on the STM32F407 the algorithm needs about 0.22 msec of time to execute - on the ATSAME70 it needs instead 0.40 msec, which is far too long. What am I missing here?

 

Please give me any clue you can think of and thanks a lot in advance!
Michael

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

muragavino wrote:

In the ISR a reverberation algorithm is called. The entire source code uses only intern memory and is optimized at "O3" optimisation level on both MCUs.​

Does the algorithm use floating point math?  If so, does the project use the hardware FPU or software emulation?  Check both the compiler and linker flag -mfpu=<value>

Does the project take advantage of the high speed Tightly Coupled Memory, specifically ITCM for code execution?  Or is the code run from internal flash memory, possibly with less than ideal wait states?

 

 

 

Last Edited: Tue. Feb 20, 2018 - 10:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not much information here:

 

Are you using ASF 3 or 4?  Not using ASF.

 

How about posting small program that is just the reverberation algorithm with some test data.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

if everything is executed out of flash and you are not using TCM and / or you are not using I&D cache .... then yeah ... the E70 can be slower .... just have a look at the block diagram in the datasheet ....

 

normal flash accesses are done through AXIM -> AXI Bridge -> Bus-Matrix -> Flash ..... more stages to get code .... generally speaking ... avoid flash accesses

the Bus-Matrix only runs at 150MHz ... not 300MHz ... dont forget that

then concerning data transfer ... there are ways you can influence the performance of xdmac .... e.g. memory burst size, same xdmac port, ...

 

 

already noticed such behaviour when i ported some code from a different processor to E70 ... but if you are using I&D cache and / or TCM .... yup ... then stuff gets really fast

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you use floating point at all ...

May be you use double precision floating point math (64bit double) on the SAME70.

The Cortex M4 has only single precision (32bit) which is faster.

 

On the M7 single precision is possible too with correct compiler and linker settings (which I don't remember now).

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
  • I made sure the ATSAME70 runs actually at 300 MHz by programming Timer0 to output a 1 kHz signal, which is correct.

 Surely with the usual Atmel Clocking system complexity,  just because the timer is running at 300MHz doesn't mean that the CPU is also running at 300 MHz...

 

But I also suspect issues with either the cache or floating point settings.   If you're using single precision floating point, there's a "-fsingle-precision-constant" gcc option that you need to prevent expressions involving compile-time constants from being evaluated using doubles; it's easy for me to imagine that being set somewhere in a STM IDE, but not in the Atmel IDE...