SAM S70 SPI works under debugger, fails under real time execution

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Need help from the collective group. I've been working on this problem for over a week now, and the solution evades me.

SAMS70Q21 processor, 300/150MHz, an FRAM SPI part on SPI0. Performing a memory test on the FRAM part (Cypress FM25L16B part) is successful for all 2048 byte locations while executing under the debugger in AS7. But if I run the code in real time, the memory test fails immediately. It also will fail if running under the debugger, and I step over a function that reads or writes the data buffer to or from the SPI part. The chip select line after that stops working, but if I step back thru the code under the debugger, then it starts to work again. The memory test is a do .. while() loop, so I can break at the bottom of the loop and begin to single step again.  The other thing to note here is, the code I'm using is imported from a AVR32 project that works all the time. So It is not may code failing, especially in light of the fact it works while stepping under the debugger. There must be something specific to the SAM S70 part that's causing this to happen, but what???

 

Does anyone have any thoughts or possible solution??

Thanks for your help.

Jerry

This topic has a solution.
Last Edited: Tue. Apr 17, 2018 - 08:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The one thing I could see different here is timing to the FRAM.  Is single step slowing the pin speed down to where the FRAM can keep up?

 

Obviously the SPI speed is the same, so I would look at time between setting up a transaction and the SPI handling it.

jeff

Last Edited: Mon. Apr 16, 2018 - 11:45 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

a) if you are using the caches then you have to make sure the content is actually being written into the transmission buffer / read from the receive buffer

a1) use the clean and invalidate cache functions (clean before sending) (invalidate after receiving, but before reading the ram content)

a2) make proper use of isb() and dsb() instructions when you write / read data from these buffers

 

if you dont want to perform cache clean and invalidate operations you can still:

a3) setup a non cached mpu section and place your communication buffers into there (note: you should create an explicit section in the linker script)

a4) make proper use of isb() and dsb() instructions when you write / read data from these buffers

 

b) if you are using the xdmac you have to properly place the isb() and dsb() instructions to make sure, the data is actually in the ram you want to read / write

 

c) if you are using buffers to store the send and the receive data (but are directly accessing the spi registers for send / receive) you still should make use of the dsb() and isb() instructions

 

why do i mention dsb and isb? ARM DDI 0403E.b - section A3.7 Memory access order - page page A3-94

 

dsb() and isb() also help against compiler optimization steps (e.g. starting a send transmission before the data is actually written into the internal ram)

 

 

a little hint concerning the chip:

- arm core and xdmac are two explicit bus masters who access the internal ram (slave)

  - it can happen that the arm core has written something into the internal ram, but xdmac has not yet been informed about the changes in the memory -> xdmac sends wrong data

  - it can happen that xdmac has written something into the internal ram, but the arm core has not yet been informed about the changes in the memory -> arm core "sees" wrong data

- memory access reordering can happen

Last Edited: Mon. Apr 16, 2018 - 02:05 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Jeff,

Thanks for the reply. The FRAM part is very fast access wise and interface wise, capable of running at 20MHz, I've tried 5MHz as well as 15MHz with no change in operation or symptoms. I've also added additional setup and release times for the Chip Select logic on the SPI port, with no change to the problem. So I don't believe this is the problem. So, I'll continue to find the solution, and will post the answer if I find it.

 

Jerry

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

E70tryhard, Thanks for your reply. So, first of all, I'm not using either the I or D cache at this point, nor am I using XDMAC, straight polled I/O of the data. Some additional info to be mentioned: I'm using ASF 3.38.0 for the SPI and SPI_master code. And for sure, in the past I have found problems with the Atmel libraries. So with this in mind I have looked through the SPI code more times than I can count, and I do not find any case where Atmel included the ISB or the DSB functions within the SPI routines. Maybe this is a mistake on their part??? I did run into the problem where as you stated, the instructions can be executed out of order, when I interfaced my 5" TFT display to the EBI bus. Under debug code the display would show the graphics properly, but under optimized code, nothing happened on the display at all. I added the DSB/ISB functions and the display will always function now. So yes I'm familiar with this problem. But I have not gone down this road yet with the SPI code, mostly because the Atmel code does not suggest this as a problem due to the lack of DSB or ISB functions in the library. That may well be a mistake on my part. Maybe the library code is to generalized to cover a lot of CPU's, and the need to add ISB/DSN calls is specific to the CPU. I could believe this.

With regard to your step a3), are you referring to setting up "Memory Regions", or is this something specific that must be done in the linker script? Do I need to change the SPI register interface to "Strongly Ordered"? I'm a little fuzzy in the area right now.

 

I will review your suggestions and appreciate your thoughts.

Jerry

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jmulchin wrote:

E70tryhard, Thanks for your reply. So, first of all, I'm not using either the I or D cache at this point, nor am I using XDMAC, straight polled I/O of the data. Some additional info to be mentioned: I'm using ASF 3.38.0 for the SPI and SPI_master code. And for sure, in the past I have found problems with the Atmel libraries. So with this in mind I have looked through the SPI code more times than I can count, and I do not find any case where Atmel included the ISB or the DSB functions within the SPI routines. Maybe this is a mistake on their part??? I did run into the problem where as you stated, the instructions can be executed out of order, when I interfaced my 5" TFT display to the EBI bus. Under debug code the display would show the graphics properly, but under optimized code, nothing happened on the display at all. I added the DSB/ISB functions and the display will always function now. So yes I'm familiar with this problem. But I have not gone down this road yet with the SPI code, mostly because the Atmel code does not suggest this as a problem due to the lack of DSB or ISB functions in the library. That may well be a mistake on my part. Maybe the library code is to generalized to cover a lot of CPU's, and the need to add ISB/DSN calls is specific to the CPU. I could believe this.

With regard to your step a3), are you referring to setting up "Memory Regions", or is this something specific that must be done in the linker script? Do I need to change the SPI register interface to "Strongly Ordered"? I'm a little fuzzy in the area right now.

 

I will review your suggestions and appreciate your thoughts.

Jerry

 

ah ok, you already know about the issue with dsb and isb.

 

my point with a3 is: if you are using I&D cache, you have to worry even a bit more about data actually being written / read from the correct memory. therefore it sometimes might be easier to group all your communication variables and buffers into an explicit section by adding the attribute section (so that the compiler knows where to place the variable). in consequence you have to create the corresponding section in the linker script. and for the cortex m7 you then also have to setup the mpu and add the memory region (with the correct settings for not allowing caching over that memory region) to the mpu.

 

concerning ASF: well .... no idea, i never actually used it. just sometimes to look up some little details.

 

 

did you already try and change the optimizer to O0 ... so no optimization? then you can at least get rid off side effects from the compiler

Last Edited: Tue. Apr 17, 2018 - 04:59 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's another thing interesting about this problem. The ONLY way the code works properly is while executing under the debugger, single stepping the code. Running it real-time under debug code or under any optimization code fails.

 

It sounds ike I'm going to have to setup some memory regions. Boy I was trying to avoid that, its an area that I just don't have any experience with.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, Found the problem and have a solution!!

The problem turned out to be the use of the "spi_write_single()" function in my code. Writing to the FRAM requires a 3 byte sequence to prepare for the actual data write. A <write cmnd>, <Addr high byte>, <Addr low byte>. If you do this using the "spi_write_single()", you will over-run the Tx buffer in the spi channel, The SPI port can have a write operation underway with a byte pending in the holding register. So using 3 "spi_write_single()" operations, one right after the other, will fail on the 3rd write because the SPI Tx is busy. It took a logic analyzer to see this happening.

 

The solution is to use the "spi_write_packet()" function instead, and setup a 3 byte buffer that holds the FRAM write, Addr High and Addr Low bytes. This works because the "spi_srite_packet()" function always checks for the SPI Tx buffer empty before sending another byte. The "spi_write_single()" function just sends bytes, and does NOT check to see if the Tx is busy. So use the "spi_write_single() with caution, because it will bite you for multiple byte write operations that are sequential by software design. If you think you need to use the "spi_write_single()" to send more than 2 bytes in rapid succession, then use the "spi_write()" function instead. It checks the Tx buffer condition as needed to ensure a Tx overrun does not occur. Please note that all of this information applies to anyone using the SPI functions in the ASF libraries. I happen to be using ASF 3.38.0 currently, but I'm sure it applies to all the other versions.

 

With regard to the ARM Cortex M7 memory barrier issue. This solution does not require any Memory Barrier operations now. I can now run the program under debug code in real-time, and under all optimization conditions as well. So no Memory Barriers needed at this time.

 

Thanks to all that responded to my problem.

Jerry