SAME70 USB + CACHE

Go To Last Post
33 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

I have a problem using USB on the SAME70 Xplained board.

 

If I compile the SAME70_CDC_EXAMPLE as it stands then the USB connects.

If I enable the caches using BOARD_ENABLE_CACHE then the PC end says that the USB device has malfunctioned.

 

Any ideas?

Thanks

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If dma is used, then the cache rows in question need to be invalidated. Don't ask me how you do this on a e70.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not sure if DMA is being used or not as I am using the Atmel supplied ASF component to handle the USB.

I would have expected the cache invalidation etc. to be handled in the library?

 

I have the same problem in my application but as it happens also in an unmodified example I don't think it is because of anything that I have done.

However there may be something I can do about it if I can work out exactly what.

 

Thanks

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi there,

 

I do have the same problem: CONF_BOARD_ENABLE_CACHE is needed to make e.g. the SD/MMC driver work, but it blocks the USB (MSC service in my case) from working properly. And of course I need both together.

I filed a bug report about (http://asf.atmel.com/bugzilla/sh...) and have been answered the problem being "in the backlog of ASF list", whatever that means.

 

Has anybody found out how to solve the issue?

 

Thanks!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am not expert regarding DCache. But I think using DMA and DCache togehter is probably quite tricky and needs well written drivers. I made the same observation, that ASF USB MSC Service does not work with DCache enabled.

As a workarround you can just enable the ICache by calling SCB_EnableICache in your code. This will already give you a significant performance boost in and it worked for me with USB MSC.

SD-Card driver worked for me also with all Caches disabled, I cannot see how it needs CONF_BOARD_ENABLE_CACHE to work properly.

 

Apart from this: In some Headerfile there was missing a #define for the E70 to use High-Speed USB. This was at AS7 build 790....

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi adjsw,

 

thanks for your reply. I tried your workarround, but unfortunately it doesn´t do the trick for me.

 

Have you used the SD-Card driver for SPI or for MCI?

 

I use MCI, which definitely needs both I- and D-cache to be enabled. 

Also I have no Idea how to setup the SPI driver on the SAME70 Xplained board, since the pins for the SD-Card slot are not connecting to any SPI port. Or is there something I ignore?

 

In any case, thank you very much!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

I was using MCI both with DVK and also with a custom design. No problem without Caches. I can also not see why the caches would be necessary, they are bascically just a performance boost to hide main memory latencies.

Are you also on a SAME70 platform?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

 

Regarding SAME70 and the SAME70 Xplained board: Atmel checked for the bug I filed and setup a temporary solution which allows to run the SD-Card (MCI) without cache (exactly that was not possible before).

The details can be found here: http://asf.atmel.com/bugzilla/sh...

 

Thanks to everybody for the help!

 

I was using MCI both with DVK and also with a custom design. No problem without Caches. 

I guess there is some further development under way to implement the (SAME70 ?) cache in the ASF and the problem arose only recently. As said in the bug report, at Atmel they are working to enable the cache also for USB. Since I can live without using the cache, to me this fix is fine.

Last Edited: Thu. Sep 8, 2016 - 12:37 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

We are still affected by this issue and I just wanted to ask if there was any progress done on this ? I event tried to use the USB stack form the Atmel Software pack 1.5 instead of the one found in Atmel Studio, but the issue persist - I can't enumerate a usb device when operating as a host.

 

Marko

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, this http://asf.atmel.com/bugzilla/show_bug.cgi?id=3747 bug seems still to be valid... sad

I also tried USB Host Vendor example adapted for SAME70-XPLD board and it didn't work when CONF_BOARD_ENABLE_CACHE is defined.

In \src\ASF\sam\drivers\usbhs\usbhs_host.c, the example has CONF_BOARD_ENABLE_CACHE_AT_INIT (notice the different name) but defining it doesn't help: both RX and TX are affected.

For example at TX tried combinations of one or even both of the following (immediately after preparing the TX buffer, main_vendor_buf_out) but it didn't work:

_dcache_flush(main_vendor_buf_out, len /* tried even with full length 1024 */ );
_dcache_flush((void *)USBHS_RAM_ADDR, UHD_PIPE_MAX_TRANS); // 0xA0100000,0x8000=32KB

Same at RX no luck with any of these:

_dcache_invalidate(main_vendor_buf_in, in_payload_trans /* tried even with full length 1024, but the same */ );
_dcache_invalidate((void *)USBHS_RAM_ADDR, UHD_PIPE_MAX_TRANS); // 0xA0100000,0x8000=32KB

 

 

The first workaround was to flush whole d-cache: SCB_CleanDCache before USB TX, respectively flush+invalidate SCB_CleanInvalidateDCache after USB RX.

But this defeats the d-cache purpose (to flush 32KB entirely when I want to send just 49 bytes...).

 

 

The second workaround was to place the RX/TX buffers in a non-cacheable area:

//! Output buffer for vendor class test
#if defined MPU_HAS_NOCACHE_REGION
COMPILER_SECTION(".ram_usbnocache")
#else
COMPILER_ALIGNED(32)
#endif // MPU_HAS_NOCACHE_REGION
static uint16_t main_vendor_buf_out[MAIN_VENDOR_LOOPBACK_SIZE];

Where the area "ram_usbnocache" is defined in src\ASF\sam\utils\linker_scripts\same70\same70q21\gcc\flash.ld:

MEMORY
{
  rom (rx)  : ORIGIN = 0x00400000, LENGTH = 0x00200000
  ram (rwx) : ORIGIN = 0x20400000, LENGTH = 0x0005f000
  /* For ram_usbnocache, see MPU_HAS_NOCACHE_REGION and NOCACHE_SRAM_REGION_SIZE */
  ram_usbnocache (rwx) : ORIGIN = 0x2045f000, LENGTH = 0x00001000
}


/* Section Definitions */
SECTIONS
{
    .text :
    {
        ...
    } > rom

    /* heap section */
    .heap (NOLOAD):
    {
        ...
    } > ram

    /* For ram_usbnocache, see MPU_HAS_NOCACHE_REGION and NOCACHE_SRAM_REGION_SIZE */
    .ram_usbnocache (NOLOAD) :
    {
        . = ALIGN(32);
        *(.ram_usbnocache);
    } > ram_usbnocache 

    . = ALIGN(4);
    _end = . ;
    _ram_end_ = ORIGIN(ram) + LENGTH(ram) -1 ;
}

Ensure these in conf_board.h:

#define CONF_BOARD_ENABLE_CACHE
#define MPU_HAS_NOCACHE_REGION
#define CONF_BOARD_CONFIG_MPU_AT_INIT

Also in src\ASF\sam\drivers\mpu\mpu.h (needed in src\ASF\sam\boards\same70_xplained\init.c):

#define INNER_OUTER_NORMAL_NOCACHE_TYPE(x)  ((0x01 << MPU_RASR_TEX_Pos) | (DISABLE << MPU_RASR_C_Pos) | (DISABLE << MPU_RASR_B_Pos) | (x << MPU_RASR_S_Pos))

And in src\ASF\sam\boards\same70_xplained\init.c replace CONF_BOARD_ENABLE_CACHE_AT_INIT with CONF_BOARD_ENABLE_CACHE.

With these DCache is enabled but USB uses non-cacheable buffer and works reliably!

 

 

All above I tried with the (not so) old ASF 3.38 example. But I also tried the "USB Host Vendor" example from the new ASF4 (aka Atmel START: http://start.atmel.com) but there no more DCache at all (searching for SCB_Invalidate or SCB_Clean and no usage except their definitions from \CMSIS\Include\core_cm7.h). This complete avoidance of cache as well as the silence from the mentioned bugzilla entry could indicate some known but not yet public HW trouble in SAME70 (at least revision A)?

At least couldn't see anything in errata related to USB/DMA/DCACHE... I ordered few E70 revision B free samples and if they'll arrive our HW engineer will solder on the evaluation board but I doubt it will change anything.

Daniel

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try this (just copied it from my init.c):

 

#ifdef CONF_BOARD_ENABLE_CACHE
	/* Enabling the Cache */
	SCB_EnableICache(); 
	SCB_EnableDCache();		//We are using multiple DMA masters so disable data cache to avoid constant cleaning (before) and invalidation (after) DMA transfers.
							//See Atmel AT17417: Usage of XDMAC on SAM S/SAM E/SAM V [APPLICATION NOTE] pg.29 for details
							
							//However, if we do not momentarily enable cache, the XDMAC does not do anything at all with SPI! 
							//So we enable it momentarily and then disable again. 
	SCB_DisableDCache();
	SCB_DisableICache();
		
#endif

 

For whatever reason I found the system way more stable from DMA perspective when I enable and then disable the cache, rather than just disabling it or not doing anything at all.

I have working setup with SAME70Q21 where DMA memory is located in external SDRAM chip, SPI (slave mode) clocks in data into that memory through XDMAC and SSC controller (also in slave mode) clocks data out from that pool, in parallel, using XDMAC as well. The throughput is 2.5Mbit/s realtime.

 

The only circumstance it fails is when I do excessive ioport access at the same time. Then the SPI DMA transfer starts skipping bits. Not sure yet why this is happening..

 

Regards,

 

Andrus.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Was there any update on this issue on the rev. B silicon? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes,   I believe I have this kind of problem.  I have found if I do a memset on external RAM when I have cache enabled and USB active - it will get an exception error pretty fast.  I don't seem to see it either when my USB stack is not running or when I do memsets to internal RAM rather than external.  I wrote a safeMemset to do the sets in a for loop - and that would work much better.  I got it to work very reliably when I would do __ISB(); __DSB(); __DMB();  after every move.  I don't know for sure if my scenario is just similar.  I don't fully know if it would be a mistake in the silicon that drives external RAM accesses or something related to cache.     I can test external memory fine for hours if I just do byte by byte accesses.

Eric

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi DaPa1,

 

 Thanks for this reply, I have exactly the same problem, so I'm keen to try your workaround. However, my compiler does not recognise:

COMPILER_SECTION(".ram_usbnocache")

What is COMPILER_SECTION? I don't seem to have it in compiler.h... Is it equivalent to COMPILER_PRAGMA?

(Atmel Studio, ASF 3.38, standard Arm / GNU compiler)

 

Thanks

Last Edited: Fri. May 18, 2018 - 10:10 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

People,

   I have had similar problems using an E70 part on our custom board.  I found crashes if the USB stack was active while doing memsets and found turning off caching would seem to help the problem.  I've found work arounds that get the USB working - I hard coded memsets rather than direct calling - that helped a lot.  Also, gratuitous use of cache flushing at various times seems to have gotten me out of trouble. There have been a few changes I've had to make to some of the Atmel driver code along the way.

    I have just noticed 2 of our recently made board crash not too long after doing SCB_EnableICache();  In that case, they lock up in an exception vector well before getting the USB stack going.  I'm just going to not use those two boards, but I'd feel better knowing what is going wrong.  Other than those strange problems I have to beat back now and then, I have been able to get much of the processor working fine including:  many IO pins, USB, SPI, I2C, USARTS, external memory, bootloader, sleep mode, etc.  I have noticed my net through put using USB com ports is probably less than a tenth of the theoretical max speed, but that could be on the Windows driver side. I do use the Studio 6.2 IDE and drivers.  

 

Eric

Eric

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi everybody,

 

I use ASF 3.39 in the composite MSD + CDC configuration on a custom board. I faced the same problem with the Cache enabled and therefore disabled it. I reach about 18MByte/s transfer for the MSD. For the CDC I didn't know where the limit is because I never tried more than 115200baud - that is enough for my application. Even without Cache enabled the throughput is quite OK as far as I can say. However, it would be nice to use the features of the E70 if possible. During the summer I have not much time and try to avoid big software changes. I guess during the winter I will try one more time and still have the hope that ASF gets an update to fix the problem.

 

Best Regards

Markus

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Markus,  I've spent a while trying to maximize the CDC Com port USB output on the E70 - I pretty much took out the UART pacing - I don't fully get why they had that in the example code.  Using a connection to Teraterm, I can sustain outbound traffic of over 200,000 characters per second.  That is vastly slower than the 480 MB USB speed theoretically possible.  I suspect most of the slow down is on the Microsoft driver side which I lack ability to address.   Luckily that is fast enough for our application.

    We had a few of our new boards that seemed to crash at different places during our main() initialization.  I noticed that all the Atmel example projects seemed to call SCB_EnableICache(); very early.  But I got my randomly crashing boards to work fine simply by moving that instruction till after all my many initialization were all complete.   I recommend to anyone with any projects using a lot of the V71 capabilities to only turn on the instruction cache AFTER all various sub engines have been set up.   I noticed the last V71 manual update was in 2016 - I fear the pilot for that chip long since jumped out of the airplane with a parachute.  Perhaps only people working on million lot quantity designs would get support at this stage?

 

best wishes, Eric Krieg    erickrieg at gmail

Eric

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

laid wrote:

Try this (just copied it from my init.c):

 

#ifdef CONF_BOARD_ENABLE_CACHE
	/* Enabling the Cache */
	SCB_EnableICache(); 
	SCB_EnableDCache();		//We are using multiple DMA masters so disable data cache to avoid constant cleaning (before) and invalidation (after) DMA transfers.
							//See Atmel AT17417: Usage of XDMAC on SAM S/SAM E/SAM V [APPLICATION NOTE] pg.29 for details
							
							//However, if we do not momentarily enable cache, the XDMAC does not do anything at all with SPI! 
							//So we enable it momentarily and then disable again. 
	SCB_DisableDCache();
	SCB_DisableICache();
		
#endif

 

For whatever reason I found the system way more stable from DMA perspective when I enable and then disable the cache, rather than just disabling it or not doing anything at all.

I have working setup with SAME70Q21 where DMA memory is located in external SDRAM chip, SPI (slave mode) clocks in data into that memory through XDMAC and SSC controller (also in slave mode) clocks data out from that pool, in parallel, using XDMAC as well. The throughput is 2.5Mbit/s realtime.

 

The only circumstance it fails is when I do excessive ioport access at the same time. Then the SPI DMA transfer starts skipping bits. Not sure yet why this is happening..

 

Regards,

 

Andrus.

Man , I don't know how to thank you

you're a life savior

you have no idea how your comment helped me

thanks a lot <3

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi muragavino

Are you able to fix the issue ? I'm facing same in 2019. The link you have provided is already expired.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unfortunately not. I just saw that in the latest ASF3 Version the Cache seems to be enabled and I did a short test on the Xplained Evalutionboard where it works. I'didn't integrate that in my Project because too many dependcies also changed in the latest ASF3 Version compared to the one I use.

Best Regards

Markus

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I did have isues with the SAMe70 XDMAC as well and just to copy from memory to memory. It did not work, the DMA transfer did not even start until I added the "enable and disable sequence" as described by you. What I don't understand is why this is needed to start the DMA transfer. If I not had found this with the help of my best friend Google I would never have got this working I think.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In my reply to my post a year ago I have to say that at least for a USB Device with 2CDC+1MSD and the MSD mapped on the HSMCI interface on a µSD card the Cache now works. Just make sure you have the latest xdmac.h/xdmac.c and also the latest CMSIS files. Additionally you need to set two defines:

#define CONF_BOARD_ENABLE_CACHE
#define CONF_BOARD_ENABLE_CACHE_AT_INIT

 

Best Regards

Markus

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Dear Markus,  from your post I see you have working USB Device + CDC + MSC class mapped to microSD with MCI interface ( 4 bit SDIO ?)  Is it so? Do you use FATFS?

I am yet to reconfigure the Atmel example USB COmposite CDC + MSC to using MicroSD instead of RAM based mem storage.

Would it be possible to get the sample code (configuration and initialization snippets) from you to reconfigure that example to use FATFS uSD card? This would be a dream-come-true... So much time could be saved!

Sincerely

Vlad.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Vlad,

 

I just send you a privat message to arrange the project exchange. This is necessary because the project is too big for an attachment here.

 

Best Regards

Markus

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Apart from the USB, I found that enabling the cache also prevents the GMAC and CAN example codes from running correctly.

As mentioned in one of the earlier posts, I tried enabling only the I cache but not the D cache, and this seems to solve the problem.

But it would be good to have it working with the D cache enabled as well, to maximize the system performance.

I'm using revision A silicon and ASF 3.38 which is the "latest" for this chip, I'm not sure if the problems have been solved with revision B chips

and later ASF versions.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Any example code can be broken (inevitably sooner or later) due to the inherent property of the cache memory. I see here that most developers are gambling with code trying to "fix a problem with cache" while there is no problem with cache per se but simply wrong usage of cache (even if it comes from Atmel example) because using cache is truly complex feature due to it's nature of required coherency with RAM. To automate this feature in the library for any use case extremely complicated (if even possible ?). To insure coherency it could be done only in the user code.

To understand this please read app note from Atmel AT12874  chapter 5.5.4 "Cache coherency"

here is quote from it:

Enabling cache may cause breakdown when:

• Memory locations are updated by other agents in the system

• Memory updates made by the application code must be made visible to other agents in the system

For example, in a system with a DMA that reads memory locations held in the data cache of a processor, a breakdown of coherency occurs when the processor has written new data in the data cache, but the DMA reads the old data held in memory.
In situations where a breakdown in coherency occurs, the software must manage the caches by using cache maintenance operations.

The Clean, Invalidate and Clean, and Invalidate operations can address these issues.
Take DCache as an example, these operations are realized in several functions such as:
static inline void SCB_InvalidateDCache();

static inline void SCB_CleanDCache ();

static inline void SCB_CleanInvalidateDCache ();

I personally prefer to increase the speed of application by optimizing sync of all the threads, Interrupts and DMA rather than analyzing the implication of all the events on the coherency of the cache.

Note: for those who think you can't make your app to speed without using cache, I confirm that Marcus code running composite USB with 2 CDCs + MSC class, which he graciously shared, indeed runs fine with cache enabled.  Nevertheless I still will disable the cache as project is too large for me to keep watching all the cases having implication on the cache coherency ... too prone to bugs, too much labor, not worth it...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

why did it post the same 2nd time?  I did something wrong...

 

Last Edited: Fri. Jul 26, 2019 - 06:04 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Surely the issue of cache coherency is only when you are using dma. There can’t be too many instances of this that would make it hard to control?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There can’t be too many instances of this that would make it hard to control?

Most of the peripheral interfaces are using DMA. In my app, for ex. DMA write/read is engaged in all ADC conversions (not in a single group, individual ISRs) and in multiple instances of using 4 interfaces in 2-way comm. Means DMA-write coherency control and DMA-read coherency control separately,

each case needs custom snippets (see Using Cache Maintenance APIs to Handle Cache Coherency ch.4 in Managing Cache Coherency on Cortex-M7 Based MCUs )

Add to that a state-event machine (something similar to RTOS) setting up Callbacks ... I don't see it as trivial ...

There is somewhat easier method (I think, may be I am wrong) :  the disabling cache only on protected memory regions (MPU) as a shortcut  ... but this means allocating buffers in MPU regions which also need testing/debugging so I'll pass for now...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bird-man, the documents you have referred to give a clear picture of the caching process and APIs; ensuring coherency when multiple peripherals are being used with DMA is clearly quite complex and I will try this  (or disabling cache only on protected memory regions) at some later point in my design cycle.

Apart from DMA transfers, what "other agents" could cause a coherency issue in the D Cache? Maybe, when different processes are running under RTOS sharing the same data ?

Also, what conditions (if any) could cause a similar problem in the I Cache ? So far I have left this enabled and the code appears stable, but I'd like to know what pitfalls to watch out for.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ICache should not be of any concern for a developer 'cause it is only instructions flow, whence Cortex core library should fully take care of it over internal bus. But DCache can get de-synchronized with RAM whenever DMA-read or DMA-write occurs due to data exchange with peripherals.

And this is lot's of cases : we mainly choose MCU instead of CPU such as Cortex-A  'cause our app suppose to control many peripherals... otherwise we'd use CPU and build our app on the top of OS (Linux for ex.) and OS would take care of sync with Cache and many other pitfaults we would not bother with...

As with MCU lot of cases of using DMA-read and DMA-write possible incoherence we have to deal with if using DMA with Cache...

You can study and incorporate those snippets shown in the doc I linked above or, I am sure, that allocating your peripheral data buffers into the protected units MPU is the simplest approach in regards to DCache safety, but you'd have to watch those areas in Memory window when debugging (to make sure you configured them properly)... this is also explained in above doc

 

Unfortunately I could not find programming APIs description from the general point of view of using MPU, using Interrupts, DMA in the ASF4 programming manual... this doc is only structured from the point of view of individual peripherals ... this does not give a complete pictures of using those facilities...

For ex. general picture of using Interrupt-related APIs in ASF4 library is not completely clear to me even though particular peripheral examples are given... so is with MPU too...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have to correct myself regarding my above complain about lack of documentation from Atmel about APIs for MPU, Interrupts, Cache, etc...

Atmel is correct focusing on the Peripheral-only view of these facilities because this is their implementation. General view of these are well described in the ARM on-line documentation, see  Arm Cortex-M7 Devices Generic User Guide

Chapter 4 covers all those topics well.

Here is link to the Cache handling processing and operations and APIs :

https://developer.arm.com/docs/dui0646/latest/cortex-m7-peripherals/cache-maintenance-operations

 

Enjoy

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, that's a relief to know. I will keep I Cache enabled and try out protected memory areas for peripheral transfers at an appropriate point of time.