Another potential XDMAC cache coherency issue

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

It seems I'm up against yet another cache coherency issue on the SAME70, this time with a (relatively) simple UART RX transfer operation.

 

To summarise the setup and method:

  • The SAME70 is receiving UART data of an unknown length.
  • While the total length of the UART data is unknown, it's guaranteed to be longer than 128 bytes (this is important for below).
  • Rather than just use an interrupt-based method to process each byte, I'm using an XDMAC RX transfer for the first 128 bytes of the UART message. Using the End of Block interrupt, I disable the XDMAC RX channel once the 128th byte is received, and enable the UART's RXRDY interrupt.
  • The remaining bytes of the UART message (bytes 129 through n) are handled one by one in the UART0_Handler() function.
  • When the end of the UART message is identified (the message contains a set of End of Sequence characters), the RXRDY interrupt is disabled, and the XDMAC RX channel is re-enabled to process the first 128 bytes of the next UART message.

 

The above process works perfectly for the very first UART message received. However, for each subsequent UART message received, the XDMAC receive buffer (rx_buffer[128]) contains the data from the first UART message. I.e. the XDMAC RX receive buffer isn't being updated. As such, I assumed this was a cache coherency issue, but I've followed all the required cache maintenance operations, per the various Atmel and Cortex M7 documents. I've included some trimmed down, simplified code snippets to hopefully illustrate the above.

 

XDMAC Defines:

#define UART0_RX_DMA_BLOCK_LEN          1
#define UART0_RX_DMA_MICROBLOCK_LEN     128
#define UART0_RX_DMA_XDMAC_CH           4
#define UART0_BUFFER_SIZE               128

XDMAC RX receive buffer (note that it's aligned as required):

__attribute__ ((aligned (32))) uint8_t rx_buffer[UART0_BUFFER_SIZE] = {0xFF};

Miscellaneous class variables:

//UART0 Copy Buffer
uint8_t xdmacCopyBuffer[UART0_BUFFER_SIZE] = {0xFF};

//UART0 Interrupt Buffer
uint8_t interruptBuffer[UART0_BUFFER_SIZE] = {0xFF};

//Interrupt Buffer Index
uint8_t interruptBufferIndex = 0x00;

The xdmacCopyBuffer is used for processing of the received XDMAC data outside of the XDMAC RX buffer. I've introduced this buffer (as opposed to processing the data from the XDMAC rx_buffer[] directly) in the hopes it would address this potential cache coherency issue, but it hasn't resolved the issue. The idea being that I copy the XDMAC RX data into the xdmacCopyBuffer, and then immediately Invalidate and Clean the rx_buffer[] cache region.

 

The interruptBuffer is a secondary buffer which is appended with each byte of UART data received from within the UART0_Handler(). Once all data is received, the complete message formed from a combination of the xdmacCopyBuffer and interruptBuffer for processing.

 

The UART0 interrupt handler:

void UART0_Handler(void)
{
    //Check UART0 status
    if (UART0->UART_SR & UART_SR_RXRDY) {
        
        //Append the received byte to interruptBuffer
        interruptBuffer[interruptBufferIndex] = UART0->UART_RHR;
        
        //Increment interruptBufferIndex
        interruptBufferIndex++;
        
        //Perform a check to see whether the End of Sequence characters have been received
        //If yes, interruptBuffIndex is reset to 0x00, and the processData function is called
    }
}

The XDMAC interrupt handler:

void XDMAC_Handler(void)
{
    //Check the XDMAC status
    if (XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CIS & XDMAC_CIS_BIS) {
        
        //Disable the XDMAC channel
        XDMAC->XDMAC_GD = (XDMAC_GD_DI0 << UART0_RX_DMA_XDMAC_CH);
        
        //Enable the UART0 NVIC interrupt (NOTE: The RXRDY interrupt is already configured in the setup function, all that's required is to enable the NVIC interrupt)
        NVIC_EnableIRQ(UART0_IRQn);
        
        //Invalidate the D-Cache before accessing the rx_buffer data
        SCB_InvalidateDCache_by_Addr((uint32_t)*rx_buffer, UART0_BUFFER_SIZE);
        
        //Copy the XDMAC rx_buffer data to xdmacCopyBuffer
        memcpy(xdmacCopyBuffer, &rx_buffer[0], UART0_BUFFER_SIZE);
    }
}

The setup function:

void setup(void)
{
    //Setup the UART0 interface
    const sam_uart_opt_t uart0Options = {
        .ul_mck = sysclk_get_cpu_hz(),
        .ul_baudrate = CONF_UART0_BAUDRATE * 2,
        .ul_mode = UART_MR_PAR_NO
    };
    
    //Configure the UART0 peripheral
    pio_set_peripheral(PIOA, PIO_PERIPH_A, PIO_PA9A_URXD0 | PIO_PA10A_UTXD0);
    
    //Enable UART0 clock
    pmc_enable_periph_clk(ID_UART0);
    
    //Enable XDMAC clock
    pmc_enable_periph_clk(ID_XDMAC);
    
    //Initialise UART0
    uart_init(UART0, &uart0Options);
    
    //Enable UART0 RXRDY interrupt
    uart_enable_interrupt(UART0, UART_IER_RXRDY);
    
    //Setup the UART0 RXRDY NVIC interrupt, but don't enable it yet
    NVIC_ClearPendingIRQ(UART0_IRQn);
    NVIC_SetPendingIRQ(UART0_IRQn);
    NVIC_SetPriority(UART0_IRQn, 4);
    
    //Set XDMAC microblock
    XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CUBC = UART0_RX_DMA_MICROBLOCK_LEN;
    
    //Set the XDMAC source address to UART0 Receive Holding Register
    XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CSA = (uint32_t)&UART0->UART_RHR;
    
    //Set XDMAC destination address to rx_buffer
    XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CDA = (uint32_t)&rx_buffer[0];
    
    //Set the XDMAC configuration register
    XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CC =
		XDMAC_CC_TYPE_PER_TRAN |
		XDMAC_CC_MBSIZE_SINGLE |
		XDMAC_CC_DSYNC_PER2MEM |
		XDMAC_CC_SWREQ_HWR_CONNECTED |
		XDMAC_CC_MEMSET_NORMAL_MODE |
		XDMAC_CC_CSIZE_CHK_1 |
		XDMAC_CC_DWIDTH_BYTE |
		XDMAC_CC_SIF_AHB_IF1 |
		XDMAC_CC_DIF_AHB_IF0 |
		XDMAC_CC_SAM_FIXED_AM |
		XDMAC_CC_DAM_INCREMENTED_AM |
		XDMAC_CC_PERID(XDMAC_CHANNEL_HWID_UART0_RX)
	;
	
	//Clear the block control and stride registers
	XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CBC = 0x00;
	XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CDS_MSP = 0x00;
	XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CSUS = 0x00;
	XDMAC->XDMAC_CHID[UART0_RX_DMA_XDMAC_CH].XDMAC_CDUS = 0x00;
	
	//Disable and clear the XDMAC NVIC interrupt
	NVIC_DisableIRQ(XDMAC_IRQn);
	NVIC_ClearPendingIRQ(XDMAC_IRQn);
	
	//Clean the D-Cache
	SCB_CleanDCache_by_Addr((uint32_t)*rx_buffer, UART0_BUFFER_SIZE);
	
	//Enable XDMAC channel interrupt
	xdmac_enable_interrupt(XDMAC, UART0_RX_DMA_XDMAC_CH);
	
	//Enable XDMAC channel End of Block interrupt
	xdmac_channel_enable_interrupt(XDMAC, UART0_RX_DMA_XDMAC_CH, XDMAC_CIE_BIE);
	
	//Prioritise and enable XDMAC NVIC interrupt
	NVIC_SetPriority(XDMAC_IRQn, 6);
	NVIC_EnableIRQ(XDMAC_IRQn);
	
	//Enable the XDMAC channel
	XDMAC->XDMAC_GE = (XDMAC_GE_EN0 << UART0_RX_DMA_XDMAC_CH);
}

The function which processes the complete UART message:

static void processData(void)
{
    //Disable the UART0 RXRDY interrupt
    NVIC_DisableIRQ(UART0_IRQn);
    
    //Clean D-Cache
    SCB_CleanDCache_by_Addr((uint32_t)*rx_buffer, UART0_BUFFER_SIZE);
    
    //Enable the XDMAC channel
    XDMAC->XDMAC_GE = (XDMAC_GE_EN0 << UART0_RX_DMA_XDMAC_CH);
    
    //At this point the processing of the received UART data takes place
    
    //At the end of processing, the copy and interrupt buffers are reset to 0x00
    memset(xdmacCopyBuffer, 0x00, UART0_BUFFER_SIZE);
    memset(interruptBuffer, 0x00, UART0_BUFFER_SIZE);
}

Does anything obvious jump out as to why the rx_buffer[] wouldn't be updated after receiving the first UART message? I've scoped the UART traces and can see updated data being sent as expected from the remote device, so I'm strongly leading towards a cache coherency issue, as the data received into the UART0 RHR register should be correct.

 

Thanks!

This topic has a solution.
Last Edited: Fri. Jul 30, 2021 - 11:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Does the problem go away if you disable the data cache?

 

What happens if you fill rx_buffer[128] with 0xff (followed by SCB_CleanInvalidateDcache_by_addr(), if the cache is still enabled) just after copying rx_buffer to xdmacCopyBuffer? Do you still see the 0xff values the next time around?

 

Steve

Maverick Embedded Technologies Ltd. Home of Maven and wAVR.

Maven: WiFi ARM Cortex-M Debugger/Programmer

wAVR: WiFi AVR ISP/PDI/uPDI Programmer

https://www.maverick-embedded.co...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

scdoubleu wrote:

Does the problem go away if you disable the data cache?

 

What happens if you fill rx_buffer[128] with 0xff (followed by SCB_CleanInvalidateDcache_by_addr(), if the cache is still enabled) just after copying rx_buffer to xdmacCopyBuffer? Do you still see the 0xff values the next time around?

 

Steve

 

Steve, thank you for your input, it's greatly appreciated.

 

I'm unable to disable the cache unfortunately, as it's used throughout the rest of the application. I could move this particular function to a non-cached memory region but I'd very much rather fix the underlying issue.

 

I've just tried what you've suggested; I memset rx_buffer with 0xFF immediately after copying the data to xdmacCopyBuffer, followed by the SCB_CleanInvalidateDCache_by_Addr(). The subsequent rx_buffer[] is then filled with 0xFF...

 

This is strange, as the XDMAC End of Block interrupt is still generated, which suggests that the UART_RHR data is being received and written somewhere, but it's not being written to the rx_buffer[].

 

The only thing I can think of at this point, is that there's some process I'm missing in disabling and re-enabling the XDMAC channel. I have also tried using the XDMAC channel suspend/resume approach with a manual FIFO flush, as this more closely aligns (at least from my understanding) to what I'm actually doing. However after resuming the XDMAC channel the End of Block interrupt isn't generated.

 

To this end, I've copied all the XDMAC configuration code from the setup() function to the XDMAC re-enable section in processData(), just in case that all needs to be set again prior to enabling the XDMAC channel in the Global Channel Enable (XDMAC_GE) register. The datasheet has the following to say regarding disabling an XDMAC channel:

 

Quote:

A disable channel request occurs when a write operation is performed in the XDMAC_GD register. If the channel issource peripheral synchronized (bit XDMAC_CCx.TYPE is set and bit XDMAC_CCx.DSYNC is cleared), thenpending bytes (bytes located in the FIFO) are written to memory and bit XDMAC_CISx.DIS is set. If the channel isnot source peripheral synchronized, the current channel transaction (read or write) is terminated andXDMAC_CISx.DIS is set. XDMAC_GS.STx is cleared by hardware when the current transfer is completed. Thechannel is no longer active and can be reused.

Last Edited: Fri. Jul 30, 2021 - 10:04 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you re-initialising XDMAC_CDA to rx_buffer before each transfer?

 

Steve

Maverick Embedded Technologies Ltd. Home of Maven and wAVR.

Maven: WiFi ARM Cortex-M Debugger/Programmer

wAVR: WiFi AVR ISP/PDI/uPDI Programmer

https://www.maverick-embedded.co...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

At the moment I've copied the register setting code from setup() to processData() immediately before re-enabling the channel, which includes reinitialising XDMAC_CDA to rx_buffer. I've tried with and without reinitialising CDA (as well as the other XDMAC configuration registers) without any impact on the rx_buffer data. I've just tried this again with the additional 0xFF memset() and using or omitting the CDA re-initialising also has no effect.

 

Just another bit of context. I have also tried using a simple view0 Linked List descriptor, thinking that that might change/effect the issue. It didn't, so I've removed the Linked List functionality, as it's not actually needed in this instance.

Last Edited: Fri. Jul 30, 2021 - 10:16 AM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've fixed it :)

 

I changed all the cache maintenance function calls from this:

 

SCB_InvalidateDCache_by_Addr((uint32_t)*rx_buffer, UART0_BUFFER_SIZE);

to this:

 

SCB_InvalidateDCache_by_Addr((uint32_t)rx_buffer, UART0_BUFFER_SIZE);

Now the rx_buffer[] is updated with the data observed on the UART lines as expected.

 

Just expanding on this fix in case someone else comes across this issue when trying to use an XDMAC RX channel in this manner (i.e. starting, stopping, re-starting). The below points are the minimum that need to be observed in order for a start, stop, re-start sequence to work:

  1. Invalidate the rx_buffer cache immediately after the XDMAC RX transfer is complete. This must be performed prior to accessing the rx_buffer data, otherwise you'll experience cache coherency issues (i.e. the data retrieved by the CPU from cache doesn't equal the data written by the peripheral into memory).
  2. Clean the rx_buffer cache immediately prior to re-starting/re-enabling the XDMAC RX channel.
  3. The only XDMAC configuration you need to do prior to re-enabling the XDMAC RX channel is resetting the CDA register to the rx_buffer (thanks to scdoubleu for the suggestion). If you don't reset the CDA register the rx_buffer data won't be updated. All the other XDMAC register configuration (as shown in my setup() function above) can be omitted.
Last Edited: Fri. Jul 30, 2021 - 11:33 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the report ! But is cleaning the rx_buffer cache really needed before restarting DMA receiption ? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hs2 wrote:

Thanks for the report ! But is cleaning the rx_buffer cache really needed before restarting DMA receiption ? 

 

Very good point! I've just done some testing again, this time omitting the cache clean operation and the XDMAC RX works without issue, so it looks like you really only have to invalidate the cache and reset the CDA register.

 

Thanks!