Firmware: Add an additional layer of DMA-assisted sample buffering#1601
Firmware: Add an additional layer of DMA-assisted sample buffering#1601martinling wants to merge 11 commits intogreatscottgadgets:mainfrom
Conversation
d863307 to
8c54161
Compare
be432d4 to
912d993
Compare
|
On my Linux test host, this improves maximum shortfall-free RX sample rate from 19.2 Msps to 21.8 Msps and improves TX from 20.1 Msps to 21.8 Msps. |
mossmann
left a comment
There was a problem hiding this comment.
I haven't spotted the bug, but I'm seeing short TX with hackrf_transfer -n truncated by a few milliseconds. Looks like the transmissions are about 9000 samples short.
|
Transmission duration does not change linearly with the number of samples. Testing with 32768 samples: 8.88 ms (expected 16.38 ms) |
|
I believe this is due to the current behaviour of When that function is used, the host will send 32KB of zeroes after the end of the provided TX data, before making the control request to switch transceiver mode back to idle. The theory behind that approach was that, since there's only 32KB of space to store samples, once you've sent that many zeroes and the device has accepted them, the earlier samples you care about must have been transmitted. With this PR, we double the buffer space, so the host would now need to send 64KB of zeroes to achieve the same. If you change Unfortunately, we hardcoded that figure rather than adding a vendor request to query it from the device. My recollection is that we discussed that decision verbally at the time, but since I'd already made some attempts to increase the buffer size without success, we thought that 32KB figure wasn't ever going to change and it wasn't worth adding this query. Our options are:
I'll implement the latter. |
|
Ensuring that all data in the USB buffer is moved to the sample buffer before stopping TX has fixed the non-linearity you were seeing, but I'm now seeing timing coming up short by what looks like a consistent 4.096ms, i.e. 8KB of samples or one DMA transfer. I'll see about tracking down the remaining bug. |
ffc5ad4 to
33bf9ad
Compare
|
Some experimentation with looking at ramped signals on a scope shows that the transmission includes the first 32KB preloaded into the sample buffer; is missing the next 8KB, which is the first to be transferred by DMA; then all subsequent samples are transmitted correctly. This is true from the memcpy version onwards, regardless of whether |
33bf9ad to
b1b2e25
Compare
|
Fixed that - still some issues with small sample counts. |
b1b2e25 to
10a9836
Compare
6e596a1 to
f78ad7f
Compare
f78ad7f to
10a7582
Compare
|
I've rebased this on Everything seems to be working as expected, so this is ready for review again. The forward/back compatibility logic is: Host: if API version < 1.10, assume buffer is 32KB. If API version >= 1.10, ask for buffer size. The rest of the flush logic is unchanged (send enough zeroes, then assume all samples were transmitted). Firmware: if the host didn’t ask for our buffer size, NAK any request to stop TX until all samples are transmitted. This should result in everything working optimally, except in the case of old host software with new firmware, in which case the device takes a little longer to leave TX. |
|
With this change I'm seeing RX hang when adjusting the sample rate mid-stream, for example with the flowgraph in #1629 (comment) |
483ad5f to
bba9a13
Compare
8bb59b7 to
8f22c35
Compare
|
This has been rebased on |
mossmann
left a comment
There was a problem hiding this comment.
I'm still able to induce hangs with a sample rate slider. I've seen this in both TX and RX, but so far I've only induced RX failure with a PortaPack installed. The TX failure also seems easier to reproduce with a PortaPack.
I tried merging #1689, hoping it would avoid PortaPack concurrency problems, but it didn't help.
|
Bummer. OK, I'll see if I can reproduce with a Portapack. In the meantime, putting this one back to draft for now. |
11b7d98 to
13d53ea
Compare
|
As discussed yesterday, I was able to reproduce the RX hang with a PortaPack attached, and examine the situation with a debugger. It turned out the hang was caused by a race condition in the following code: volatile uint32_t dma_pending;
// ...
void transceiver_start_dma(...) {
// ...
gpdma_channel_enable(DMA_CHANNEL);
dma_pending = size;
}
void dma_isr(void)
{
gpdma_channel_disable(DMA_CHANNEL);
GPDMA_INTTCCLEAR = (1 << DMA_CHANNEL);
m0_state.m4_count += dma_pending;
dma_pending = 0;
}Because The PortaPack made the bug easier to reproduce, because the USB ISR would update the baseband LPF cutoff on the LCD display. Once I realised this, I was able to reproduce the race more quickly by rapidly alternating between sample rates that caused different LPF settings to be chosen. This race has been fixed, and the branch rewritten with a cleaner implementation and history. I'm no longer able to reproduce the RX hang and am reopening this for review. |
|
I'm no longer able to reproduce the hang in RX, but I can still reproduce it quite easily in TX. |
|
OK, the TX hang must be a different bug then. I'll investigate. |
5908711 to
3da9326
Compare
|
I was able to reproduce the TX hang, and worked out that it was due to mistakes in the calculations of I've fixed that, and rebased the branch to resolve new conflicts with the merge of #1701. |
3da9326 to
dd937a0
Compare
|
Rebased on It occurs to me we could also apply this improvement to sweep mode. |
dd937a0 to
92ce978
Compare
This PR attempts to implement the buffer scheme described in #1363 (comment).
The M0 SGPIO code continues to use the existing 32KB sample buffer, which has special placement.
An additional 32KB USB buffer is placed in
ram_local1, above the.textsection.The GPDMA controller is used to transfer samples between the sample buffer and USB buffer.
The initial commits in this PR set up the new buffer scheme, whilst using
memcpyrunning synchronously on the M4 core as a placeholder for the DMA operations. This works correctly up to around 8Msps.The final commit switches to the DMA implementation.