Conversation
Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition") introduced a regression when using SRC in DP mode in userspace on PTL and running two aplay -r 44100 back to back. The reason is that SRC isn't calling mod_fast_put() explicitly and is instead relying on automatic clean up, which then happens in a different thread. In fact those two commits weren't needed AFAICS, since fast_put() shouldn't be called directly from userspace, instead only mod_fast_put() should be called, which is a syscall, and therefore fast_put() then will have access to all the memory. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
There was a problem hiding this comment.
Pull request overview
Fixes a userspace regression in Zephyr’s fast-get cleanup path where memory-domain partition removal can occur from a different thread than the one that originally fast-get’d the buffer (observed with SRC in DP mode and back-to-back aplay -r 44100).
Changes:
- Change
fast_put()userspace partition removal to use theentry->threadmemory domain instead ofk_current_get(). - Update debug logging to reference
entry->threadinstead of the current thread.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| struct k_mem_domain *domain = entry->thread->mem_domain_info.mem_domain; | ||
|
|
||
| LOG_DBG("removing partition %p size %#zx from thread %p", | ||
| (void *)part.start, part.size, k_current_get()); | ||
| (void *)part.start, part.size, entry->thread); | ||
| int err = k_mem_domain_remove_partition(domain, &part); |
There was a problem hiding this comment.
fast_get() can grant additional access by adding the partition to k_current_get() when it differs from entry->thread (see the additional-access path). With this change, fast_put() always removes the partition from entry->thread’s domain, so a thread that received additional access will never remove its own partition (k_mem_domain_remove_partition() will likely fail and the partition will leak). Consider removing from the calling thread’s domain when it contains the partition, and only falling back to entry->thread (or track/remove partitions per granted thread) so both multi-thread use and cross-thread cleanup work.
Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition") introduced a regression when using SRC in DP mode in userspace on PTL and running two aplay -r 44100
back to back. The reason is that SRC isn't calling mod_fast_put() explicitly and is instead relying on automatic clean up, which then happens in a different thread.
In fact those two commits weren't needed AFAICS, since fast_put() shouldn't be called directly from userspace, instead only mod_fast_put() should be called, which is a syscall, and therefore fast_put() then will have access to all the memory.