Skip to content

Exception dump update#10654

Open
jsarha wants to merge 6 commits intothesofproject:mainfrom
jsarha:exception_dump_update
Open

Exception dump update#10654
jsarha wants to merge 6 commits intothesofproject:mainfrom
jsarha:exception_dump_update

Conversation

@jsarha
Copy link
Contributor

@jsarha jsarha commented Mar 25, 2026

The first commit is mandatory to make the debug_stream exception hooks working after the latest changes to recently merged zephyrproject-rtos/zephyr#103449 .

The second one gives an option to limit the number CPUs supported by the debug_stream. As the number of supported cores shrink, the available circular buffer size increases. This means increased bandwidth. This is particularly useful, when debugging a problem on core 0.

The three following connect assert_print() to debug_stream.

The last one adds the new options, CONFIG_SOF_DEBUG_STREAM_SLOT_FORCE_MAX_CPUS commented out, and CONFIG_SOF_DEBUG_STREAM_TEXT_MSG_ASSERT_PRINT enabled, to the debug_stream_overlay.conf.

There were some changes to the Zephyr side exception dumpping sopport
in the review phase of the PR. This commint aligns to them.

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Copilot AI review requested due to automatic review settings March 25, 2026 22:32
@jsarha jsarha force-pushed the exception_dump_update branch from 0f53d1e to 6bcdd13 Compare March 25, 2026 22:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the SOF debug_stream text-message path to keep exception dump hooks working after upstream Zephyr changes, adds configurability for how many CPU sections the debug-stream slot is partitioned into, and optionally routes assert_print() output through debug_stream.

Changes:

  • Add ds_vamsg() (va_list-based helper) and refactor ds_msg() to use it; update exception dump hook registration API.
  • Add CONFIG_SOF_DEBUG_STREAM_SLOT_FORCE_MAX_CPUS to limit per-core slot partitioning and grow per-core circular buffer size.
  • Add optional assert_print() forwarding to debug_stream and enable related settings in the debug overlay.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/include/user/debug_stream_text_msg.h Adds ds_vamsg() declaration and includes <stdarg.h> for va_list.
src/debug/debug_stream/debug_stream_text_msg.c Refactors text message sending; updates exception dump hook API; adds assert_print() routing to debug_stream.
src/debug/debug_stream/debug_stream_slot.c Uses configurable max CPU sections; adds guard for cores beyond configured sections; recomputes header/section sizing.
src/debug/debug_stream/Kconfig Introduces SOF_DEBUG_STREAM_SLOT_FORCE_MAX_CPUS and SOF_DEBUG_STREAM_TEXT_MSG_ASSERT_PRINT options.
app/debug_stream_overlay.conf Documents/enables new debug_stream options for the overlay configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lyakh
lyakh previously requested changes Mar 26, 2026
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's clarify the reports_sent_cpu puzzle

circular buffer size so much that it limit debugging. With
this option its possible to use fewer sections. The downside
is that the cpus above the number of sections can not send
any debug stream messages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done at run-time? If yes, you could use a CPU-mask here instead. Where static arrays are allocated, allocate CONFIG_CORE_COUNT of them, but at run-time count bits in the mask and divide the window in as many sections?

Copy link
Contributor Author

@jsarha jsarha Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is possible, but the amount of refactoring needed is IMO not at all proportional to the benefit. The original design principle was to initialize everything at boot time, and then be a totally passive and light weight system to pass debug information from DSP to the host. Nor is this feature enabled by default, there is not even debug slots available for it unless some other features are turned off, which makes runtime configuration quite useless. One anyway needs to rebuild the FW to use this feature.

Run time reconfiguration would require almost complete rewrite of both the FW side implementation and the host side decoding and invention of new APIs on how to change the configuration at runtime.

The CPU mask style stuff would be slightly more feasible, but I do not see any significant benefit over this low hanging fruit. In 90% of cases one wants to debug core 0 incidents, and the remaining 10% fall to three first cores (ATM our topologies do not utilize more than 3 cores). So most of the time setting this value to 1 or 3 is enough. And if one wants to test some multicore case with max debug_stream bandwidth he/she can make a custom topology for the case to fit the event to as low number of cores as possible.

/* Do not print assert after exception has been dumped */
if (reports_sent_cpu[arch_proc_id()] > 0)
return;
/* For unknown reason the above does not work, and there is assert prints after dump */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't read very encouraging :-) Can we find out?

Copy link
Contributor Author

@jsarha jsarha Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already spent several hour trying to find out. I intent to go back back and try something new when I have fresh ideas, but things I have tried so far are extra locks, volatile declarations, and uncached addresses (which should not be needed as all cores are separated). So for the moment I am out of ideas.

If you wish, I can drop the assert support all together, but I really need to get back to user space IPC stuff.

edit:
Now also tried atomic_val_t and atomic_set() and atomic_get(), but still the value of reports_sent remains zero, even after I see the exception report being dumped. I am welcome to any new suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it. PR updated.

Jyri Sarha added 5 commits March 26, 2026 18:33
…struct

Replace separate per-CPU arrays (ds_buf[], reports_sent_cpu[],
ds_pos[]) with a single struct ds_cpu_state table aligned to
CONFIG_DCACHE_LINE_SIZE to avoid false sharing between cores.

Also remove the reports_sent reset in the flush path, so that once an
exception dump has been sent, the sent-flag stays set and suppresses
further output.

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
The CONFIG_SOF_DEBUG_STREAM_SLOT_FORCE_MAX_CPUS option limit the
number of debug stream slot section to lower number than the actual
number of cores.  In some situations a high number of cpu sections
shrinks the circular buffer size so much that it limit debugging. With
this option its possible to use fewer sections. The downside is that
the cpus above the number of sections can not send any debug stream
messages.

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Add ds_vamsg(const char *format, va_list ap) to the debug stream
text message API and implementation.

Refactor ds_msg() to forward to ds_vamsg() so formatting and record
emission are handled in one shared path.

This enables callers that already operate on va_list to emit debug
stream text messages without rebuilding variadic arguments, and keeps the
message construction logic centralized.

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Add optional support to route assert output through debug stream text
messages.

Introduce Kconfig option SOF_DEBUG_STREAM_TEXT_MSG_ASSERT_PRINT
(depends on EXCEPTION_DUMP_HOOK and ASSERT/ASSERT_VERBOSE).

Implement assert_print(const char *fmt, ...) to emit via ds_vamsg(),
and mirror to vprintk() when CONFIG_EXCEPTION_DUMP_HOOK_ONLY is not
set.

Avoid duplicate post-fault prints by skipping assert output after an
exception report has already been sent on the current CPU.

Export assert_print for use by assert paths.

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
There is two new options, CONFIG_SOF_DEBUG_STREAM_SLOT_FORCE_MAX_CPUS
and CONFIG_SOF_DEBUG_STREAM_TEXT_MSG_ASSERT_PRINT.

CONFIG_SOF_DEBUG_STREAM_SLOT_FORCE_MAX_CPUS limits the amount of
debug_stream support to a lower number of cores than what is
available. As the number of supported cores shrink, the available
circular buffer size increases. This means increased bandwidth. This
is particularly useful, when debugging a problem on core 0. By defaut
this option is commented out.

CONFIG_SOF_DEBUG_STREAM_TEXT_MSG_ASSERT_PRINT adds optional support to
route assert output through debug stream text messages. This option
obeys CONFIG_EXCEPTION_DUMP_HOOK_ONLY. If it is selected the assert
print is sent only to debug stream. Without it the assert prints are
printed with vprintk too,

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
@jsarha jsarha force-pushed the exception_dump_update branch from 6bcdd13 to ae17802 Compare March 26, 2026 16:53
@lyakh lyakh dismissed their stale review March 27, 2026 07:37

comment addressed

} __packed buf;
int reports_sent;
size_t pos;
} __aligned(CONFIG_DCACHE_LINE_SIZE) ds_cpu[CONFIG_MP_MAX_NUM_CPUS];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is placed into .bss so it isn't cached, so no cache-line alignment should be needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants