Skip to content

Conversation

@minggangw
Copy link
Member

@minggangw minggangw commented Jan 15, 2026

Investigated and resolved an intermittent crash (SIGABRT / "pure virtual method called") occurring in the underlying FastDDS layer.

Crash log

✔ setContentFilter (2003ms)
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
Error: Process completed with exit code 134.

Callstack

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007cfa4744527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007cfa474288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007cfa478a5ff5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#6  0x00007cfa478bb0da in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#7  0x00007cfa478a5a55 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58
#8  0x00007cfa478bbfc3 in __cxxabiv1::__cxa_pure_virtual () at ../../../../src/libstdc++-v3/libsupc++/pure.cc:50
#9  0x00007cf9f689e916 in eprosima::fastdds::dds::detail::DataReaderHistory::get_first_untaken_info (this=0x2d825968, info=...) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp:364
#10 0x00007cf9f688d9fb in eprosima::fastdds::dds::DataReaderImpl::get_first_untaken_info (this=<optimized out>, info=info@entry=0x7cf9e57f8d80) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReaderImpl.cpp:793
#11 0x00007cf9f688a2dd in eprosima::fastdds::dds::DataReader::get_first_untaken_info (this=<optimized out>, info=info@entry=0x7cf9e57f8d80) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReader.cpp:278
#12 0x00007cfa44644c46 in rmw_fastrtps_shared_cpp::__rmw_wait (identifier=<optimized out>, subscriptions=<optimized out>, guard_conditions=<optimized out>, services=0x7cf9dc000bc0, clients=0x7cf9dc000ba8, events=0x7cf9dc000bd8, wait_set=0x7cf9dc000f60, wait_timeout=0x7cf9e57f8ea0)
    at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_wait.cpp:235
#13 0x00007cfa446ac27f in rmw_wait (subscriptions=<optimized out>, guard_conditions=<optimized out>, services=<optimized out>, clients=<optimized out>, events=<optimized out>, wait_set=<optimized out>, wait_timeout=0x7cf9e57f8ea0)
    at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_wait.cpp:33
#14 0x00007cfa44adf742 in rcl_wait (wait_set=0x7cf9e57f9890, timeout=<optimized out>) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rcl/rcl/src/rcl/wait.c:670
#15 0x00007cfa44b54625 in rclnodejs::Executor::WaitForReadyCallbacks(rcl_wait_set_s*, int) () from /home/minggang/proj/rclnodejs/build/Release/rclnodejs.node
#16 0x00007cfa44b561aa in rclnodejs::Executor::Run(void*) () from /home/minggang/proj/rclnodejs/build/Release/rclnodejs.node
#17 0x00007cfa4749caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#18 0x00007cfa47529c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Root Cause Analysis:
The crash was caused by a race condition between the main JavaScript thread and the background worker thread:

  1. The node creates a background thread when node.spin() is called to run the rcl executor and wait for messages.
  2. Calling subscription.setContentFilter() from the main thread invokes rcl_subscription_set_content_filter, which modifies the underlying DDS DataReader.
  3. If the background thread is simultaneously accessing that same DataReader (e.g., in rcl_wait or checking for new data), it leads to memory corruption or accessing an object in an invalid state, resulting in the "pure virtual method called" termination.

Fix Implementation:
Updated tests, examples, and documentation to enforce a thread-safe pattern. The node must now explicitly stop the background executor before modifying the content filter, and restart it afterwards:

node.stop(); // Stop background thread
subscription.setContentFilter(filter); // Safe modification
node.spin(); // Restart background thread

Fix: #1368

Copilot AI review requested due to automatic review settings January 15, 2026 05:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition that was causing crashes when setting content filters on subscriptions while the node is spinning. The fix involves stopping the subscriber node before modifying the content filter, then restarting it afterwards.

Changes:

  • Modified C++ error handling to capture error strings before calling cleanup operations
  • Updated test cases to stop spinning before calling setContentFilter() to prevent race conditions
  • Added stop() calls at the end of async test promises to clean up properly

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/rcl_subscription_bindings.cpp Fixed error handling by capturing error string before cleanup operations that may overwrite it
test/test-subscription-content-filter.js Added stop/spin pattern around setContentFilter() calls and cleanup stop() calls in async promises

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 286 to 290
std::string error_string;
if (ret != RCL_RET_OK) {
error_string = rcl_get_error_string().str;
rcl_reset_error();
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable error_string is declared without initialization and is only assigned when ret != RCL_RET_OK. If fini_ret != RCL_RET_OK but ret == RCL_RET_OK (line 300-305), the code will reuse this uninitialized error_string. Consider initializing it to an empty string or using a separate variable for the fini_ret error.

Copilot uses AI. Check for mistakes.

const p2 = new Promise((resolve) =>
setTimeout(() => {
this.subscriberNode.stop();
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is stopping the node inside an async promise timeout without restarting it before the test ends. Unlike lines 299-301 and 372-374, this creates an inconsistent pattern where the node remains stopped. This could cause issues if the test framework expects proper cleanup. Consider whether the node should be left spinning or if this is intentional cleanup.

Copilot uses AI. Check for mistakes.

const p2 = new Promise((resolve) =>
setTimeout(() => {
this.subscriberNode.stop();
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous issue, this test stops the node inside an async promise timeout without restarting it before the test ends. This creates an inconsistent pattern with the stop/spin/stop pattern used around setContentFilter() calls (lines 299-301 and 372-374). Consider whether the node should be left spinning or if this is intentional cleanup.

Copilot uses AI. Check for mistakes.
@coveralls
Copy link

coveralls commented Jan 15, 2026

Coverage Status

coverage: 85.565% (+0.02%) from 85.544%
when pulling 1de4f4b on minggangw:fix-1368-1
into 94a1728 on RobotWebTools:develop.

@minggangw minggangw merged commit 7767051 into RobotWebTools:develop Jan 15, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix subscription content-filtering flakiness

2 participants