Skip to content

Add error percentage to cpu and ram monitor#548

Open
sangteak601 wants to merge 6 commits into
ros:ros2from
sangteak601:add_configuration
Open

Add error percentage to cpu and ram monitor#548
sangteak601 wants to merge 6 commits into
ros:ros2from
sangteak601:add_configuration

Conversation

@sangteak601
Copy link
Copy Markdown

At the moment, it is not possible to configure cpu_monitor or ram_monitor to publish ERROR status. It would be good to have ERROR state. It would allow us to take actions before program exhibits unexpected behaviour.

Also, cpu_monitor was updated to compare threshold to average cpu usage rather than individual cpu usage. This is aligned with how ram_monitor works and makes more sense, as in most cases users are more interested in average usage than individual usage.

@mergify mergify Bot added the ros2 PR tackling a ROS2 branch label Dec 17, 2025
@sangteak601
Copy link
Copy Markdown
Author

@ct2034
It seems I can't add a reviewer. Would you be able to take a look at this?

Comment thread diagnostic_common_diagnostics/diagnostic_common_diagnostics/cpu_monitor.py Outdated
Comment thread diagnostic_common_diagnostics/diagnostic_common_diagnostics/cpu_monitor.py Outdated
Comment thread diagnostic_common_diagnostics/README.md
Comment thread diagnostic_common_diagnostics/diagnostic_common_diagnostics/cpu_monitor.py Outdated
@ct2034 ct2034 added the needs more work Someone has worked on this but more work is needed label Apr 20, 2026
@ct2034 ct2034 added the enhancement This tackles a new feature of the code (and not a bug) label May 21, 2026
@ct2034
Copy link
Copy Markdown
Collaborator

ct2034 commented May 21, 2026

@sangteak601 are you planning to work on this?

@sangteak601
Copy link
Copy Markdown
Author

@sangteak601 are you planning to work on this?

All comments are addressed now I believe. Can you take another look?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds error_percentage thresholds to cpu_monitor and ram_monitor so they can publish ERROR status, and adds a use_average switch to cpu_monitor to compare the threshold against the average CPU usage (consistent with the RAM monitor) instead of per-core usage. Documentation and the CPU system test are updated accordingly.

Changes:

  • CpuTask gains required error_percentage and use_average constructor args; status logic now selects max-core or average CPU, then evaluates OK / WARN / ERROR.
  • RamTask gains a required error_percentage constructor arg and emits ERROR when RAM average exceeds it.
  • README and the CPU system test updated; new ROS parameters (error_percentage default 95, use_average default False) declared in both monitors' main().

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diagnostic_common_diagnostics/diagnostic_common_diagnostics/cpu_monitor.py Adds ERROR threshold and use_average switch, removes constructor defaults, declares new parameters.
diagnostic_common_diagnostics/diagnostic_common_diagnostics/ram_monitor.py Adds ERROR threshold to RAM monitoring and declares the new parameter.
diagnostic_common_diagnostics/test/systemtest/test_cpu_monitor.py Updates CpuTask constructor calls to match new required args; updates warn message assertion.
diagnostic_common_diagnostics/README.md Documents new use_average and error_percentage args for CPU/RAM monitors.
diagnostic_common_diagnostics/diagnostic_common_diagnostics/param_decl.yaml Empty file; no content change visible.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* Name of the node is "cpu_monitor_" + hostname.
* Uses the following args:
* use_average: If true, the average CPU usage over all cores will be used to determine the status. If false, the maximum CPU usage among all cores will be used.
* warning_percentage: If the CPU usage is > warning_percentage, a WARN status will be publised.
class CpuTask(DiagnosticTask):

def __init__(self, warning_percentage=90, window=1):
def __init__(self, warning_percentage, error_percentage, window, use_average):
DiagnosticTask.__init__(self, 'CPU Information')

self._warning_percentage = int(warning_percentage)
self._error_percentage = int(error_percentage)
Comment on lines 82 to +99
@@ -96,7 +96,7 @@ def test_updater(self):
node = Node('cpu_monitor_test')
updater = Updater(node)
updater.setHardwareID('test_id')
updater.add(CpuTask())
updater.add(CpuTask(warning_percentage=95, error_percentage=100, window=1, use_average=False))
Comment on lines +16 to +18
* use_average: If true, the average CPU usage over all cores will be used to determine the status. If false, the maximum CPU usage among all cores will be used.
* warning_percentage: If the CPU usage is > warning_percentage, a WARN status will be publised.
* error_percentage: If the CPU usage is > error_percentage, an ERROR status will be published.
Copy link
Copy Markdown
Collaborator

@ct2034 ct2034 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of the copilot comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement This tackles a new feature of the code (and not a bug) needs more work Someone has worked on this but more work is needed ros2 PR tackling a ROS2 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants