Skip to content

Add Nutanix Integration#22086

Merged
NouemanKHAL merged 202 commits intomasterfrom
noueman/add-nutanix-integration
Mar 4, 2026
Merged

Add Nutanix Integration#22086
NouemanKHAL merged 202 commits intomasterfrom
noueman/add-nutanix-integration

Conversation

@NouemanKHAL
Copy link
Copy Markdown
Member

@NouemanKHAL NouemanKHAL commented Dec 10, 2025

What does this PR do?

Adds Nutanix integration for monitoring Prism Central v4 infrastructure via Datadog Agent.

Features

  • Infrastructure Metrics: Clusters, hosts, and VMs with stats (CPU, memory, storage, network)
  • Activity Monitoring: Events, audits, alerts, and tasks as Datadog events with a ntnx_type tag
  • Category Tags: Automatic tagging from Nutanix categories (with ntnx_ prefix option)
  • Resource Filtering: Regex-based filtering for clusters, hosts, and VMs

Technical Details

  • API: Nutanix Prism Central v4.0/v4.2 REST APIs
  • Authentication: Basic auth
  • Rate Limiting: Respects API rate limits with exponential backoff
  • Testing: Unit tests + Docker integration tests + AWS environment support

Motivation

https://datadoghq.atlassian.net/browse/AI-5917

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Comment thread nutanix/datadog_checks/nutanix/__about__.py Outdated
]
check = NutanixCheck('nutanix', {}, [mock_instance])
dd_run_check(check)
aggregator.assert_metric("nutanix.host.count", at_least=1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this test be more specific?

]

# VM "NTNX-10-0-0-165-PCVM-1767014640" has numCoresPerSocket=1
aggregator.assert_metric("nutanix.vm.cpu.cores_per_socket", value=1, tags=expected_tags)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests could be combined and just assert metrics in one test


raise ConnectionError("Connection failed")

mocker.patch('requests.Session.get', side_effect=mock_exception)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the new way of mocking the http wrapper here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that, we can handle it after the migration of existing integrations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it now (a future PR) so there's one less thing to migrate later?

)

# Test POST
check._make_request_with_retry("http://test.com", method='post', json={'key': 'value'})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit- we shouldn't be testing private/helper methods directly

sarah-witt
sarah-witt previously approved these changes Mar 4, 2026
dkirov-dd
dkirov-dd previously approved these changes Mar 4, 2026
@temporal-github-worker-1 temporal-github-worker-1 Bot dismissed stale reviews from sarah-witt and dkirov-dd March 4, 2026 18:05

Review from sarah-witt is dismissed. Related teams and files:

  • agent-integrations
    • nutanix/assets/configuration/spec.yaml
    • nutanix/datadog_checks/nutanix/check.py
    • nutanix/datadog_checks/nutanix/config_models/defaults.py
    • nutanix/datadog_checks/nutanix/config_models/instance.py
    • nutanix/datadog_checks/nutanix/data/conf.yaml.example
    • nutanix/datadog_checks/nutanix/infrastructure_monitor.py
    • nutanix/tests/test_resource_filters.py
    • nutanix/tests/test_vms.py

Review from dkirov-dd is dismissed. Related teams and files:

  • agent-integrations
    • nutanix/assets/configuration/spec.yaml
    • nutanix/datadog_checks/nutanix/check.py
    • nutanix/datadog_checks/nutanix/config_models/defaults.py
    • nutanix/datadog_checks/nutanix/config_models/instance.py
    • nutanix/datadog_checks/nutanix/data/conf.yaml.example
    • nutanix/datadog_checks/nutanix/infrastructure_monitor.py
    • nutanix/tests/test_resource_filters.py
    • nutanix/tests/test_vms.py
sarah-witt
sarah-witt previously approved these changes Mar 4, 2026
@temporal-github-worker-1 temporal-github-worker-1 Bot dismissed sarah-witt’s stale review March 4, 2026 18:27

Review from sarah-witt is dismissed. Related teams and files:

  • agent-integrations
    • nutanix/datadog_checks/nutanix/infrastructure_monitor.py
    • nutanix/tests/test_vms.py
@NouemanKHAL
Copy link
Copy Markdown
Member Author

Will addressing docs review and tests/assets review in a separate PR!
@sarah-witt @drichards-87

@NouemanKHAL NouemanKHAL merged commit ae8e33e into master Mar 4, 2026
315 of 322 checks passed
@NouemanKHAL NouemanKHAL deleted the noueman/add-nutanix-integration branch March 4, 2026 19:11
@datadog-agent-integrations-bot
Copy link
Copy Markdown
Contributor

The backport to 7.77.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-7.77.x 7.77.x
# Navigate to the new working tree
cd .worktrees/backport-7.77.x
# Create a new branch
git switch --create backport-22086-to-7.77.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ae8e33e6c688ef49aacd17ae461c0e9b19d3b04d
# Push it to GitHub
git push --set-upstream origin backport-22086-to-7.77.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-7.77.x

Then, create a pull request where the base branch is 7.77.x and the compare/head branch is backport-22086-to-7.77.x.

NouemanKHAL added a commit that referenced this pull request Mar 4, 2026
* initial scaffolding

* working nutanix.health.up metric

* working nutanix.cluster.count metric

* work in progress

* work in progress: assert tags in unit tests

* tests cleanup

* cleanup

* health check

* collecting basic cluster metrics

* collecting cluster stats and basic node metrics

* remove upgrade status tag and lint

* fix basic auth + add integration tests

* refactor and cleanup

* collecting node stats metrics

* lint

* add cluster namespace to cluster metrics

* rename metrics, remove unit suffixes

* use host instead of nodes as much as possible

* collecting basic vm metrics

* fix  query param typo

* collecting vm stats metric

* add missing  argument required for VmStats and passing integration tests

* add metadata.csv

* fix typo in test name

* update integration tests to stop checking for values

* add nutanix overview dashboard

* update manifest description and classifier tags

* update manifest metric to check for

* little cleanup

* remove unused dependency from pyproject

* set default min_collection_interval to 120s

* update dashboard with more units and improvements

* update dashboard description

* report host metrics and vm metrics with their correspondig hostname

* report external host tags for hosts and vms

* switch to list all vm stats endpoint for better rate limit - update metdata.csv with new metrics

* add ntnx_type:host and ntnx_type:vm as tags

* add cluster_name and host_name tags to all hosts and vm metrics + fix integration tests

* improve metrics descriptions in metadata.csv

* update dashboard

* add compact legend to all cluster/host/vm widgets for better ux

* fix stats sampling interval to match the min_collection_interval

* add support for pagination

* add page_limit parameter for pagination size limit

* update fixtures and tests for the new paginated requests

* rename paginated methods to start with list instead of get

* add support for retry logic to handle PC rate limiting

* add process signatures

* update nutanix process signatures

* fix error deleting page and limit params

* fix manifest.json extra comma in process_signatures

* collect events

* add bash script to record fixtures

* Fix log message for error collecting vm metrics

* refactor pagination method and improve logging

* ddev validate ci --sync

* update dashboard and add new nutanix logos

* add debug logs for HTTP requests and payloads

* add support for port in pc_ip

* swap nutanix.vm.hypervisor.memory_usage_ppm with nutanix.vm.memory.usage_ppm for more accurate VM memory usage

* improve logging: reduce HTTP logging noise to only rate limits and error responses

* fix validate dashboards

* bump python version to 3.13 and min base check version

* fix typo in min base check

* fix Mock() has no len error in test_retry.py

* wip

* add collect_events property

* change remaining references to nutanix.vm.hypervisor.memory_usage_ppm to nutanix.vm.memory.usage_ppm to fix VM memory usage widgets

* add support for tasks collection, update fixtures

* add ntnx_type tag to events and tasks

* small cleanup

* dashboard: change all bytes in binary to bytes in decimal

* cleanup and small refactor

* make events and tasks match implementation, fix handling of start_time, improve tests and small refactor

* split check.py into modules, fix integration tests

* improve error messages for non 2xx http error responses

* add missing dd licence headers to some files

* rename health_check_score metric

* improve metric names batch 1

* improve cluster and host metric names

* improve vm metric names

* split unit tests into multiple files

* add support for audits collection

* cleanup and improving tests setup

* improve duplication logic tests in events,audits and tasks

* add support for alerts collection

* use alerts v4.2 API that supports filtering by creationTime

* sync all API calls to use the same time window (start, end)

* add extra filtering to avoid events/tasks/audits/alerts duplicates

* fallback to alerts v4.0 API if v4.2 is not available

* fix self.last__x_collection_time fields to be the max timestamp: fixes duplicates

* persist information about v4.2 API in the persistence cache

* wip: host and vm stats not working?

* improve vm stats collection by cluster, improve info logs and debug logs

* improve type hints and method comments

* add support for capacity metrics

* add nutanix tag to all entities

* report node status metric

* ddev validate models and config

* add collect_tasks and collect_audits properties for nutanix

* add filter propreties for alerts

* add filter by severity and type for alerts

* add filter events by type

* add filter tasks by status

* add resource filters support for infra resources and activity resources

* improve resource_filters

* cleanup

* fetch and cache categories

* attach categories as tags with option to add ntnx_ suffix

* improve categories collection/attachment, improve tests, update all fixtures

* improve categories collection and testing

* add owner to manifest.json

* remove duplicate self.last_audit_collection_time assignment

* fix alert messages parameter rendering

* add more tests

* reduce info logs, improve info log summary, and change rest of logs to debug

* improve audits timpestamp tracking, improve logging, code cleanup

* improve resource_filters logging, log error messages

* fix integration tests + add support for fake docker server testing

* fix nutanix wheel version

* reset teleport change

* ddev validate ci --sync

* fix licence headers

* fix more licence headers

* fix one more licence header

* ddev validate labeler --sync

* Fix labeler config

* reduce audits.json size

* reduce audits.json to 50KB

* reduce alerts.json to 20 items

* replace bash script for recording fixtures with python implementation

* update resource_filters description

* add starting check info log and add comment about sampling interval

* improve categories tests around default behavior, remove duplicate record_fixtures.py

* udpate resource_filters to match default desired category behavior

* add collect_subtasks properties, use persistent cache to track last collected items correctly before filtering

* Apply suggestions from code review

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>

* cleanup rate limit retry implementation

* add nutanix.api.rate_limited metric for visibility

* fix port configuration log message test

* fix rate limit tests and update with the new metric

* improve retry_limit implementation

* update README.md

* add missing prefix_category_tags property

* add tests for prefix_category_tags property

* address david review: always add ntnx_is_agent_vm tag

* document categories in the README

* add example in the README for collecting categories

* fix dashboard validation

* fix readme validate

* address review: raise ConfigurationError when its not set

* fix ntnx_is_agent_vm tag in tests

* add missing spec.yaml properties

* update powerConsumptionInstantWatt name for consistency

* set creates_events to true in manifest.json

* fix retry_on_rate_limit behavior on non 429 responses

* fix changelog file name

* remove unnecessary file

* fix retry_on_rate_limit behavior and add type hints

* address sarah review: add log message when skipping a resource

* address sarah review: add test all metrics + fix missing metric in metadata.csv

* update fixtures with new prism_central url and new VM in OFF state

* report nutanix.vm.status metric

* address review: by default only collect VMs with powerState ON

* collect only ON VM even if other VM resource_filters are set that are not powerState

* fix imports and paths for record_fixtures.py

* add note about duplicate hostname issue

* sync config with the new VM collection comment

* update README to explain that a single agent can monitor a prism central environment

* fix license headers

* refactor activity_monitor by reducing code duplication and extraching a sharing method

* refactor infrastructure_monitor

* refactor resource_filters

* refactor check.py / simplyfing

* add support for rendering audit messages

* add support for rendering nutanix event messages

* remove all X_id tags

* add caches for activity entities

* update test fixtures and adjust tests to work with the new data

* add support for displaying affected alerts in tasks

* refactor: reduce code duplication in activity_monitor

* remove debugging code

* improve readability of activity_monitor.py code related to filtering

* code cleanup + reduce code duplication

* split collect_cluster_metrics into isolated collection phases to enable partial data collection on failures

* improve error isolation for host processing to allow non-blocking errors when single host fails

* replace log.exception with log.error for more user friendly log message for known errors

* make resource_filters proprety always required and remove its default values

* switch super() call to python3 style

* address some code smells

* update page_limit default value from 50 to 100 to reduce API calls

* Revert accidental version modification for dev

Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>

* add support for batch_vm_collection by default to avoid rate limits

* fix batch collection mode to process all vms regardless of the batch mode

* improve testing vms filtering in both vm collection modes

---------

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>
Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>
(cherry picked from commit ae8e33e)
github-actions Bot pushed a commit that referenced this pull request Mar 4, 2026
* initial scaffolding

* working nutanix.health.up metric

* working nutanix.cluster.count metric

* work in progress

* work in progress: assert tags in unit tests

* tests cleanup

* cleanup

* health check

* collecting basic cluster metrics

* collecting cluster stats and basic node metrics

* remove upgrade status tag and lint

* fix basic auth + add integration tests

* refactor and cleanup

* collecting node stats metrics

* lint

* add cluster namespace to cluster metrics

* rename metrics, remove unit suffixes

* use host instead of nodes as much as possible

* collecting basic vm metrics

* fix  query param typo

* collecting vm stats metric

* add missing  argument required for VmStats and passing integration tests

* add metadata.csv

* fix typo in test name

* update integration tests to stop checking for values

* add nutanix overview dashboard

* update manifest description and classifier tags

* update manifest metric to check for

* little cleanup

* remove unused dependency from pyproject

* set default min_collection_interval to 120s

* update dashboard with more units and improvements

* update dashboard description

* report host metrics and vm metrics with their correspondig hostname

* report external host tags for hosts and vms

* switch to list all vm stats endpoint for better rate limit - update metdata.csv with new metrics

* add ntnx_type:host and ntnx_type:vm as tags

* add cluster_name and host_name tags to all hosts and vm metrics + fix integration tests

* improve metrics descriptions in metadata.csv

* update dashboard

* add compact legend to all cluster/host/vm widgets for better ux

* fix stats sampling interval to match the min_collection_interval

* add support for pagination

* add page_limit parameter for pagination size limit

* update fixtures and tests for the new paginated requests

* rename paginated methods to start with list instead of get

* add support for retry logic to handle PC rate limiting

* add process signatures

* update nutanix process signatures

* fix error deleting page and limit params

* fix manifest.json extra comma in process_signatures

* collect events

* add bash script to record fixtures

* Fix log message for error collecting vm metrics

* refactor pagination method and improve logging

* ddev validate ci --sync

* update dashboard and add new nutanix logos

* add debug logs for HTTP requests and payloads

* add support for port in pc_ip

* swap nutanix.vm.hypervisor.memory_usage_ppm with nutanix.vm.memory.usage_ppm for more accurate VM memory usage

* improve logging: reduce HTTP logging noise to only rate limits and error responses

* fix validate dashboards

* bump python version to 3.13 and min base check version

* fix typo in min base check

* fix Mock() has no len error in test_retry.py

* wip

* add collect_events property

* change remaining references to nutanix.vm.hypervisor.memory_usage_ppm to nutanix.vm.memory.usage_ppm to fix VM memory usage widgets

* add support for tasks collection, update fixtures

* add ntnx_type tag to events and tasks

* small cleanup

* dashboard: change all bytes in binary to bytes in decimal

* cleanup and small refactor

* make events and tasks match implementation, fix handling of start_time, improve tests and small refactor

* split check.py into modules, fix integration tests

* improve error messages for non 2xx http error responses

* add missing dd licence headers to some files

* rename health_check_score metric

* improve metric names batch 1

* improve cluster and host metric names

* improve vm metric names

* split unit tests into multiple files

* add support for audits collection

* cleanup and improving tests setup

* improve duplication logic tests in events,audits and tasks

* add support for alerts collection

* use alerts v4.2 API that supports filtering by creationTime

* sync all API calls to use the same time window (start, end)

* add extra filtering to avoid events/tasks/audits/alerts duplicates

* fallback to alerts v4.0 API if v4.2 is not available

* fix self.last__x_collection_time fields to be the max timestamp: fixes duplicates

* persist information about v4.2 API in the persistence cache

* wip: host and vm stats not working?

* improve vm stats collection by cluster, improve info logs and debug logs

* improve type hints and method comments

* add support for capacity metrics

* add nutanix tag to all entities

* report node status metric

* ddev validate models and config

* add collect_tasks and collect_audits properties for nutanix

* add filter propreties for alerts

* add filter by severity and type for alerts

* add filter events by type

* add filter tasks by status

* add resource filters support for infra resources and activity resources

* improve resource_filters

* cleanup

* fetch and cache categories

* attach categories as tags with option to add ntnx_ suffix

* improve categories collection/attachment, improve tests, update all fixtures

* improve categories collection and testing

* add owner to manifest.json

* remove duplicate self.last_audit_collection_time assignment

* fix alert messages parameter rendering

* add more tests

* reduce info logs, improve info log summary, and change rest of logs to debug

* improve audits timpestamp tracking, improve logging, code cleanup

* improve resource_filters logging, log error messages

* fix integration tests + add support for fake docker server testing

* fix nutanix wheel version

* reset teleport change

* ddev validate ci --sync

* fix licence headers

* fix more licence headers

* fix one more licence header

* ddev validate labeler --sync

* Fix labeler config

* reduce audits.json size

* reduce audits.json to 50KB

* reduce alerts.json to 20 items

* replace bash script for recording fixtures with python implementation

* update resource_filters description

* add starting check info log and add comment about sampling interval

* improve categories tests around default behavior, remove duplicate record_fixtures.py

* udpate resource_filters to match default desired category behavior

* add collect_subtasks properties, use persistent cache to track last collected items correctly before filtering

* Apply suggestions from code review

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>

* cleanup rate limit retry implementation

* add nutanix.api.rate_limited metric for visibility

* fix port configuration log message test

* fix rate limit tests and update with the new metric

* improve retry_limit implementation

* update README.md

* add missing prefix_category_tags property

* add tests for prefix_category_tags property

* address david review: always add ntnx_is_agent_vm tag

* document categories in the README

* add example in the README for collecting categories

* fix dashboard validation

* fix readme validate

* address review: raise ConfigurationError when its not set

* fix ntnx_is_agent_vm tag in tests

* add missing spec.yaml properties

* update powerConsumptionInstantWatt name for consistency

* set creates_events to true in manifest.json

* fix retry_on_rate_limit behavior on non 429 responses

* fix changelog file name

* remove unnecessary file

* fix retry_on_rate_limit behavior and add type hints

* address sarah review: add log message when skipping a resource

* address sarah review: add test all metrics + fix missing metric in metadata.csv

* update fixtures with new prism_central url and new VM in OFF state

* report nutanix.vm.status metric

* address review: by default only collect VMs with powerState ON

* collect only ON VM even if other VM resource_filters are set that are not powerState

* fix imports and paths for record_fixtures.py

* add note about duplicate hostname issue

* sync config with the new VM collection comment

* update README to explain that a single agent can monitor a prism central environment

* fix license headers

* refactor activity_monitor by reducing code duplication and extraching a sharing method

* refactor infrastructure_monitor

* refactor resource_filters

* refactor check.py / simplyfing

* add support for rendering audit messages

* add support for rendering nutanix event messages

* remove all X_id tags

* add caches for activity entities

* update test fixtures and adjust tests to work with the new data

* add support for displaying affected alerts in tasks

* refactor: reduce code duplication in activity_monitor

* remove debugging code

* improve readability of activity_monitor.py code related to filtering

* code cleanup + reduce code duplication

* split collect_cluster_metrics into isolated collection phases to enable partial data collection on failures

* improve error isolation for host processing to allow non-blocking errors when single host fails

* replace log.exception with log.error for more user friendly log message for known errors

* make resource_filters proprety always required and remove its default values

* switch super() call to python3 style

* address some code smells

* update page_limit default value from 50 to 100 to reduce API calls

* Revert accidental version modification for dev

Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>

* add support for batch_vm_collection by default to avoid rate limits

* fix batch collection mode to process all vms regardless of the batch mode

* improve testing vms filtering in both vm collection modes

---------

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>
Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com> ae8e33e
github-actions Bot pushed a commit to ConnectionMaster/integrations-core that referenced this pull request Mar 5, 2026
* initial scaffolding

* working nutanix.health.up metric

* working nutanix.cluster.count metric

* work in progress

* work in progress: assert tags in unit tests

* tests cleanup

* cleanup

* health check

* collecting basic cluster metrics

* collecting cluster stats and basic node metrics

* remove upgrade status tag and lint

* fix basic auth + add integration tests

* refactor and cleanup

* collecting node stats metrics

* lint

* add cluster namespace to cluster metrics

* rename metrics, remove unit suffixes

* use host instead of nodes as much as possible

* collecting basic vm metrics

* fix  query param typo

* collecting vm stats metric

* add missing  argument required for VmStats and passing integration tests

* add metadata.csv

* fix typo in test name

* update integration tests to stop checking for values

* add nutanix overview dashboard

* update manifest description and classifier tags

* update manifest metric to check for

* little cleanup

* remove unused dependency from pyproject

* set default min_collection_interval to 120s

* update dashboard with more units and improvements

* update dashboard description

* report host metrics and vm metrics with their correspondig hostname

* report external host tags for hosts and vms

* switch to list all vm stats endpoint for better rate limit - update metdata.csv with new metrics

* add ntnx_type:host and ntnx_type:vm as tags

* add cluster_name and host_name tags to all hosts and vm metrics + fix integration tests

* improve metrics descriptions in metadata.csv

* update dashboard

* add compact legend to all cluster/host/vm widgets for better ux

* fix stats sampling interval to match the min_collection_interval

* add support for pagination

* add page_limit parameter for pagination size limit

* update fixtures and tests for the new paginated requests

* rename paginated methods to start with list instead of get

* add support for retry logic to handle PC rate limiting

* add process signatures

* update nutanix process signatures

* fix error deleting page and limit params

* fix manifest.json extra comma in process_signatures

* collect events

* add bash script to record fixtures

* Fix log message for error collecting vm metrics

* refactor pagination method and improve logging

* ddev validate ci --sync

* update dashboard and add new nutanix logos

* add debug logs for HTTP requests and payloads

* add support for port in pc_ip

* swap nutanix.vm.hypervisor.memory_usage_ppm with nutanix.vm.memory.usage_ppm for more accurate VM memory usage

* improve logging: reduce HTTP logging noise to only rate limits and error responses

* fix validate dashboards

* bump python version to 3.13 and min base check version

* fix typo in min base check

* fix Mock() has no len error in test_retry.py

* wip

* add collect_events property

* change remaining references to nutanix.vm.hypervisor.memory_usage_ppm to nutanix.vm.memory.usage_ppm to fix VM memory usage widgets

* add support for tasks collection, update fixtures

* add ntnx_type tag to events and tasks

* small cleanup

* dashboard: change all bytes in binary to bytes in decimal

* cleanup and small refactor

* make events and tasks match implementation, fix handling of start_time, improve tests and small refactor

* split check.py into modules, fix integration tests

* improve error messages for non 2xx http error responses

* add missing dd licence headers to some files

* rename health_check_score metric

* improve metric names batch 1

* improve cluster and host metric names

* improve vm metric names

* split unit tests into multiple files

* add support for audits collection

* cleanup and improving tests setup

* improve duplication logic tests in events,audits and tasks

* add support for alerts collection

* use alerts v4.2 API that supports filtering by creationTime

* sync all API calls to use the same time window (start, end)

* add extra filtering to avoid events/tasks/audits/alerts duplicates

* fallback to alerts v4.0 API if v4.2 is not available

* fix self.last__x_collection_time fields to be the max timestamp: fixes duplicates

* persist information about v4.2 API in the persistence cache

* wip: host and vm stats not working?

* improve vm stats collection by cluster, improve info logs and debug logs

* improve type hints and method comments

* add support for capacity metrics

* add nutanix tag to all entities

* report node status metric

* ddev validate models and config

* add collect_tasks and collect_audits properties for nutanix

* add filter propreties for alerts

* add filter by severity and type for alerts

* add filter events by type

* add filter tasks by status

* add resource filters support for infra resources and activity resources

* improve resource_filters

* cleanup

* fetch and cache categories

* attach categories as tags with option to add ntnx_ suffix

* improve categories collection/attachment, improve tests, update all fixtures

* improve categories collection and testing

* add owner to manifest.json

* remove duplicate self.last_audit_collection_time assignment

* fix alert messages parameter rendering

* add more tests

* reduce info logs, improve info log summary, and change rest of logs to debug

* improve audits timpestamp tracking, improve logging, code cleanup

* improve resource_filters logging, log error messages

* fix integration tests + add support for fake docker server testing

* fix nutanix wheel version

* reset teleport change

* ddev validate ci --sync

* fix licence headers

* fix more licence headers

* fix one more licence header

* ddev validate labeler --sync

* Fix labeler config

* reduce audits.json size

* reduce audits.json to 50KB

* reduce alerts.json to 20 items

* replace bash script for recording fixtures with python implementation

* update resource_filters description

* add starting check info log and add comment about sampling interval

* improve categories tests around default behavior, remove duplicate record_fixtures.py

* udpate resource_filters to match default desired category behavior

* add collect_subtasks properties, use persistent cache to track last collected items correctly before filtering

* Apply suggestions from code review

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>

* cleanup rate limit retry implementation

* add nutanix.api.rate_limited metric for visibility

* fix port configuration log message test

* fix rate limit tests and update with the new metric

* improve retry_limit implementation

* update README.md

* add missing prefix_category_tags property

* add tests for prefix_category_tags property

* address david review: always add ntnx_is_agent_vm tag

* document categories in the README

* add example in the README for collecting categories

* fix dashboard validation

* fix readme validate

* address review: raise ConfigurationError when its not set

* fix ntnx_is_agent_vm tag in tests

* add missing spec.yaml properties

* update powerConsumptionInstantWatt name for consistency

* set creates_events to true in manifest.json

* fix retry_on_rate_limit behavior on non 429 responses

* fix changelog file name

* remove unnecessary file

* fix retry_on_rate_limit behavior and add type hints

* address sarah review: add log message when skipping a resource

* address sarah review: add test all metrics + fix missing metric in metadata.csv

* update fixtures with new prism_central url and new VM in OFF state

* report nutanix.vm.status metric

* address review: by default only collect VMs with powerState ON

* collect only ON VM even if other VM resource_filters are set that are not powerState

* fix imports and paths for record_fixtures.py

* add note about duplicate hostname issue

* sync config with the new VM collection comment

* update README to explain that a single agent can monitor a prism central environment

* fix license headers

* refactor activity_monitor by reducing code duplication and extraching a sharing method

* refactor infrastructure_monitor

* refactor resource_filters

* refactor check.py / simplyfing

* add support for rendering audit messages

* add support for rendering nutanix event messages

* remove all X_id tags

* add caches for activity entities

* update test fixtures and adjust tests to work with the new data

* add support for displaying affected alerts in tasks

* refactor: reduce code duplication in activity_monitor

* remove debugging code

* improve readability of activity_monitor.py code related to filtering

* code cleanup + reduce code duplication

* split collect_cluster_metrics into isolated collection phases to enable partial data collection on failures

* improve error isolation for host processing to allow non-blocking errors when single host fails

* replace log.exception with log.error for more user friendly log message for known errors

* make resource_filters proprety always required and remove its default values

* switch super() call to python3 style

* address some code smells

* update page_limit default value from 50 to 100 to reduce API calls

* Revert accidental version modification for dev

Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>

* add support for batch_vm_collection by default to avoid rate limits

* fix batch collection mode to process all vms regardless of the batch mode

* improve testing vms filtering in both vm collection modes

---------

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>
Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com> ae8e33e
NouemanKHAL added a commit that referenced this pull request Mar 5, 2026
* Add Nutanix Integration (#22086)
Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>
Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>
@NouemanKHAL NouemanKHAL mentioned this pull request Mar 5, 2026
3 tasks
lukepatrick pushed a commit to lukepatrick/integrations-core that referenced this pull request Mar 17, 2026
* initial scaffolding

* working nutanix.health.up metric

* working nutanix.cluster.count metric

* work in progress

* work in progress: assert tags in unit tests

* tests cleanup

* cleanup

* health check

* collecting basic cluster metrics

* collecting cluster stats and basic node metrics

* remove upgrade status tag and lint

* fix basic auth + add integration tests

* refactor and cleanup

* collecting node stats metrics

* lint

* add cluster namespace to cluster metrics

* rename metrics, remove unit suffixes

* use host instead of nodes as much as possible

* collecting basic vm metrics

* fix  query param typo

* collecting vm stats metric

* add missing  argument required for VmStats and passing integration tests

* add metadata.csv

* fix typo in test name

* update integration tests to stop checking for values

* add nutanix overview dashboard

* update manifest description and classifier tags

* update manifest metric to check for

* little cleanup

* remove unused dependency from pyproject

* set default min_collection_interval to 120s

* update dashboard with more units and improvements

* update dashboard description

* report host metrics and vm metrics with their correspondig hostname

* report external host tags for hosts and vms

* switch to list all vm stats endpoint for better rate limit - update metdata.csv with new metrics

* add ntnx_type:host and ntnx_type:vm as tags

* add cluster_name and host_name tags to all hosts and vm metrics + fix integration tests

* improve metrics descriptions in metadata.csv

* update dashboard

* add compact legend to all cluster/host/vm widgets for better ux

* fix stats sampling interval to match the min_collection_interval

* add support for pagination

* add page_limit parameter for pagination size limit

* update fixtures and tests for the new paginated requests

* rename paginated methods to start with list instead of get

* add support for retry logic to handle PC rate limiting

* add process signatures

* update nutanix process signatures

* fix error deleting page and limit params

* fix manifest.json extra comma in process_signatures

* collect events

* add bash script to record fixtures

* Fix log message for error collecting vm metrics

* refactor pagination method and improve logging

* ddev validate ci --sync

* update dashboard and add new nutanix logos

* add debug logs for HTTP requests and payloads

* add support for port in pc_ip

* swap nutanix.vm.hypervisor.memory_usage_ppm with nutanix.vm.memory.usage_ppm for more accurate VM memory usage

* improve logging: reduce HTTP logging noise to only rate limits and error responses

* fix validate dashboards

* bump python version to 3.13 and min base check version

* fix typo in min base check

* fix Mock() has no len error in test_retry.py

* wip

* add collect_events property

* change remaining references to nutanix.vm.hypervisor.memory_usage_ppm to nutanix.vm.memory.usage_ppm to fix VM memory usage widgets

* add support for tasks collection, update fixtures

* add ntnx_type tag to events and tasks

* small cleanup

* dashboard: change all bytes in binary to bytes in decimal

* cleanup and small refactor

* make events and tasks match implementation, fix handling of start_time, improve tests and small refactor

* split check.py into modules, fix integration tests

* improve error messages for non 2xx http error responses

* add missing dd licence headers to some files

* rename health_check_score metric

* improve metric names batch 1

* improve cluster and host metric names

* improve vm metric names

* split unit tests into multiple files

* add support for audits collection

* cleanup and improving tests setup

* improve duplication logic tests in events,audits and tasks

* add support for alerts collection

* use alerts v4.2 API that supports filtering by creationTime

* sync all API calls to use the same time window (start, end)

* add extra filtering to avoid events/tasks/audits/alerts duplicates

* fallback to alerts v4.0 API if v4.2 is not available

* fix self.last__x_collection_time fields to be the max timestamp: fixes duplicates

* persist information about v4.2 API in the persistence cache

* wip: host and vm stats not working?

* improve vm stats collection by cluster, improve info logs and debug logs

* improve type hints and method comments

* add support for capacity metrics

* add nutanix tag to all entities

* report node status metric

* ddev validate models and config

* add collect_tasks and collect_audits properties for nutanix

* add filter propreties for alerts

* add filter by severity and type for alerts

* add filter events by type

* add filter tasks by status

* add resource filters support for infra resources and activity resources

* improve resource_filters

* cleanup

* fetch and cache categories

* attach categories as tags with option to add ntnx_ suffix

* improve categories collection/attachment, improve tests, update all fixtures

* improve categories collection and testing

* add owner to manifest.json

* remove duplicate self.last_audit_collection_time assignment

* fix alert messages parameter rendering

* add more tests

* reduce info logs, improve info log summary, and change rest of logs to debug

* improve audits timpestamp tracking, improve logging, code cleanup

* improve resource_filters logging, log error messages

* fix integration tests + add support for fake docker server testing

* fix nutanix wheel version

* reset teleport change

* ddev validate ci --sync

* fix licence headers

* fix more licence headers

* fix one more licence header

* ddev validate labeler --sync

* Fix labeler config

* reduce audits.json size

* reduce audits.json to 50KB

* reduce alerts.json to 20 items

* replace bash script for recording fixtures with python implementation

* update resource_filters description

* add starting check info log and add comment about sampling interval

* improve categories tests around default behavior, remove duplicate record_fixtures.py

* udpate resource_filters to match default desired category behavior

* add collect_subtasks properties, use persistent cache to track last collected items correctly before filtering

* Apply suggestions from code review

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>

* cleanup rate limit retry implementation

* add nutanix.api.rate_limited metric for visibility

* fix port configuration log message test

* fix rate limit tests and update with the new metric

* improve retry_limit implementation

* update README.md

* add missing prefix_category_tags property

* add tests for prefix_category_tags property

* address david review: always add ntnx_is_agent_vm tag

* document categories in the README

* add example in the README for collecting categories

* fix dashboard validation

* fix readme validate

* address review: raise ConfigurationError when its not set

* fix ntnx_is_agent_vm tag in tests

* add missing spec.yaml properties

* update powerConsumptionInstantWatt name for consistency

* set creates_events to true in manifest.json

* fix retry_on_rate_limit behavior on non 429 responses

* fix changelog file name

* remove unnecessary file

* fix retry_on_rate_limit behavior and add type hints

* address sarah review: add log message when skipping a resource

* address sarah review: add test all metrics + fix missing metric in metadata.csv

* update fixtures with new prism_central url and new VM in OFF state

* report nutanix.vm.status metric

* address review: by default only collect VMs with powerState ON

* collect only ON VM even if other VM resource_filters are set that are not powerState

* fix imports and paths for record_fixtures.py

* add note about duplicate hostname issue

* sync config with the new VM collection comment

* update README to explain that a single agent can monitor a prism central environment

* fix license headers

* refactor activity_monitor by reducing code duplication and extraching a sharing method

* refactor infrastructure_monitor

* refactor resource_filters

* refactor check.py / simplyfing

* add support for rendering audit messages

* add support for rendering nutanix event messages

* remove all X_id tags

* add caches for activity entities

* update test fixtures and adjust tests to work with the new data

* add support for displaying affected alerts in tasks

* refactor: reduce code duplication in activity_monitor

* remove debugging code

* improve readability of activity_monitor.py code related to filtering

* code cleanup + reduce code duplication

* split collect_cluster_metrics into isolated collection phases to enable partial data collection on failures

* improve error isolation for host processing to allow non-blocking errors when single host fails

* replace log.exception with log.error for more user friendly log message for known errors

* make resource_filters proprety always required and remove its default values

* switch super() call to python3 style

* address some code smells

* update page_limit default value from 50 to 100 to reduce API calls

* Revert accidental version modification for dev

Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>

* add support for batch_vm_collection by default to avoid rate limits

* fix batch collection mode to process all vms regardless of the batch mode

* improve testing vms filtering in both vm collection modes

---------

Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>
Co-authored-by: Sarah Witt <sarah.witt@datadoghq.com>
Signed-off-by: lukepatrick <lukephilips@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants