Skip to content

[validations] fix: prevent cifmw_validations_xml_filter hang#4000

Open
hgoelredhat wants to merge 1 commit into
mainfrom
fix
Open

[validations] fix: prevent cifmw_validations_xml_filter hang#4000
hgoelredhat wants to merge 1 commit into
mainfrom
fix

Conversation

@hgoelredhat

Copy link
Copy Markdown

Issue: The validations playbook was hanging indefinitely when creating JUnit XML files for validation results.

Root cause: ET.tostring() with encoding='utf-8' returned bytes, which Ansible's copy module couldn't serialize in a template context, causing an indefinite hang.

Solution: Changed encoding parameter to 'unicode', which returns str directly, allowing Ansible to process it normally.

Testing:

  • Created reproduction test proving filter returns bytes (problematic)
  • Applied the fix
  • Verified filter now returns str
  • Added 11 comprehensive unit tests covering:
    • Return type verification (critical for the fix)
    • Normal test cases (empty, single, multiple)
    • Malformed input handling (missing fields, long errors)
    • XML validity and proper escaping
    • Performance with 100+ tests
  • All 11 tests PASS
  • Ansible playbook completes successfully

Issue: The validations playbook was hanging indefinitely when
creating JUnit XML files for validation results.

Root cause: ET.tostring() with encoding='utf-8' returned bytes,
which Ansible's copy module couldn't serialize in a template
context, causing an indefinite hang.

Solution: Changed encoding parameter to 'unicode', which returns
str directly, allowing Ansible to process it normally.

Testing:
- Created reproduction test proving filter returns bytes (problematic)
- Applied the fix
- Verified filter now returns str
- Added 11 comprehensive unit tests covering:
  * Return type verification (critical for the fix)
  * Normal test cases (empty, single, multiple)
  * Malformed input handling (missing fields, long errors)
  * XML validity and proper escaping
  * Performance with 100+ tests
- All 11 tests PASS
- Ansible playbook completes successfully

Signed-off-by: Harshita Goel <hgoel@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign drosenfe for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@centosinfra-prod-github-app

Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/bb56b18b581f4bfe87328f307619106e

✔️ openstack-k8s-operators-content-provider SUCCESS in 18m 15s
podified-multinode-edpm-deployment-crc NODE_FAILURE Node(set) request 099-0000120712 failed in 0s
cifmw-crc-podified-edpm-baremetal NODE_FAILURE Node(set) request 099-0000120713 failed in 0s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 53s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 43s
cifmw-pod-pre-commit FAILURE in 9m 25s
✔️ cifmw-molecule-validations SUCCESS in 4m 51s

@hgoelredhat

Copy link
Copy Markdown
Author

Summary of Work Done

Issue

The validations playbook was hanging indefinitely when creating JUnit XML files for validation results. This prevented the playbook from completing successfully.

Root Cause Analysis

The cifmw_validations_xml_filter Python filter was using encoding="utf-8" in the ET.tostring() function, which returns bytes instead of str. When Ansible's copy module tried to serialize these bytes in a template context, it would hang indefinitely.

Testing & Reproduction

Step 1: Proved the Issue Exists

Created a test script test_hang_reproduction.py that demonstrated:

  • Before Fix: Filter returns bytes → Causes Ansible to hang
  • Expected behavior: Filter should return str

Output:
Return type: bytes

Return type is bytes?: True

Return type is str?: False
❌ ISSUE FOUND!

The filter returns BYTES instead of STR

This causes Ansible to hang when trying to serialize!

Step 2: Applied the Fix

Changed one line in roles/validations/filter_plugins/cifmw_validations_xml_filter.py:

# Before (problematic)
return ET.tostring(root_elm, encoding="utf-8", xml_declaration=True)

# After (fixed)
return ET.tostring(root_elm, encoding="unicode", xml_declaration=True)

Step 3: Verified Fix Works

Ran the same test script and confirmed:

  • After Fix: Filter returns str (correct!)
  • ✅ Ansible can process it normally
  • ✅ No hanging!

Output:
Return type: str

Return type is bytes?: False

Return type is str?: True
✅ GOOD!

The filter returns STR

Ansible can process it normally

Step 4: Added Unit Tests

Created 11 comprehensive unit tests in roles/validations/tests/unit/plugins/filter/test_cifmw_validations_xml_filter.py:

Test Coverage:

  • test_returns_string_not_bytes - CRITICAL: Verifies return type is str (the fix)
  • test_empty_results - Empty test results
  • test_single_passing_test - Single passing test
  • test_single_failing_test - Single failing test
  • test_multiple_mixed_tests - Multiple tests (passing + failing)
  • test_valid_xml_output - XML validity and structure
  • test_missing_time_field - Malformed input handling
  • test_very_long_error_message - Edge case (5000+ char errors)
  • test_xml_special_characters_in_error - XML escaping
  • test_unicode_characters - Unicode support
  • test_large_number_of_tests - Performance (100+ tests)

Test Results:
====================== 11 passed in 0.04s =======================

Step 5: Verified Playbook Completes

Ran the actual Ansible playbook:

ansible-playbook playbooks/validations.yml -vvv

Result:
PLAY RECAP *********************************************************************

localhost : ok=6 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
total ------------------------------------------------------------------- 1.61s

Playbook completes successfully in 1.61 seconds (vs. hanging indefinitely before)

Changes Made

  1. ✅ Fixed cifmw_validations_xml_filter.py - Changed encoding="utf-8" to encoding="unicode"
  2. ✅ Added 11 unit tests for regression prevention
  3. ✅ Proper commit message with [validations] prefix
  4. ✅ Signed commit with DCO sign-off

Impact

  • Before: Playbook hangs indefinitely ❌
  • After: Playbook completes in ~1.6 seconds ✅
  • Backward Compatibility: 100% compatible ✅
  • Test Coverage: 11 comprehensive tests ✅
  • Breaking Changes: None ✅

Verification Checklist

  • Hang issue reproduced and documented
  • Root cause identified and fixed
  • Fix verified to work
  • Unit tests added and all pass
  • Ansible playbook completes successfully
  • Proper commit message format with [validations] prefix
  • Commit signed with DCO (-s flag)
  • No breaking changes

@evallesp evallesp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Looks good to me in general, thanks for adding tests. Do you mind to reword a bit the docstrings?

@@ -0,0 +1,149 @@
#!/usr/bin/python3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: Could you add copyright header?

@@ -0,0 +1,149 @@
#!/usr/bin/python3
"""
Unit tests for cifmw_validations_xml_filter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: Instead of having here a comment about the whole file, could you move this information in each tests? with the relevant information in each of the tests?


def test_returns_string_not_bytes(self):
"""
⚠️ CRITICAL TEST ⚠️

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: I'd rather go by deleting these type of characters.


Ensure filter returns str, not bytes.

Before fix: encoding='utf-8' → returns bytes → Ansible hangs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: I don0t think tests should set what was the problem before and what's now.
Probably before fix and after fix is not necessary to have. If you wnat to check this tests you can check the git history.
Also the test name imo should be better like: test_returns_format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants