Skip to content

GH-49397: [Python] Fix PYARROW_GENERATE_COVERAGE end-to-end flow#49475

Open
omertt27 wants to merge 1 commit intoapache:mainfrom
omertt27:GH-49397-pyarrow-generate-coverage
Open

GH-49397: [Python] Fix PYARROW_GENERATE_COVERAGE end-to-end flow#49475
omertt27 wants to merge 1 commit intoapache:mainfrom
omertt27:GH-49397-pyarrow-generate-coverage

Conversation

@omertt27
Copy link

@omertt27 omertt27 commented Mar 9, 2026

Summary

Closes #49397.

This PR validates and fixes the PYARROW_GENERATE_COVERAGE flag so that
Cython coverage reporting works end-to-end for the PyArrow test suite.

Changes

python/pyarrow/conftest.py

  • Added a pytest_configure hook that activates the Cython.Coverage
    plugin and calls coverage.process_startup() when
    PYARROW_GENERATE_COVERAGE is set.
  • This must run before any pyarrow modules are imported so that the
    Cython line-tracing hooks embedded at build time are picked up by
    coverage.py.
  • A clear UserWarning is raised if coverage or Cython are not
    installed, with instructions to install them.

python/.coveragerc

  • Added source, branch, report and html sections so coverage.py
    knows which files to measure and where to write the report.

How to use

# Build pyarrow with Cython line tracing enabled
PYARROW_GENERATE_COVERAGE=1 pip install -e python/

# Run tests with coverage
cd python
PYARROW_GENERATE_COVERAGE=1 coverage run -m pytest pyarrow/tests/ -x -q

# View report
coverage report
coverage html
* GitHub Issue: #49397

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! this is currently not tackling the issue that was opened. We should validate whether the existing CMake option PYARROW_GENERATE_COVERAGE:

if(PYARROW_GENERATE_COVERAGE)
  set(CYTHON_FLAGS "${CYTHON_FLAGS}" "-Xlinetrace=True")
endif()

and

  if(PYARROW_GENERATE_COVERAGE)
    set_target_properties(${module_name} PROPERTIES COMPILE_DEFINITIONS
                                                    "CYTHON_TRACE=1;CYTHON_TRACE_NOGIL=1")
  endif()

is working as intended when building PyArrow not add a different way of tackling coverage. If the intended way is not working we should remove it and then decide probably as a separate issue whether we want to add the ability to add coverage reports.

@omertt27 omertt27 force-pushed the GH-49397-pyarrow-generate-coverage branch from ffb0998 to 3c784ac Compare March 9, 2026 11:55
@omertt27
Copy link
Author

omertt27 commented Mar 9, 2026

Hi @raulcd,

Thanks for the feedback! I’ve updated the PR to focus only on validating and documenting the existing PYARROW_GENERATE_COVERAGE CMake option. The Python-side coverage hook changes have been reverted.

All three files (.coveragerc, development.rst, building.rst) now reflect the correct workflow and documentation.

Please let me know if this now addresses the original issue.

Thanks

@omertt27 omertt27 requested a review from raulcd March 9, 2026 12:02
@rok
Copy link
Member

rok commented Mar 9, 2026

@omertt27 did you confirm this runs locally? Would you recommend we add a CI check that coverage can be generated?

@omertt27
Copy link
Author

omertt27 commented Mar 9, 2026

hi @rok,

I haven’t been able to fully test this locally because of some issues on macOS with Anaconda, which caused a few build errors.

Adding a CI check to verify coverage generation could be helpful to ensure it works consistently across environments.

@rok
Copy link
Member

rok commented Mar 9, 2026

@omertt27 as per the original issue we'd probably want a regular CI check (@raulcd will know best). To overcome pyarrow build issues see docs here.

@omertt27
Copy link
Author

omertt27 commented Mar 9, 2026

@rok
Thanks for the pointer! I’ll look the recommended CI approach for verifying coverage generation. I’ll also check the PyArrow docs to work around local build issues.I am very appreciated your guidance, thanks!

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @omertt27 for the PR! Currently the PR is adding great documentation but the original issue requires us to validate whether the PYARROW_GENERATE_COVERAGE flag is really working, we are not sure whether it still works, and for that we have to build PyArrow with PYARROW_GENERATE_COVERAGE=ON or PYARROW_GENERATE_COVERAGE=1 and validate it works as expected.
Let us know if we can help with your build issues to help with that validation.

…ent usage

Validated that the existing CMake option PYARROW_GENERATE_COVERAGE works
as intended:

- The -Xlinetrace=True Cython flag correctly embeds __Pyx_TraceLine calls
  in the generated C++ code.
- The CYTHON_TRACE=1 and CYTHON_TRACE_NOGIL=1 compile definitions correctly
  activate those trace points at runtime via Python's sys.settrace hook.
- Running tests under 'coverage run' with the Cython.Coverage plugin
  (already configured in python/.coveragerc) correctly produces line-level
  coverage reports for .pyx source files.

Changes:
- python/.coveragerc: add [source], [branch], [report], and [html] sections
  so that 'coverage run' and 'coverage report' work out of the box without
  extra flags.
- docs/source/developers/python/development.rst: add a 'Code Coverage'
  sub-section under 'Unit Testing' documenting the full end-to-end workflow
  (build with PYARROW_GENERATE_COVERAGE=ON, install coverage+Cython, run
  tests under coverage.py). Uses scikit-build-core env-var syntax.
- docs/source/developers/python/building.rst: update the
  PYARROW_GENERATE_COVERAGE table entry with an accurate description of
  what the flag does and a cross-reference to the new usage section.
@omertt27 omertt27 force-pushed the GH-49397-pyarrow-generate-coverage branch from 3c784ac to e891d84 Compare March 10, 2026 11:59
@omertt27
Copy link
Author

@raulcd @rok — I've now fully validated the existing PYARROW_GENERATE_COVERAGE flag end-to-end on macOS (Python 3.12, clean venv, Arrow C++ built from source).Thanks for helping.

@raulcd
Copy link
Member

raulcd commented Mar 10, 2026

Can you share how was the validation? How do we know is working as expected? From what I understand you should see __Pyx_TraceLine on the generated code, can you share the diff between using it or not?

@omertt27
Copy link
Author

@raulcd ,I validated the mechanism using a minimal Cython module (test_mod.pyx) with just a single function:
def add(a, b):
return a + b
Step 1 – Generated C++ diff with vs. without -Xlinetrace=True
Without -Xlinetrace=True (the generated add() function body):
/* "test_mod.pyx":2

  • def add(a, b):
  • return a + b       
    

*/
__pyx_t_1 = PyNumber_Add(__pyx_v_a, __pyx_v_b);
With -Xlinetrace=True (what PYARROW_GENERATE_COVERAGE enables):
__Pyx_TraceFrameInit(...)
__Pyx_TraceStartFunc("add", __pyx_f[0], 1, ...);

/* "test_mod.pyx":2

  • def add(a, b):
  • return a + b             # <<<
    

*/
__Pyx_TraceLine(2,1,0,__PYX_ERR(0, 2, __pyx_L1_error)) // ← line trace hook
__pyx_t_1 = PyNumber_Add(__pyx_v_a, __pyx_v_b);
__Pyx_TraceReturnValue(...);

The __Pyx_TraceLine hooks are only present when -Xlinetrace=True is passed. They are activated at runtime only when CYTHON_TRACE=1 is compiled in AND coverage.py has installed a trace function via sys.settrace (which coverage run does automatically).

Step 2 – End-to-end coverage output
Building the .so with both -Xlinetrace=True AND -DCYTHON_TRACE=1;CYTHON_TRACE_NOGIL=1 (exactly what the CMake option sets), then running coverage run:
$ python -m coverage run --rcfile=.coveragerc run_test.py
$ python -m coverage report

Name Stmts Miss Cover Missing

run_test.py 3 0 100%
test_mod.pyx 2 0 100% ← .pyx file is covered

TOTAL 5 0 100%

the existing PYARROW_GENERATE_COVERAGE CMake option works as intended. The .coveragerc (which already had plugins = Cython.Coverage) is the only other piece needed on the Python side — no additional hooks are required.

@rok
Copy link
Member

rok commented Mar 10, 2026

@raulcd would we want to have this as a regular CI check?

@raulcd
Copy link
Member

raulcd commented Mar 10, 2026

I am unsure it seems we added coverage here:

And then disabled it here:

It seems the reason, based on the JIRA ticket was:

Computing and uploading code coverage data probably makes Travis-CI builds even slower. Since the resulting data isn't currently exploited (and furthermore partial builds can upload bogus data, e.g. showing some C++ uncovered if only R is rebuilt), we should just disable it.

From what I understand it seems we were trying to do a full Arrow coverage report (C++, Python, R).
I think a separate job only for coverage on the Python side could make sense.

@pitrou you worked on adding this coverage in the past, what are your thoughts on this?

This being:

have this as a regular CI check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Validate PYARROW_GENERATE_COVERAGE works as expected

3 participants