Add segger export for scverse-compatible output by EliHei2 · Pull Request #70 · dpeerlab/segger

EliHei2 · 2026-06-24T18:19:00Z

segger export writes a segger segmentation to plain files from which a SpatialData object can be assembled. Name which elements to write: anndata (the cell-by-gene table), transcripts (the assigned transcripts), or boundaries (one polygon per cell). With no element named it writes anndata and boundaries; pass transcripts to also write the per-transcript assignment.

The column names follow SOPA's conventions. anndata.h5ad and cell_boundaries.parquet share cell_id, the instance key SOPA uses to join a table to its shapes. transcripts.parquet keeps the segger assignment as segger_cell_id alongside row_index, a sibling column in the spirit of SOPA's sopa_prior, so it merges onto an existing transcripts dataframe by row_index without overwriting the vendor cell_id; its values match the cell_id in the other two files.

Boundaries are traced with --method (delaunay, a pruned Delaunay outline, or convex_hull) and Chaikin-smoothed unless --no-smooth-masks. The exported transcripts are selected with --include-all-transcripts, --min-similarity, and --min-transcripts, which default to the per-gene similarity threshold.

No new dependencies. Closes #67.

Tobiaspk · 2026-06-24T19:03:14Z

Looks good, thanks for putting this together so quickly.

I'd try to reuse anndata_from_transcripts as much as possible. It's very similar to build_anndata, and we can still add others features like region and area in build_anndata.
We currently default to Delaunay boundaries but anndata uses convex hulls. Should we maybe include area only when boundary is exported too and use that area directly?

`segger export` writes a segger segmentation to plain files from which a `SpatialData` object can be assembled. Name which elements to write: `anndata` (the cell-by-gene table), `transcripts` (the assigned transcripts), or `boundaries` (one polygon per cell). With no element named it writes `anndata` and `boundaries`; pass `transcripts` to also write the per-transcript assignment. The column names follow SOPA's conventions. `anndata.h5ad` and `cell_boundaries.parquet` share `cell_id`, the instance key SOPA uses to join a table to its shapes. `transcripts.parquet` keeps the segger assignment as `segger_cell_id` alongside `row_index`, a sibling column in the spirit of SOPA's `sopa_prior`, so it merges onto an existing transcripts dataframe by `row_index` without overwriting the vendor `cell_id`; its values match the `cell_id` in the other two files. Boundaries are traced with `--method` (`delaunay`, a pruned Delaunay outline, or `convex_hull`) and Chaikin-smoothed unless `--no-smooth-masks`. The exported transcripts are selected with `--include-all-transcripts`, `--min-similarity`, and `--min-transcripts`, which default to the per-gene similarity threshold. No new dependencies. Closes #67.

EliHei2 · 2026-06-25T15:37:08Z

Thanks! Both done in the latest push.
build_anndata now builds on anndata_from_transcripts and just adds region, the spatialdata_attrs link, n_transcripts, and area on top. To keep segger export usable without a GPU I moved the cupyx/cuml/phenograph_rapids imports in data/utils/anndata.py into setup_anndata, so anndata_from_transcripts imports fine on CPU. the training path is unchanged.

Good call on the area too. I dropped the convex-hull area and now write obs["area"] only when boundaries is exported, taking it straight from the polygon areas so it stays consistent with the shapes.

EliHei2 requested a review from Tobiaspk June 24, 2026 18:19

EliHei2 force-pushed the feat/export-sopa-mvp branch from 5d42532 to e3075b3 Compare June 25, 2026 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add segger export for scverse-compatible output#70

Add segger export for scverse-compatible output#70
EliHei2 wants to merge 1 commit into
mainfrom
feat/export-sopa-mvp

EliHei2 commented Jun 24, 2026

Uh oh!

Tobiaspk commented Jun 24, 2026

Uh oh!

EliHei2 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

EliHei2 commented Jun 24, 2026

Uh oh!

Tobiaspk commented Jun 24, 2026

Uh oh!

EliHei2 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants