Skip to content

Add segger export for scverse-compatible output#70

Open
EliHei2 wants to merge 1 commit into
mainfrom
feat/export-sopa-mvp
Open

Add segger export for scverse-compatible output#70
EliHei2 wants to merge 1 commit into
mainfrom
feat/export-sopa-mvp

Conversation

@EliHei2

@EliHei2 EliHei2 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

segger export writes a segger segmentation to plain files from which a SpatialData object can be assembled. Name which elements to write: anndata (the cell-by-gene table), transcripts (the assigned transcripts), or boundaries (one polygon per cell). With no element named it writes anndata and boundaries; pass transcripts to also write the per-transcript assignment.

The column names follow SOPA's conventions. anndata.h5ad and cell_boundaries.parquet share cell_id, the instance key SOPA uses to join a table to its shapes. transcripts.parquet keeps the segger assignment as segger_cell_id alongside row_index, a sibling column in the spirit of SOPA's sopa_prior, so it merges onto an existing transcripts dataframe by row_index without overwriting the vendor cell_id; its values match the cell_id in the other two files.

Boundaries are traced with --method (delaunay, a pruned Delaunay outline, or convex_hull) and Chaikin-smoothed unless --no-smooth-masks. The exported transcripts are selected with --include-all-transcripts, --min-similarity, and --min-transcripts, which default to the per-gene similarity threshold.

No new dependencies. Closes #67.

@EliHei2 EliHei2 requested a review from Tobiaspk June 24, 2026 18:19
@Tobiaspk

Copy link
Copy Markdown
Collaborator

Looks good, thanks for putting this together so quickly.

  • I'd try to reuse anndata_from_transcripts as much as possible. It's very similar to build_anndata, and we can still add others features like region and area in build_anndata.
  • We currently default to Delaunay boundaries but anndata uses convex hulls. Should we maybe include area only when boundary is exported too and use that area directly?

`segger export` writes a segger segmentation to plain files from which a
`SpatialData` object can be assembled. Name which elements to write: `anndata`
(the cell-by-gene table), `transcripts` (the assigned transcripts), or
`boundaries` (one polygon per cell). With no element named it writes `anndata`
and `boundaries`; pass `transcripts` to also write the per-transcript
assignment.

The column names follow SOPA's conventions. `anndata.h5ad` and
`cell_boundaries.parquet` share `cell_id`, the instance key SOPA uses to join a
table to its shapes. `transcripts.parquet` keeps the segger assignment as
`segger_cell_id` alongside `row_index`, a sibling column in the spirit of
SOPA's `sopa_prior`, so it merges onto an existing transcripts dataframe by
`row_index` without overwriting the vendor `cell_id`; its values match the
`cell_id` in the other two files.

Boundaries are traced with `--method` (`delaunay`, a pruned Delaunay outline,
or `convex_hull`) and Chaikin-smoothed unless `--no-smooth-masks`. The exported
transcripts are selected with `--include-all-transcripts`, `--min-similarity`,
and `--min-transcripts`, which default to the per-gene similarity threshold.

No new dependencies. Closes #67.
@EliHei2 EliHei2 force-pushed the feat/export-sopa-mvp branch from 5d42532 to e3075b3 Compare June 25, 2026 15:29
@EliHei2

EliHei2 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks! Both done in the latest push.
build_anndata now builds on anndata_from_transcripts and just adds region, the spatialdata_attrs link, n_transcripts, and area on top. To keep segger export usable without a GPU I moved the cupyx/cuml/phenograph_rapids imports in data/utils/anndata.py into setup_anndata, so anndata_from_transcripts imports fine on CPU. the training path is unchanged.

Good call on the area too. I dropped the convex-hull area and now write obs["area"] only when boundaries is exported, taking it straight from the polygon areas so it stays consistent with the shapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save segger segmentation as a SOPA-compatible SpatialData object

2 participants