Bug Fix for parallel expert encoder for SALM automodel by tango4j · Pull Request #15814 · NVIDIA-NeMo/NeMo

tango4j · 2026-06-18T21:48:53Z

Important

The Update branch button must only be pressed in very rare occasions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do?

Lets the ParallelExpertEncoder (PE) bundle load from a HuggingFace/NGC model card
(not just a local .nemo), relaxes multispeaker-ASR RTTM parsing so 9-column
RTTM files are accepted, and hardens the new RTTM parser with explicit validation
and focused unit coverage.

Collection: ASR (with a small touch in SpeechLM2)

Changelog

nemo/collections/asr/modules/parallel_expert_encoder.py
- ParallelExpertEncoderPT.load_from_nemo now follows the standard NeMo
  Model checkpoint-resolution convention: a local .nemo path is restored
  via ModelPT.restore_from, otherwise the argument is treated as a pretrained
  model id (HuggingFace Hub {repo}/{name} or NGC alias) and resolved via
  Model.from_pretrained (which downloads/caches the .nemo and honours the
  HuggingFace cache + HF_HUB_OFFLINE, so a prefetched cache works on offline
  cluster nodes).
- Renamed the parameter nemo_path to model_path_or_name to reflect that it
  accepts either a local path or a model id; updated the docstring.
nemo/collections/speechlm2/parts/pretrained.py
- setup_parallel_expert_encoder no longer requires model.pe_encoder_path to
  end in .nemo; it accepts any non-empty string (local .nemo path or
  pretrained model id), with a clearer error message.
- The strict "is this a PE .nemo bundle?" pre-check (is_pe_nemo) now runs
  only for an actual local .nemo file. For HuggingFace/NGC ids the bundle is
  resolved and validated downstream by ParallelExpertEncoderPT.load_from_nemo
  (from_pretrained -> restore_from, which checks the bundle target class).
nemo/collections/asr/parts/utils/asr_multispeaker_utils.py
- Added read_rttm_supervisions_lenient(): a faithful copy of
  lhotse.SupervisionSet.from_rttm (same columns: recording_id=1, channel=2,
  start=3, duration=4, speaker=7; skips zero-duration segments) that relaxes
  the strict len(parts) == 10 check to len(parts) >= 8.
- Replaced the parser's assert validation with an explicit ValueError so
  malformed RTTM lines are rejected even when Python runs with -O / -OO.
- speaker_to_target now calls read_rttm_supervisions_lenient instead of
  SupervisionSet.from_rttm when a cut carries an rttm_filepath.
tests/collections/asr/utils/test_asr_multispeaker_utils.py
- Added unit coverage for 9-column RTTM parsing, 10-column RTTM parsing,
  malformed short RTTM lines, blank/zero-duration line handling, multiple RTTM
  file input, and selected helper utilities.
nemo/collections/asr/parts/utils/sot_speaker_alignment.py
- Keeps speaker-activity collation robust when per-example targets have fewer
  or more speaker columns than the configured target speaker count.
tests/collections/asr/utils/test_sot_speaker_alignment.py
- Covers SOT speaker-token parsing and speaker-activity alignment behavior.

Why

PE encoder loading: PE bundles are distributed as HuggingFace model cards
(e.g. nvidia/... / taejinp/...). The previous
setup_parallel_expert_encoder hard-required a local .nemo path and
rejected model ids with
model.pe_encoder_path must point to a ParallelExpertEncoderPT .nemo bundle.
Aligning with the standard restore_from / from_pretrained convention lets a
recipe set model.pe_encoder_path to either a local file or a model card, and
works offline once the card is prefetched into the HuggingFace cache.
RTTM reading: The nemoSOT multispeaker data ships RTTMs with 9 columns
(the trailing Signal Lookahead Time, specified as always <NA> and never
read, is dropped). lhotse's from_rttm asserts exactly 10 fields, so a single
dataloader worker raised AssertionError: Invalid RTTM line ... and crashed
multispeaker-ASR training. The parser only uses columns 1,2,3,4,7, so accepting
>= 8 fields is sufficient and produces identical output for valid 10-field
RTTMs.
Parser validation: RTTM files are external user/data-pipeline input, so
validation must not rely on assert, which can be stripped in optimized Python
mode.

Usage

# 1) Load a ParallelExpertEncoder from a HuggingFace model card or a local .nemo
from nemo.collections.asr.modules.parallel_expert_encoder import ParallelExpertEncoderPT

# HuggingFace Hub id (downloaded/cached; works offline once prefetched):
enc = ParallelExpertEncoderPT.load_from_nemo("nvidia/phPEE-canary-enc-1b-v2-sortformer-v2.1")

# ...or a local bundle:
enc = ParallelExpertEncoderPT.load_from_nemo("/path/to/pe_encoder.nemo")

# In a SpeechLM2 recipe this is driven by:
#   model.pe_encoder_path: nvidia/phPEE-canary-enc-1b-v2-sortformer-v2.1
# or:
#   model.pe_encoder_path: /path/to/pe_encoder.nemo

# 2) Read RTTMs tolerantly (accepts 9- or 10-column lines)
from nemo.collections.asr.parts.utils.asr_multispeaker_utils import read_rttm_supervisions_lenient

sup = read_rttm_supervisions_lenient("/path/to/utt.rttm")

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added
to the PR. To re-run CI remove and add the label again. To run CI on an untrusted
fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines
contains specific people who can review PRs to various areas.

Additional Information

Both changes are backward compatible: a local .nemo pe_encoder_path behaves
exactly as before, and 10-field RTTMs parse identically to from_rttm.
The RTTM parser now has explicit tests for the 9-column data case and for the
malformed-input path requested during review.
Related to # (issue)

Signed-off-by: Taejin Park <tango4j@gmail.com>

copy-pr-bot · 2026-06-18T21:48:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Taejin Park <tango4j@gmail.com>

tango4j · 2026-06-18T22:16:27Z

/ok to test 4029bf9

tango4j · 2026-06-19T18:16:38Z

/ok to test f4c04a2

github-actions · 2026-06-19T19:34:46Z

[🤖]: Hi @tango4j 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

tango4j · 2026-06-19T19:44:28Z

@pzelasko I still need to add fix for slow data loader so please do not merge this yet. I will notify you when I finish fixing it. Thanks.

Signed-off-by: Taejin Park <tango4j@gmail.com>

tango4j · 2026-06-20T02:40:29Z

/ok to test e27fc16

tango4j · 2026-06-20T02:43:58Z

@pzelasko There was a critical bug in "speaker_targets" tensor collate function. It was crashing whenever there were 5,6 speakers where max speaker is set 4 speaker. There were 0.001% cases (few dozen utts) in the training datasets. Faulty 5+ speaker datapoints are removed from the cluster, and the code itself was changed to not crash even if it happens. Also added unit tests for this.

Now this PR is ready for review and merge.

tango4j · 2026-06-20T14:48:38Z

/ok to test e27fc16

Signed-off-by: Taejin Park <tango4j@gmail.com>

tango4j · 2026-06-21T16:13:31Z

/ok to test f7ae8c8

tango4j added 18 commits June 9, 2026 16:47

Adding a draft version of PEE framework for SALM

a20367b

Signed-off-by: Taejin Park <tango4j@gmail.com>

Removing wrong inits

9b637ba

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed no_rttms_to_one

35d2a4a

Signed-off-by: Taejin Park <tango4j@gmail.com>

Resolving lint issues

edced12

Signed-off-by: Taejin Park <tango4j@gmail.com>

CodeQL and lint errors

694f348

Signed-off-by: Taejin Park <tango4j@gmail.com>

reverted unintended changes to main status

f4b2c74

Signed-off-by: Taejin Park <tango4j@gmail.com>

Fixed wrong test results

2d285fb

Signed-off-by: Taejin Park <tango4j@gmail.com>

refactored and created salm_automodel_pee tests

f870a4a

Signed-off-by: Taejin Park <tango4j@gmail.com>

Merge branch 'main' into pe_encs_pr1

33e6bfd

Fixed the unused import

c68d9ea

Signed-off-by: Taejin Park <tango4j@gmail.com>

Reflected reviews. Implemented encoder chunking with GT spk_targets.

6a85f16

Signed-off-by: Taejin Park <tango4j@gmail.com>

Merge branch 'main' into pe_encs_pr1

d07f0fc

Fixed isort and black issues

4767088

Signed-off-by: Taejin Park <tango4j@gmail.com>

Fixed codeQL issues

5495d3a

Signed-off-by: Taejin Park <tango4j@gmail.com>

Reflected the second review comments

839b364

Signed-off-by: Taejin Park <tango4j@gmail.com>

Merge branch 'NVIDIA-NeMo:main' into pe_encs_pr1

ccab35b

Adding HF model card loading feature for PEE

424ec29

Signed-off-by: Taejin Park <tango4j@gmail.com>

Fix overly strict RTTM readling for Multispeaker ASR dataloading

1ea312a

Signed-off-by: Taejin Park <tango4j@gmail.com>

github-actions Bot added the ASR label Jun 18, 2026

tango4j requested a review from pzelasko June 18, 2026 21:49

tango4j marked this pull request as ready for review June 18, 2026 21:49

fixed isort and black errors

4029bf9

Signed-off-by: Taejin Park <tango4j@gmail.com>

copy-pr-bot Bot temporarily deployed to public June 18, 2026 22:17 Inactive

copy-pr-bot Bot temporarily deployed to test June 18, 2026 22:18 Inactive

copy-pr-bot Bot temporarily deployed to public June 18, 2026 22:20 Inactive

copy-pr-bot Bot temporarily deployed to public June 18, 2026 22:21 Inactive

copy-pr-bot Bot temporarily deployed to public June 18, 2026 22:45 Inactive

copy-pr-bot Bot temporarily deployed to public June 19, 2026 18:41 Inactive

copy-pr-bot Bot temporarily deployed to test June 19, 2026 18:42 Inactive

copy-pr-bot Bot temporarily deployed to public June 19, 2026 18:45 Inactive

copy-pr-bot Bot temporarily deployed to public June 19, 2026 19:09 Inactive

tango4j added 3 commits June 19, 2026 13:52

Merge branch 'NVIDIA-NeMo:main' into pe_encs_pr1

36e645c

Fixed speaker capping dim and diar model freezing

090cced

Signed-off-by: Taejin Park <tango4j@gmail.com>

Linting error fixed

e27fc16

Signed-off-by: Taejin Park <tango4j@gmail.com>

copy-pr-bot Bot temporarily deployed to public June 20, 2026 02:41 Inactive

copy-pr-bot Bot temporarily deployed to test June 20, 2026 02:42 Inactive

copy-pr-bot Bot temporarily deployed to public June 20, 2026 02:44 Inactive

copy-pr-bot Bot temporarily deployed to public June 20, 2026 02:45 Inactive

copy-pr-bot Bot temporarily deployed to public June 20, 2026 03:09 Inactive

tango4j added 2 commits June 21, 2026 08:50

Added HF model card support for PEE

7542213

Signed-off-by: Taejin Park <tango4j@gmail.com>

Fixing isort and black

f7ae8c8

Signed-off-by: Taejin Park <tango4j@gmail.com>

copy-pr-bot Bot temporarily deployed to public June 21, 2026 16:14 Inactive

copy-pr-bot Bot deployed to test June 21, 2026 16:15 Active

copy-pr-bot Bot temporarily deployed to public June 21, 2026 16:17 Inactive

copy-pr-bot Bot temporarily deployed to public June 21, 2026 16:18 Inactive

copy-pr-bot Bot temporarily deployed to public June 21, 2026 16:42 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix for parallel expert encoder for SALM automodel #15814

Bug Fix for parallel expert encoder for SALM automodel #15814
tango4j wants to merge 25 commits into
NVIDIA-NeMo:mainfrom
tango4j:pe_encs_pr1

tango4j commented Jun 18, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

tango4j commented Jun 18, 2026

Uh oh!

tango4j commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

tango4j commented Jun 19, 2026

Uh oh!

tango4j commented Jun 20, 2026

Uh oh!

tango4j commented Jun 20, 2026 •

edited

Loading

Uh oh!

tango4j commented Jun 20, 2026

Uh oh!

tango4j commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tango4j commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changelog

Why

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

tango4j commented Jun 18, 2026

Uh oh!

tango4j commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

tango4j commented Jun 19, 2026

Uh oh!

tango4j commented Jun 20, 2026

Uh oh!

tango4j commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tango4j commented Jun 20, 2026

Uh oh!

tango4j commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tango4j commented Jun 18, 2026 •

edited

Loading

tango4j commented Jun 20, 2026 •

edited

Loading