Skip to content

feat: support natom padding in deepmd/npy/mixed format#932

Merged
wanghan-iapcm merged 2 commits intodeepmodeling:develfrom
wanghan-iapcm:feat-natom-padding
Feb 14, 2026
Merged

feat: support natom padding in deepmd/npy/mixed format#932
wanghan-iapcm merged 2 commits intodeepmodeling:develfrom
wanghan-iapcm:feat-natom-padding

Conversation

@wanghan-iapcm
Copy link
Contributor

@wanghan-iapcm wanghan-iapcm commented Feb 13, 2026

Summary by CodeRabbit

  • New Features

    • Optional atom-count padding for DeepMD mixed-format: systems can be padded to a specified multiple using virtual atoms (type -1); virtual atoms are stripped on load. Systems are now grouped by padded atom counts, which can reduce subdirectories and reorganize storage.
    • Exposed option to enable padding when creating mixed-format datasets.
  • Tests

    • Added tests covering padding, type-mapping, and per-atom data (fparam/aparam) preservation.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Feb 13, 2026
@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.37%. Comparing base (4ae8b2c) to head (281ec6a).
⚠️ Report is 1 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #932      +/-   ##
==========================================
+ Coverage   86.27%   86.37%   +0.09%     
==========================================
  Files          86       86              
  Lines        8032     8086      +54     
==========================================
+ Hits         6930     6984      +54     
  Misses       1102     1102              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dosubot dosubot bot added deepmd DeePMD-kit format enhancement New feature or request labels Feb 13, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 13, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing wanghan-iapcm:feat-natom-padding (281ec6a) with devel (4ae8b2c)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Adds padding support for DeepMD mixed-format: systems can be padded with virtual atoms (type -1) to a target atom count, grouped by padded size, then virtual atoms are stripped when assembling per-group data. Exposes atom_numb_pad through the plugin and adds tests for padding and per-atom data preservation.

Changes

Cohort / File(s) Summary
Core Padding/Grouping Implementation
dpdata/deepmd/mixed.py
Added _pad_to and _strip_virtual_atoms helpers; remapping now preserves virtual-atom sentinel (-1); to_system_data extracts homogeneous groups and strips virtual atoms; mix_system gains atom_numb_pad to optionally pad systems to a multiple and group by padded count; uses LabeledSystem.DTYPES for padding/stripping.
Plugin Interface
dpdata/plugins/deepmd.py
DeePMDMixedFormat.mix_system signature updated to accept atom_numb_pad=None and forwards it to dpdata.deepmd.mixed.mix_system; docstring updated to document padding behavior.
Tests
tests/test_deepmd_mixed.py
Added multiple tests covering padding round-trips, grouping by padded counts, custom type_map compatibility, and preservation of per-atom data (fparam/aparam) including checks for virtual-atom rows and coordinates/forces.

Sequence Diagram

sequenceDiagram
    participant User as User/API
    participant MixSystem as mix_system()
    participant PadTo as _pad_to()
    participant Grouping as Grouping Logic
    participant StripVirtual as _strip_virtual_atoms()
    participant Storage as Mixed Format Output

    User->>MixSystem: call mix_system(..., atom_numb_pad)
    MixSystem->>PadTo: pad each system to target natoms
    PadTo-->>MixSystem: padded system (virtual atoms type -1)
    MixSystem->>Grouping: group systems by padded count
    Grouping->>StripVirtual: for each group, strip virtual atoms
    StripVirtual-->>Grouping: per-group stripped data
    Grouping->>Storage: write grouped mixed-format subfolders by padded count
    Storage-->>User: on-disk mixed-format with grouped subdirectories
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

dpdata

Suggested reviewers

  • iProzd
  • njzjz
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: support natom padding in deepmd/npy/mixed format' accurately reflects the main change—adding natom padding support to the mixed format handler, which is the primary observable behavior change across all modified files.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into devel

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
tests/test_deepmd_mixed.py (1)

826-842: Duplicate DataType registration across test classes.

Both TestMixedSystemWithFparamAparam (line 487–489) and this class register the same fparam/aparam data types. If register_data_type isn't idempotent (or appends duplicates), this could cause subtle side effects when both test classes run in the same process.

Consider extracting the registration to a module-level fixture or setUpModule to avoid repeated registration.

#!/bin/bash
# Check if register_data_type is idempotent
ast-grep --pattern $'def register_data_type($_, $_) {
  $$$
}'

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dpdata/deepmd/mixed.py`:
- Around line 92-146: In _strip_virtual_atoms, remove the unused local variable
`reserved` (it is never referenced) and update the function docstring return
section to match the actual returns (the function returns three items:
atom_types, coords, and extra_data/stripped) — replace the current four-return
description that lists `real_mask` with a corrected three-value description and
use consistent names (`atom_types`, `coords`, `extra_data` or `stripped`) to
match the actual return tuple and the callers that unpack three values.
🧹 Nitpick comments (1)
tests/test_deepmd_mixed.py (1)

797-899: Consider setting a random seed for reproducibility.

The fparam and aparam arrays are generated with np.random.random (lines 838–846) without a fixed seed. While unlikely to cause issues in practice, pinning np.random.seed(...) at the top of setUp makes failures easier to reproduce.

@njzjz njzjz linked an issue Feb 13, 2026 that may be closed by this pull request
@wanghan-iapcm wanghan-iapcm requested a review from njzjz February 14, 2026 01:17
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 14, 2026
@wanghan-iapcm wanghan-iapcm merged commit 8cfafc8 into deepmodeling:devel Feb 14, 2026
12 checks passed
@wanghan-iapcm wanghan-iapcm deleted the feat-natom-padding branch February 14, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepmd DeePMD-kit format enhancement New feature or request lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto Padding for deepmd/npy in dpdata

2 participants