feat: add --precision fp16 to optimize, build, and export commands by DingmaomaoBJTU · Pull Request #872 · microsoft/winml-cli

DingmaomaoBJTU · 2026-06-11T03:04:44Z

Summary

Add --precision fp16 flag to winml optimize, winml build, and winml export commands for FP32→FP16 model conversion.

Fixes #867

Design

FP16 lives as a command-layer utility function (optim/fp16.py), not in the optimizer pipe registry. All three commands share the same convert_to_fp16() entry point:

graph TD
    subgraph "CLI Commands"
        subgraph optimize
            O1[Optimizer.optimize] --> O2[convert_to_fp16]
        end
        subgraph build
            B1[Export] --> B2[Optimize] --> B3[FP16] --> B4[Quantize] --> B5[Compile]
        end
        subgraph export
            E1[torch.onnx.export] --> E2[convert_to_fp16]
        end
    end

    O2 --> FP16
    B3 --> FP16
    E2 --> FP16

    FP16["optim/fp16.py<br/><code>convert_to_fp16(model, keep_io_types, op_block_list)</code>"]

    style FP16 fill:#e1f5fe,stroke:#0288d1
    style B3 fill:#e1f5fe,stroke:#0288d1

Key decisions

op_block_list=None preserves ORT defaults — ORT's DEFAULT_OP_BLOCK_LIST contains 24 ops known to be numerically unsafe in FP16 (TopK, CumSum, NonMaxSuppression, etc.). Passing [] would bypass this safety net.
--fp16-keep-io-types defaults to True — model I/O stays FP32 by inserting Cast nodes at boundaries. This ensures compatibility with inference runtimes that feed float32 tensors.
Node count logged before in-place mutation — ORT's converter mutates the model in-place, so we capture len(model.graph.node) before calling it.
Already-FP16 skip — if all floating-point initializers are already FLOAT16, conversion is skipped with a log message.

CLI Flags

Command	Flags
`optimize`	`--precision fp16`, `--fp16-keep-io-types` / `--no-fp16-keep-io-types`, `--fp16-op-block-list`
`build`	`--precision fp16`
`export`	`--precision fp16`

Sample Usage & Output

Basic FP16 optimization

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16

Input: model.onnx
Output: model_fp16.onnx
Loading model...
Running optimizer...
Converting to FP16...
Saving optimized model...
Success! Model optimized: model_fp16.onnx
Nodes: 2 -> 4 (-100.0% reduction)

Node count increases because Cast nodes are inserted at I/O boundaries when --fp16-keep-io-types is enabled (default).

Verbose output with logging

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 -v

Input: model.onnx
Output: model_fp16.onnx
Loading model...
Running optimizer...
[INFO winml.modelkit.optim.optimizer] Running shape inference (pre-stage)...
[INFO winml.modelkit.optim.optimizer] ✔ Shape inference (pre-stage) completed in 4.25s
[INFO winml.modelkit.optim.optimizer] Starting optimization pipeline (4 pipes)...
[INFO winml.modelkit.optim.optimizer] ⚙ Executing ort_graph...
[INFO winml.modelkit.optim.optimizer] ✔ ort_graph completed in 0.03s
[INFO winml.modelkit.optim.optimizer] ⏩ Skipping rewrite (no capabilities enabled)
[INFO winml.modelkit.optim.optimizer] ⏩ Skipping ort_fusion (no capabilities enabled)
[INFO winml.modelkit.optim.optimizer] ⏩ Skipping surgery (no capabilities enabled)
[INFO winml.modelkit.optim.optimizer] Running shape inference...
[INFO winml.modelkit.optim.optimizer] ✔ Shape inference completed in 0.00s
Converting to FP16...
[INFO winml.modelkit.optim.fp16] Converting model to FP16...
[INFO winml.modelkit.optim.fp16]   Keeping I/O types as FP32
[INFO winml.modelkit.optim.fp16] FP16 conversion complete: 2 -> 4 nodes
Saving optimized model...
Success! Model optimized: model_fp16.onnx
Nodes: 2 -> 4 (-100.0% reduction)

FP16 without preserving I/O types

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 --no-fp16-keep-io-types

Input: model.onnx
Output: model_fp16.onnx
Loading model...
Running optimizer...
Converting to FP16...
Saving optimized model...
Success! Model optimized: model_fp16.onnx
Nodes: 2 -> 2 (0.0% reduction)

No Cast nodes inserted — model I/O uses FP16 directly. Node count stays the same.

Mixed precision (block specific ops)

$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 \
    --fp16-op-block-list LayerNorm,Softmax

Build pipeline with FP16

$ winml build microsoft/resnet-50 --precision fp16

Export with FP16

$ winml export microsoft/resnet-50 --precision fp16

Files Changed

File	Change
`src/winml/modelkit/optim/fp16.py`	NEW — `convert_to_fp16()` utility
`src/winml/modelkit/commands/optimize.py`	Added `--precision fp16` + fine-control flags
`src/winml/modelkit/commands/build.py`	Added `--precision fp16` stage
`src/winml/modelkit/commands/export.py`	Added `--precision fp16` post-export
`src/winml/modelkit/utils/cli.py`	Shared `precision_option()` decorator
`tests/unit/optim/test_fp16.py`	NEW — 7 unit tests

+
+
+if TYPE_CHECKING:
+    import onnx


timenick

Three findings on PR #872.

🤖 Generated with GitHub Copilot CLI

timenick · 2026-06-11T04:33:32Z

+        current_path = _run_fp16_stage(
+            model_path=current_path,
+            stage_timings=stage_timings,
+        )


Pipeline order is inconsistent between the pytorch and ONNX build paths.

In _build_pytorch_pipeline (here) FP16 runs between Export and Optimize, but elsewhere FP16 runs after Optimize:

_build_onnx_pipeline (line ~1547): Optimize → FP16 → Quantize

Standalone winml optimize --precision fp16 (optimize.py:435): converts to FP16 after optimizer.optimize(...) returns

The PR description's mermaid diagram: Export → Optimize → FP16 → Quantize → Compile

Practical impact: winml build -m hf-model --precision fp16 will hand an FP16 graph to the optimizer, while winml build -c cfg.json -m model.onnx --precision fp16 runs optimize on FP32 first. Two different graphs out for the same logical request, depending on input format.

Suggest moving the FP16 stage to run after _run_optimize_stage so both pipelines match the documented order.

🤖 Generated with GitHub Copilot CLI

timenick · 2026-06-11T04:33:32Z

+        elapsed = time.monotonic() - t0
+        sl.set_done(elapsed)
+        sl.detail("[dim]I/O types preserved as FP32[/dim]")
+        sl.artifact(str(model_path), 0)


sl.artifact(str(model_path), 0) hardcodes the size to 0. Every other stage in this file passes _safe_size(...) (see lines 1121, 1249, 1322, 1421), so the FP16 stage will render as 0 B in the stage summary. Should be:

sl.artifact(str(model_path), _safe_size(model_path))

🤖 Generated with GitHub Copilot CLI

timenick · 2026-06-11T04:33:32Z

+import onnx
+from onnx import TensorProto, numpy_helper
+
+from winml.modelkit.optim.fp16 import convert_to_fp16


Per CLAUDE.md, test code must use absolute imports at the package level, not reach into internal submodules for non-_-prefixed symbols. This line imports from the deep submodule winml.modelkit.optim.fp16, but convert_to_fp16 is a new public function used by three commands (build, optimize, export).

Suggest exporting it from src/winml/modelkit/optim/__init__.py (add to both the imports and __all__) and changing this import to:

from winml.modelkit.optim import convert_to_fp16

🤖 Generated with GitHub Copilot CLI

+from __future__ import annotations
+
+import numpy as np
+import onnx


Add FP16 precision conversion support across all model pipeline commands: - Create optim/fp16.py with convert_to_fp16() utility (wraps ORT float16) - optimize: --precision fp16 with --fp16-keep-io-types and --fp16-op-block-list - build: --precision fp16 stage between optimize and quantize - export: --precision fp16 as post-export conversion - Add shared precision_option() CLI decorator in utils/cli.py Design: FP16 is a precision transformation (not a graph optimization), so it lives as a command-layer utility rather than an optimizer pipe. All three commands share the same convert_to_fp16() function. Fixes #867

DingmaomaoBJTU requested a review from a team as a code owner June 11, 2026 03:04

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread tests/unit/optim/pipes/test_pipe_fp16.py Fixed

DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 8f5a1d2 to 9e7d8fd Compare June 11, 2026 04:15

DingmaomaoBJTU changed the title ~~feat: add --enable-fp16-conversion to winml optimize~~ feat: add --precision fp16 to optimize, build, and export commands Jun 11, 2026

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread tests/unit/optim/test_fp16.py Fixed

DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 9e7d8fd to 7d7a0ae Compare June 11, 2026 04:22

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread src/winml/modelkit/optim/fp16.py

if TYPE_CHECKING:

import onnx

DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 7d7a0ae to 328b5ab Compare June 11, 2026 04:32

timenick reviewed Jun 11, 2026

View reviewed changes

DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 328b5ab to b859627 Compare June 11, 2026 05:26

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread tests/unit/optim/test_fp16.py

from __future__ import annotations

import numpy as np

import onnx

DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from b859627 to 837330d Compare June 11, 2026 06:57

DingmaomaoBJTU force-pushed the dingmaomaobjtu/feat-fp16-conversion branch from 837330d to fede96c Compare June 11, 2026 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --precision fp16 to optimize, build, and export commands#872

feat: add --precision fp16 to optimize, build, and export commands#872
DingmaomaoBJTU wants to merge 1 commit into
mainfrom
dingmaomaobjtu/feat-fp16-conversion

DingmaomaoBJTU commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

timenick left a comment

Uh oh!

timenick Jun 11, 2026

Uh oh!

timenick Jun 11, 2026

Uh oh!

timenick Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DingmaomaoBJTU commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Key decisions

CLI Flags

Sample Usage & Output

Basic FP16 optimization

Verbose output with logging

FP16 without preserving I/O types

Mixed precision (block specific ops)

Build pipeline with FP16

Export with FP16

Files Changed

Uh oh!

Uh oh!

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

timenick Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

timenick Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

timenick Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DingmaomaoBJTU commented Jun 11, 2026 •

edited

Loading