feat: add --precision fp16 to optimize, build, and export commands#872
feat: add --precision fp16 to optimize, build, and export commands#872DingmaomaoBJTU wants to merge 1 commit into
Conversation
8f5a1d2 to
9e7d8fd
Compare
9e7d8fd to
7d7a0ae
Compare
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| import onnx |
7d7a0ae to
328b5ab
Compare
| current_path = _run_fp16_stage( | ||
| model_path=current_path, | ||
| stage_timings=stage_timings, | ||
| ) |
There was a problem hiding this comment.
Pipeline order is inconsistent between the pytorch and ONNX build paths.
In _build_pytorch_pipeline (here) FP16 runs between Export and Optimize, but elsewhere FP16 runs after Optimize:
_build_onnx_pipeline(line ~1547):Optimize → FP16 → Quantize- Standalone
winml optimize --precision fp16(optimize.py:435): converts to FP16 afteroptimizer.optimize(...)returns - The PR description's mermaid diagram:
Export → Optimize → FP16 → Quantize → Compile
Practical impact: winml build -m hf-model --precision fp16 will hand an FP16 graph to the optimizer, while winml build -c cfg.json -m model.onnx --precision fp16 runs optimize on FP32 first. Two different graphs out for the same logical request, depending on input format.
Suggest moving the FP16 stage to run after _run_optimize_stage so both pipelines match the documented order.
🤖 Generated with GitHub Copilot CLI
| elapsed = time.monotonic() - t0 | ||
| sl.set_done(elapsed) | ||
| sl.detail("[dim]I/O types preserved as FP32[/dim]") | ||
| sl.artifact(str(model_path), 0) |
There was a problem hiding this comment.
sl.artifact(str(model_path), 0) hardcodes the size to 0. Every other stage in this file passes _safe_size(...) (see lines 1121, 1249, 1322, 1421), so the FP16 stage will render as 0 B in the stage summary. Should be:
sl.artifact(str(model_path), _safe_size(model_path))🤖 Generated with GitHub Copilot CLI
| import onnx | ||
| from onnx import TensorProto, numpy_helper | ||
|
|
||
| from winml.modelkit.optim.fp16 import convert_to_fp16 |
There was a problem hiding this comment.
Per CLAUDE.md, test code must use absolute imports at the package level, not reach into internal submodules for non-_-prefixed symbols. This line imports from the deep submodule winml.modelkit.optim.fp16, but convert_to_fp16 is a new public function used by three commands (build, optimize, export).
Suggest exporting it from src/winml/modelkit/optim/__init__.py (add to both the imports and __all__) and changing this import to:
from winml.modelkit.optim import convert_to_fp16🤖 Generated with GitHub Copilot CLI
328b5ab to
b859627
Compare
| from __future__ import annotations | ||
|
|
||
| import numpy as np | ||
| import onnx |
b859627 to
837330d
Compare
Add FP16 precision conversion support across all model pipeline commands: - Create optim/fp16.py with convert_to_fp16() utility (wraps ORT float16) - optimize: --precision fp16 with --fp16-keep-io-types and --fp16-op-block-list - build: --precision fp16 stage between optimize and quantize - export: --precision fp16 as post-export conversion - Add shared precision_option() CLI decorator in utils/cli.py Design: FP16 is a precision transformation (not a graph optimization), so it lives as a command-layer utility rather than an optimizer pipe. All three commands share the same convert_to_fp16() function. Fixes #867
837330d to
fede96c
Compare
Summary
Add
--precision fp16flag towinml optimize,winml build, andwinml exportcommands for FP32→FP16 model conversion.Fixes #867
Design
FP16 lives as a command-layer utility function (
optim/fp16.py), not in the optimizer pipe registry. All three commands share the sameconvert_to_fp16()entry point:graph TD subgraph "CLI Commands" subgraph optimize O1[Optimizer.optimize] --> O2[convert_to_fp16] end subgraph build B1[Export] --> B2[Optimize] --> B3[FP16] --> B4[Quantize] --> B5[Compile] end subgraph export E1[torch.onnx.export] --> E2[convert_to_fp16] end end O2 --> FP16 B3 --> FP16 E2 --> FP16 FP16["optim/fp16.py<br/><code>convert_to_fp16(model, keep_io_types, op_block_list)</code>"] style FP16 fill:#e1f5fe,stroke:#0288d1 style B3 fill:#e1f5fe,stroke:#0288d1Key decisions
op_block_list=Nonepreserves ORT defaults — ORT'sDEFAULT_OP_BLOCK_LISTcontains 24 ops known to be numerically unsafe in FP16 (TopK, CumSum, NonMaxSuppression, etc.). Passing[]would bypass this safety net.--fp16-keep-io-typesdefaults to True — model I/O stays FP32 by inserting Cast nodes at boundaries. This ensures compatibility with inference runtimes that feed float32 tensors.len(model.graph.node)before calling it.CLI Flags
optimize--precision fp16,--fp16-keep-io-types/--no-fp16-keep-io-types,--fp16-op-block-listbuild--precision fp16export--precision fp16Sample Usage & Output
Basic FP16 optimization
Verbose output with logging
FP16 without preserving I/O types
Mixed precision (block specific ops)
$ winml optimize -m model.onnx -o model_fp16.onnx --precision fp16 \ --fp16-op-block-list LayerNorm,SoftmaxBuild pipeline with FP16
Export with FP16
$ winml export microsoft/resnet-50 --precision fp16Files Changed
src/winml/modelkit/optim/fp16.pyconvert_to_fp16()utilitysrc/winml/modelkit/commands/optimize.py--precision fp16+ fine-control flagssrc/winml/modelkit/commands/build.py--precision fp16stagesrc/winml/modelkit/commands/export.py--precision fp16post-exportsrc/winml/modelkit/utils/cli.pyprecision_option()decoratortests/unit/optim/test_fp16.py