[onnx_importer] Fix bug with path-based shape inference when `--temp-dir` is set and `--data-dir` is not #4375

tkocmathla · 2025-11-12T15:48:07Z

zjgarvey

Nice! Thanks for finding this. It does seem strange that we would be checking for an explicit args.temp_dir for the external data.

I have a few comments. Most importantly, I think we should find a way to test without needing to generate a 2GB random tensor.

zjgarvey · 2025-12-10T20:26:21Z

test/python/onnx_importer/command_line_test.py

+        mlir_file = run_path / f"{model_name}-explicit_temp_implicit_data.torch.mlir"
+        onnx.save(onnx_model, model_file)
+        temp_dir = run_path / "temp"
+        temp_dir.mkdir(exist_ok=True)


May be good to use tempfile.TemporaryDirectory in case there is an exception thrown before the temp files are cleaned up by the importer.

zjgarvey · 2025-12-10T20:27:50Z

test/python/onnx_importer/command_line_test.py

+    byte_size = numpy.dtype(dtype).itemsize
+    tensor_size = onnx.checker.MAXIMUM_PROTOBUF // byte_size + 1
+    large_tensor = numpy.random.rand(tensor_size).astype(dtype)
+    assert large_tensor.nbytes > onnx.checker.MAXIMUM_PROTOBUF


It does seem rather unfortunate that we need to make the CI create a 2GB tensor just to test this.

How long does this test take to run? If it takes more than a minute, I think we should refactor the load_onnx_model code so we can test the file-based shape inference directly.

Maybe pulling out the file based shape inference into a utility is just a good idea in general. https://github.com/llvm/torch-mlir/pull/4375/files#r2608194510

Alternatively, we could try to mirror what is currently being done in IREE:

https://github.com/iree-org/iree/blob/d0dd8893758dcf40558a57916b869ba0babc0a95/compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py#L86

Ignore the params (this is primarily the reason the importer code has diverged in IREE, but there are some improvements which haven't found their way back to torch-mlir yet).

I agree this isn't a great unit test and it does take too long to reasonably merge it. The point of it was to test the exact bug in this issue though, and not file-based shape inference in general. I'm definitely happy to hoist that code into a separate function but I don't follow how that makes this bug easier to test. For this bug, I need a model that fails in-memory shape inference but passes file-based shape inference.

One other way to do this is to monkey-patch the onnx.checker.MAXIMUM_PROTOBUF constant in the test to a small value with unittest.mock.patch. What do you think of that?

That makes sense. We do want to test the tool e2e to catch these issues.

A monkeypatch would be great if it works. Thanks for working with me on the test size.

zjgarvey · 2025-12-10T20:44:44Z

python/torch_mlir/tools/import_onnx/__main__.py

    # Load the temp file and the external data.
    inferred_model = onnx.load(temp_inferred_file, load_external_data=False)
-    data_dir = Path(input_dir if args.temp_dir is None else args.data_dir)
+    data_dir = Path(input_dir if args.data_dir is None else args.data_dir)


I think this needs to be moved up above the condition on line 127 (if raw_model_modified), since the external data may end up being organized differently and this gets saved to the temp_dir.

E.g.,

data_dir = Path(args.data_dir or input_dir) if raw_model_modified: ... data_dir = Path(temp_dir) ...

Great catch! I'll update this, and add a unit test for the raw_model_modified case.

zjgarvey · 2025-12-10T20:59:54Z

python/torch_mlir/tools/import_onnx/__main__.py

Let's hoist this into a util function so we can test in isolation without needing to generate a 2GB tensor.

E.g.,

def _file_based_shape_infer( model_path : Path, data_dir : Path, temp_dir : Path, ) -> onnx.ModelProto: """docstring""" ...

Re: the other comment. It may be good to do this still, but I agree that the monkeypatch is still a better way to test the actual issue you are addressing. Feel free to disregard this if you don't think it's worth the time now.

tkocmathla added 3 commits November 12, 2025 07:42

Add unit test covering issue.

4027976

Format

5009096

Check data_dir, not temp_dir

42ac53f

sjarus requested a review from zjgarvey December 9, 2025 17:47

Fix test

5f2d9f6

zjgarvey requested changes Dec 10, 2025

View reviewed changes

tkocmathla requested a review from zjgarvey December 15, 2025 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[onnx_importer] Fix bug with path-based shape inference when `--temp-dir` is set and `--data-dir` is not #4375

[onnx_importer] Fix bug with path-based shape inference when `--temp-dir` is set and `--data-dir` is not #4375

tkocmathla commented Nov 12, 2025

Uh oh!

zjgarvey left a comment

Uh oh!

zjgarvey Dec 10, 2025

Uh oh!

zjgarvey Dec 10, 2025

Uh oh!

zjgarvey Dec 10, 2025

Uh oh!

zjgarvey Dec 10, 2025

Uh oh!

tkocmathla Dec 15, 2025

Uh oh!

zjgarvey Dec 15, 2025

Uh oh!

zjgarvey Dec 10, 2025

Uh oh!

tkocmathla Dec 15, 2025

Uh oh!

zjgarvey Dec 10, 2025

Uh oh!

zjgarvey Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[onnx_importer] Fix bug with path-based shape inference when --temp-dir is set and --data-dir is not #4375

Are you sure you want to change the base?

[onnx_importer] Fix bug with path-based shape inference when --temp-dir is set and --data-dir is not #4375

Conversation

tkocmathla commented Nov 12, 2025

Uh oh!

zjgarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[onnx_importer] Fix bug with path-based shape inference when `--temp-dir` is set and `--data-dir` is not #4375

[onnx_importer] Fix bug with path-based shape inference when `--temp-dir` is set and `--data-dir` is not #4375