Arm backend: Add example of partial quantization (#16298)

martinlsm · Martin Lindström · web-flow · commit 3233761d0391 · 2025-12-18T06:58:51.000+01:00
Update the minimal VGF example to mention partial quantization of
models.

Signed-off-by: Martin Lindstroem &lt;Martin.Lindstroem@arm.com&gt;
Co-authored-by: Martin Lindström &lt;Martin.Lindstroem@arm.com&gt;
diff --git a/examples/arm/vgf_minimal_example.ipynb b/examples/arm/vgf_minimal_example.ipynb
@@ -48,13 +48,17 @@
    "source": [
     "import torch\n",
     "\n",
-    "class Add(torch.nn.Module):\n",
+    "class AddSigmoid(torch.nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        self.sigmoid = torch.nn.Sigmoid()\n",
+    "\n",
     "    def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n",
-    "        return x + y\n",
+    "        return self.sigmoid(x + y)\n",
     "\n",
     "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n",
     "\n",
-    "model = Add()\n",
+    "model = AddSigmoid()\n",
     "model = model.eval()\n",
     "exported_program = torch.export.export(model, example_inputs)\n",
     "graph_module = exported_program.graph_module\n",
@@ -84,8 +88,8 @@
    "source": [
     "from executorch.backends.arm.vgf import VgfCompileSpec\n",
     "\n",
-    "# Create a compilation spec describing the floating point target.\n",
-    "compile_spec = VgfCompileSpec(\"TOSA-1.0+FP\")\n",
+    "# Create a compilation spec describing the target\n",
+    "compile_spec = VgfCompileSpec()\n",
     "\n",
     "_ = graph_module.print_readable()\n",
     "\n",
@@ -99,7 +103,7 @@
    "source": [
     "To lower the graph_module for INT targets using the VGF backend, we apply the arm_quantizer. \n",
     "\n",
-    "Quantization can be performed in various ways and tailored to different subgraphs; the sequence shown here represents the recommended workflow for VGF. \n",
+    "Quantization can be performed in various ways and tailored to different subgraphs; it is even possible to opt out of quantization for selected layers and have them run in floating-point.\n",
     "\n",
     "This step also requires calibrating the module with representative inputs. \n",
     "\n",
@@ -120,13 +124,21 @@
     "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n",
     "\n",
     "# Create a compilation spec describing the target for configuring the quantizer\n",
-    "compile_spec = VgfCompileSpec(\"TOSA-1.0+INT\")\n",
+    "compile_spec = VgfCompileSpec()\n",
     "\n",
     "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
     "quantizer = VgfQuantizer(compile_spec)\n",
     "operator_config = get_symmetric_quantization_config(is_per_channel=False)\n",
+    "\n",
+    "# Set global (default) quantization config for the layers in the models.\n",
+    "# Can also be set to `None` to let layers run in FP as default.\n",
     "quantizer.set_global(operator_config)\n",
     "\n",
+    "# Skip quantizing all sigmoid ops (only one for this model); let it run in FP.\n",
+    "# This step is optional; selecting which layers to include/exclude for\n",
+    "# quantization is part of optimizing the model's performance.\n",
+    "quantizer.set_module_type(torch.nn.Sigmoid, None)\n",
+    "\n",
     "# Post training quantization\n",
     "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n",
     "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n",
@@ -142,7 +154,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# In the example below, we will make use of the quantized graph module.\n",
+    "# In the example below, we will make use of the (partially) quantized graph module.\n",
     "\n",
     "The lowering in the VGFBackend happens in five steps:\n",
     "\n",