diff --git a/main/acle.md b/main/acle.md
index 1823e1de..0b9439d2 100644
--- a/main/acle.md
+++ b/main/acle.md
@@ -243,8 +243,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
 * Added reference to the *Cortex-M Security Extension (CMSE)*
   specifications in [Cortex-M Security Extension
   (CMSE)](#cortex-m-security-extension-cmse).
-* Added specification for [NEON-SVE Bridge](#neon-sve-bridge) and
-  [NEON-SVE Bridge macros](#neon-sve-bridge-macro).
+* Added specification for [Neon-SVE Bridge](#neon-sve-bridge) and
+  [Neon-SVE Bridge macros](#neon-sve-bridge-macro).
 * Added feature detection macro for the memcpy family of memory
   operations (MOPS) at [memcpy family of memory operations
   standarization instructions -
@@ -435,7 +435,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
 * Changed name mangling of function types to include SME attributes.
 * Changed `__ARM_NEON_SVE_BRIDGE` to refer to the availability of the
   [`arm_neon_sve_bridge.h`](#arm_neon_sve_bridge.h) header file, rather
-  than the [NEON-SVE bridge](#neon-sve-bridge) intrinsics.
+  than the [Neon-SVE bridge](#neon-sve-bridge) intrinsics.
 * Removed extraneous `const` from SVE2.1 store intrinsics.
 * Added [`__arm_agnostic`](#arm_agnostic) keyword attribute.
 * Refined function versioning scope and signature rules to use the default
@@ -488,6 +488,22 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
 * Added [**Alpha**](#current-status-and-anticipated-changes) support
   for Brain 16-bit floating-point vector multiplication intrinsics.
 * Redesigned atomic store with hints intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SVE2.3 (FEAT_SVE2p3), SME2.3 (FEAT_SME2p3), FEAT_F16F32DOT
+  dot product intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for FEAT_F16F32MM, FEAT_F16MM and FEAT_SVE_B16MM mmla intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SVE2.3 (FEAT_SVE2p3) and SME2.3 lookup table intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SVE2.3 (FEAT_SVE2p3) and SME2.3 pairwise operation intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SVE2.3 (FEAT_SVE2p3) and SME2.3 conversion intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SVE2.3 (FEAT_SVE2p3) and SME2.3 absolute difference accumulation
+  intrinsics.
+* Added [**Alpha**](#current-status-and-anticipated-changes)
+  support for SVE2.3 (FEAT_SVE2p3) and SME2.3 shift right narrow intrinsics.
 
 ### References
 
@@ -1110,7 +1126,7 @@ Including `<arm_sve.h>` also includes the following header files:
 ### `<arm_neon_sve_bridge.h>`
 
 `<arm_neon_sve_bridge.h>` defines intrinsics for moving data between
-Neon and SVE vector types; see [NEON-SVE Bridge](#neon-sve-bridge)
+Neon and SVE vector types; see [Neon-SVE Bridge](#neon-sve-bridge)
 for details. Before including the header, you should test the
 `__ARM_NEON_SVE_BRIDGE` macro.
 :
@@ -2004,7 +2020,11 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero.
  are available and if the associated [ACLE features]
 (#sme-language-extensions-and-intrinsics) are supported.
 
-#### NEON-SVE Bridge macro
+`__ARM_FEATURE_SVE2p3` is defined to 1 if the FEAT_SVE2p3 instructions
+ are available and if the associated [ACLE features]
+(#sme-language-extensions-and-intrinsics) are supported.
+
+#### Neon-SVE Bridge macro
 
 `__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h)
 header file is available.
@@ -2027,6 +2047,7 @@ of SME has an associated preprocessor macro, given in the table below:
 | FEAT_SME2   | __ARM_FEATURE_SME2         |
 | FEAT_SME2p1 | __ARM_FEATURE_SME2p1       |
 | FEAT_SME2p2 | __ARM_FEATURE_SME2p2       |
+| FEAT_SME2p3 | __ARM_FEATURE_SME2p3       |
 
 Each macro is defined if there is hardware support for the associated
 architecture feature and if all of the [ACLE
@@ -2163,6 +2184,20 @@ See [Half-precision brain
 floating-point](#half-precision-brain-floating-point) for details
 of half-precision brain floating-point types.
 
+#### Brain 16-bit floating-point matrix multiplication support
+
+This section is in
+[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
+extended in the future.
+
+`__ARM_FEATURE_SVE_B16MM` is defined to `1` if there is hardware
+support for the SVE BF16 matrix multiply extension and if the
+associated ACLE intrinsics are available.
+
+See [Half-precision brain
+floating-point](#half-precision-brain-floating-point) for details
+of half-precision brain floating-point types.
+
 ### Cryptographic extensions
 
 #### “Crypto” extension
@@ -2381,6 +2416,18 @@ this implies:
 
   * `__ARM_NEON == 1`
 
+#### Half-precision to single-precision dot product extension
+
+This section is in
+[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
+extended in the future.
+
+`__ARM_FEATURE_F16F32DOT` is defined if the half-precision dot product
+accumulating to single-precision instructions are supported and the vector
+intrinsics are available. Note that this implies:
+
+  * `__ARM_NEON == 1`
+
 #### Complex number intrinsics
 
 `__ARM_FEATURE_COMPLEX` is defined if the complex addition and complex
@@ -2413,11 +2460,11 @@ This section is in
 extended in the future.
 
 `__ARM_FEATURE_F8F16MM` is defined to `1` if there is hardware support
-for the NEON and SVE modal 8-bit floating-point matrix multiply-accumulate to half-precision (FEAT_F8F16MM)
+for the Neon and SVE modal 8-bit floating-point matrix multiply-accumulate to half-precision (FEAT_F8F16MM)
 instructions and if the associated ACLE intrinsics are available.
 
 `__ARM_FEATURE_F8F32MM` is defined to `1` if there is hardware support
-for the NEON and SVE modal 8-bit floating-point matrix multiply-accumulate to single-precision (FEAT_F8F32MM)
+for the Neon and SVE modal 8-bit floating-point matrix multiply-accumulate to single-precision (FEAT_F8F32MM)
 instructions and if the associated ACLE intrinsics are available.
 
 ##### Multiplication of 16-bit floating-point matrices
@@ -2426,6 +2473,21 @@ instructions and if the associated ACLE intrinsics are available.
 for the SVE 16-bit floating-point to 32-bit floating-point matrix multiply and add
 (FEAT_SVE_F16F32MM) instructions and if the associated ACLE intrinsics are available.
 
+##### Multiplication of 16-bit floating-point matrices (AdvSIMD)
+
+This section is in
+[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
+extended in the future.
+
+`__ARM_FEATURE_F16F32MM` is defined if the Neon half-precision matrix multiply
+accumulating to single-precision instruction is supported. Note that this implies:
+
+  * `__ARM_NEON == 1`
+
+`__ARM_FEATURE_F16MM` is defined if the Neon non-widening half-precision matrix multiply instruction instruction is supported. Note that this implies:
+
+  * `__ARM_NEON == 1`
+
 ##### Multiplication of 32-bit floating-point matrices
 
 `__ARM_FEATURE_SVE_MATMUL_FP32` is defined to `1` if there is hardware support
@@ -2663,6 +2725,9 @@ be found in [[BA]](#BA).
 | [`__ARM_FEATURE_DIRECTED_ROUNDING`](#directed-rounding)                                                                                                 | Directed Rounding                                                                                  | 1           |
 | [`__ARM_FEATURE_DOTPROD`](#availability-of-dot-product-intrinsics)                                                                                      | Dot product extension (ARM v8.2-A)                                                                 | 1           |
 | [`__ARM_FEATURE_DSP`](#dsp-instructions)                                                                                                                | DSP instructions (Arm v5E) (32-bit-only)                                                           | 1           |
+| [`__ARM_FEATURE_F16F32DOT`](#half-precision-to-single-precision-dot-product-extension)                                                                  | Half-precision to single-precision dot product extension (FEAT_F16F32DOT)                          | 1           |
+| [`__ARM_FEATURE_F16F32MM`](#multiplication-of-16-bit-floating-point-matrices-advsimd)                                                                   | Half-precision to single-precision matrix multiply accumulating extension (FEAT_F16F32MM).         | 1           |
+| [`__ARM_FEATURE_F16MM`](#multiplication-of-16-bit-floating-point-matrices-advsimd)                                                                      | Half-precision matrix multiply accumulating extension (FEAT_F16MM)                                 | 1           |
 | [`__ARM_FEATURE_FAMINMAX`](#floating-point-absolute-minimum-and-maximum-extension)                                                                      | Floating-point absolute minimum and maximum extension                                              | 1           |
 | [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma)                                                                                                   | Floating-point fused multiply-accumulate                                                           | 1           |
 | [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension)                                                                                                         | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A)                                     | 1           |
@@ -2715,6 +2780,7 @@ be found in [[BA]](#BA).
 | [`__ARM_FEATURE_SSVE_FP8FMA`](#modal-8-bit-floating-point-extensions)                                                                                   | Modal 8-bit floating-point extensions                                                              | 1           |
 | [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve)                                                                                                   | Scalable Vector Extension (FEAT_SVE)                                                               | 1           |
 | [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support)                                                                         | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16)                              | 1           |
+| [`__ARM_FEATURE_SVE_B16MM`](#brain-16-bit-floating-point-matrix-multiplication-support)                                                                 | SVE brain 16-bit floating-point matrix multiply extension (FEAT_SVE_B16MM)                         | 1           |
 | [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support)                                                                                        | SVE support for the 16-bit brain floating-point extension (FEAT_BF16)                              | 1           |
 | [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support)                                                               | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1           |
 | [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve)                                                                                              | The number of bits in an SVE vector, when known in advance                                         | 256         |
@@ -2738,6 +2804,7 @@ be found in [[BA]](#BA).
 | [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension)                                                                                                              | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4)                                     | 1           |
 | [`__ARM_FEATURE_SVE2p1`](#sve2)                                                                                                                         | SVE version 2.1 (FEAT_SVE2p1)
 | [`__ARM_FEATURE_SVE2p2`](#sve2)                                                                                                                         | SVE version 2.2 (FEAT_SVE2p2)
+| [`__ARM_FEATURE_SVE2p3`](#sve2)                                                                                                                         | SVE version 2.3 (FEAT_SVE2p3)
 | [`__ARM_FEATURE_SYSREG128`](#bit-system-registers)                                                                                                      | Support for 128-bit system registers (FEAT_SYSREG128)                                              | 1           |
 | [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware)                                                                                    | Hardware support for unaligned access                                                              | 1           |
 | [`__ARM_FP`](#hardware-floating-point)                                                                                                                  | Hardware floating-point                                                                            | 1           |
@@ -6235,7 +6302,7 @@ correspond to the new mode and returns the resulting value. No side effects, suc
 as changing processor state, occur.
 
 Individual FP8 intrinsics are described in their respective
-Advanced SIMD (NEON), SVE, and SME sections.
+Advanced SIMD (Neon), SVE, and SME sections.
 
 ## Support enumerations
 
@@ -9600,6 +9667,15 @@ BFloat16 floating-point adjust exponent vectors.
 
 ### SVE2 floating-point matrix multiply-accumulate instructions.
 
+#### BFMMLA, FMMLA (non-widening)
+
+16-bit floating-point matrix multiply-accumulate.
+```c
+  // Only if __ARM_FEATURE_SVE_B16MM
+  // Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM).
+  svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm);
+```
+
 #### FMMLA (widening, FP8 to FP16)
 
 Modal 8-bit floating-point matrix multiply-accumulate to half-precision.
@@ -9945,7 +10021,7 @@ Lookup table read with 2-bit indices.
   // Variant is  also available for: _u8
   svint8_t svluti2_lane[_s8](svint8_t table, svuint8_t indices, uint64_t imm_idx);
 
-  // Variant are also available for: _u16, _f16 and _bf16
+  // Variants are also available for: _u16, _f16 and _bf16
   svint16_t svluti2_lane[_s16]( svint16_t table, svuint8_t indices, uint64_t imm_idx);
 ```
 
@@ -9956,11 +10032,27 @@ Lookup table read with 4-bit indices.
   // Variant is also available for: _u8
   svint8_t svluti4_lane[_s8](svint8_t table, svuint8_t indices, uint64_t imm_idx);
 
-  // Variant are also available for: _u16, _f16, _bf16
+  // Variants are also available for: _u16, _f16, _bf16
   svint16_t svluti4_lane[_s16](svint16_t table, svuint8_t indices, uint64_t imm_idx);
   svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
 ```
 
+### SVE2.3 lookup table
+
+The intrinsics in this section are defined by the header file
+[`<arm_sve.h>`](#arm_sve.h) when `__ARM_FEATURE_SVE2p3` is defined to 1.
+
+#### LUTI6
+
+Lookup table read with 6-bit indices (8-bit).
+
+Use of this intrinsic if `svcntb() * 8 < 256` results in undefined behaviour.
+
+```c
+  // Variant is  also available for: _u8 _mf8
+  svint8_t svluti6[_s8_x2](svint8x2_t table, svuint8_t indices);
+```
+
 ### SVE2 Multi-vector AES and 128-bit polynomial multiply long instructions
 
 The specification for SVE2 Multi-vector AES and 128-bit polynomial multiply long instructions is in
@@ -13287,7 +13379,8 @@ Set scalar to count from predicate-as-counter. ``vl`` is expected to be 2 or 4.
 Multi-vector dot-product (2-way)
 
 ``` c
-  // Variants are also available for _s32_s16 and _u32_u16
+  // Variants are also available for _s32_s16, _u32_u16
+  // and also for _s16_s8 and _u16_u8 if (__ARM_FEATURE_SVE2p3 || __ARM_FEATURE_SME2p3).
   svfloat32_t svdot[_f32_f16](svfloat32_t zda, svfloat16_t zn,
                               svfloat16_t zm);
   svfloat32_t svdot[_n_f32_f16](svfloat32_t zda, svfloat16_t zn,
@@ -13299,7 +13392,8 @@ Multi-vector dot-product (2-way)
 Multi-vector dot-product (2-way)
 
 ``` c
-  // Variants are also available for _s32_s16 and _u32_u16
+  // Variants are also available for _s32_s16, _u32_u16
+  // and also for _s16_s8 and _u16_u8  if (__ARM_FEATURE_SVE2p3 || __ARM_FEATURE_SME2p3).
   svfloat32_t svdot_lane[_f32_f16](svfloat32_t zda, svfloat16_t zn,
                                    svfloat16_t zm, uint64_t imm_idx);
   ```
@@ -13554,6 +13648,7 @@ Multi-vector saturating rounding shift right narrow and interleave
 
 ``` c
   // Variants are also available for _u16[_u32_x2]
+  // and also _s8[_s16_x2] and _u8[_u16_x2] if __ARM_FEATURE_SVE2p3 || __ARM_FEATURE_SME2p3.
   svint16_t svqrshrn[_n]_s16[_s32_x2](svint32x2_t zn, uint64_t imm);
   ```
 
@@ -13562,6 +13657,7 @@ Multi-vector saturating rounding shift right narrow and interleave
 Multi-vector saturating rounding shift right unsigned narrow and interleave
 
 ``` c
+  // Variant for _u8[_s16_x2] is available if __ARM_FEATURE_SVE2p3 || __ARM_FEATURE_SME2p3.
   svuint16_t svqrshrun[_n]_u16[_s32_x2](svint32x2_t zn, uint64_t imm);
   ```
 
@@ -13863,6 +13959,108 @@ Scalar index of first/last true predicate element (predicated).
 
   ```
 
+### SVE2.3 and SME2.3 instruction intrinsics
+
+The specification for SVE2.3 and SME2.3 are in
+[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
+extended in the future.
+
+The functions in this section are defined by either the header file
+ [`<arm_sve.h>`](#arm_sve.h) or [`<arm_sme.h>`](#arm_sme.h)
+when `__ARM_FEATURE_SVE2p3` or `__ARM_FEATURE_SME2p3` is defined, respectively.
+
+#### ADDQP
+
+Add pairwise within quadword vector segments.
+
+``` c
+  // Variants are also available for _s16, _s32, _s64, _u8, _u16, _u32 and _u64.
+  svint8_t svaddqp[_s8](svint8_t zn, svint8_t zm);
+  ```
+
+#### ADDSUBP
+
+Add subtract pairwise.
+
+``` c
+  // Variants are also available for _s16, _s32, _s64, _u8, _u16, _u32 and _u64.
+  svint8_t svaddsubp[_s8](svint8_t zn, svint8_t zm);
+  ```
+
+#### LUTI6
+
+Lookup table read with 6-bit indices (16-bit).
+
+Use of this intrinsic if `svcntb() * 8 < 512` results in undefined behaviour.
+
+``` c
+  // Variants are also available for _u16_x2 and _f16_x2.
+  svint16_t svluti6_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
+  ```
+
+#### FCVTZSN, FCVTZUN
+
+Floating-point narrowing convert to interleaved integer, rounding toward zero.
+
+``` c
+  // Variants are also available for
+  //               _s16[_f32_x2], _s32[_f64_x2],
+  // _u8[_f16_x2], _u16[_f32_x2], _u32[_f64_x2].
+  svint8_t svcvtzn_s8[_f16_x2](svfloat16x2_t zn);
+```
+
+#### SABAL, UABAL
+
+Two-way absolute difference sum and accumulate long.
+
+``` c
+  // Variants are also available for
+  //       _s32, _s64,
+  // _u16, _u32, _u64.
+  svint16_t svabal[_s16](svint16_t zda, svint8_t zn, svint8_t zm);
+  svint16_t svabal[_n_s16](svint16_t zda, svint8_t zn, int8_t zm);
+```
+
+#### SCVTF, SCVTFLT, UCVTF, UCVTFLT
+
+Integer convert to floating-point (top and bottom).
+
+``` c
+  // Variants are also available for
+  //            _f32[_s16], _f64[_s32],
+  // _f16[_u8], _f32[_u16], _f64[_u32].
+  svfloat16_t svcvtt_f16[_s8](svint8_t zn);
+  svfloat16_t svcvtb_f16[_s8](svint8_t zn);
+```
+
+#### SQSHRN, UQSHRN
+
+Multi-vector saturating shift right narrow and interleave.
+
+``` c
+  // Variants are also available for _s8[_s16_x2], _u16[_u32_x2] and _u8[_u16_x2].
+  svint16_t svqshrn[_n]_s16[_s32_x2](svint32x2_t zn, uint64_t imm);
+  ```
+
+#### SQSHRUN
+
+Signed saturating shift right narrow by immediate to interleaved unsigned integer.
+
+``` c
+  // Variant for _u8[_s16_x2] is also available.
+  svuint16_t svqshrun[_n]_u16[_s32_x2](svint32x2_t zn, uint64_t imm);
+  ```
+
+#### SUBP
+
+Subtract pairwise.
+
+``` c
+  // Variants are also available for _s16, _s32, _s64, _u8, _u16, _u32 and _u64.
+  svint8_t svsubp[_s8]_m (svbool_t pg, svint8_t zdn, svint8_t zm);
+  svint8_t svsubp[_s8]_x (svbool_t pg, svint8_t zdn, svint8_t zm);
+  ```
+
 ### SME2 maximum and minimum absolute value
 
 The intrinsics in this section are defined by the header file
@@ -14587,6 +14785,40 @@ non-overloaded names to indicate which vector argument is a vector register pair
     __arm_streaming __arm_inout("za");
 ```
 
+### SME2.3 lookup table
+
+The intrinsics in this section are defined by the header file
+[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p3` is defined to 1.
+
+#### LUTI6
+
+Lookup table read with 6-bit indices (16-bit).
+
+Use of this intrinsic if `svcntb() * 8 < 512` results in undefined behaviour.
+
+```c
+  // Variants are also available for:
+  //   _u16_x2_u8_x2, _f16_x2_u8_x2, _bf16_x2_u8_x2
+  //   _s16_x2_u8_x3, _u16_x2_u8_x3, _f16_x2_u8_x3, _bf16_x2_u8_x3
+  svint16x4_t svluti6_lane_s16_x4[_s16_x2_u8_x2](svint16x2_t table,
+                                                 svuint8x2_t indices,
+                                                 uint64_t imm_idx) __arm_streaming;
+```
+
+Lookup table read with 6-bit indices (four registers, 8-bit).
+
+``` c
+  // Variants are also available for: _u8 and _mf8.
+  svint8x4_t svluti6_zt_s8_x4(uint64_t zt0, svuint8x3_t zn) __arm_streaming __arm_in("zt0");
+```
+
+Lookup table read with 6-bit indices (table, single, 8-bit).
+
+``` c
+  // Variants are also available for: _u8 and _mf8.
+  svint8_t svluti6_zt_s8(uint64_t zt0, svuint8_t zn) __arm_streaming __arm_in("zt0");
+```
+
 # M-profile Vector Extension (MVE) intrinsics
 
 The M-profile Vector Extension (MVE) [[MVE-spec]](#MVE-spec) instructions provide packed Single
@@ -14915,9 +15147,9 @@ Similarly to C's `memset`, this intrinsic returns the `tagged_address` pointer.
 
 # Architectural Extension Bridges
 
-## NEON-SVE Bridge
+## Neon-SVE Bridge
 
-The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
+The NEON_SVE Bridge adds intrinsics that allow conversions between Neon and
 SVE vectors.
 
 The [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h) header should be
@@ -14939,7 +15171,7 @@ SVE vector.  Using `svld1` to load elements would instead put the
 first memory element in lane 0 of the returned SVE vector.
 
 When `svundef` is passed as the `vec` parameter, compilers are able
-to reuse the SVE register overlapping the NEON input without generating
+to reuse the SVE register overlapping the Neon input without generating
 additional instructions.
 
 | **Instances**                                                            |
@@ -14961,7 +15193,7 @@ additional instructions.
 ### `svget_neonq`
 
 These intrinsics get the first 128 bit subvector of SVE vector `vec` as a
-NEON vector.
+Neon vector.
 
 | **Instances**                                       |
 |-----------------------------------------------------|
@@ -14982,7 +15214,7 @@ NEON vector.
 ### `svdup_neonq`
 
 These intrinsics return an SVE vector with all SVE subvectors containing the
-duplicated NEON vector `vec`.
+duplicated Neon vector `vec`.
 
 | **Instances**                                       |
 |-----------------------------------------------------|
diff --git a/neon_intrinsics/advsimd.md b/neon_intrinsics/advsimd.md
index 630ca2c6..302b3168 100644
--- a/neon_intrinsics/advsimd.md
+++ b/neon_intrinsics/advsimd.md
@@ -12,7 +12,7 @@ toc: true
 ---
 
 <!--
-SPDX-FileCopyrightText: Copyright 2014-2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
+SPDX-FileCopyrightText: Copyright 2014-2026 Arm Limited and/or its affiliates <open-source-office@arm.com>
 SPDX-FileCopyrightText: Copyright 2021 Matt P. Dziubinski <matdzb@gmail.com>
 CC-BY-SA-4.0 AND Apache-Patent-License
 See LICENSE.md file for details
@@ -169,6 +169,11 @@ for more information about Arm’s trademarks.
   floating point conversion intrinsics from "Half Precision to 32-bit"
   and "Half Precision to 64-bit".
 
+### Changes for next release
+
+* Added support for FEAT_F16F32DOT
+* Added support for FEAT_F16F32MM and FEAT_F16MM
+
 <!---
 **** Do not remove! ****
 The line following this comment is necessary to generate custom geometry settings
@@ -6213,3 +6218,38 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``.
 |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------------|-------------------|---------------------------|
 | <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f16_mf8_fpm" target="_blank">vmmlaq_f16_mf8_fpm</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; fpm_t fpm)</code> | `r -> Vd.8H`<br>`a -> Vn.16B`<br>`b -> Vm.16B` | `FMMLA Vd.8H, Vn.16B, Vm.16B` | `Vd.8H -> result` | `A64`                     |
 | <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f32_mf8_fpm" target="_blank">vmmlaq_f32_mf8_fpm</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; fpm_t fpm)</code> | `r -> Vd.4S`<br>`a -> Vn.16B`<br>`b -> Vm.16B` | `FMMLA Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64`                     |
+
+## Half-precision dot product to single-precision instructions from Armv9.7-A. Requires the +f16f32dot architecture extension.
+
+### Vector arithmetic
+
+#### Dot product
+
+| Intrinsic                                                                                                                                                                                                                                                                                                                                        | Argument preparation                                             | AArch64 Instruction            | Result            | Supported architectures   |
+|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|--------------------------------|-------------------|---------------------------|
+| <code>float32x2_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vdot_f32_f16" target="_blank">vdot_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x2_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x4_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x4_t b)</code>                                                           | `r -> Vd.2S`<br>`a -> Vn.4H`<br>`b -> Vm.4H`                     | `FDOT Vd.2S,Vn.4H,Vm.4H`       | `Vd.2S -> result` | `A64`                     |
+| <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vdotq_f32_f16" target="_blank">vdotq_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t b)</code>                                                         | `r -> Vd.4S`<br>`a -> Vn.8H`<br>`b -> Vm.8H`                     | `FDOT Vd.4S,Vn.8H,Vm.8H`       | `Vd.4S -> result` | `A64`                     |
+| <code>float32x2_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vdot_lane_f32_f16" target="_blank">vdot_lane_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x2_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x4_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x4_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int lane)</code>     | `r -> Vd.2S`<br>`a -> Vn.4H`<br>`b -> Vm.4H`<br>`0 <= lane <= 1` | `FDOT Vd.2S,Vn.4H,Vm.2H[lane]` | `Vd.2S -> result` | `A64`                     |
+| <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vdotq_laneq_f32_f16" target="_blank">vdotq_laneq_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int lane)</code> | `r -> Vd.4S`<br>`a -> Vn.8H`<br>`b -> Vm.8H`<br>`0 <= lane <= 3` | `FDOT Vd.4S,Vn.8H,Vm.2H[lane]` | `Vd.4S -> result` | `A64`                     |
+| <code>float32x2_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vdot_laneq_f32_f16" target="_blank">vdot_laneq_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x2_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x4_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int lane)</code>   | `r -> Vd.2S`<br>`a -> Vn.4H`<br>`b -> Vm.8H`<br>`0 <= lane <= 3` | `FDOT Vd.2S,Vn.4H,Vm.2H[lane]` | `Vd.2S -> result` | `A64`                     |
+| <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vdotq_lane_f32_f16" target="_blank">vdotq_lane_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x4_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int lane)</code>   | `r -> Vd.4S`<br>`a -> Vn.8H`<br>`b -> Vm.4H`<br>`0 <= lane <= 1` | `FDOT Vd.4S,Vn.8H,Vm.2H[lane]` | `Vd.4S -> result` | `A64`                     |
+
+## Half-precision matrix multiply accumulating to single-precision instruction from Armv9.7-A. Requires the +f16f32mm architecture extension.
+
+### Vector arithmetic
+
+#### Matrix multiply
+
+| Intrinsic                                                                                                                                                                                                                                                                                  | Argument preparation                         | AArch64 Instruction         | Result            | Supported architectures   |
+|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|-----------------------------|-------------------|---------------------------|
+| <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f32_f16" target="_blank">vmmlaq_f32_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t b)</code> | `r -> Vd.4S`<br>`a -> Vn.8H`<br>`b -> Vm.8H` | `FMMLA Vd.4S, Vn.8H, Vm.8H` | `Vd.4S -> result` | `A64`                     |
+
+## Non-widening half-precision matrix multiply instruction. Requires the +f16mm architecture extension.
+
+### Vector arithmetic
+
+#### Matrix multiply
+
+| Intrinsic                                                                                                                                                                                                                                                                                  | Argument preparation                         | AArch64 Instruction         | Result            | Supported architectures   |
+|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|-----------------------------|-------------------|---------------------------|
+| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f16_f16" target="_blank">vmmlaq_f16_f16</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t b)</code> | `r -> Vd.8H`<br>`a -> Vn.8H`<br>`b -> Vm.8H` | `FMMLA Vd.8H, Vn.8H, Vm.8H` | `Vd.8H -> result` | `A64`                     |
diff --git a/neon_intrinsics/advsimd.template.md b/neon_intrinsics/advsimd.template.md
index 336378db..c76774ab 100644
--- a/neon_intrinsics/advsimd.template.md
+++ b/neon_intrinsics/advsimd.template.md
@@ -12,7 +12,7 @@ toc: true
 ---
 
 <!--
-SPDX-FileCopyrightText: Copyright 2014-2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
+SPDX-FileCopyrightText: Copyright 2014-2026 Arm Limited and/or its affiliates <open-source-office@arm.com>
 SPDX-FileCopyrightText: Copyright 2021 Matt P. Dziubinski <matdzb@gmail.com>
 CC-BY-SA-4.0 AND Apache-Patent-License
 See LICENSE.md file for details
@@ -169,6 +169,11 @@ for more information about Arm’s trademarks.
   floating point conversion intrinsics from "Half Precision to 32-bit"
   and "Half Precision to 64-bit".
 
+### Changes for next release
+
+* Added support for FEAT_F16F32DOT
+* Added support for FEAT_F16F32MM and FEAT_F16MM
+
 <!---
 **** Do not remove! ****
 The line following this comment is necessary to generate custom geometry settings
diff --git a/tools/intrinsic_db/advsimd.csv b/tools/intrinsic_db/advsimd.csv
index 5922fcad..6de70af8 100644
--- a/tools/intrinsic_db/advsimd.csv
+++ b/tools/intrinsic_db/advsimd.csv
@@ -1,4 +1,4 @@
-<COMMENT>	SPDX-FileCopyrightText: Copyright 2014-2025 Arm Limited <open-source-office@arm.com>
+<COMMENT>	SPDX-FileCopyrightText: Copyright 2014-2026 Arm Limited <open-source-office@arm.com>
 <COMMENT>	SPDX-FileCopyrightText: Copyright 2021 Matt P. Dziubinski <matdzb@gmail.com>
 <COMMENT>	SPDX-License-Identifier: Apache-2.0
 <COMMENT>	
@@ -4834,3 +4834,18 @@ float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8
 <SECTION>	Matrix multiplication intrinsics from Armv9.6-A
 float16x8_t vmmlaq_f16_mf8_fpm(float16x8_t r, mfloat8x16_t a, mfloat8x16_t b, fpm_t fpm)	r -> Vd.8H;a -> Vn.16B;b -> Vm.16B	FMMLA Vd.8H, Vn.16B, Vm.16B	Vd.8H -> result	A64
 float32x4_t vmmlaq_f32_mf8_fpm(float32x4_t r, mfloat8x16_t a, mfloat8x16_t b, fpm_t fpm)	r -> Vd.4S;a -> Vn.16B;b -> Vm.16B	FMMLA Vd.4S, Vn.16B, Vm.16B	Vd.4S -> result	A64
+
+<SECTION>	Half-precision dot product to single-precision instructions from Armv9.7-A. Requires the +f16f32dot architecture extension.
+float32x2_t vdot_f32_f16(float32x2_t r, float16x4_t a, float16x4_t b)	r -> Vd.2S;a -> Vn.4H;b -> Vm.4H	FDOT Vd.2S,Vn.4H,Vm.4H	Vd.2S -> result	A64
+float32x4_t vdotq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b)	r -> Vd.4S;a -> Vn.8H;b -> Vm.8H	FDOT Vd.4S,Vn.8H,Vm.8H	Vd.4S -> result	A64
+
+float32x2_t vdot_lane_f32_f16(float32x2_t r, float16x4_t a, float16x4_t b, __builtin_constant_p(lane))	r -> Vd.2S;a -> Vn.4H;b -> Vm.4H;0 <= lane <= 1	FDOT Vd.2S,Vn.4H,Vm.2H[lane]	Vd.2S -> result	A64
+float32x4_t vdotq_laneq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b, __builtin_constant_p(lane))	r -> Vd.4S;a -> Vn.8H;b -> Vm.8H;0 <= lane <= 3	FDOT Vd.4S,Vn.8H,Vm.2H[lane]	Vd.4S -> result	A64
+float32x2_t vdot_laneq_f32_f16(float32x2_t r, float16x4_t a, float16x8_t b, __builtin_constant_p(lane))	r -> Vd.2S;a -> Vn.4H;b -> Vm.8H;0 <= lane <= 3	FDOT Vd.2S,Vn.4H,Vm.2H[lane]	Vd.2S -> result	A64
+float32x4_t vdotq_lane_f32_f16(float32x4_t r, float16x8_t a, float16x4_t b, __builtin_constant_p(lane))	r -> Vd.4S;a -> Vn.8H;b -> Vm.4H;0 <= lane <= 1	FDOT Vd.4S,Vn.8H,Vm.2H[lane]	Vd.4S -> result	A64
+
+<SECTION>	Half-precision matrix multiply accumulating to single-precision instruction from Armv9.7-A. Requires the +f16f32mm architecture extension.
+float32x4_t vmmlaq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b)	r -> Vd.4S;a -> Vn.8H;b -> Vm.8H	FMMLA Vd.4S, Vn.8H, Vm.8H	Vd.4S -> result	A64
+
+<SECTION>	Non-widening half-precision matrix multiply instruction. Requires the +f16mm architecture extension.
+float16x8_t vmmlaq_f16_f16(float16x8_t r, float16x8_t a, float16x8_t b)	r -> Vd.8H;a -> Vn.8H;b -> Vm.8H	FMMLA Vd.8H, Vn.8H, Vm.8H	Vd.8H -> result	A64
diff --git a/tools/intrinsic_db/advsimd_classification.csv b/tools/intrinsic_db/advsimd_classification.csv
index 1b0f0da8..1c3de29c 100644
--- a/tools/intrinsic_db/advsimd_classification.csv
+++ b/tools/intrinsic_db/advsimd_classification.csv
@@ -1,4 +1,4 @@
-<COMMENT>	SPDX-FileCopyrightText: Copyright 2021, 2024 Arm Limited <open-source-office@arm.com>
+<COMMENT>	SPDX-FileCopyrightText: Copyright 2021-2026 Arm Limited <open-source-office@arm.com>
 <COMMENT>	SPDX-License-Identifier: Apache-2.0
 <COMMENT>	
 <COMMENT>	Licensed under the Apache License, Version 2.0 (the "License");
@@ -4719,3 +4719,11 @@ vmlallttq_lane_f32_mf8_fpm	Vector arithmetic|Multiply|Multiply-accumulate and wi
 vmlallttq_laneq_f32_mf8_fpm	Vector arithmetic|Multiply|Multiply-accumulate and widen
 vmmlaq_f16_mf8_fpm	Vector arithmetic|Matrix multiply
 vmmlaq_f32_mf8_fpm	Vector arithmetic|Matrix multiply
+vdot_f32_f16	Vector arithmetic|Dot product
+vdotq_f32_f16	Vector arithmetic|Dot product
+vdot_lane_f32_f16	Vector arithmetic|Dot product
+vdotq_laneq_f32_f16	Vector arithmetic|Dot product
+vdot_laneq_f32_f16	Vector arithmetic|Dot product
+vdotq_lane_f32_f16	Vector arithmetic|Dot product
+vmmlaq_f32_f16	Vector arithmetic|Matrix multiply
+vmmlaq_f16_f16	Vector arithmetic|Matrix multiply