ArmDeveloperEcosystem · pareenaverma · Jul 2, 2026 · Jul 1, 2026 · Jul 1, 2026 · Jul 1, 2026
diff --git a/content/learning-paths/cross-platform/_example-learning-path/_index.md b/content/learning-paths/cross-platform/_example-learning-path/_index.md
@@ -18,7 +18,7 @@ prerequisites:
 
 author: Zach Lasiuk
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false
 

diff --git a/content/learning-paths/servers-and-cloud-computing/benchmark-nlp/_index.md b/content/learning-paths/servers-and-cloud-computing/benchmark-nlp/_index.md
@@ -14,9 +14,55 @@ learning_objectives:
 prerequisites:
     - An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server.
 
+# START generated_summary_faq
+generated_summary_faq:
+  template_version: summary-faq-v3
+  generated_at: '2026-06-30T21:36:51Z'
+  generator: ai
+  ai_assisted: true
+  ai_review_required: true
+  model: gpt-5
+  prompt_template: summary-faq-v3
+  source_hash: a4cf1d9161b3a32e29694415762eda419752e1c3144662d5e131b6553f0a58e3
+  summary_generated_at: '2026-06-30T21:36:51Z'
+  summary_source_hash: a4cf1d9161b3a32e29694415762eda419752e1c3144662d5e131b6553f0a58e3
+  faq_generated_at: '2026-06-30T21:36:51Z'
+  faq_source_hash: a4cf1d9161b3a32e29694415762eda419752e1c3144662d5e131b6553f0a58e3
+  summary: >-
+    You'll deploy Hugging Face Sentiment Analysis models
+    with PyTorch on Arm servers and measure how they perform. Starting from a working Ubuntu
+    22.04 Arm environment, you'll run three NLP models with the Sentiment Analysis
+    pipeline, record baseline results, and enable BFloat16 fast math kernels to assess the
+    impact on inference performance. By
+    the end, you'll compare before-and-after measurements to confirm the effect of BFloat16
+    on this workload.
+  faqs:
+  - question: Which environment do the instructions assume?
+    answer: >-
+      The instructions target an Arm server running Ubuntu 22.04 LTS. They've been tested on
+      AWS Graviton3 (c7g) instances.
+  - question: What system resources should I provision before running the steps?
+    answer: >-
+      Use an Arm server instance with at least four CPU cores and 8 GB of RAM. This capacity supports
+      running the three sentiment analysis models and collecting measurements.
+  - question: Which framework and model source will I use in this Learning Path?
+    answer: >-
+      You'll uses PyTorch to run NLP Sentiment Analysis models sourced from Hugging Face.
+  - question: How should I measure the performance uplift from BFloat16 fast math kernels?
+    answer: >-
+      First, run the models to collect a baseline using the same Sentiment Analysis pipeline.
+      Then enable BFloat16 fast math kernels on supported Arm Neoverse-based AWS Graviton3 processors,
+      rerun the same workload, and compare measurements.
+  - question: Which models will I evaluate and what should I have at the end?
+    answer: >-
+      You'll evaluate three NLP models with the Sentiment Analysis pipeline. By the end, you
+      should have deployed the models on your Arm server and recorded baseline and BFloat16-enabled
+      performance results for comparison.
+# END generated_summary_faq
+
 author: Pareena Verma
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false
 

diff --git a/content/learning-paths/servers-and-cloud-computing/bitmap_scan_sve2/_index.md b/content/learning-paths/servers-and-cloud-computing/bitmap_scan_sve2/_index.md
@@ -13,12 +13,59 @@ learning_objectives:
   - Measure performance improvements on Graviton4 instances
 
 prerequisites:
-- An [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate
-  cloud service provider.
+- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate cloud service provider.
+
+# START generated_summary_faq
+generated_summary_faq:
+  template_version: summary-faq-v3
+  generated_at: '2026-06-30T21:37:19Z'
+  generator: ai
+  ai_assisted: true
+  ai_review_required: true
+  model: gpt-5
+  prompt_template: summary-faq-v3
+  source_hash: 1701b37580fe5d012a5e6fd322307742656a748dfb766fd48914011167386e95
+  summary_generated_at: '2026-06-30T21:37:19Z'
+  summary_source_hash: 1701b37580fe5d012a5e6fd322307742656a748dfb766fd48914011167386e95
+  faq_generated_at: '2026-06-30T21:37:19Z'
+  faq_source_hash: 1701b37580fe5d012a5e6fd322307742656a748dfb766fd48914011167386e95
+  summary: >-
+    You'll implement and benchmark bitmap scanning for database-style
+    workloads on Arm Neoverse V2–based servers, such as AWS Graviton4. First, you'll build a compact
+    bit vector in C and add baseline and improved scalar scanning routines. Then, you'll implement Neon
+    and SVE vectorized versions to process data in wider chunks. You'll use a benchmarking harness
+    that measures each approach so the relative behavior of scalar, Neon, and SVE implementations can
+    be compared on an Arm-based Linux instance. By the end, you'll run a single C program
+    that exercises all variants and produces timing results suitable for side-by-side evaluation.
+  faqs:
+  - question: Where should I place the code as I follow the steps?
+    answer: >-
+      Use a single source file named `bitvector_scan_benchmark.c`. Add the bit vector type, helper
+      functions, scalar scan routines, Neon and SVE implementations, and the benchmarking code
+      into this file as directed.
+  - question: What must the bitmap data structure contain before I can add the scan functions?
+    answer: >-
+      The data structure must include a byte array that holds the bits, the physical size in bytes, and the logical
+      size in bits. The same file must also add helpers to generate and analyze test bitmaps.
+  - question: In what order should I implement and test the scanning approaches?
+    answer: >-
+      Start with the per-bit scalar baseline, then the optimized scalar version, followed by the
+      Neon implementation, and finally SVE. After each addition, run the benchmark to compare
+      against the previous versions.
+  - question: What result should I expect from the benchmarking step?
+    answer: >-
+      The framework measures elapsed time for each scan function over a chosen number of iterations
+      and tracks how many set-bit positions were found. Use the same input bitmap and iteration
+      count when comparing implementations.
+  - question: How can I exercise different workload characteristics when benchmarking?
+    answer: >-
+      Use the provided bitmap generation helpers to create datasets with varying densities. Sparse
+      and dense bitmaps highlight different behaviors across the scalar, Neon, and SVE implementations.
+# END generated_summary_faq
 
 author: Pareena Verma
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false
 

diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-demo/_index.md b/content/learning-paths/servers-and-cloud-computing/bolt-demo/_index.md
@@ -20,9 +20,59 @@ prerequisites:
     - GCC version 13.3 or later to compile the example program ([GCC](/install-guides/gcc/) )
     - A system with with sufficient hardware performance counters to use the [TopDown](/install-guides/topdown-tool) methodology. This typically requires running on bare metal rather than a virtualized environment.
 
+# START generated_summary_faq
+generated_summary_faq:
+  template_version: summary-faq-v3
+  generated_at: '2026-06-30T21:37:54Z'
+  generator: ai
+  ai_assisted: true
+  ai_review_required: true
+  model: gpt-5
+  prompt_template: summary-faq-v3
+  source_hash: 8654b656131bf1d529e11d85f874f7a81c01f7207340d2a606b6fd2d80bfad04
+  summary_generated_at: '2026-06-30T21:37:54Z'
+  summary_source_hash: 8654b656131bf1d529e11d85f874f7a81c01f7207340d2a606b6fd2d80bfad04
+  faq_generated_at: '2026-06-30T21:37:54Z'
+  faq_source_hash: 8654b656131bf1d529e11d85f874f7a81c01f7207340d2a606b6fd2d80bfad04
+  summary: >-
+    You'll apply LLVM BOLT post-link optimization to AArch64 binaries
+    using profile-guided code layout. Starting with a deliberately inefficient BubbleSort workload
+    to make instruction locality issues visible, you'll install a suitable BOLT
+    release, set up a working directory, and gather profiles with BRBE, SPE, instrumentation,
+    or PMU sampling. Using a small set of Arm TopDown indicators, you'll judge
+    whether a program is front-end bound and a good candidate for BOLT. You'll then run BOLT with
+    collected profiles to reorganize code layout and evaluate the impact using performance
+    metrics and profiling data to confirm improvements in instruction delivery and locality.
+  faqs:
+  - question: Which BOLT version should I use if my package manager installs an older release?
+    answer: >-
+      Use LLVM BOLT 22.1.0 or later. If your distribution provides an older version, install a
+      prebuilt LLVM release instead (for example, LLVM 22.1.5) to match the required features.
+  - question: Where do the example’s build and profiling outputs go?
+    answer: >-
+      You'll find outputs in three directories: out for binaries, prof for profile data,
+      and heatmap for visualization artifacts. Keeping these separate makes it easier to rerun
+      steps and compare results.
+  - question: How do I know if my program is a good candidate for BOLT?
+    answer: >-
+      Check a small set of Arm TopDown indicators related to instruction delivery and code locality.
+      Programs that appear front-end bound, with inefficient instruction fetch and poor locality,
+      are strong candidates for code layout optimization with BOLT.
+  - question: What should I use if my kernel doesn't meet the BRBE or SPE requirements?
+    answer: >-
+      If your kernel is older than the BRBE requirement, use SPE if the kernel meets the SPE version
+      requirement. If neither is available, you can use instrumentation or PMU
+      event sampling to collect profiles.
+  - question: What result should I expect after running BOLT with profiles?
+    answer: >-
+      You should be able to evaluate changes using performance metrics and profiling data. Look
+      for improvements in instruction delivery indicators and evidence of better code locality
+      in the optimized binary.
+# END generated_summary_faq
+
 author: Paschalis Mpeis
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false
 

diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md
@@ -16,9 +16,59 @@ learning_objectives:
 prerequisites:
   - An Arm-based Linux system with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed
 
+# START generated_summary_faq
+generated_summary_faq:
+  template_version: summary-faq-v3
+  generated_at: '2026-06-30T21:38:23Z'
+  generator: ai
+  ai_assisted: true
+  ai_review_required: true
+  model: gpt-5
+  prompt_template: summary-faq-v3
+  source_hash: 84a8b96fe7df302e0a2a6e4645bbb6170b45a3e0b55e0ea3682ec47663d34819
+  summary_generated_at: '2026-06-30T21:38:23Z'
+  summary_source_hash: 84a8b96fe7df302e0a2a6e4645bbb6170b45a3e0b55e0ea3682ec47663d34819
+  faq_generated_at: '2026-06-30T21:38:23Z'
+  faq_source_hash: 84a8b96fe7df302e0a2a6e4645bbb6170b45a3e0b55e0ea3682ec47663d34819
+  summary: >-
+    You'll use BOLT with Linux Perf profiles to optimize an Arm application
+    and its shared libraries. First, you'll instrument a MySQL server build to generate workload-specific
+    profiles, create separate traces for read-heavy and write-heavy runs, and merge them to broaden
+    code layout guidance. Then, you'll rebuild OpenSSL to make `libssl.so` and
+    `libcrypto.so` suitable for BOLT, collect profiles, and apply optimizations independently
+    from the main binary. Finally, you'll compare results across baseline, isolated, and merged
+    scenarios using a consistent Sysbench configuration to assess the
+    impact of application and library-level optimizations on throughput and latency.
+  faqs:
+  - question: What output should I expect after running an instrumented workload with BOLT?
+    answer: >-
+      BOLT produces a profile file in `.fdata` format, such as `profile-writeonly.fdata`. These files
+      are later used to optimize the binary and can be merged to improve coverage.
+  - question: Should I reuse the BOLT-instrumented mysqld binary for additional workloads or create
+      a new one?
+    answer: >-
+      Either approach works. The steps allow reusing the previously instrumented binary or generating
+      a new instrumented variant as long as you produce a new `.fdata` profile for each workload.
+  - question: Which shared libraries are targeted for optimization, and what if the system copies
+      are stripped?
+    answer: >-
+      The path optimizes `libssl.so` and `libcrypto.so`. If system libraries are stripped, rebuild
+      OpenSSL from source with relocations enabled so BOLT can instrument and optimize them.
+  - question: Do I need to rebuild the application to benefit from optimized shared libraries?
+    answer: >-
+      The shared libraries are optimized independently of the application binary. The path focuses
+      on rebuilding OpenSSL for symbol information and then integrating the optimized libraries
+      with the application.
+  - question: What test configuration is used to compare baseline and BOLT-optimized results?
+    answer: >-
+      Sysbench is run with `--time=0 --events=10000` to complete exactly 10,000 requests per thread.
+      Use this consistent configuration to compare baseline, application-only, and merged-with-library
+      optimization scenarios.
+# END generated_summary_faq
+
 author: Gayathri Narayana Yegna Narayanan
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false
 

diff --git a/content/learning-paths/servers-and-cloud-computing/bolt/_index.md b/content/learning-paths/servers-and-cloud-computing/bolt/_index.md
@@ -1,5 +1,5 @@
 ---
-title: Learn how to optimize an application with BOLT
+title: Optimize an application with BOLT
 description: Learn how to build, profile, and optimize Arm executables using BOLT post-link binary optimization to improve application performance through code layout improvements.
 
 minutes_to_complete: 30
@@ -15,9 +15,55 @@ prerequisites:
     - An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available. For [SPE](./bolt-spe) the version should be 6.14 or later.
     - (Optional) A second, more powerful Linux system to build the software executable and run BOLT.
 
+# START generated_summary_faq
+generated_summary_faq:
+  template_version: summary-faq-v3
+  generated_at: '2026-06-30T21:38:54Z'
+  generator: ai
+  ai_assisted: true
+  ai_review_required: true
+  model: gpt-5
+  prompt_template: summary-faq-v3
+  source_hash: 2e9ac8a3c73b7d3d59fe6ba20fb6d61fc2b7e5e9320aaadc20af0a8bbb3ff959
+  summary_generated_at: '2026-06-30T21:38:54Z'
+  summary_source_hash: 2e9ac8a3c73b7d3d59fe6ba20fb6d61fc2b7e5e9320aaadc20af0a8bbb3ff959
+  faq_generated_at: '2026-06-30T21:38:54Z'
+  faq_source_hash: 2e9ac8a3c73b7d3d59fe6ba20fb6d61fc2b7e5e9320aaadc20af0a8bbb3ff959
+  summary: >-
+    You'll use BOLT to post-link optimize an Arm Linux executable
+    based on real execution profiles. First, you'll prepare a target system for profiling and optionally
+    a separate build/BOLT system, then choose a profiling method — Perf samples, ETM, or SPE — to
+    collect runtime behavior into a `perf.data` file. You'll convert the profile for BOLT, and run BOLT to reorder code layout and emit a new optimized executable. Finally, you'll compare the resulting binary against
+    the original to observe improvements.
+  faqs:
+  - question: How should I choose between Perf samples, ETM, and SPE for profiling?
+    answer: >-
+      Use the dedicated sections for each method. Perf samples provide general sampling data,
+      while ETM and SPE record richer branch information. Follow the method that best fits your
+      availability and profiling detail needs.
+  - question: Can I profile on one Arm Linux system and run BOLT on another?
+    answer: >-
+      Yes. The target system runs the application and collects the profile, and a separate Linux
+      system can build the application and run BOLT. Transfer the executable and the collected
+      profile files between systems as needed.
+  - question: What file should exist after recording with Perf before converting for BOLT?
+    answer: >-
+      Expect a `perf.data` file. Perf prints sample counts or data size when recording completes,
+      which indicates that profiling output was captured and is ready for conversion.
+  - question: What version of Perf do I need for the SPE workflow?
+    answer: >-
+      Use Linux Perf version 6.14 or later for SPE to capture the required branch stack information.
+      Verify the version before recording so the profile contains all needed fields.
+  - question: How do I check results after BOLT creates the optimized executable?
+    answer: >-
+      Run the same workload with the original and the optimized executables and compare outcomes.
+      The optimized executable should show improved performance relative to the original after
+      the steps are completed.
+# END generated_summary_faq
+
 author: Jonathan Davies
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false
 

diff --git a/content/learning-paths/servers-and-cloud-computing/buildkite-gcp/_index.md b/content/learning-paths/servers-and-cloud-computing/buildkite-gcp/_index.md
@@ -19,9 +19,57 @@ prerequisites:
   - Familiarity with [Docker](https://docs.docker.com/get-started/) and container concepts
   - A [GitHub account](https://github.com/join) to host your application repository
 
+# START generated_summary_faq
+generated_summary_faq:
+  template_version: summary-faq-v3
+  generated_at: '2026-06-30T21:39:35Z'
+  generator: ai
+  ai_assisted: true
+  ai_review_required: true
+  model: gpt-5
+  prompt_template: summary-faq-v3
+  source_hash: 4bda206717eef380430009f859826d9bcf820442d13492cd3c22a114561e2917
+  summary_generated_at: '2026-06-30T21:39:35Z'
+  summary_source_hash: 4bda206717eef380430009f859826d9bcf820442d13492cd3c22a114561e2917
+  faq_generated_at: '2026-06-30T21:39:35Z'
+  faq_source_hash: 4bda206717eef380430009f859826d9bcf820442d13492cd3c22a114561e2917
+  summary: >-
+    You'll provision an Arm-based Google Cloud C4A virtual machine powered by Google
+    Axion, install Docker, Docker Buildx, and the Buildkite agent, and connect the agent to a
+    Buildkite queue. First, you'll create a small Flask application and Dockerfile in a GitHub repository,
+    then configure a Buildkite pipeline that uses Buildx to build a multi-architecture container
+    image, and push it to Docker Hub. You'll use Ubuntu or SUSE on the VM
+    and validate that the agent is online. By the end, you'll have a published
+    image and a running Flask service to confirm the build.
+  faqs:
+  - question: Which Google Cloud instance type and OS should I use for the VM?
+    answer: >-
+      Use a Google Axion C4A Arm VM, specifically `c4a-standard-4` with 4 vCPUs and 16
+      GB memory. You can select either Ubuntu or SUSE Linux Enterprise Server as the OS.
+  - question: Where do I create the Buildkite agent token, and when do I use it?
+    answer: >-
+      Create an agent token in your Buildkite organization after signing in (GitHub sign-in is
+      supported). You use this token during the agent installation and configuration on the C4A
+      VM.
+  - question: How do I confirm the Buildkite agent is connected and assigned to the right queue?
+    answer: >-
+      After configuring the agent and queue, check the Agents page in Buildkite; the agent should
+      appear online with the expected queue. If it doesn't, check the agent configuration and
+      queue name, then repeat the verification step.
+  - question: What files should my GitHub repository contain for the example application?
+    answer: >-
+      Add a Dockerfile and a Python file named `app.py`. The provided Dockerfile uses `python:3.12-slim`,
+      installs Flask, exposes port 5000, and runs the app.
+  - question: What result should I expect after the pipeline runs successfully?
+    answer: >-
+      A multi-architecture Docker image for Arm and x86 is built with Docker Buildx and pushed
+      to Docker Hub. You then start the containerized Flask application and verify that it runs
+      as the final validation step.
+# END generated_summary_faq
+
 author: Jason Andrews
 
-generate_summary_faq: true
+generate_summary_faq: false
 rerun_summary: false
 rerun_faqs: false