Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: AI Plank Tutor
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## What you will build

In this Learning Path, you will build a simple on-device AI fitness tutor for Android.

The app watches a learner hold a plank, compares their body position with a stored instructor reference, asks a local LLM for one short correction, and speaks the correction using Android text-to-speech.

This project is based on the [AI Yoga Tutor](https://developer.arm.com/community/arm-community-blogs/b/ai-blog/posts/ai-yoga-tutor) demo. The Learning Path keeps the same core pipeline, but narrows the app to one static pose so you can focus on how a pipeline that includes Android camera, pose detector, local LLM, and speech output fits together.

![AI Plank Tutor final UI alt-text#center](screenshot.jpg "Figure 1: AI Plank Tutor showing the instructor plank image, live camera view, score, and spoken correction caption.")

The finished app has two main visual areas:

- An instructor plank image on the left.
- A live front-camera preview on the right.

The app overlays a pose score and a short caption that matches the spoken coaching feedback.

This Learning Path starts with a "shell" project with MediaPipe and camera integration mostly setup. If you wish to learn about that setup from an empty project, you could try another Learning Path - [Build a Hands-Free Selfie Android Application with MediaPipe](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/build-android-selfie-app-using-mediapipe-multimodality/) is a good example.

## App pipeline

The app uses a small pipeline of on-device components:

```text
reference image and camera view
-> CameraX live frames
-> Pose landmarks
-> joint-angle scoring
-> compact text prompt
-> Arm AI Chat + LLM
-> Text-To-Speech
```

Each stage passes structured data to a subsequent stage. The LLM does not receive camera frames or images. It receives a short text prompt describing the largest joint-angle differences between the learner and the reference plank pose.

This keeps the LLM prompt small, reduces latency, and makes the behavior easier to tune.

## Clone the starter project

Clone the Learning Path code examples repository:

```console
git clone https://gitlab.arm.com/learning-code-examples/code-examples.git
```

The starter app for this Learning Path is in:

```text
code-examples/learning-paths/mobile-graphics-and-gaming/ai-plank-tutor/android
```

{{% notice Note %}}
The starter project contains the app structure, layout, image asset, MediaPipe pose model, and several Kotlin shell files. You will fill in the missing code over the next pages.
{{% /notice %}}

## Open the project in Android Studio

1. Start Android Studio.
2. Select **Open**.
3. Open `code-examples/learning-paths/mobile-graphics-and-gaming/ai-plank-tutor/android`.
4. Wait for Gradle sync to finish.

If Android Studio prompts you to trust the project, accept the prompt.

The starter app is intentionally incomplete, but it should sync successfully before you add code.

## Inspect the provided files

Start by looking at the files that are already provided for you.

Open `app/build.gradle` and confirm that the Android, CameraX, lifecycle, and MediaPipe dependencies are already present.
Arm's AI Chat dependency is not included yet. You will add it later, when you implement local LLM inference.

Open `app/src/main/AndroidManifest.xml` and confirm that the app requests camera access:

```xml
<uses-permission android:name="android.permission.CAMERA" />
```

Open `app/src/main/res/layout/activity_main.xml` and review the main UI. The layout already contains:

- An `ImageView` for the instructor plank image.
- A `PreviewView` for the live camera.
- A score label.
- A caption label for spoken feedback.

Open `app/src/main/res/drawable/plank.jpg` and review the instructor reference image.

Code is under the long path `app/src/main/java/com/arm/demo/AIPlankTutor`. Under that, open `data/PlankPoseData.kt` and note the hard-coded plank reference data. This file contains the instructor's reference landmarks and angle weights used by the scoring step. This was generated from the reference plank image in an offline step so it doesn't need any runtime compute.

Android code starts from the `MainActivity.kt` file, and we will look at that in the next step.
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
---
title: Get pose landmarks from camera
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Objective

In this section, you will connect the Android camera to MediaPipe Pose Landmarker.

You will:

- Bind a CameraX preview to the app UI.
- Add an `ImageAnalysis` use case for live camera frames.
- Configure MediaPipe Pose Landmarker in live-stream mode.
- Convert each CameraX `ImageProxy` into a MediaPipe `MPImage`.
- Send the first detected pose landmark list to `MainViewModel`.

At the end of this section, the app opens the front camera and passes live pose landmarks into the app. The score will still be incomplete until you add pose scoring in the next section.

## Configure CameraX

Open `ui/MainActivity.kt`.

The starter project already requests camera permission and calls `setUpCamera()` from `onCreate()`. Replace the TODO in `setUpCamera()` with the following code:

```kotlin
private fun setUpCamera() {
val cameraProviderFuture = ProcessCameraProvider.getInstance(this)
cameraProviderFuture.addListener(
{
cameraProvider = cameraProviderFuture.get()
bindCameraUseCases()
}, Dispatchers.Main.asExecutor()
)
}
```

`ProcessCameraProvider` owns the camera use cases for the activity. When the provider is ready, the app stores it and calls `bindCameraUseCases()`.

## Bind preview and image analysis

The app needs two CameraX use cases:

- `Preview`, which displays the camera feed in the `PreviewView`.
- `ImageAnalysis`, which receives frames for pose detection.

In `MainActivity.kt`, replace the TODO at the end of `bindCameraUseCases()` with this code:

```kotlin
val preview = Preview.Builder()
.setResolutionSelector(resolutionSelector)
.setTargetRotation(targetRotation)
.build()

val imageAnalyzer = ImageAnalysis.Builder()
.setResolutionSelector(resolutionSelector)
.setTargetRotation(targetRotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
.build()
.also {
it.setAnalyzer(
Dispatchers.Default.limitedParallelism(1).asExecutor()
) { image -> detectPose(image) }
}

cameraProvider.unbindAll()

try {
cameraProvider.bindToLifecycle(
this, cameraSelector, preview, imageAnalyzer
)

preview.surfaceProvider = cameraPreview.surfaceProvider
} catch (exc: Exception) {
Log.e(TAG, "Use case binding failed", exc)
}
```

The analyzer uses `STRATEGY_KEEP_ONLY_LATEST` because pose detection should work on the most recent frame. If the device is busy, old frames are dropped instead of queued.

The output image format is `RGBA_8888`, which makes the frame data easy to copy into a `Bitmap` before passing it to MediaPipe.

## Send frames to the landmarker

In `bindCameraUseCases()` we just set `ImageAnalysis` to call `detectPose()` for every analyzed frame. Now, replace the TODO in `detectPose()` with this code:

```kotlin
private fun detectPose(imageProxy: ImageProxy) {
if (!this::poseLandmarkerHelper.isInitialized || poseLandmarkerHelper.isClosed) {
imageProxy.close()
return
}

if (imageAnalysisEnabled) {
poseLandmarkerHelper.detectLiveStream(
imageProxy = imageProxy,
isFrontCamera = true
)
} else {
imageProxy.close()
}
}
```

This code checks that the MediaPipe helper is ready before using it. If the helper is not ready, it closes the `ImageProxy` immediately.

{{% notice Note %}}
Every `ImageProxy` from CameraX must be closed. In this app, `PoseLandmarkerHelper.detectLiveStream()` closes the image after copying its pixels. If the frame is skipped, `detectPose()` closes it directly.
{{% /notice %}}

Now replace the TODO in `onResults()` with this code:

```kotlin
override fun onResults(landmarks: List<NormalizedLandmark>?) {
mainViewModel.handleUserPose(landmarks)
}
```

This is a callback from `PoseLandmarkerHelper` and sends the live landmarks onto the ViewModel. The next page will convert those landmarks into joint angles and a pose score.

## Configure MediaPipe Pose Landmarker

What happens between `detectPose()` and `onResults()`? Open `ui/landmarker/PoseLandmarkerHelper.kt`.

The starter file already contains the MediaPipe imports, model path, confidence values, and the `LandmarkerListener` interface for the callbacks to `MainActivity`.

Replace the TODO in `setupPoseLandmarker()` with this code:

```kotlin
fun setupPoseLandmarker(context: Context) {
try {
val baseOptions = BaseOptions.builder()
.setDelegate(Delegate.GPU)
.setModelAssetPath(MODEL_PATH)
.build()

val options = PoseLandmarker.PoseLandmarkerOptions.builder()
.setBaseOptions(baseOptions)
.setNumPoses(1)
.setMinPoseDetectionConfidence(MIN_POSE_DETECTION_CONFIDENCE)
.setMinTrackingConfidence(MIN_POSE_TRACKING_CONFIDENCE)
.setMinPosePresenceConfidence(MIN_POSE_PRESENCE_CONFIDENCE)
.setRunningMode(RunningMode.LIVE_STREAM)
.setResultListener(this::returnLiveStreamResult)
.setErrorListener(this::returnLiveStreamError)
.build()

poseLandmarker = PoseLandmarker.createFromOptions(context, options)
} catch (exception: IllegalStateException) {
listener.onError("Pose Landmarker failed to initialize. See logs for details.")
Log.e(TAG, "MediaPipe failed to load the pose landmarker", exception)
} catch (exception: RuntimeException) {
listener.onError("Pose Landmarker failed to initialize. See logs for details.")
Log.e(TAG, "MediaPipe failed to create the pose landmarker", exception)
}
}
```

The app uses `RunningMode.LIVE_STREAM` because frames arrive continuously from the camera. In this mode, MediaPipe returns results through callbacks instead of returning them directly from the detection call.

The app also requests one pose with `setNumPoses(1)`. This keeps the example focused on a single learner.

## Convert camera frames to MPImage

When we call from `MainActivity`, it is with a CameraX `ImageProxy`, but the MediaPipe analyzer expects an `MPImage`.

Replace the TODO in `detectLiveStream()` with this code to convert between the two and start `poseLandmaker`'s analysis:

```kotlin
fun detectLiveStream(
imageProxy: ImageProxy,
isFrontCamera: Boolean
) {
val frameTime = SystemClock.uptimeMillis()
val imageWidth = imageProxy.width
val imageHeight = imageProxy.height
val rotationDegrees = imageProxy.imageInfo.rotationDegrees

val bitmapBuffer = Bitmap.createBitmap(
imageWidth,
imageHeight,
Bitmap.Config.ARGB_8888
)

imageProxy.use { image ->
bitmapBuffer.copyPixelsFromBuffer(image.planes[0].buffer)
}

val matrix = Matrix().apply {
postRotate(rotationDegrees.toFloat())
if (isFrontCamera) {
postScale(-1f, 1f, imageWidth.toFloat(), imageHeight.toFloat())
}
}

val rotatedBitmap = Bitmap.createBitmap(
bitmapBuffer,
0,
0,
bitmapBuffer.width,
bitmapBuffer.height,
matrix,
true
)

val mpImage = BitmapImageBuilder(rotatedBitmap).build()
poseLandmarker?.detectAsync(mpImage, frameTime)
}
```

The call to `imageProxy.use { ... }` closes the frame after its pixels are copied.

The matrix rotates the image using the camera frame metadata. For the front camera, it also mirrors the image so the detected pose matches the preview the learner sees.

## Return the first detected pose

Finally, replace the TODO in `returnLiveStreamResult()` with this code:

```kotlin
private fun returnLiveStreamResult(
result: PoseLandmarkerResult,
@Suppress("UNUSED_PARAMETER") input: MPImage
) {
listener.onResults(result.landmarks().firstOrNull())
}
```

MediaPipe returns a list of detected poses. This app asks for one pose, so it forwards only the first landmark list.

## Run the app

Build and run the app on your Android device.

When prompted, allow camera access. You should see the front camera preview in the right side of the app.

The score will not update yet. At this point, the app is collecting pose landmarks and passing them to `MainViewModel`; the scoring logic is added in the next section.
Loading
Loading