Skip to content

STAC-24549: Handle PARTIAL snapshots in ES restore and add describe command #25

Merged
viliakov merged 1 commit intomainfrom
STAC-24549
Apr 3, 2026
Merged

STAC-24549: Handle PARTIAL snapshots in ES restore and add describe command #25
viliakov merged 1 commit intomainfrom
STAC-24549

Conversation

@viliakov
Copy link
Copy Markdown
Contributor

@viliakov viliakov commented Apr 2, 2026

Summary

  • Fix JSON unmarshalling of snapshot failures field.
  • Warn users before restoring PARTIAL snapshots with explicit confirmation prompt showing failed/total shard counts
  • Add --allow-partial flag required alongside --yes for non-interactive PARTIAL restore
  • Pass partial: true to ES restore API for PARTIAL snapshots to avoid "index wasn't fully snapshotted - cannot restore" errors
  • Add elasticsearch describe command to show full snapshot details as pretty JSON

…ommand

- Fix JSON unmarshalling of snapshot failures (was []string, now []SnapshotFailure)
- Warn users before restoring PARTIAL snapshots with explicit confirmation
- Add --allow-partial flag for non-interactive PARTIAL restore (required with --yes)
- Pass partial=true to ES restore API for PARTIAL snapshots to avoid "wasn't fully snapshotted" errors
- Add elasticsearch describe command to show snapshot details as pretty JSON
@viliakov
Copy link
Copy Markdown
Contributor Author

viliakov commented Apr 2, 2026

❯ go run main.go elasticsearch restore --namespace nightly-restore --latest
Setting up port-forward to suse-observability-elasticsearch-master-headless:9200 in namespace nightly-restore...
✅ Port-forward established on localhost:54995
Fetching latest snapshot from repository 'sts-backup'...
✅ Latest snapshot found: sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq

⚠️ Warning: WARNING: Snapshot 'sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq' is in PARTIAL state!
⚠️ Warning:   51 shard(s) failed out of 267 total (216 successful)
⚠️ Warning: Restoring this snapshot will result in incomplete data for the failed shards.

Do you want to continue? (yes/no): yes

⚠️ Warning: WARNING: Restoring from snapshot will DELETE all existing STS indices!
⚠️ Warning: This operation cannot be undone.

Snapshot to restore: sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq
Snapshot state: PARTIAL
Namespace: nightly-restore

Do you want to continue? (yes/no): yes

Scaling down deployments (selector: observability.suse.com/scalable-during-es-restore=true)...
✅ Scaled down 3 deployment(s):
  - suse-observability-e2es (replicas: 0 -> 0)
  - suse-observability-receiver-base (replicas: 0 -> 0)
  - suse-observability-receiver-logs (replicas: 0 -> 0)
✅ Scaled down 0 statefulsets(s):
Waiting for pods to terminate...
✅ All pods have terminated

Fetching current Elasticsearch indices...
Found 90 STS index(es) to delete
Rolling over datastream 'sts_k8s_logs'...
✅ Datastream rolled over successfully
Deleting 90 index(es)...
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008019
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008073
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008075
  Deleting index: sts_topology_events-2026.04.01
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008071
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008063
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008065
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008109
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008061
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008107
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008025
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008105
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008021
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008069
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008023
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008103
  Deleting index: sts_topology_events-2026.03.31
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008029
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008101
  Deleting index: sts_topology_events-2026.03.30
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008067
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008027
  Deleting index: sts_topology_events-2026.03.03
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008083
  Deleting index: .ds-sts_k8s_logs-2026.03.26-007997
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008085
  Deleting index: sts_topology_events-2026.03.06
  Deleting index: .ds-sts_k8s_logs-2026.03.26-007999
  Deleting index: sts_topology_events-2026.03.07
  Deleting index: sts_topology_events-2026.03.04
  Deleting index: sts_topology_events-2026.03.05
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008081
  Deleting index: .ds-sts_k8s_logs-2026.04.01-008111
  Deleting index: sts_topology_events-2026.03.08
  Deleting index: sts_topology_events-2026.03.09
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008077
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008079
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008099
  Deleting index: sts_topology_events-2026.03.13
  Deleting index: sts_topology_events-2026.03.14
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008097
  Deleting index: sts_topology_events-2026.03.11
  Deleting index: sts_topology_events-2026.03.12
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008095
  Deleting index: sts_topology_events-2026.03.17
  Deleting index: sts_topology_events-2026.03.18
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008093
  Deleting index: sts_topology_events-2026.03.15
  Deleting index: sts_topology_events-2026.03.16
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008091
  Deleting index: sts_topology_events-2026.03.19
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008043
  Deleting index: .ds-sts_k8s_logs-2026.04.02-008114
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008041
  Deleting index: .ds-sts_k8s_logs-2026.04.01-008113
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008049
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008001
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008045
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008003
  Deleting index: sts_topology_events-2026.03.10
  Deleting index: .ds-sts_k8s_logs-2026.03.30-008087
  Deleting index: .ds-sts_k8s_logs-2026.03.31-008089
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008047
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008005
  Deleting index: sts_topology_events-2026.03.24
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008007
  Deleting index: sts_topology_events-2026.03.25
  Deleting index: sts_topology_events-2026.03.22
  Deleting index: sts_topology_events-2026.03.23
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008009
  Deleting index: sts_topology_events-2026.03.28
  Deleting index: sts_topology_events-2026.03.29
  Deleting index: sts_topology_events-2026.03.26
  Deleting index: sts_topology_events-2026.03.27
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008053
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008055
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008031
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008051
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008011
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008035
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008033
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008013
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008015
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008039
  Deleting index: sts_topology_events-2026.03.20
  Deleting index: sts_topology_events-2026.03.21
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008057
  Deleting index: .ds-sts_k8s_logs-2026.03.28-008037
  Deleting index: .ds-sts_k8s_logs-2026.03.27-008017
  Deleting index: .ds-sts_k8s_logs-2026.03.29-008059
✅ All indices deleted successfully

Triggering restore for snapshot: sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq
✅ Restore triggered successfully
Checking restore status for snapshot: sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq
✅ Restore completed successfully

Finalizing restore...
Scaling up deployments from annotations (selector: observability.suse.com/scalable-during-es-restore=true)...
✅ Scaled up 3 deployment(s) successfully:
  - suse-observability-e2es (replicas: 0 -> 0)
  - suse-observability-receiver-base (replicas: 0 -> 0)
  - suse-observability-receiver-logs (replicas: 0 -> 0)
✅ Scaled up 0 statefulset(s) successfully:
✅ Finalization completed successfully

@viliakov
Copy link
Copy Markdown
Contributor Author

viliakov commented Apr 2, 2026

❯ go run main.go elasticsearch restore --namespace nightly-restore --latest --yes
Setting up port-forward to suse-observability-elasticsearch-master-headless:9200 in namespace nightly-restore...
✅ Port-forward established on localhost:55124
Fetching latest snapshot from repository 'sts-backup'...
✅ Latest snapshot found: sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq
❌ Error: snapshot 'sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq' is PARTIAL with 51 shard failure(s); use --allow-partial together with --yes to restore a partial snapshot non-interactively
exit status 1

@viliakov
Copy link
Copy Markdown
Contributor Author

viliakov commented Apr 2, 2026

❯ go run main.go elasticsearch describe --namespace nightly-restore -s sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq
Setting up port-forward to suse-observability-elasticsearch-master-headless:9200 in namespace nightly-restore...
✅ Port-forward established on localhost:55283
Fetching snapshot 'sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq' from repository 'sts-backup'...
{
  "snapshot": "sts-backup-20260401-0300-ggheroepr2aj24ebp15qpq",
  "uuid": "CE8LQ3_HTo6dmi0iiZC6LQ",
  "repository": "sts-backup",
  "state": "PARTIAL",
  "start_time": "2026-04-01T02:59:59.864Z",
  "start_time_in_millis": 1775012399864,
  "end_time": "2026-04-01T03:04:48.697Z",
  "end_time_in_millis": 1775012688697,
  "duration_in_millis": 288833,
  "indices": [
    ".ds-sts_k8s_logs-2026.03.28-008041",
    ".ds-sts_k8s_logs-2026.03.28-008025",
    ".ds-sts_k8s_logs-2026.03.31-008097",
  ...
  ],
  "failures": [
    {
      "index": ".ds-sts_k8s_logs-2026.03.30-008069",
      "index_uuid": "u82xI1lgToOVOL-UGO00RQ",
      "shard_id": 2,
      "reason": "IOException[Unable to upload object [nightly/elasticsearch/indices/uVVwxR7ySKinTlIgnz8axg/2/__-OY_MWpfR2OH1XM8fLLa1w] using a single upload]; nested: SdkClientException[Unable to execute HTTP request: Connect to suse-observability-s3proxy:9000 [suse-observability-s3proxy/10.0.240.76] failed: Connection refused (SDK Attempt Count: 1)]; nested: HttpHostConnectException[Connect to suse-observability-s3proxy:9000 [suse-observability-s3proxy/10.0.240.76] failed: Connection refused]; nested: ConnectException[Connection refused]",
      "node_id": "J4QengskRpGJfaB8uwWymw",
      "status": "INTERNAL_SERVER_ERROR"
    },
    {
      "index": ".ds-sts_k8s_logs-2026.03.31-008089",
      "index_uuid": "sn2Wfve3T3yNLQZFWWbRPA",
      "shard_id": 2,
      "reason": "IOException[Unable to upload object [nightly/elasticsearch/indices/x_sMK1BmR9i4T6aE2WsdUg/2/__e3jCZU4-StKTu_N-mGev3A] using a single upload]; nested: SdkClientException[Unable to execute HTTP request: Connect to suse-observability-s3proxy:9000 [suse-observability-s3proxy/10.0.240.76] failed: Connection refused (SDK Attempt Count: 1)]; nested: HttpHostConnectException[Connect to suse-observability-s3proxy:9000 [suse-observability-s3proxy/10.0.240.76] failed: Connection refused]; nested: ConnectException[Connection refused]",
      "node_id": "J4QengskRpGJfaB8uwWymw",
      "status": "INTERNAL_SERVER_ERROR"
    },
 ...
  ],
  "shards": {
    "total": 267,
    "failed": 51,
    "successful": 216
  }
}

@viliakov viliakov merged commit ad15fa1 into main Apr 3, 2026
5 checks passed
@viliakov viliakov deleted the STAC-24549 branch April 3, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants