Skip to content

Conversation

@ameukam
Copy link
Member

@ameukam ameukam commented Dec 11, 2025

Retake of #17520

In addition:

  • ensure bucket is empty before deletion
  • use exponential back-off to apply public read policy
  • ensure bucket public access policy is disabled

xmudrii and others added 4 commits December 11, 2025 10:49
Signed-off-by: Arnaud Meukam <[email protected]>
Ensure all the objects of a bucket are deleted before it's deleted.
AWS S3 requires the bucket is empty before deletion

Signed-off-by: Arnaud Meukam <[email protected]>
5 attempts are done to ensure the public read policy is applied

Signed-off-by: Arnaud Meukam <[email protected]>
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 11, 2025
@ameukam
Copy link
Member Author

ameukam commented Dec 11, 2025

/test pull-kops-aws-distro-debian13

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 11, 2025
@ameukam
Copy link
Member Author

ameukam commented Dec 11, 2025

/test pull-kops-aws-distro-debian13

@ameukam
Copy link
Member Author

ameukam commented Dec 11, 2025

/test pull-kops-aws-distro-al2023
/test pull-kops-aws-distro-ubuntu2404
/test pull-kops-e2e-cni-cilium

@ameukam ameukam marked this pull request as ready for review December 11, 2025 15:11
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
@hakman
Copy link
Member

hakman commented Dec 11, 2025

@ameukam This needs to be tested with something that enables discovery store. Do we have such job?

@ameukam
Copy link
Member Author

ameukam commented Dec 11, 2025

@ameukam This needs to be tested with something that enables discovery store. Do we have such job?

I don't think we have something like that as a presubmit.

@ameukam
Copy link
Member Author

ameukam commented Dec 11, 2025

/test pull-kops-e2e-cni-cilium

@hakman
Copy link
Member

hakman commented Dec 11, 2025

Please ignore the failing tests, let's get the review going first.

@ameukam
Copy link
Member Author

ameukam commented Dec 11, 2025

cc @justinsb @rifelpet

Copy link
Member

@hakman hakman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main part looks great, thank you both @ameukam & @xmudrii for this.
There are still some Q, though nothing huge.

Comment on lines +156 to +159
// Empty the bucket first
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea, but I would expect that kOps removes all the files from the bucket, otherwise it's a bug.
This approach may hide such issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kOps force delete the S3 buckets at shutdown whether the bucket is empty or not. This is just an extra step to guarantee the bucket is properly deleted.

Potential issues will probably not of the emptiness of the bucket.

Happy to drop this if you think it's unnecessary

Comment on lines +241 to +253
var err error
backoff := time.Second
for i := 0; i < 5; i++ {
_, err = c.s3Client.PutBucketPolicy(ctx, &s3.PutBucketPolicyInput{
Bucket: aws.String(bucketName),
Policy: aws.String(policy),
})
if err == nil {
return nil
}
klog.Infof("Failed to set public read policy on bucket %s, retrying in %v. Error: %v", bucketName, backoff, err)
time.Sleep(backoff)
backoff *= 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if exponential backoff is the right approach here:

  • it is not a recurring task
  • it does not happen concurrently
  • we don't expect rate limiting.
    I would vote for keeping it simple with some fixed interval and number of retries, making the behaviour more predictable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I'll set a time sleep of 10 seconds

Comment on lines 263 to 266
BlockPublicAcls: aws.Bool(false),
IgnorePublicAcls: aws.Bool(false),
BlockPublicPolicy: aws.Bool(false),
RestrictPublicBuckets: aws.Bool(false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the ACLs to 'true`?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure of the ACL requirements of kOps for the service discovery so I went with a simple configuration. We can try what you are suggesting.

return ""
}
d.createBucket = true
discovery = "s3://" + bucketName
Copy link
Member

@hakman hakman Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this to be used somewhere, by being passed to kOps when creating the cluster.
Maybe it's me, but I don't see it used in any of the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the test will need a discovery bucket by default* and we just want to make sure we don't depends on a static reference of an AWS resource we (sig-k8s-infra) currently don't have control on.

*based on: https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kops/17809/pull-kops-aws-distro-al2023/1999113652837289984/build-log.txt

I1211 13:57:27.886490   13618 s3.go:129] Bucket k8s-infra-kops-state-ae62-20251211135010 created successfully
I1211 13:57:28.292397   13618 s3.go:129] Bucket k8s-infra-kops-discovery-ae62-20251211135727 created successfully
I1211 13:57:28.648562   13618 s3.go:146] Public read policy set on bucket k8s-infra-kops-discovery-ae62-20251211135727
I1211 14:34:05.635169   13618 s3.go:222] Bucket k8s-infra-kops-state-ae62-20251211135010 deleted
I1211 14:34:05.920863   13618 s3.go:222] Bucket k8s-infra-kops-discovery-ae62-20251211135727 deleted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kOps requires --discovery-store to be passed to tests needing it (see kubernetes/test-infra#35648 as reference). In the past, due to the static nature of the bucket, it was passed as part of the test definition. As it is dynamically created now, kubetest2-kops needs to pass it to kOps somehow (either as flag or new env var, similar to the state store).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferable we pass it as a env var in prowjob definition to avoid hard coding it. Setup a dedicated bucket is cheap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but not sure I understand what you mean. We cannot pass it from prow anymore, as it's dynamically generated.

Ensure public access policies can be disabled when the bucket is public

Signed-off-by: Arnaud Meukam <[email protected]>
@ameukam ameukam force-pushed the test-aws-discovery-store branch from a332cc7 to aa03eb6 Compare December 12, 2025 17:16
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hakman. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants