-
Notifications
You must be signed in to change notification settings - Fork 0
Switch production EKS to t3 instances for compatibility #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Change instance types from t4g to t3 family - Addresses compatibility issues with certain workloads - Maintains same instance sizes (large, xlarge)
WalkthroughEKS node group instance types updated from ARM-based t4g (t4g.large, t4g.xlarge) to t3 (t3.large, t3.xlarge) in the production Terraform configuration. Instance sizing and allocation remain unchanged; only the instance family is modified. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
prod/eks/terragrunt.hcl(1 hunks)
🧰 Additional context used
🔍 Remote MCP Exa, Linear, Notion
Perfect! I have gathered comprehensive context. Let me compile my findings:
Summary of Relevant Context for PR #11 Review
⚠️ CRITICAL CONFLICT IDENTIFIED
Company Infrastructure Policy Violation: Your pull request directly contradicts the organization's approved AWS Instance Type Standards documented in Notion. The policy states:
Graviton (ARM64) First Policy: "All production workloads MUST use AWS Graviton processors (ARM64 architecture)."
The policy explicitly lists t3 and t2 as prohibited for production use:
"The following x86 instance families are NOT approved for production use: t3, t2 (use t4g instead)"
Your PR change: Converting from t4g (ARM64-based Graviton) → t3 (x86-based Intel Xeon) directly violates this "Graviton first" mandate.
Technical Background on Instance Types
t4g (Current):
- Powers AWS Graviton2 ARM-based processors, delivering up to 40% better price performance over T3 instances
- ARM64 architecture
t3 (Proposed):
- Intel Xeon Platinum processors, low-cost burstable general-purpose instances designed for applications with moderate CPU usage
- x86 architecture
Container Architecture Compatibility Concern
The policy notes that "all container images are built for ARM64 in our CI/CD pipeline." In multi-architecture Kubernetes environments, deploying x86-based workloads on ARM-only clusters or vice versa requires special consideration, and compatibility issues may arise if container images are not multi-arch builds.
Exception Process Required
If there is a legitimate technical reason for this migration, the policy requires:
- Submit request to cloud-infrastructure team
- Provide technical justification for x86 requirement
- Include migration timeline back to ARM64
- Obtain VP-level approval
No exception approval is evident in the current PR.
Recommendation: Before approving this PR, clarify whether:
- An exception has been granted (and if so, document it)
- The workload compatibility issue requires x86 specifically
- Container images support multi-arch builds if both architectures will run,,
| # Node group configuration - switched to t3 for compatibility | ||
| arm_instance_types = ["t3.large", "t3.xlarge"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the actual file and its context
cat -n prod/eks/terragrunt.hcl | head -40Repository: coderabbitai/rabbits-playground
Length of output: 1164
🏁 Script executed:
# Search for any exception approvals or justification comments in the repo
rg -i "exception|approval|t3.*approved|graviton.*first|instance.*policy" -A 3 -B 3Repository: coderabbitai/rabbits-playground
Length of output: 57
🏁 Script executed:
# Look for container image build configuration
fd -e Dockerfile -e "*.yml" -e "*.yaml" | head -20Repository: coderabbitai/rabbits-playground
Length of output: 57
🏁 Script executed:
# Search for any documentation about instance type policy or requirements
rg -i "instance.*type.*policy|graviton.*first|aws.*standard|t3.*prohibited|t4g.*required" -g '*.md' -g '*.txt' -g '*.rst'Repository: coderabbitai/rabbits-playground
Length of output: 57
🏁 Script executed:
# Check git history for justification of this change
git log --oneline -20 --all -- prod/eks/terragrunt.hcl 2>/dev/null || echo "Git history not available"Repository: coderabbitai/rabbits-playground
Length of output: 161
🏁 Script executed:
# Search for multi-arch or platform-specific image build configuration
rg -i "docker buildx|platform.*linux|arm64|amd64|multi.*arch" -g 'Dockerfile*' -g '*.yml' -g '*.yaml' -g '*.sh' | head -30Repository: coderabbitai/rabbits-playground
Length of output: 223
🏁 Script executed:
# Check git commit details for the t3 migration
git show bfe9a34 --stat 2>/dev/null || echo "Cannot show full commit"Repository: coderabbitai/rabbits-playground
Length of output: 2103
🏁 Script executed:
# Search for any README or documentation about instance type requirements
fd -e README.md -e '*.md' -e '*.txt' | xargs grep -l "instance\|t3\|t4g\|compatibility" 2>/dev/null || echo "No docs found"Repository: coderabbitai/rabbits-playground
Length of output: 88
🏁 Script executed:
# Check if there are any other node group configurations for comparison
rg "instance_types|arm_instance" -B 3 -A 3Repository: coderabbitai/rabbits-playground
Length of output: 1534
🏁 Script executed:
# Look for any comments or variables that might explain the compatibility issue
rg "compatibility|t3|t4g" --context=5Repository: coderabbitai/rabbits-playground
Length of output: 4258
Critical: Architecture mismatch between AMI type and instance type will cause deployment failures.
The change introduces a serious compatibility issue that will prevent the EKS node group from launching successfully:
-
AMI and Instance Type Incompatibility: The EKS module is configured with
ami_type = "AL2_ARM_64"(Amazon Linux 2 ARM 64-bit), but prod/eks now uses t3 instances which are x86-based Intel Xeon. AL2_ARM_64 is incompatible with x86 architecture—workloads will fail to launch. The variable must either use ARM instances (t4g) matching the AMI type, or the AMI must be changed toAL2_x86_64. -
Misleading Variable Name: The variable
arm_instance_typescontains x86 instances (t3 family), not ARM-based instances. This contradicts the variable's documented purpose. If x86 instances are intentional, rename toinstance_typesand update the corresponding variables (e.g.,arm_desired_size→desired_size). Otherwise, revert to t4g (ARM-based) instances. -
Incomplete Justification: The commit message states "Addresses compatibility issues with certain workloads" but provides no specifics. What workloads? Why does t3 solve it? This vague explanation makes it impossible to assess whether the change is appropriate or if there are better solutions.
-
Outdated Documentation: The README still states prod should use "t4g.large/xlarge instances," but this change uses t3. Update documentation to reflect the actual configuration and justify the deviation.
🤖 Prompt for AI Agents
In prod/eks/terragrunt.hcl around lines 25-26, the variable arm_instance_types
was set to x86 t3 instances which conflicts with the EKS module configured as
ami_type = "AL2_ARM_64"; fix by either (A) reverting to ARM instance types
(e.g., "t4g.large","t4g.xlarge") to match AL2_ARM_64 and keep variable name, or
(B) change the AMI to AL2_x86_64 and rename the variable from arm_instance_types
to instance_types (and rename related variables like arm_desired_size →
desired_size) to reflect x86 usage; also update README and any module/variable
docs to match the chosen architecture and amend the commit message to explain
which workloads required the change and why this instance/AMI choice resolves
that need.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.