diff --git a/README.md b/README.md index 351a160a62..10c8adf7bf 100644 --- a/README.md +++ b/README.md @@ -121,6 +121,7 @@ Join our discord community via [this invite link](https://discord.gg/bxgXW8jJGh) | [disable\_runner\_autoupdate](#input\_disable\_runner\_autoupdate) | Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the [GitHub article](https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/) | `bool` | `false` | no | | [enable\_ami\_housekeeper](#input\_enable\_ami\_housekeeper) | Option to disable the lambda to clean up old AMIs. | `bool` | `false` | no | | [enable\_cloudwatch\_agent](#input\_enable\_cloudwatch\_agent) | Enables the cloudwatch agent on the ec2 runner instances. The runner uses a default config that can be overridden via `cloudwatch_config`. | `bool` | `true` | no | +| [enable\_dynamic\_labels](#input\_enable\_dynamic\_labels) | Experimental! Can be removed / changed without trigger a major release. Enable dynamic EC2 configs based on workflow job labels. When enabled, jobs can request specific configs via the 'gh-ec2-:' label (e.g., 'gh-ec2-instance-type:t3.large'). | `bool` | `false` | no | | [enable\_ephemeral\_runners](#input\_enable\_ephemeral\_runners) | Enable ephemeral runners, runners will only be used once. | `bool` | `false` | no | | [enable\_jit\_config](#input\_enable\_jit\_config) | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is available. In case you are upgrading from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI. | `bool` | `null` | no | | [enable\_job\_queued\_check](#input\_enable\_job\_queued\_check) | Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | `bool` | `null` | no | diff --git a/docs/configuration.md b/docs/configuration.md index 8ec7e4caef..5a5246e706 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -328,6 +328,212 @@ Below is an example of the log messages created. } ``` +### Dynamic Labels + +This feature is in early stage and therefore disabled by default. To enable dynamic labels, set `enable_dynamic_labels = true`. + +Dynamic labels allow workflow authors to pass arbitrary metadata and EC2 instance overrides directly from the `runs-on` labels in their GitHub Actions workflows. All labels prefixed with `ghr-` are treated as dynamic labels. A deterministic hash of all `ghr-` prefixed labels is computed and used for runner matching, ensuring that each unique combination of dynamic labels routes to the correct runner configuration. + +Dynamic labels serve two purposes: + +1. **Custom identity / restriction labels (`ghr-:`)** — Any `ghr-` prefixed label that is *not* `ghr-ec2-` acts as a custom identity label. These can represent a unique job ID, a team name, a cost center, an environment tag, or any arbitrary restriction. They do not affect EC2 configuration but are included in the label hash, guaranteeing unique runner matching per combination. +2. **EC2 override labels (`ghr-ec2-:`)** — Labels prefixed with `ghr-ec2-` are parsed by the scale-up lambda to dynamically configure the EC2 fleet request — including instance type, vCPU/memory requirements, GPU/accelerator specs, EBS volumes, placement, and networking. This eliminates the need to create separate runner configurations for each hardware combination. + +#### How it works + +When `enable_dynamic_labels` is enabled, the webhook and scale-up lambdas inspect the `runs-on` labels of incoming `workflow_job` events. Labels starting with `ghr-ec2-` are parsed into an EC2 override configuration that is applied to the [CreateFleet](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html) API call. All other `ghr-` prefixed labels are carried through as custom identity labels. A deterministic hash of **all** `ghr-` prefixed labels (both custom and EC2) is used to ensure consistent and unique runner matching. + +#### Configuration + +```hcl +module "runners" { + source = "github-aws-runners/github-runners/aws" + + ... + enable_dynamic_labels = true + ... +} +``` + +#### Custom identity labels + +Any label matching `ghr-:` (where `` does **not** start with `ec2-`) is a custom identity label. These labels have no effect on EC2 instance configuration but are included in the runner matching hash. Use them to: + +- Assign a **unique job identity** so each workflow run targets a dedicated runner (e.g., `ghr-job-id:abc123`). +- Apply **team or cost-center restrictions** (e.g., `ghr-team:platform`, `ghr-cost-center:eng-42`). +- Tag runners with **environment or deployment context** (e.g., `ghr-env:staging`, `ghr-region:us-west-2`). +- Enforce **any custom constraint** that differentiates one runner request from another. + +```yaml +jobs: + deploy: + runs-on: + - self-hosted + - linux + - ghr-team:platform + - ghr-env:staging + - ghr-job-id:${{ github.run_id }} +``` + +In the example above, the three `ghr-` labels produce a unique hash, ensuring this job is matched to a runner created specifically for this combination. No EC2 overrides are applied — the runner uses the default fleet configuration. + +#### EC2 override labels + +Labels using the format `ghr-ec2-:` override EC2 fleet configuration. Values with multiple items use comma-separated lists. + +##### Basic Fleet Overrides + +| Label | Description | Example value | +| -------------------------------------------- | ------------------------------------ | ------------------- | +| `ghr-ec2-instance-type:` | Set specific instance type | `c5.xlarge` | +| `ghr-ec2-max-price:` | Set maximum spot price | `0.10` | +| `ghr-ec2-subnet-id:` | Set subnet ID | `subnet-abc123` | +| `ghr-ec2-availability-zone:` | Set availability zone | `us-east-1a` | +| `ghr-ec2-availability-zone-id:` | Set availability zone ID | `use1-az1` | +| `ghr-ec2-weighted-capacity:` | Set weighted capacity | `2` | +| `ghr-ec2-priority:` | Set launch priority | `1` | +| `ghr-ec2-image-id:` | Override AMI ID | `ami-0abcdef123` | + +##### Instance Requirements — vCPU & Memory + +| Label | Description | Example value | +| -------------------------------------------- | ------------------------------------ | ------------------- | +| `ghr-ec2-vcpu-count-min:` | Minimum vCPU count | `4` | +| `ghr-ec2-vcpu-count-max:` | Maximum vCPU count | `16` | +| `ghr-ec2-memory-mib-min:` | Minimum memory in MiB | `16384` | +| `ghr-ec2-memory-mib-max:` | Maximum memory in MiB | `65536` | +| `ghr-ec2-memory-gib-per-vcpu-min:` | Min memory per vCPU ratio (GiB) | `2` | +| `ghr-ec2-memory-gib-per-vcpu-max:` | Max memory per vCPU ratio (GiB) | `8` | + +##### Instance Requirements — CPU & Performance + +| Label | Description | Example value | +| -------------------------------------------- | ----------------------------------------------------------------- | -------------------------- | +| `ghr-ec2-cpu-manufacturers:` | CPU manufacturers (comma-separated) | `intel,amd` | +| `ghr-ec2-instance-generations:` | Instance generations (comma-separated) | `current` | +| `ghr-ec2-excluded-instance-types:` | Exclude instance types (comma-separated) | `t2.micro,t3.nano` | +| `ghr-ec2-allowed-instance-types:` | Allow only specific instance types (comma-separated) | `c5.xlarge,c5.2xlarge` | +| `ghr-ec2-burstable-performance:` | Burstable performance (`included`, `excluded`, `required`) | `excluded` | +| `ghr-ec2-bare-metal:` | Bare metal (`included`, `excluded`, `required`) | `excluded` | + +##### Instance Requirements — Accelerators / GPU + +| Label | Description | Example value | +| ------------------------------------------------- | ------------------------------------------------------------------------ | -------------------------------- | +| `ghr-ec2-accelerator-types:` | Accelerator types (comma-separated: `gpu`, `fpga`, `inference`) | `gpu` | +| `ghr-ec2-accelerator-count-min:` | Minimum accelerator count | `1` | +| `ghr-ec2-accelerator-count-max:` | Maximum accelerator count | `4` | +| `ghr-ec2-accelerator-manufacturers:` | Accelerator manufacturers (comma-separated) | `nvidia` | +| `ghr-ec2-accelerator-names:` | Specific accelerator names (comma-separated) | `t4,v100` | +| `ghr-ec2-accelerator-memory-mib-min:` | Min accelerator total memory in MiB | `8192` | +| `ghr-ec2-accelerator-memory-mib-max:` | Max accelerator total memory in MiB | `32768` | + +##### Instance Requirements — Network & Storage + +| Label | Description | Example value | +| -------------------------------------------------- | ----------------------------------------------------------------- | ------------------- | +| `ghr-ec2-network-interface-count-min:` | Min network interfaces | `1` | +| `ghr-ec2-network-interface-count-max:` | Max network interfaces | `4` | +| `ghr-ec2-network-bandwidth-gbps-min:` | Min network bandwidth in Gbps | `10` | +| `ghr-ec2-network-bandwidth-gbps-max:` | Max network bandwidth in Gbps | `25` | +| `ghr-ec2-local-storage:` | Local storage (`included`, `excluded`, `required`) | `required` | +| `ghr-ec2-local-storage-types:` | Local storage types (comma-separated: `hdd`, `ssd`) | `ssd` | +| `ghr-ec2-total-local-storage-gb-min:` | Min total local storage in GB | `100` | +| `ghr-ec2-total-local-storage-gb-max:` | Max total local storage in GB | `500` | +| `ghr-ec2-baseline-ebs-bandwidth-mbps-min:` | Min baseline EBS bandwidth in Mbps | `1000` | +| `ghr-ec2-baseline-ebs-bandwidth-mbps-max:` | Max baseline EBS bandwidth in Mbps | `5000` | + +##### Placement + +| Label | Description | Example value | +| ------------------------------------------------------ | ---------------------------------------------------------- | --------------------- | +| `ghr-ec2-placement-group:` | Placement group name | `my-cluster-group` | +| `ghr-ec2-placement-tenancy:` | Tenancy (`default`, `dedicated`, `host`) | `dedicated` | +| `ghr-ec2-placement-host-id:` | Dedicated host ID | `h-abc123` | +| `ghr-ec2-placement-affinity:` | Affinity (`default`, `host`) | `host` | +| `ghr-ec2-placement-partition-number:` | Partition number | `1` | +| `ghr-ec2-placement-availability-zone:` | Placement availability zone | `us-east-1a` | +| `ghr-ec2-placement-spread-domain:` | Spread domain | `my-domain` | +| `ghr-ec2-placement-host-resource-group-arn:` | Host resource group ARN | `arn:aws:...` | + +##### Block Device Mappings (EBS) + +| Label | Description | Example value | +| ------------------------------------------------ | -------------------------------------------------------------- | -------------- | +| `ghr-ec2-ebs-volume-size:` | EBS volume size in GB | `100` | +| `ghr-ec2-ebs-volume-type:` | EBS volume type (`gp2`, `gp3`, `io1`, `io2`, `st1`, `sc1`) | `gp3` | +| `ghr-ec2-ebs-iops:` | EBS IOPS | `3000` | +| `ghr-ec2-ebs-throughput:` | EBS throughput in MB/s (gp3 only) | `125` | +| `ghr-ec2-ebs-encrypted:` | EBS encryption (`true`, `false`) | `true` | +| `ghr-ec2-ebs-kms-key-id:` | KMS key ID for encryption | `key-abc123` | +| `ghr-ec2-ebs-delete-on-termination:` | Delete on termination (`true`, `false`) | `true` | +| `ghr-ec2-ebs-snapshot-id:` | Snapshot ID for EBS volume | `snap-abc123` | +| `ghr-ec2-block-device-virtual-name:` | Virtual device name (ephemeral storage) | `ephemeral0` | +| `ghr-ec2-block-device-no-device:` | Suppresses device mapping | `true` | + +##### Pricing & Advanced + +| Label | Description | Example value | +| ----------------------------------------------------------------------------- | ------------------------------------------------------------------ | -------------- | +| `ghr-ec2-spot-max-price-percentage-over-lowest-price:` | Spot max price as % over lowest price | `20` | +| `ghr-ec2-on-demand-max-price-percentage-over-lowest-price:` | On-demand max price as % over lowest price | `10` | +| `ghr-ec2-max-spot-price-as-percentage-of-optimal-on-demand-price:` | Max spot price as % of optimal on-demand | `50` | +| `ghr-ec2-require-hibernate-support:` | Require hibernate support (`true`, `false`) | `true` | +| `ghr-ec2-require-encryption-in-transit:` | Require encryption in-transit (`true`, `false`) | `true` | +| `ghr-ec2-baseline-performance-factors-cpu-reference-families:` | CPU baseline performance reference families (comma-separated) | `c5,m5` | + +#### Examples + +Custom identity labels only — unique runner per job run: + +```yaml +jobs: + deploy: + runs-on: + - self-hosted + - linux + - ghr-job-id:${{ github.run_id }} +``` + +Specific instance type with a larger EBS volume: + +```yaml +jobs: + build: + runs-on: + - self-hosted + - linux + - ghr-ec2-instance-type:c5.2xlarge + - ghr-ec2-ebs-volume-size:200 + - ghr-ec2-ebs-volume-type:gp3 +``` + +Attribute-based instance selection with Intel CPUs only: + +```yaml +jobs: + test: + runs-on: + - self-hosted + - linux + - ghr-ec2-vcpu-count-min:2 + - ghr-ec2-vcpu-count-max:8 + - ghr-ec2-memory-mib-min:8192 + - ghr-ec2-cpu-manufacturers:intel + - ghr-ec2-burstable-performance:excluded +``` + +#### Considerations + +- This feature requires `enable_dynamic_labels = true` in your Terraform configuration. +- When using `ghr-ec2-instance-type`, the fleet request uses a direct instance type override. When using `ghr-ec2-vcpu-count-*`, `ghr-ec2-memory-mib-*`, or other instance requirement labels, the fleet request uses [attribute-based instance type selection](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-attribute-based-instance-type-selection.html). +- Labels are parsed at the scale-up lambda level — they do not change after the instance is launched. +- A deterministic hash of all `ghr-` prefixed labels (both custom identity and EC2 override) is used for runner matching. Different label combinations produce different hashes, ensuring each unique set of requirements gets its own runner. +- Custom `ghr-` labels (non-`ec2`) are free-form — you can use any key/value pair. They are not validated by the module. +- Multiple EBS labels apply to the same (first) block device mapping. If you need more complex block device configurations, use a custom AMI or launch template instead. +- This feature is compatible with both org-level and repo-level runners, spot and on-demand instances, and ephemeral and non-ephemeral runners. +- Be mindful of the security implications: enabling this feature allows workflow authors to influence EC2 instance configuration via `ghr-ec2-` labels. Ensure your IAM policies and subnet configurations provide appropriate guardrails. + ### EventBridge This module can be deployed in `EventBridge` mode. The `EventBridge` mode will publish an event to an eventbus. Within the eventbus, there is a target rule set, sending events to the dispatch lambda. The `EventBridge` mode is enabled by default. diff --git a/lambdas/functions/control-plane/src/aws/runners.d.ts b/lambdas/functions/control-plane/src/aws/runners.d.ts index 7e9bf0fbba..ab7604f7ff 100644 --- a/lambdas/functions/control-plane/src/aws/runners.d.ts +++ b/lambdas/functions/control-plane/src/aws/runners.d.ts @@ -1,4 +1,11 @@ -import { DefaultTargetCapacityType, SpotAllocationStrategy } from '@aws-sdk/client-ec2'; +import { + DefaultTargetCapacityType, + InstanceRequirementsRequest, + SpotAllocationStrategy, + _InstanceType, + Placement, + FleetBlockDeviceMappingRequest, +} from '@aws-sdk/client-ec2'; export type RunnerType = 'Org' | 'Repo'; @@ -29,6 +36,20 @@ export interface ListRunnerFilters { statuses?: string[]; } +export interface Ec2OverrideConfig { + InstanceType?: _InstanceType; + MaxPrice?: string; + SubnetId?: string; + AvailabilityZone?: string; + WeightedCapacity?: number; + Priority?: number; + Placement?: Placement; + BlockDeviceMappings?: FleetBlockDeviceMappingRequest[]; + InstanceRequirements?: InstanceRequirementsRequest; + ImageId?: string; + AvailabilityZoneId?: string; +} + export interface RunnerInputParameters { environment: string; runnerType: RunnerType; @@ -41,6 +62,7 @@ export interface RunnerInputParameters { maxSpotPrice?: string; instanceAllocationStrategy: SpotAllocationStrategy; }; + ec2OverrideConfig?: Ec2OverrideConfig; numberOfRunners: number; amiIdSsmParameterName?: string; tracingEnabled?: boolean; diff --git a/lambdas/functions/control-plane/src/aws/runners.test.ts b/lambdas/functions/control-plane/src/aws/runners.test.ts index 63f1412dd0..911af2d70d 100644 --- a/lambdas/functions/control-plane/src/aws/runners.test.ts +++ b/lambdas/functions/control-plane/src/aws/runners.test.ts @@ -318,6 +318,7 @@ describe('create runner', () => { allocationStrategy: SpotAllocationStrategy.CAPACITY_OPTIMIZED, capacityType: 'spot', type: 'Org', + scaleErrors: ['UnfulfillableCapacity', 'MaxSpotInstanceCountExceeded'], }; const defaultExpectedFleetRequestValues: ExpectedFleetRequestValues = { @@ -425,6 +426,215 @@ describe('create runner', () => { }), }); }); + + it('overrides SubnetId when specified in ec2OverrideConfig', async () => { + await createRunner({ + ...createRunnerConfig(defaultRunnerConfig), + ec2OverrideConfig: { + SubnetId: 'subnet-override', + }, + }); + + expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { + LaunchTemplateConfigs: [ + { + LaunchTemplateSpecification: { + LaunchTemplateName: 'lt-1', + Version: '$Default', + }, + Overrides: [ + { + InstanceType: 'm5.large', + SubnetId: 'subnet-override', + }, + { + InstanceType: 'c5.large', + SubnetId: 'subnet-override', + }, + ], + }, + ], + SpotOptions: { + AllocationStrategy: SpotAllocationStrategy.CAPACITY_OPTIMIZED, + }, + TagSpecifications: expect.any(Array), + TargetCapacitySpecification: { + DefaultTargetCapacityType: 'spot', + TotalTargetCapacity: 1, + }, + Type: 'instant', + }); + }); + + it('overrides InstanceType when specified in ec2OverrideConfig', async () => { + await createRunner({ + ...createRunnerConfig(defaultRunnerConfig), + ec2OverrideConfig: { + InstanceType: 't3.xlarge', + }, + }); + + expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { + LaunchTemplateConfigs: [ + { + LaunchTemplateSpecification: { + LaunchTemplateName: 'lt-1', + Version: '$Default', + }, + Overrides: [ + { + InstanceType: 't3.xlarge', + SubnetId: 'subnet-123', + }, + { + InstanceType: 't3.xlarge', + SubnetId: 'subnet-456', + }, + ], + }, + ], + SpotOptions: { + AllocationStrategy: SpotAllocationStrategy.CAPACITY_OPTIMIZED, + }, + TagSpecifications: expect.any(Array), + TargetCapacitySpecification: { + DefaultTargetCapacityType: 'spot', + TotalTargetCapacity: 1, + }, + Type: 'instant', + }); + }); + + it('overrides ImageId when specified in ec2OverrideConfig', async () => { + await createRunner({ + ...createRunnerConfig(defaultRunnerConfig), + ec2OverrideConfig: { + ImageId: 'ami-override-123', + }, + }); + + expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { + LaunchTemplateConfigs: [ + { + LaunchTemplateSpecification: { + LaunchTemplateName: 'lt-1', + Version: '$Default', + }, + Overrides: [ + { + InstanceType: 'm5.large', + SubnetId: 'subnet-123', + ImageId: 'ami-override-123', + }, + { + InstanceType: 'c5.large', + SubnetId: 'subnet-123', + ImageId: 'ami-override-123', + }, + { + InstanceType: 'm5.large', + SubnetId: 'subnet-456', + ImageId: 'ami-override-123', + }, + { + InstanceType: 'c5.large', + SubnetId: 'subnet-456', + ImageId: 'ami-override-123', + }, + ], + }, + ], + SpotOptions: { + AllocationStrategy: SpotAllocationStrategy.CAPACITY_OPTIMIZED, + }, + TagSpecifications: expect.any(Array), + TargetCapacitySpecification: { + DefaultTargetCapacityType: 'spot', + TotalTargetCapacity: 1, + }, + Type: 'instant', + }); + }); + + it('overrides all three fields (SubnetId, InstanceType, ImageId) when specified in ec2OverrideConfig', async () => { + await createRunner({ + ...createRunnerConfig(defaultRunnerConfig), + ec2OverrideConfig: { + SubnetId: 'subnet-custom', + InstanceType: 'c5.2xlarge', + ImageId: 'ami-custom-456', + }, + }); + + expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { + LaunchTemplateConfigs: [ + { + LaunchTemplateSpecification: { + LaunchTemplateName: 'lt-1', + Version: '$Default', + }, + Overrides: [ + { + InstanceType: 'c5.2xlarge', + SubnetId: 'subnet-custom', + ImageId: 'ami-custom-456', + }, + ], + }, + ], + SpotOptions: { + AllocationStrategy: SpotAllocationStrategy.CAPACITY_OPTIMIZED, + }, + TagSpecifications: expect.any(Array), + TargetCapacitySpecification: { + DefaultTargetCapacityType: 'spot', + TotalTargetCapacity: 1, + }, + Type: 'instant', + }); + }); + + it('spreads additional ec2OverrideConfig properties to Overrides', async () => { + await createRunner({ + ...createRunnerConfig(defaultRunnerConfig), + ec2OverrideConfig: { + SubnetId: 'subnet-override', + InstanceType: 't3.medium', + MaxPrice: '0.05', + Priority: 1.5, + WeightedCapacity: 2.0, + }, + }); + + expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { + LaunchTemplateConfigs: [ + { + LaunchTemplateSpecification: { + LaunchTemplateName: 'lt-1', + Version: '$Default', + }, + Overrides: [ + { + InstanceType: 't3.medium', + SubnetId: 'subnet-override', + MaxPrice: '0.05', + Priority: 1.5, + WeightedCapacity: 2.0, + }, + ], + }, + ], + SpotOptions: { + AllocationStrategy: SpotAllocationStrategy.CAPACITY_OPTIMIZED, + }, + TagSpecifications: expect.any(Array), + TargetCapacitySpecification: { + DefaultTargetCapacityType: 'spot', + TotalTargetCapacity: 1, + }, + Type: 'instant', + }); + }); }); describe('create runner with errors', () => { @@ -546,6 +756,7 @@ describe('create runner with errors fail over to OnDemand', () => { capacityType: 'spot', type: 'Repo', onDemandFailoverOnError: ['InsufficientInstanceCapacity'], + scaleErrors: ['UnfulfillableCapacity', 'MaxSpotInstanceCountExceeded'], }; const defaultExpectedFleetRequestValues: ExpectedFleetRequestValues = { type: 'Repo', diff --git a/lambdas/functions/control-plane/src/aws/runners.ts b/lambdas/functions/control-plane/src/aws/runners.ts index 7f7f5750bf..b7825a0314 100644 --- a/lambdas/functions/control-plane/src/aws/runners.ts +++ b/lambdas/functions/control-plane/src/aws/runners.ts @@ -125,14 +125,22 @@ function generateFleetOverrides( subnetIds: string[], instancesTypes: string[], amiId?: string, + ec2OverrideConfig?: Runners.Ec2OverrideConfig, ): FleetLaunchTemplateOverridesRequest[] { const result: FleetLaunchTemplateOverridesRequest[] = []; - subnetIds.forEach((s) => { - instancesTypes.forEach((i) => { + + // Use override values if available, otherwise use parameter arrays + const subnetsToUse = ec2OverrideConfig?.SubnetId ? [ec2OverrideConfig.SubnetId] : subnetIds; + const instanceTypesToUse = ec2OverrideConfig?.InstanceType ? [ec2OverrideConfig.InstanceType] : instancesTypes; + const amiIdToUse = ec2OverrideConfig?.ImageId ?? amiId; + + subnetsToUse.forEach((s) => { + instanceTypesToUse.forEach((i) => { const item: FleetLaunchTemplateOverridesRequest = { SubnetId: s, InstanceType: i as _InstanceType, - ImageId: amiId, + ImageId: amiIdToUse, + ...ec2OverrideConfig, }; result.push(item); }); @@ -265,6 +273,7 @@ async function createInstances( runnerParameters.subnets, runnerParameters.ec2instanceCriteria.instanceTypes, amiIdOverride, + runnerParameters.ec2OverrideConfig, ), }, ], diff --git a/lambdas/functions/control-plane/src/scale-runners/ScaleError.test.ts b/lambdas/functions/control-plane/src/scale-runners/ScaleError.test.ts index 0a7478c12f..8490a80447 100644 --- a/lambdas/functions/control-plane/src/scale-runners/ScaleError.test.ts +++ b/lambdas/functions/control-plane/src/scale-runners/ScaleError.test.ts @@ -23,10 +23,42 @@ describe('ScaleError', () => { describe('toBatchItemFailures', () => { const mockMessages: ActionRequestMessageSQS[] = [ - { messageId: 'msg-1', id: 1, eventType: 'workflow_job' }, - { messageId: 'msg-2', id: 2, eventType: 'workflow_job' }, - { messageId: 'msg-3', id: 3, eventType: 'workflow_job' }, - { messageId: 'msg-4', id: 4, eventType: 'workflow_job' }, + { + messageId: 'msg-1', + id: 1, + eventType: 'workflow_job', + repositoryName: 'repo', + repositoryOwner: 'owner', + installationId: 123, + repoOwnerType: 'Organization', + }, + { + messageId: 'msg-2', + id: 2, + eventType: 'workflow_job', + repositoryName: 'repo', + repositoryOwner: 'owner', + installationId: 123, + repoOwnerType: 'Organization', + }, + { + messageId: 'msg-3', + id: 3, + eventType: 'workflow_job', + repositoryName: 'repo', + repositoryOwner: 'owner', + installationId: 123, + repoOwnerType: 'Organization', + }, + { + messageId: 'msg-4', + id: 4, + eventType: 'workflow_job', + repositoryName: 'repo', + repositoryOwner: 'owner', + installationId: 123, + repoOwnerType: 'Organization', + }, ]; it.each([ diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts b/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts index 458d89763e..edfbff5e4b 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts @@ -571,6 +571,372 @@ describe('scaleUp with GHES', () => { 10000, ); }); + + describe('Dynamic EC2 Configuration', () => { + beforeEach(() => { + process.env.ENABLE_ORGANIZATION_RUNNERS = 'true'; + process.env.ENABLE_DYNAMIC_LABELS = 'true'; + process.env.ENABLE_EPHEMERAL_RUNNERS = 'true'; + process.env.ENABLE_JOB_QUEUED_CHECK = 'false'; + process.env.RUNNER_LABELS = 'base-label'; + process.env.INSTANCE_TYPES = 't3.medium,t3.large'; + process.env.RUNNER_NAME_PREFIX = 'unit-test'; + expectedRunnerParams = { ...EXPECTED_RUNNER_PARAMS }; + mockSSMClient.reset(); + }); + + it('appends EC2 labels to existing runner labels when EC2 labels are present', async () => { + const testDataWithEc2Labels = [ + { + ...TEST_DATA_SINGLE, + labels: ['ghr-ec2-instance-type:c5.2xlarge', 'ghr-ec2-custom:value'], + messageId: 'test-1', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithEc2Labels); + + // Verify createRunner was called with EC2 instance type in override config + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + ec2OverrideConfig: expect.objectContaining({ + InstanceType: 'c5.2xlarge', + }), + }), + ); + }); + + it('uses default instance types when no instance type EC2 label is provided', async () => { + const testDataWithEc2Labels = [ + { + ...TEST_DATA_SINGLE, + labels: ['ghr-ec2-custom:value'], + messageId: 'test-3', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithEc2Labels); + + // Should use the default INSTANCE_TYPES from environment + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + }), + ); + }); + + it('handles messages with no labels gracefully', async () => { + const testDataWithNoLabels = [ + { + ...TEST_DATA_SINGLE, + labels: undefined, + messageId: 'test-5', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithNoLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + }), + ); + }); + + it('handles empty labels array', async () => { + const testDataWithEmptyLabels = [ + { + ...TEST_DATA_SINGLE, + labels: [], + messageId: 'test-6', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithEmptyLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + }), + ); + }); + + it('does not process EC2 labels when ENABLE_DYNAMIC_LABELS is disabled', async () => { + process.env.ENABLE_DYNAMIC_LABELS = 'false'; + + const testDataWithEc2Labels = [ + { + ...TEST_DATA_SINGLE, + labels: ['ghr-ec2-instance-type:c5.4xlarge'], + messageId: 'test-7', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithEc2Labels); + + // Should ignore EC2 labels and use default instance types + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + }), + ); + }); + + it('handles multiple EC2 labels correctly', async () => { + const testDataWithMultipleEc2Labels = [ + { + ...TEST_DATA_SINGLE, + labels: ['regular-label', 'ghr-ec2-instance-type:r5.2xlarge', 'ghr-ec2-ami:custom-ami', 'ghr-ec2-disk:200'], + messageId: 'test-8', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithMultipleEc2Labels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + ec2OverrideConfig: expect.objectContaining({ + InstanceType: 'r5.2xlarge', + }), + }), + ); + }); + + it('includes ec2OverrideConfig with VCpuCount requirements when specified', async () => { + const testDataWithVCpuLabels = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-vcpu-count-min:4', 'ghr-ec2-vcpu-count-max:16'], + messageId: 'test-9', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithVCpuLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + InstanceRequirements: expect.objectContaining({ + VCpuCount: { + Min: 4, + Max: 16, + }, + }), + }), + }), + ); + }); + + it('includes ec2OverrideConfig with MemoryMiB requirements when specified', async () => { + const testDataWithMemoryLabels = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-memory-mib-min:8192', 'ghr-ec2-memory-mib-max:32768'], + messageId: 'test-10', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithMemoryLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + InstanceRequirements: expect.objectContaining({ + MemoryMiB: { + Min: 8192, + Max: 32768, + }, + }), + }), + }), + ); + }); + + it('includes ec2OverrideConfig with CPU manufacturers when specified', async () => { + const testDataWithCpuLabels = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-cpu-manufacturers:intel,amd'], + messageId: 'test-11', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithCpuLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + InstanceRequirements: expect.objectContaining({ + CpuManufacturers: ['intel', 'amd'], + }), + }), + }), + ); + }); + + it('includes ec2OverrideConfig with instance generations when specified', async () => { + const testDataWithGenerationLabels = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-instance-generations:current'], + messageId: 'test-12', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithGenerationLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + InstanceRequirements: expect.objectContaining({ + InstanceGenerations: ['current'], + }), + }), + }), + ); + }); + + it('includes ec2OverrideConfig with accelerator requirements when specified', async () => { + const testDataWithAcceleratorLabels = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-accelerator-count-min:1', 'ghr-ec2-accelerator-types:gpu'], + messageId: 'test-13', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithAcceleratorLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + InstanceRequirements: expect.objectContaining({ + AcceleratorCount: { + Min: 1, + }, + AcceleratorTypes: ['gpu'], + }), + }), + }), + ); + }); + + it('includes ec2OverrideConfig with max price when specified', async () => { + const testDataWithMaxPrice = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-max-price:0.50'], + messageId: 'test-14', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithMaxPrice); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + MaxPrice: '0.50', + }), + }), + ); + }); + + it('includes ec2OverrideConfig with priority and weighted capacity when specified', async () => { + const testDataWithPriorityWeight = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-priority:1', 'ghr-ec2-weighted-capacity:2'], + messageId: 'test-15', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithPriorityWeight); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + Priority: 1, + WeightedCapacity: 2, + }), + }), + ); + }); + + it('includes ec2OverrideConfig with combined requirements', async () => { + const testDataWithCombinedLabels = [ + { + ...TEST_DATA_SINGLE, + labels: [ + 'self-hosted', + 'linux', + 'ghr-ec2-vcpu-count-min:8', + 'ghr-ec2-memory-mib-min:16384', + 'ghr-ec2-cpu-manufacturers:intel', + 'ghr-ec2-instance-generations:current', + 'ghr-ec2-max-price:1.00', + ], + messageId: 'test-16', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithCombinedLabels); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2OverrideConfig: expect.objectContaining({ + InstanceRequirements: expect.objectContaining({ + VCpuCount: { Min: 8 }, + MemoryMiB: { Min: 16384 }, + CpuManufacturers: ['intel'], + InstanceGenerations: ['current'], + }), + MaxPrice: '1.00', + }), + }), + ); + }); + + it('includes both instance type and ec2OverrideConfig when both specified', async () => { + const testDataWithBoth = [ + { + ...TEST_DATA_SINGLE, + labels: ['self-hosted', 'ghr-ec2-instance-type:c5.xlarge', 'ghr-ec2-vcpu-count-min:4'], + messageId: 'test-18', + }, + ]; + + await scaleUpModule.scaleUp(testDataWithBoth); + + expect(createRunner).toBeCalledWith( + expect.objectContaining({ + ec2instanceCriteria: expect.objectContaining({ + instanceTypes: ['t3.medium', 't3.large'], + }), + ec2OverrideConfig: expect.objectContaining({ + InstanceType: 'c5.xlarge', + InstanceRequirements: expect.objectContaining({ + VCpuCount: { Min: 4 }, + }), + }), + }), + ); + }); + }); + describe('on repo level', () => { beforeEach(() => { process.env.ENABLE_ORGANIZATION_RUNNERS = 'false'; @@ -2018,6 +2384,705 @@ describe('Retry mechanism tests', () => { }); }); +describe('parseEc2OverrideConfig', () => { + describe('Basic Fleet Overrides', () => { + it('should parse instance-type label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-instance-type:c5.xlarge']); + expect(result?.InstanceType).toBe('c5.xlarge'); + }); + + it('should parse subnet-id label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-subnet-id:subnet-123456']); + expect(result?.SubnetId).toBe('subnet-123456'); + }); + + it('should parse availability-zone label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-availability-zone:us-east-1a']); + expect(result?.AvailabilityZone).toBe('us-east-1a'); + }); + + it('should parse availability-zone-id label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-availability-zone-id:use1-az1']); + expect(result?.AvailabilityZoneId).toBe('use1-az1'); + }); + + it('should parse max-price label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-max-price:0.50']); + expect(result?.MaxPrice).toBe('0.50'); + }); + + it('should parse priority label as number', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-priority:1']); + expect(result?.Priority).toBe(1); + }); + + it('should parse weighted-capacity label as number', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-weighted-capacity:2']); + expect(result?.WeightedCapacity).toBe(2); + }); + + it('should parse image-id label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-image-id:ami-12345678']); + expect(result?.ImageId).toBe('ami-12345678'); + }); + + it('should parse multiple basic fleet overrides', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-instance-type:r5.2xlarge', + 'ghr-ec2-max-price:1.00', + 'ghr-ec2-priority:2', + ]); + expect(result?.InstanceType).toBe('r5.2xlarge'); + expect(result?.MaxPrice).toBe('1.00'); + expect(result?.Priority).toBe(2); + }); + }); + + describe('Placement', () => { + it('should parse placement-group label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-group:my-placement-group']); + expect(result?.Placement?.GroupName).toBe('my-placement-group'); + }); + + it('should parse placement-tenancy label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-tenancy:dedicated']); + expect(result?.Placement?.Tenancy).toBe('dedicated'); + }); + + it('should parse placement-host-id label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-host-id:h-1234567890abcdef']); + expect(result?.Placement?.HostId).toBe('h-1234567890abcdef'); + }); + + it('should parse placement-affinity label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-affinity:host']); + expect(result?.Placement?.Affinity).toBe('host'); + }); + + it('should parse placement-partition-number label as number', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-partition-number:3']); + expect(result?.Placement?.PartitionNumber).toBe(3); + }); + + it('should parse placement-availability-zone label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-availability-zone:us-west-2b']); + expect(result?.Placement?.AvailabilityZone).toBe('us-west-2b'); + }); + + it('should parse placement-spread-domain label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-placement-spread-domain:my-spread-domain']); + expect(result?.Placement?.SpreadDomain).toBe('my-spread-domain'); + }); + + it('should parse placement-host-resource-group-arn label', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-placement-host-resource-group-arn:arn:aws:ec2:us-east-1:123456789012:host-resource-group/hrg-1234', + ]); + expect(result?.Placement?.HostResourceGroupArn).toBe( + 'arn:aws:ec2:us-east-1:123456789012:host-resource-group/hrg-1234', + ); + }); + + it('should parse multiple placement labels', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-placement-group:group-1', + 'ghr-ec2-placement-tenancy:dedicated', + 'ghr-ec2-placement-availability-zone:us-east-1b', + ]); + expect(result?.Placement?.GroupName).toBe('group-1'); + expect(result?.Placement?.Tenancy).toBe('dedicated'); + expect(result?.Placement?.AvailabilityZone).toBe('us-east-1b'); + }); + }); + + describe('Block Device Mappings', () => { + it('should parse ebs-volume-size label as number', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-volume-size:100']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeSize).toBe(100); + }); + + it('should parse ebs-volume-type label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-volume-type:gp3']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeType).toBe('gp3'); + }); + + it('should parse ebs-iops label as number', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-iops:3000']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Iops).toBe(3000); + }); + + it('should parse ebs-throughput label as number', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-throughput:250']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Throughput).toBe(250); + }); + + it('should parse ebs-encrypted label as boolean true', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-encrypted:true']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Encrypted).toBe(true); + }); + + it('should parse ebs-encrypted label as boolean false', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-encrypted:false']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Encrypted).toBe(false); + }); + + it('should parse ebs-kms-key-id label', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-ebs-kms-key-id:arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012', + ]); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.KmsKeyId).toBe( + 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012', + ); + }); + + it('should parse ebs-delete-on-termination label as boolean true', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-delete-on-termination:true']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.DeleteOnTermination).toBe(true); + }); + + it('should parse ebs-delete-on-termination label as boolean false', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-delete-on-termination:false']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.DeleteOnTermination).toBe(false); + }); + + it('should parse ebs-snapshot-id label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-snapshot-id:snap-1234567890abcdef']); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.SnapshotId).toBe('snap-1234567890abcdef'); + }); + + it('should parse block-device-virtual-name label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-block-device-virtual-name:ephemeral0']); + expect(result?.BlockDeviceMappings?.[0]?.VirtualName).toBe('ephemeral0'); + }); + + it('should parse block-device-no-device label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-block-device-no-device:true']); + expect(result?.BlockDeviceMappings?.[0]?.NoDevice).toBe('true'); + }); + + it('should parse multiple block device mapping labels', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-ebs-volume-size:200', + 'ghr-ec2-ebs-volume-type:gp3', + 'ghr-ec2-ebs-iops:5000', + 'ghr-ec2-ebs-encrypted:true', + ]); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeSize).toBe(200); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeType).toBe('gp3'); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Iops).toBe(5000); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Encrypted).toBe(true); + }); + + it('should initialize BlockDeviceMappings when not present', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-ebs-volume-size:50']); + expect(result?.BlockDeviceMappings).toBeDefined(); + // expect(result?.BlockDeviceMappings?.[0]?.DeviceName).toBe('/dev/sda1'); + }); + }); + + describe('Instance Requirements - vCPU and Memory', () => { + it('should parse vcpu-count-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-vcpu-count-min:4']); + expect(result?.InstanceRequirements?.VCpuCount?.Min).toBe(4); + }); + + it('should parse vcpu-count-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-vcpu-count-max:16']); + expect(result?.InstanceRequirements?.VCpuCount?.Max).toBe(16); + }); + + it('should parse both vcpu-count-min and vcpu-count-max labels', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-vcpu-count-min:2', 'ghr-ec2-vcpu-count-max:8']); + expect(result?.InstanceRequirements?.VCpuCount?.Min).toBe(2); + expect(result?.InstanceRequirements?.VCpuCount?.Max).toBe(8); + }); + + it('should parse memory-mib-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-memory-mib-min:8192']); + expect(result?.InstanceRequirements?.MemoryMiB?.Min).toBe(8192); + }); + + it('should parse memory-mib-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-memory-mib-max:32768']); + expect(result?.InstanceRequirements?.MemoryMiB?.Max).toBe(32768); + }); + + it('should parse both memory-mib-min and memory-mib-max labels', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-memory-mib-min:16384', + 'ghr-ec2-memory-mib-max:65536', + ]); + expect(result?.InstanceRequirements?.MemoryMiB?.Min).toBe(16384); + expect(result?.InstanceRequirements?.MemoryMiB?.Max).toBe(65536); + }); + + it('should parse memory-gib-per-vcpu-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-memory-gib-per-vcpu-min:2']); + expect(result?.InstanceRequirements?.MemoryGiBPerVCpu?.Min).toBe(2); + }); + + it('should parse memory-gib-per-vcpu-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-memory-gib-per-vcpu-max:8']); + expect(result?.InstanceRequirements?.MemoryGiBPerVCpu?.Max).toBe(8); + }); + + it('should parse combined vCPU and memory requirements', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-vcpu-count-min:8', + 'ghr-ec2-vcpu-count-max:32', + 'ghr-ec2-memory-mib-min:32768', + 'ghr-ec2-memory-mib-max:131072', + ]); + expect(result?.InstanceRequirements?.VCpuCount?.Min).toBe(8); + expect(result?.InstanceRequirements?.VCpuCount?.Max).toBe(32); + expect(result?.InstanceRequirements?.MemoryMiB?.Min).toBe(32768); + expect(result?.InstanceRequirements?.MemoryMiB?.Max).toBe(131072); + }); + }); + + describe('Instance Requirements - CPU and Performance', () => { + it('should parse cpu-manufacturers as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-cpu-manufacturers:intel']); + expect(result?.InstanceRequirements?.CpuManufacturers).toEqual(['intel']); + }); + + it('should parse cpu-manufacturers as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-cpu-manufacturers:intel,amd']); + expect(result?.InstanceRequirements?.CpuManufacturers).toEqual(['intel', 'amd']); + }); + + it('should parse instance-generations as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-instance-generations:current']); + expect(result?.InstanceRequirements?.InstanceGenerations).toEqual(['current']); + }); + + it('should parse instance-generations as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-instance-generations:current,previous']); + expect(result?.InstanceRequirements?.InstanceGenerations).toEqual(['current', 'previous']); + }); + + it('should parse excluded-instance-types as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-excluded-instance-types:t2.micro']); + expect(result?.InstanceRequirements?.ExcludedInstanceTypes).toEqual(['t2.micro']); + }); + + it('should parse excluded-instance-types as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-excluded-instance-types:t2.micro,t2.small']); + expect(result?.InstanceRequirements?.ExcludedInstanceTypes).toEqual(['t2.micro', 't2.small']); + }); + + it('should parse allowed-instance-types as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-allowed-instance-types:c5.xlarge']); + expect(result?.InstanceRequirements?.AllowedInstanceTypes).toEqual(['c5.xlarge']); + }); + + it('should parse allowed-instance-types as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-allowed-instance-types:c5.xlarge,c5.2xlarge']); + expect(result?.InstanceRequirements?.AllowedInstanceTypes).toEqual(['c5.xlarge', 'c5.2xlarge']); + }); + + it('should parse burstable-performance label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-burstable-performance:included']); + expect(result?.InstanceRequirements?.BurstablePerformance).toBe('included'); + }); + + it('should parse bare-metal label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-bare-metal:excluded']); + expect(result?.InstanceRequirements?.BareMetal).toBe('excluded'); + }); + }); + + describe('Instance Requirements - Accelerators', () => { + it('should parse accelerator-count-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-count-min:1']); + expect(result?.InstanceRequirements?.AcceleratorCount?.Min).toBe(1); + }); + + it('should parse accelerator-count-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-count-max:4']); + expect(result?.InstanceRequirements?.AcceleratorCount?.Max).toBe(4); + }); + + it('should parse both accelerator-count-min and accelerator-count-max', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-accelerator-count-min:1', + 'ghr-ec2-accelerator-count-max:2', + ]); + expect(result?.InstanceRequirements?.AcceleratorCount?.Min).toBe(1); + expect(result?.InstanceRequirements?.AcceleratorCount?.Max).toBe(2); + }); + + it('should parse accelerator-types as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-types:gpu']); + expect(result?.InstanceRequirements?.AcceleratorTypes).toEqual(['gpu']); + }); + + it('should parse accelerator-types as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-types:gpu,fpga']); + expect(result?.InstanceRequirements?.AcceleratorTypes).toEqual(['gpu', 'fpga']); + }); + + it('should parse accelerator-manufacturers as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-manufacturers:nvidia']); + expect(result?.InstanceRequirements?.AcceleratorManufacturers).toEqual(['nvidia']); + }); + + it('should parse accelerator-manufacturers as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-manufacturers:nvidia,amd']); + expect(result?.InstanceRequirements?.AcceleratorManufacturers).toEqual(['nvidia', 'amd']); + }); + + it('should parse accelerator-names as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-names:a100']); + expect(result?.InstanceRequirements?.AcceleratorNames).toEqual(['a100']); + }); + + it('should parse accelerator-names as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-names:a100,v100']); + expect(result?.InstanceRequirements?.AcceleratorNames).toEqual(['a100', 'v100']); + }); + + it('should parse accelerator-total-memory-mib-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-total-memory-mib-min:8192']); + expect(result?.InstanceRequirements?.AcceleratorTotalMemoryMiB?.Min).toBe(8192); + }); + + it('should parse accelerator-total-memory-mib-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-accelerator-total-memory-mib-max:40960']); + expect(result?.InstanceRequirements?.AcceleratorTotalMemoryMiB?.Max).toBe(40960); + }); + + it('should parse combined accelerator requirements', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-accelerator-count-min:1', + 'ghr-ec2-accelerator-count-max:2', + 'ghr-ec2-accelerator-types:gpu', + 'ghr-ec2-accelerator-manufacturers:nvidia', + ]); + expect(result?.InstanceRequirements?.AcceleratorCount?.Min).toBe(1); + expect(result?.InstanceRequirements?.AcceleratorCount?.Max).toBe(2); + expect(result?.InstanceRequirements?.AcceleratorTypes).toEqual(['gpu']); + expect(result?.InstanceRequirements?.AcceleratorManufacturers).toEqual(['nvidia']); + }); + }); + + describe('Instance Requirements - Network and Storage', () => { + it('should parse network-interface-count-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-network-interface-count-min:2']); + expect(result?.InstanceRequirements?.NetworkInterfaceCount?.Min).toBe(2); + }); + + it('should parse network-interface-count-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-network-interface-count-max:4']); + expect(result?.InstanceRequirements?.NetworkInterfaceCount?.Max).toBe(4); + }); + + it('should parse network-bandwidth-gbps-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-network-bandwidth-gbps-min:5']); + expect(result?.InstanceRequirements?.NetworkBandwidthGbps?.Min).toBe(5); + }); + + it('should parse network-bandwidth-gbps-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-network-bandwidth-gbps-max:25']); + expect(result?.InstanceRequirements?.NetworkBandwidthGbps?.Max).toBe(25); + }); + + it('should parse local-storage label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-local-storage:included']); + expect(result?.InstanceRequirements?.LocalStorage).toBe('included'); + }); + + it('should parse local-storage-types as single value', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-local-storage-types:ssd']); + expect(result?.InstanceRequirements?.LocalStorageTypes).toEqual(['ssd']); + }); + + it('should parse local-storage-types as comma-separated list', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-local-storage-types:hdd,ssd']); + expect(result?.InstanceRequirements?.LocalStorageTypes).toEqual(['hdd', 'ssd']); + }); + + it('should parse total-local-storage-gb-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-total-local-storage-gb-min:100']); + expect(result?.InstanceRequirements?.TotalLocalStorageGB?.Min).toBe(100); + }); + + it('should parse total-local-storage-gb-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-total-local-storage-gb-max:1000']); + expect(result?.InstanceRequirements?.TotalLocalStorageGB?.Max).toBe(1000); + }); + + it('should parse baseline-ebs-bandwidth-mbps-min label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-baseline-ebs-bandwidth-mbps-min:500']); + expect(result?.InstanceRequirements?.BaselineEbsBandwidthMbps?.Min).toBe(500); + }); + + it('should parse baseline-ebs-bandwidth-mbps-max label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-baseline-ebs-bandwidth-mbps-max:2000']); + expect(result?.InstanceRequirements?.BaselineEbsBandwidthMbps?.Max).toBe(2000); + }); + }); + + describe('Instance Requirements - Pricing and Other', () => { + it('should parse spot-max-price-percentage-over-lowest-price label', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-spot-max-price-percentage-over-lowest-price:50']); + expect(result?.InstanceRequirements?.SpotMaxPricePercentageOverLowestPrice).toBe(50); + }); + + it('should parse on-demand-max-price-percentage-over-lowest-price label', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-on-demand-max-price-percentage-over-lowest-price:75', + ]); + expect(result?.InstanceRequirements?.OnDemandMaxPricePercentageOverLowestPrice).toBe(75); + }); + + it('should parse max-spot-price-as-percentage-of-optimal-on-demand-price label', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-max-spot-price-as-percentage-of-optimal-on-demand-price:60', + ]); + expect(result?.InstanceRequirements?.MaxSpotPriceAsPercentageOfOptimalOnDemandPrice).toBe(60); + }); + + it('should parse require-hibernate-support label as boolean true', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-require-hibernate-support:true']); + expect(result?.InstanceRequirements?.RequireHibernateSupport).toBe(true); + }); + + it('should parse require-hibernate-support label as boolean false', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-require-hibernate-support:false']); + expect(result?.InstanceRequirements?.RequireHibernateSupport).toBe(false); + }); + + it('should parse require-encryption-in-transit label as boolean true', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-require-encryption-in-transit:true']); + expect(result?.InstanceRequirements?.RequireEncryptionInTransit).toBe(true); + }); + + it('should parse require-encryption-in-transit label as boolean false', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-require-encryption-in-transit:false']); + expect(result?.InstanceRequirements?.RequireEncryptionInTransit).toBe(false); + }); + + it('should parse baseline-performance-factors-cpu-reference-families label', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-baseline-performance-factors-cpu-reference-families:intel', + ]); + expect(result?.InstanceRequirements?.BaselinePerformanceFactors?.Cpu?.References?.[0]?.InstanceFamily).toBe( + 'intel', + ); + }); + it('should parse baseline-performance-factors-cpu-reference-families list label', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-baseline-performance-factors-cpu-reference-families:intel,amd', + ]); + expect(result?.InstanceRequirements?.BaselinePerformanceFactors?.Cpu?.References?.[0]?.InstanceFamily).toBe( + 'intel', + ); + expect(result?.InstanceRequirements?.BaselinePerformanceFactors?.Cpu?.References?.[1]?.InstanceFamily).toBe( + 'amd', + ); + }); + }); + + describe('Edge Cases', () => { + it('should return undefined when empty array is provided', () => { + const result = scaleUpModule.parseEc2OverrideConfig([]); + expect(result).toBeUndefined(); + }); + + it('should return undefined when no ghr-ec2 labels are provided', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['self-hosted', 'linux', 'x64']); + expect(result).toBeUndefined(); + }); + + it('should ignore non-ghr-ec2 labels and only parse ghr-ec2 labels', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'self-hosted', + 'ghr-ec2-instance-type:m5.large', + 'linux', + 'ghr-ec2-max-price:0.30', + ]); + expect(result?.InstanceType).toBe('m5.large'); + expect(result?.MaxPrice).toBe('0.30'); + }); + + it('should handle labels with colons in values (ARNs)', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-ebs-kms-key-id:arn:aws:kms:us-east-1:123456789012:key/abc-def-ghi', + ]); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.KmsKeyId).toBe( + 'arn:aws:kms:us-east-1:123456789012:key/abc-def-ghi', + ); + }); + + it('should handle labels with colons in placement ARNs', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-placement-host-resource-group-arn:arn:aws:ec2:us-west-2:123456789012:host-resource-group/hrg-abc123', + ]); + expect(result?.Placement?.HostResourceGroupArn).toBe( + 'arn:aws:ec2:us-west-2:123456789012:host-resource-group/hrg-abc123', + ); + }); + + it('should handle labels without values gracefully', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-instance-type:', 'ghr-ec2-max-price:0.50']); + expect(result?.InstanceType).toBeUndefined(); + expect(result?.MaxPrice).toBe('0.50'); + }); + + it('should handle malformed labels (no colon) gracefully', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-instance-type-m5-large', 'ghr-ec2-max-price:0.50']); + expect(result?.MaxPrice).toBe('0.50'); + expect(result?.InstanceType).toBeUndefined(); + }); + + it('should handle numeric strings correctly for number fields', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-priority:5', + 'ghr-ec2-weighted-capacity:10', + 'ghr-ec2-vcpu-count-min:4', + ]); + expect(result?.Priority).toBe(5); + expect(result?.WeightedCapacity).toBe(10); + expect(result?.InstanceRequirements?.VCpuCount?.Min).toBe(4); + }); + + it('should handle boolean strings correctly for boolean fields', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-ebs-encrypted:true', + 'ghr-ec2-ebs-delete-on-termination:false', + 'ghr-ec2-require-hibernate-support:true', + ]); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Encrypted).toBe(true); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.DeleteOnTermination).toBe(false); + expect(result?.InstanceRequirements?.RequireHibernateSupport).toBe(true); + }); + + it('should handle floating point numbers in max-price', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-max-price:0.12345']); + expect(result?.MaxPrice).toBe('0.12345'); + }); + + it('should handle whitespace in comma-separated lists', () => { + const result = scaleUpModule.parseEc2OverrideConfig(['ghr-ec2-cpu-manufacturers: intel , amd ']); + expect(result?.InstanceRequirements?.CpuManufacturers).toEqual([' intel ', ' amd ']); + }); + + it('should return config with all parsed labels', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-instance-type:c5.xlarge', + 'ghr-ec2-vcpu-count-min:4', + 'ghr-ec2-memory-mib-min:8192', + 'ghr-ec2-placement-tenancy:dedicated', + 'ghr-ec2-ebs-volume-size:100', + ]); + expect(result?.InstanceType).toBe('c5.xlarge'); + expect(result?.InstanceRequirements?.VCpuCount?.Min).toBe(4); + expect(result?.InstanceRequirements?.MemoryMiB?.Min).toBe(8192); + expect(result?.Placement?.Tenancy).toBe('dedicated'); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeSize).toBe(100); + }); + }); + + describe('Complex Scenarios', () => { + it('should handle comprehensive EC2 configuration with all categories', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + // Basic Fleet + 'ghr-ec2-instance-type:r5.2xlarge', + 'ghr-ec2-max-price:0.75', + 'ghr-ec2-priority:1', + // Placement + 'ghr-ec2-placement-group:my-group', + 'ghr-ec2-placement-tenancy:dedicated', + // Block Device + 'ghr-ec2-ebs-volume-size:200', + 'ghr-ec2-ebs-volume-type:gp3', + 'ghr-ec2-ebs-encrypted:true', + // Instance Requirements + 'ghr-ec2-vcpu-count-min:8', + 'ghr-ec2-vcpu-count-max:32', + 'ghr-ec2-memory-mib-min:32768', + 'ghr-ec2-cpu-manufacturers:intel,amd', + 'ghr-ec2-instance-generations:current', + ]); + + expect(result?.InstanceType).toBe('r5.2xlarge'); + expect(result?.MaxPrice).toBe('0.75'); + expect(result?.Priority).toBe(1); + expect(result?.Placement?.GroupName).toBe('my-group'); + expect(result?.Placement?.Tenancy).toBe('dedicated'); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeSize).toBe(200); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.VolumeType).toBe('gp3'); + expect(result?.BlockDeviceMappings?.[0]?.Ebs?.Encrypted).toBe(true); + expect(result?.InstanceRequirements?.VCpuCount?.Min).toBe(8); + expect(result?.InstanceRequirements?.VCpuCount?.Max).toBe(32); + expect(result?.InstanceRequirements?.MemoryMiB?.Min).toBe(32768); + expect(result?.InstanceRequirements?.CpuManufacturers).toEqual(['intel', 'amd']); + expect(result?.InstanceRequirements?.InstanceGenerations).toEqual(['current']); + }); + + it('should handle GPU instance configuration', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-accelerator-count-min:1', + 'ghr-ec2-accelerator-count-max:4', + 'ghr-ec2-accelerator-types:gpu', + 'ghr-ec2-accelerator-manufacturers:nvidia', + 'ghr-ec2-accelerator-names:a100,v100', + 'ghr-ec2-accelerator-total-memory-mib-min:16384', + ]); + + expect(result?.InstanceRequirements?.AcceleratorCount?.Min).toBe(1); + expect(result?.InstanceRequirements?.AcceleratorCount?.Max).toBe(4); + expect(result?.InstanceRequirements?.AcceleratorTypes).toEqual(['gpu']); + expect(result?.InstanceRequirements?.AcceleratorManufacturers).toEqual(['nvidia']); + expect(result?.InstanceRequirements?.AcceleratorNames).toEqual(['a100', 'v100']); + expect(result?.InstanceRequirements?.AcceleratorTotalMemoryMiB?.Min).toBe(16384); + }); + + it('should handle network-optimized instance configuration', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-network-interface-count-min:2', + 'ghr-ec2-network-interface-count-max:8', + 'ghr-ec2-network-bandwidth-gbps-min:10', + 'ghr-ec2-network-bandwidth-gbps-max:100', + 'ghr-ec2-baseline-ebs-bandwidth-mbps-min:1000', + ]); + + expect(result?.InstanceRequirements?.NetworkInterfaceCount?.Min).toBe(2); + expect(result?.InstanceRequirements?.NetworkInterfaceCount?.Max).toBe(8); + expect(result?.InstanceRequirements?.NetworkBandwidthGbps?.Min).toBe(10); + expect(result?.InstanceRequirements?.NetworkBandwidthGbps?.Max).toBe(100); + expect(result?.InstanceRequirements?.BaselineEbsBandwidthMbps?.Min).toBe(1000); + }); + + it('should handle storage-optimized instance configuration', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-local-storage:included', + 'ghr-ec2-local-storage-types:ssd', + 'ghr-ec2-total-local-storage-gb-min:500', + 'ghr-ec2-total-local-storage-gb-max:2000', + ]); + + expect(result?.InstanceRequirements?.LocalStorage).toBe('included'); + expect(result?.InstanceRequirements?.LocalStorageTypes).toEqual(['ssd']); + expect(result?.InstanceRequirements?.TotalLocalStorageGB?.Min).toBe(500); + expect(result?.InstanceRequirements?.TotalLocalStorageGB?.Max).toBe(2000); + }); + + it('should handle spot instance configuration with pricing', () => { + const result = scaleUpModule.parseEc2OverrideConfig([ + 'ghr-ec2-max-price:0.50', + 'ghr-ec2-spot-max-price-percentage-over-lowest-price:100', + 'ghr-ec2-on-demand-max-price-percentage-over-lowest-price:150', + ]); + + expect(result?.MaxPrice).toBe('0.50'); + expect(result?.InstanceRequirements?.SpotMaxPricePercentageOverLowestPrice).toBe(100); + expect(result?.InstanceRequirements?.OnDemandMaxPricePercentageOverLowestPrice).toBe(150); + }); + }); +}); + function defaultOctokitMockImpl() { mockOctokit.actions.getJobForWorkflowRun.mockImplementation(() => ({ data: { diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-up.ts b/lambdas/functions/control-plane/src/scale-runners/scale-up.ts index 759be95089..793d23fee2 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-up.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-up.ts @@ -5,9 +5,39 @@ import yn from 'yn'; import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../github/auth'; import { createRunner, listEC2Runners, tag, terminateRunner } from './../aws/runners'; -import { RunnerInputParameters } from './../aws/runners.d'; +import { Ec2OverrideConfig, RunnerInputParameters } from './../aws/runners.d'; import { metricGitHubAppRateLimit } from '../github/rate-limit'; import { publishRetryMessage } from './job-retry'; +import { + _InstanceType, + Tenancy, + VolumeType, + CpuManufacturer, + InstanceGeneration, + BurstablePerformance, + BareMetal, + AcceleratorType, + AcceleratorManufacturer, + AcceleratorName, + LocalStorage, + LocalStorageType, + Placement, + BaselinePerformanceFactorsRequest, + FleetEbsBlockDeviceRequest, + CpuPerformanceFactorRequest, + PerformanceFactorReferenceRequest, + FleetBlockDeviceMappingRequest, + InstanceRequirementsRequest, + VCpuCountRangeRequest, + MemoryMiBRequest, + MemoryGiBPerVCpuRequest, + AcceleratorCountRequest, + AcceleratorTotalMemoryMiBRequest, + NetworkInterfaceCountRequest, + NetworkBandwidthGbpsRequest, + TotalLocalStorageGBRequest, + BaselineEbsBandwidthMbpsRequest, +} from '@aws-sdk/client-ec2'; const logger = createChildLogger('scale-up'); @@ -30,6 +60,7 @@ export interface ActionRequestMessage { installationId: number; repoOwnerType: string; retryCounter?: number; + labels?: string[]; } export interface ActionRequestMessageSQS extends ActionRequestMessage { @@ -60,6 +91,7 @@ interface CreateEC2RunnerConfig { subnets: string[]; launchTemplateName: string; ec2instanceCriteria: RunnerInputParameters['ec2instanceCriteria']; + ec2OverrideConfig?: RunnerInputParameters['ec2OverrideConfig']; numberOfRunners?: number; amiIdSsmParameterName?: string; tracingEnabled?: boolean; @@ -291,7 +323,7 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise(); const rejectedMessageIds = new Set(); for (const payload of payloads) { - const { eventType, messageId, repositoryName, repositoryOwner } = payload; + const { eventType, messageId, repositoryName, repositoryOwner, labels } = payload; if (ephemeralEnabled && eventType !== 'workflow_job') { logger.warn( 'Event is not supported in combination with ephemeral runners. Please ensure you have enabled workflow_job events.', @@ -362,7 +396,19 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise l.startsWith('ghr-'))?.slice('ghr-'.length); + + if (dynamicLabels) { + const dynamicLabelsHash = labelsHash(labels); + key = `${key}/${dynamicLabelsHash}`; + } + } let entry = validMessages.get(key); @@ -376,6 +422,7 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise 0 && dynamicLabelsEnabled) { + logger.debug('Dynamic EC2 config enabled, processing labels', { labels: messages[0].labels }); + + const dynamicEC2Labels = messages[0].labels?.map((l) => l.trim()).filter((l) => l.startsWith('ghr-ec2-')) ?? []; + const allDynamicLabels = messages[0].labels?.map((l) => l.trim()).filter((l) => l.startsWith('ghr-')) ?? []; + + if (allDynamicLabels.length > 0) { + runnerLabels = runnerLabels ? `${runnerLabels},${allDynamicLabels.join(',')}` : allDynamicLabels.join(','); + + logger.debug('Updated runner labels', { runnerLabels }); + + if (dynamicEC2Labels.length > 0) { + ec2OverrideConfig = parseEc2OverrideConfig(dynamicEC2Labels); + if (ec2OverrideConfig) { + logger.debug('EC2 override config parsed from labels', { + ec2OverrideConfig, + }); + } + } + } else { + logger.debug('No dynamic labels found on message'); + } + } + for (const message of messages) { const messageLogger = logger.createChild({ persistentKeys: { @@ -409,6 +482,7 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise - Set specific instance type (e.g., c5.xlarge) + * - ghr-ec2-max-price: - Set maximum spot price + * - ghr-ec2-subnet-id: - Set subnet ID + * - ghr-ec2-availability-zone: - Set availability zone + * - ghr-ec2-availability-zone-id: - Set availability zone ID + * - ghr-ec2-weighted-capacity: - Set weighted capacity + * - ghr-ec2-priority: - Set launch priority + * - ghr-ec2-image-id: - Override AMI ID + * + * Instance Requirements (vCPU & Memory): + * - ghr-ec2-vcpu-count-min: - Set minimum vCPU count + * - ghr-ec2-vcpu-count-max: - Set maximum vCPU count + * - ghr-ec2-memory-mib-min: - Set minimum memory in MiB + * - ghr-ec2-memory-mib-max: - Set maximum memory in MiB + * - ghr-ec2-memory-gib-per-vcpu-min: - Set min memory per vCPU ratio + * - ghr-ec2-memory-gib-per-vcpu-max: - Set max memory per vCPU ratio + * + * Instance Requirements (CPU & Performance): + * - ghr-ec2-cpu-manufacturers: - CPU manufacturers (comma-separated: intel,amd,amazon-web-services) + * - ghr-ec2-instance-generations: - Instance generations (comma-separated: current,previous) + * - ghr-ec2-excluded-instance-types: - Exclude instance types (comma-separated) + * - ghr-ec2-allowed-instance-types: - Allow only specific instance types (comma-separated) + * - ghr-ec2-burstable-performance: - Burstable performance (included,excluded,required) + * - ghr-ec2-bare-metal: - Bare metal (included,excluded,required) + * + * Instance Requirements (Accelerators/GPU): + * - ghr-ec2-accelerator-types: - Accelerator types (comma-separated: gpu,fpga,inference) + * - ghr-ec2-accelerator-count-min: - Set minimum accelerator count + * - ghr-ec2-accelerator-count-max: - Set maximum accelerator count + * - ghr-ec2-accelerator-manufacturers: - Accelerator manufacturers (comma-separated: nvidia,amd,amazon-web-services,xilinx) + * - ghr-ec2-accelerator-names: - Specific accelerator names (comma-separated) + * - ghr-ec2-accelerator-memory-mib-min: - Min accelerator total memory in MiB + * - ghr-ec2-accelerator-memory-mib-max: - Max accelerator total memory in MiB + * + * Instance Requirements (Network & Storage): + * - ghr-ec2-network-interface-count-min: - Min network interfaces + * - ghr-ec2-network-interface-count-max: - Max network interfaces + * - ghr-ec2-network-bandwidth-gbps-min: - Min network bandwidth in Gbps + * - ghr-ec2-network-bandwidth-gbps-max: - Max network bandwidth in Gbps + * - ghr-ec2-local-storage: - Local storage (included,excluded,required) + * - ghr-ec2-local-storage-types: - Local storage types (comma-separated: hdd,ssd) + * - ghr-ec2-total-local-storage-gb-min: - Min total local storage in GB + * - ghr-ec2-total-local-storage-gb-max: - Max total local storage in GB + * - ghr-ec2-baseline-ebs-bandwidth-mbps-min: - Min baseline EBS bandwidth in Mbps + * - ghr-ec2-baseline-ebs-bandwidth-mbps-max: - Max baseline EBS bandwidth in Mbps + * + * Placement: + * - ghr-ec2-placement-group: - Placement group name + * - ghr-ec2-placement-tenancy: - Tenancy (default,dedicated,host) + * - ghr-ec2-placement-host-id: - Dedicated host ID + * - ghr-ec2-placement-affinity: - Affinity (default,host) + * - ghr-ec2-placement-partition-number: - Partition number + * - ghr-ec2-placement-availability-zone: - Placement availability zone + * - ghr-ec2-placement-spread-domain: - Spread domain + * - ghr-ec2-placement-host-resource-group-arn: - Host resource group ARN + * + * Block Device Mappings: + * - ghr-ec2-ebs-volume-size: - EBS volume size in GB + * - ghr-ec2-ebs-volume-type: - EBS volume type (gp2,gp3,io1,io2,st1,sc1) + * - ghr-ec2-ebs-iops: - EBS IOPS + * - ghr-ec2-ebs-throughput: - EBS throughput in MB/s (gp3 only) + * - ghr-ec2-ebs-encrypted: - EBS encryption (true,false) + * - ghr-ec2-ebs-kms-key-id: - KMS key ID for encryption + * - ghr-ec2-ebs-delete-on-termination: - Delete on termination (true,false) + * - ghr-ec2-ebs-snapshot-id: - Snapshot ID for EBS volume + * - ghr-ec2-block-device-virtual-name: - Virtual device name (ephemeral storage) + * - ghr-ec2-block-device-no-device: - Suppresses device mapping + * + * Pricing & Advanced: + * - ghr-ec2-spot-max-price-percentage-over-lowest-price: - Spot max price as % over lowest price + * - ghr-ec2-on-demand-max-price-percentage-over-lowest-price: - On-demand max price as % over lowest price + * - ghr-ec2-max-spot-price-as-percentage-of-optimal-on-demand-price: - Max spot price as % of optimal on-demand + * - ghr-ec2-require-hibernate-support: - Require hibernate support (true,false) + * - ghr-ec2-require-encryption-in-transit: - Require encryption in-transit (true,false) + * - ghr-ec2-baseline-performance-factors-cpu-reference-families: - CPU baseline performance reference families (comma-separated) + * + * Example: + * runs-on: [self-hosted, linux, ghr-ec2-vcpu-count-min:4, ghr-ec2-memory-mib-min:16384, ghr-ec2-accelerator-types:gpu] + * + * @param labels - Array of GitHub workflow job labels + * @returns EC2 override configuration object or undefined if no valid config found + */ +export function parseEc2OverrideConfig(labels: string[]): Ec2OverrideConfig | undefined { + const ec2Labels = labels.filter((l) => l.startsWith('ghr-ec2-')); + const config: Ec2OverrideConfig = {}; + + for (const label of ec2Labels) { + const [key, ...valueParts] = label.replace('ghr-ec2-', '').split(':'); + const value = valueParts.join(':'); + + if (!value) continue; + + // Basic Fleet Overrides + if (key === 'instance-type') { + config.InstanceType = value as _InstanceType; + } else if (key === 'subnet-id') { + config.SubnetId = value; + } else if (key === 'availability-zone') { + config.AvailabilityZone = value; + } else if (key === 'availability-zone-id') { + config.AvailabilityZoneId = value; + } else if (key === 'max-price') { + config.MaxPrice = value; + } else if (key === 'priority') { + config.Priority = parseFloat(value); + } else if (key === 'weighted-capacity') { + config.WeightedCapacity = parseFloat(value); + } else if (key === 'image-id') { + config.ImageId = value; + } + + // Placement + else if (key.startsWith('placement-')) { + config.Placement = config.Placement || ({} as Placement); + const placementKey = key.replace('placement-', ''); + if (placementKey === 'group') { + config.Placement.GroupName = value; + } else if (placementKey === 'tenancy') { + config.Placement.Tenancy = value as Tenancy; + } else if (placementKey === 'host-id') { + config.Placement.HostId = value; + } else if (placementKey === 'affinity') { + config.Placement.Affinity = value; + } else if (placementKey === 'partition-number') { + config.Placement.PartitionNumber = parseInt(value, 10); + } else if (placementKey === 'availability-zone') { + config.Placement.AvailabilityZone = value; + } else if (placementKey === 'spread-domain') { + config.Placement.SpreadDomain = value; + } else if (placementKey === 'host-resource-group-arn') { + config.Placement.HostResourceGroupArn = value; + } + } + + // Block Device Mappings (EBS) + else if (key.startsWith('ebs-')) { + config.BlockDeviceMappings = config.BlockDeviceMappings || ([{}] as FleetBlockDeviceMappingRequest[]); + const ebsKey = key.replace('ebs-', ''); + const ebs = + config.BlockDeviceMappings[0].Ebs || (config.BlockDeviceMappings[0].Ebs = {} as FleetEbsBlockDeviceRequest); + + if (ebsKey === 'volume-size') { + ebs.VolumeSize = parseInt(value, 10); + } else if (ebsKey === 'volume-type') { + ebs.VolumeType = value as VolumeType; + } else if (ebsKey === 'iops') { + ebs.Iops = parseInt(value, 10); + } else if (ebsKey === 'throughput') { + ebs.Throughput = parseInt(value, 10); + } else if (ebsKey === 'encrypted') { + ebs.Encrypted = value.toLowerCase() === 'true'; + } else if (ebsKey === 'kms-key-id') { + ebs.KmsKeyId = value; + } else if (ebsKey === 'delete-on-termination') { + ebs.DeleteOnTermination = value.toLowerCase() === 'true'; + } else if (ebsKey === 'snapshot-id') { + ebs.SnapshotId = value; + } + } + + // Block Device Mappings (Non-EBS) + else if (key === 'block-device-virtual-name') { + config.BlockDeviceMappings = config.BlockDeviceMappings || ([{}] as FleetBlockDeviceMappingRequest[]); + config.BlockDeviceMappings[0].VirtualName = value; + } else if (key === 'block-device-no-device') { + config.BlockDeviceMappings = config.BlockDeviceMappings || ([{}] as FleetBlockDeviceMappingRequest[]); + config.BlockDeviceMappings[0].NoDevice = value; + } + + // Instance Requirements - vCPU & Memory + else if (key.startsWith('vcpu-count-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.VCpuCount = config.InstanceRequirements.VCpuCount || ({} as VCpuCountRangeRequest); + const subKey = key.replace('vcpu-count-', ''); + config.InstanceRequirements.VCpuCount![subKey === 'min' ? 'Min' : 'Max'] = parseInt(value, 10); + } else if (key.startsWith('memory-mib-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.MemoryMiB = config.InstanceRequirements.MemoryMiB || ({} as MemoryMiBRequest); + const subKey = key.replace('memory-mib-', ''); + config.InstanceRequirements.MemoryMiB![subKey === 'min' ? 'Min' : 'Max'] = parseInt(value, 10); + } else if (key.startsWith('memory-gib-per-vcpu-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.MemoryGiBPerVCpu = + config.InstanceRequirements.MemoryGiBPerVCpu || ({} as MemoryGiBPerVCpuRequest); + const subKey = key.replace('memory-gib-per-vcpu-', ''); + config.InstanceRequirements.MemoryGiBPerVCpu![subKey === 'min' ? 'Min' : 'Max'] = parseFloat(value); + } + + // Instance Requirements - CPU & Performance + else if (key === 'cpu-manufacturers') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.CpuManufacturers = value.split(',') as CpuManufacturer[]; + } else if (key === 'instance-generations') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.InstanceGenerations = value.split(',') as InstanceGeneration[]; + } else if (key === 'excluded-instance-types') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.ExcludedInstanceTypes = value.split(','); + } else if (key === 'allowed-instance-types') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.AllowedInstanceTypes = value.split(','); + } else if (key === 'burstable-performance') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.BurstablePerformance = value as BurstablePerformance; + } else if (key === 'bare-metal') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.BareMetal = value as BareMetal; + } + + // Instance Requirements - Accelerators + else if (key.startsWith('accelerator-count-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.AcceleratorCount = + config.InstanceRequirements.AcceleratorCount || ({} as AcceleratorCountRequest); + const subKey = key.replace('accelerator-count-', ''); + config.InstanceRequirements.AcceleratorCount![subKey === 'min' ? 'Min' : 'Max'] = parseInt(value, 10); + } else if (key === 'accelerator-types') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.AcceleratorTypes = value.split(',') as AcceleratorType[]; + } else if (key === 'accelerator-manufacturers') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.AcceleratorManufacturers = value.split(',') as AcceleratorManufacturer[]; + } else if (key === 'accelerator-names') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.AcceleratorNames = value.split(',') as AcceleratorName[]; + } else if (key.startsWith('accelerator-total-memory-mib-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.AcceleratorTotalMemoryMiB = + config.InstanceRequirements.AcceleratorTotalMemoryMiB || ({} as AcceleratorTotalMemoryMiBRequest); + const subKey = key.replace('accelerator-total-memory-mib-', ''); + config.InstanceRequirements.AcceleratorTotalMemoryMiB![subKey === 'min' ? 'Min' : 'Max'] = parseInt(value, 10); + } + + // Instance Requirements - Network + else if (key.startsWith('network-interface-count-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.NetworkInterfaceCount = + config.InstanceRequirements.NetworkInterfaceCount || ({} as NetworkInterfaceCountRequest); + const subKey = key.replace('network-interface-count-', ''); + config.InstanceRequirements.NetworkInterfaceCount![subKey === 'min' ? 'Min' : 'Max'] = parseInt(value, 10); + } else if (key.startsWith('network-bandwidth-gbps-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.NetworkBandwidthGbps = + config.InstanceRequirements.NetworkBandwidthGbps || ({} as NetworkBandwidthGbpsRequest); + const subKey = key.replace('network-bandwidth-gbps-', ''); + config.InstanceRequirements.NetworkBandwidthGbps![subKey === 'min' ? 'Min' : 'Max'] = parseFloat(value); + } + + // Instance Requirements - Storage + else if (key === 'local-storage') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.LocalStorage = value as LocalStorage; + } else if (key === 'local-storage-types') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.LocalStorageTypes = value.split(',') as LocalStorageType[]; + } else if (key.startsWith('total-local-storage-gb-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.TotalLocalStorageGB = + config.InstanceRequirements.TotalLocalStorageGB || ({} as TotalLocalStorageGBRequest); + const subKey = key.replace('total-local-storage-gb-', ''); + config.InstanceRequirements.TotalLocalStorageGB![subKey === 'min' ? 'Min' : 'Max'] = parseFloat(value); + } else if (key.startsWith('baseline-ebs-bandwidth-mbps-')) { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.BaselineEbsBandwidthMbps = + config.InstanceRequirements.BaselineEbsBandwidthMbps || ({} as BaselineEbsBandwidthMbpsRequest); + const subKey = key.replace('baseline-ebs-bandwidth-mbps-', ''); + config.InstanceRequirements.BaselineEbsBandwidthMbps![subKey === 'min' ? 'Min' : 'Max'] = parseInt(value, 10); + } + + // Instance Requirements - Pricing & Other + else if (key === 'spot-max-price-percentage-over-lowest-price') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.SpotMaxPricePercentageOverLowestPrice = parseInt(value, 10); + } else if (key === 'on-demand-max-price-percentage-over-lowest-price') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.OnDemandMaxPricePercentageOverLowestPrice = parseInt(value, 10); + } else if (key === 'max-spot-price-as-percentage-of-optimal-on-demand-price') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.MaxSpotPriceAsPercentageOfOptimalOnDemandPrice = parseInt(value, 10); + } else if (key === 'require-hibernate-support') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.RequireHibernateSupport = value.toLowerCase() === 'true'; + } else if (key === 'require-encryption-in-transit') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.RequireEncryptionInTransit = value.toLowerCase() === 'true'; + } else if (key === 'baseline-performance-factors-cpu-reference-families') { + config.InstanceRequirements = config.InstanceRequirements || ({} as InstanceRequirementsRequest); + config.InstanceRequirements.BaselinePerformanceFactors = + config.InstanceRequirements.BaselinePerformanceFactors || ({} as BaselinePerformanceFactorsRequest); + config.InstanceRequirements.BaselinePerformanceFactors.Cpu = + config.InstanceRequirements.BaselinePerformanceFactors.Cpu || ({} as CpuPerformanceFactorRequest); + config.InstanceRequirements.BaselinePerformanceFactors.Cpu.References = value + .split(',') + .map((family) => ({ InstanceFamily: family })) as PerformanceFactorReferenceRequest[]; + } + } + + return Object.keys(config).length > 0 ? config : undefined; +} + +function labelsHash(labels: string[]): string { + const prefix = 'ghr-'; + + const input = labels + .filter((l) => l.startsWith(prefix)) + .sort() // ensure deterministic hash + .join('|'); + + let hash = 0; + for (let i = 0; i < input.length; i++) { + hash = (hash << 5) - hash + input.charCodeAt(i); + hash |= 0; // force 32-bit integer + } + + return Math.abs(hash).toString(36); +} diff --git a/lambdas/functions/webhook/src/ConfigLoader.ts b/lambdas/functions/webhook/src/ConfigLoader.ts index e77a92b16e..df7b159495 100644 --- a/lambdas/functions/webhook/src/ConfigLoader.ts +++ b/lambdas/functions/webhook/src/ConfigLoader.ts @@ -130,9 +130,11 @@ export class ConfigWebhook extends MatcherAwareConfig { repositoryAllowList: string[] = []; webhookSecret: string = ''; workflowJobEventSecondaryQueue: string = ''; + enableDynamicLabels: boolean = false; async loadConfig(): Promise { this.loadEnvVar(process.env.REPOSITORY_ALLOW_LIST, 'repositoryAllowList', []); + this.loadEnvVar(process.env.ENABLE_DYNAMIC_LABELS, 'enableDynamicLabels', false); await Promise.all([ this.loadMatcherConfig(process.env.PARAMETER_RUNNER_MATCHER_CONFIG_PATH), @@ -162,9 +164,11 @@ export class ConfigWebhookEventBridge extends BaseConfig { export class ConfigDispatcher extends MatcherAwareConfig { repositoryAllowList: string[] = []; workflowJobEventSecondaryQueue: string = ''; // Deprecated + enableDynamicLabels: boolean = false; async loadConfig(): Promise { this.loadEnvVar(process.env.REPOSITORY_ALLOW_LIST, 'repositoryAllowList', []); + this.loadEnvVar(process.env.ENABLE_DYNAMIC_LABELS, 'enableDynamicLabels', false); await this.loadMatcherConfig(process.env.PARAMETER_RUNNER_MATCHER_CONFIG_PATH); validateRunnerMatcherConfig(this); diff --git a/lambdas/functions/webhook/src/modules.d.ts b/lambdas/functions/webhook/src/modules.d.ts index 76a72660c0..9d73cf5815 100644 --- a/lambdas/functions/webhook/src/modules.d.ts +++ b/lambdas/functions/webhook/src/modules.d.ts @@ -5,6 +5,7 @@ declare namespace NodeJS { PARAMETER_GITHUB_APP_WEBHOOK_SECRET: string; PARAMETER_RUNNER_MATCHER_CONFIG_PATH: string; REPOSITORY_ALLOW_LIST: string; + ENABLE_DYNAMIC_LABELS: string; RUNNER_LABELS: string; ACCEPT_EVENTS: string; } diff --git a/lambdas/functions/webhook/src/runners/dispatch.test.ts b/lambdas/functions/webhook/src/runners/dispatch.test.ts index e8eff9be4c..79f3f5b870 100644 --- a/lambdas/functions/webhook/src/runners/dispatch.test.ts +++ b/lambdas/functions/webhook/src/runners/dispatch.test.ts @@ -103,6 +103,7 @@ describe('Dispatcher', () => { installationId: 0, queueId: runnerConfig[0].id, repoOwnerType: 'Organization', + labels: ['self-hosted', 'Test'], }); }); @@ -150,6 +151,7 @@ describe('Dispatcher', () => { installationId: 0, queueId: 'match', repoOwnerType: 'Organization', + labels: ['self-hosted', 'match'], }); }); @@ -181,49 +183,49 @@ describe('Dispatcher', () => { it('should accept job with an exact match and identical labels.', () => { const workflowLabels = ['self-hosted', 'linux', 'x64', 'ubuntu-latest']; const runnerLabels = [['self-hosted', 'linux', 'x64', 'ubuntu-latest']]; - expect(canRunJob(workflowLabels, runnerLabels, true)).toBe(true); + expect(canRunJob(workflowLabels, runnerLabels, true, false)).toBe(true); }); it('should accept job with an exact match and identical labels, ignoring cases.', () => { const workflowLabels = ['self-Hosted', 'Linux', 'X64', 'ubuntu-Latest']; const runnerLabels = [['self-hosted', 'linux', 'x64', 'ubuntu-latest']]; - expect(canRunJob(workflowLabels, runnerLabels, true)).toBe(true); + expect(canRunJob(workflowLabels, runnerLabels, true, false)).toBe(true); }); it('should accept job with an exact match and runner supports requested capabilities.', () => { const workflowLabels = ['self-hosted', 'linux', 'x64']; const runnerLabels = [['self-hosted', 'linux', 'x64', 'ubuntu-latest']]; - expect(canRunJob(workflowLabels, runnerLabels, true)).toBe(true); + expect(canRunJob(workflowLabels, runnerLabels, true, false)).toBe(true); }); it('should NOT accept job with an exact match and runner not matching requested capabilities.', () => { const workflowLabels = ['self-hosted', 'linux', 'x64', 'ubuntu-latest']; const runnerLabels = [['self-hosted', 'linux', 'x64']]; - expect(canRunJob(workflowLabels, runnerLabels, true)).toBe(false); + expect(canRunJob(workflowLabels, runnerLabels, true, false)).toBe(false); }); it('should accept job with for a non exact match. Any label that matches will accept the job.', () => { const workflowLabels = ['self-hosted', 'linux', 'x64', 'ubuntu-latest', 'gpu']; const runnerLabels = [['gpu']]; - expect(canRunJob(workflowLabels, runnerLabels, false)).toBe(true); + expect(canRunJob(workflowLabels, runnerLabels, false, false)).toBe(true); }); it('should NOT accept job with for an exact match. Not all requested capabilities are supported.', () => { const workflowLabels = ['self-hosted', 'linux', 'x64', 'ubuntu-latest', 'gpu']; const runnerLabels = [['gpu']]; - expect(canRunJob(workflowLabels, runnerLabels, true)).toBe(false); + expect(canRunJob(workflowLabels, runnerLabels, true, false)).toBe(false); }); - it('should not accept jobs not providing labels if exact match is.', () => { - const workflowLabels: string[] = []; + it('should filter out ghr- and ghr-run- labels when enableDynamicLabels is true.', () => { + const workflowLabels = ['self-hosted', 'linux', 'x64', 'ghr-ec2-instance-type:t3.large', 'ghr-run-id:12345']; const runnerLabels = [['self-hosted', 'linux', 'x64']]; - expect(canRunJob(workflowLabels, runnerLabels, true)).toBe(false); + expect(canRunJob(workflowLabels, runnerLabels, true, true)).toBe(true); }); - it('should accept jobs not providing labels and exact match is set to false.', () => { - const workflowLabels: string[] = []; + it('should NOT filter out ghr- and ghr-run- labels when enableDynamicLabels is false.', () => { + const workflowLabels = ['self-hosted', 'linux', 'x64', 'ghr-ec2-instance-type:t3.large']; const runnerLabels = [['self-hosted', 'linux', 'x64']]; - expect(canRunJob(workflowLabels, runnerLabels, false)).toBe(true); + expect(canRunJob(workflowLabels, runnerLabels, true, false)).toBe(false); }); }); }); diff --git a/lambdas/functions/webhook/src/runners/dispatch.ts b/lambdas/functions/webhook/src/runners/dispatch.ts index fe81e63a26..50654e68cc 100644 --- a/lambdas/functions/webhook/src/runners/dispatch.ts +++ b/lambdas/functions/webhook/src/runners/dispatch.ts @@ -15,7 +15,7 @@ export async function dispatch( ): Promise { validateRepoInAllowList(event, config); - return await handleWorkflowJob(event, eventType, config.matcherConfig!); + return await handleWorkflowJob(event, eventType, config.matcherConfig!, config.enableDynamicLabels); } function validateRepoInAllowList(event: WorkflowJobEvent, config: ConfigDispatcher) { @@ -29,6 +29,7 @@ async function handleWorkflowJob( body: WorkflowJobEvent, githubEvent: string, matcherConfig: Array, + enableDynamicLabels: boolean, ): Promise { if (body.action !== 'queued') { return { @@ -47,7 +48,14 @@ async function handleWorkflowJob( return a.matcherConfig.exactMatch === b.matcherConfig.exactMatch ? 0 : a.matcherConfig.exactMatch ? -1 : 1; }); for (const queue of matcherConfig) { - if (canRunJob(body.workflow_job.labels, queue.matcherConfig.labelMatchers, queue.matcherConfig.exactMatch)) { + if ( + canRunJob( + body.workflow_job.labels, + queue.matcherConfig.labelMatchers, + queue.matcherConfig.exactMatch, + enableDynamicLabels, + ) + ) { await sendActionRequest({ id: body.workflow_job.id, repositoryName: body.repository.name, @@ -56,6 +64,7 @@ async function handleWorkflowJob( installationId: body.installation?.id ?? 0, queueId: queue.id, repoOwnerType: body.repository.owner.type, + labels: body.workflow_job.labels, }); logger.info( `Successfully dispatched job for ${body.repository.full_name} to the queue ${queue.id} - ` + @@ -80,14 +89,20 @@ export function canRunJob( workflowJobLabels: string[], runnerLabelsMatchers: string[][], workflowLabelCheckAll: boolean, + enableDynamicLabels: boolean, ): boolean { + // Filter out ghr- and ghr-run- labels only if dynamic labels config is enabled + const filteredLabels = enableDynamicLabels + ? workflowJobLabels.filter((label) => !label.startsWith('ghr-')) + : workflowJobLabels; + runnerLabelsMatchers = runnerLabelsMatchers.map((runnerLabel) => { return runnerLabel.map((label) => label.toLowerCase()); }); const matchLabels = workflowLabelCheckAll - ? runnerLabelsMatchers.some((rl) => workflowJobLabels.every((wl) => rl.includes(wl.toLowerCase()))) - : runnerLabelsMatchers.some((rl) => workflowJobLabels.some((wl) => rl.includes(wl.toLowerCase()))); - const match = workflowJobLabels.length === 0 ? !matchLabels : matchLabels; + ? runnerLabelsMatchers.some((rl) => filteredLabels.every((wl) => rl.includes(wl.toLowerCase()))) + : runnerLabelsMatchers.some((rl) => filteredLabels.some((wl) => rl.includes(wl.toLowerCase()))); + const match = filteredLabels.length === 0 ? !matchLabels : matchLabels; logger.debug( `Received workflow job event with labels: '${JSON.stringify(workflowJobLabels)}'. The event does ${ diff --git a/lambdas/functions/webhook/src/sqs/index.ts b/lambdas/functions/webhook/src/sqs/index.ts index a028d7dcc4..ecf31f1cfd 100644 --- a/lambdas/functions/webhook/src/sqs/index.ts +++ b/lambdas/functions/webhook/src/sqs/index.ts @@ -12,6 +12,7 @@ export interface ActionRequestMessage { installationId: number; queueId: string; repoOwnerType: string; + labels?: string[]; } export interface MatcherConfig { diff --git a/main.tf b/main.tf index a9a79c87a3..63b5725d06 100644 --- a/main.tf +++ b/main.tf @@ -137,6 +137,7 @@ module "webhook" { logging_retention_in_days = var.logging_retention_in_days logging_kms_key_id = var.logging_kms_key_id log_class = var.log_class + enable_dynamic_labels = var.enable_dynamic_labels role_path = var.role_path role_permissions_boundary = var.role_permissions_boundary @@ -185,8 +186,9 @@ module "runners" { github_app_parameters = local.github_app_parameters enable_organization_runners = var.enable_organization_runners enable_ephemeral_runners = var.enable_ephemeral_runners - enable_jit_config = var.enable_jit_config + enable_dynamic_labels = var.enable_dynamic_labels enable_job_queued_check = var.enable_job_queued_check + enable_jit_config = var.enable_jit_config enable_on_demand_failover_for_errors = var.enable_runner_on_demand_failover_for_errors scale_errors = var.scale_errors disable_runner_autoupdate = var.disable_runner_autoupdate diff --git a/modules/multi-runner/README.md b/modules/multi-runner/README.md index 7a050cdeee..0488db417f 100644 --- a/modules/multi-runner/README.md +++ b/modules/multi-runner/README.md @@ -127,6 +127,7 @@ module "multi-runner" { | [aws\_region](#input\_aws\_region) | AWS region. | `string` | n/a | yes | | [cloudwatch\_config](#input\_cloudwatch\_config) | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | `string` | `null` | no | | [enable\_ami\_housekeeper](#input\_enable\_ami\_housekeeper) | Option to disable the lambda to clean up old AMIs. | `bool` | `false` | no | +| [enable\_dynamic\_labels](#input\_enable\_dynamic\_labels) | Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners. | `bool` | `false` | no | | [enable\_managed\_runner\_security\_group](#input\_enable\_managed\_runner\_security\_group) | Enabling the default managed security group creation. Unmanaged security groups can be specified via `runner_additional_security_group_ids`. | `bool` | `true` | no | | [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling. | object({ enable = optional(bool, true) accept_events = optional(list(string), []) }) | `{}` | no | | [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no | @@ -151,7 +152,7 @@ module "multi-runner" { | [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no | | [matcher\_config\_parameter\_store\_tier](#input\_matcher\_config\_parameter\_store\_tier) | The tier of the parameter store for the matcher configuration. Valid values are `Standard`, and `Advanced`. | `string` | `"Standard"` | no | | [metrics](#input\_metrics) | Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. | object({ enable = optional(bool, false) namespace = optional(string, "GitHub Runners") metric = optional(object({ enable_github_app_rate_limit = optional(bool, true) enable_job_retry = optional(bool, true) enable_spot_termination_warning = optional(bool, true) }), {}) }) | `{}` | no | -| [multi\_runner\_config](#input\_multi\_runner\_config) | multi\_runner\_config = { runner\_config: { runner\_os: "The EC2 Operating System type to use for action runner instances (linux,windows)." runner\_architecture: "The platform architecture of the runner instance\_type." runner\_metadata\_options: "(Optional) Metadata options for the ec2 runner instances." ami: "(Optional) AMI configuration for the action runner instances. This object allows you to specify all AMI-related settings in one place." create\_service\_linked\_role\_spot: (Optional) create the serviced linked role for spot instances that is required by the scale-up lambda. credit\_specification: "(Optional) The credit specification of the runner instance\_type. Can be unset, `standard` or `unlimited`. delay\_webhook\_event: "The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event." disable\_runner\_autoupdate: "Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the [GitHub article](https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/)" ebs\_optimized: "The EC2 EBS optimized configuration." enable\_ephemeral\_runners: "Enable ephemeral runners, runners will only be used once." enable\_job\_queued\_check: "Enables JIT configuration for creating runners instead of registration token based registraton. JIT configuration will only be applied for ephemeral runners. By default JIT configuration is enabled for ephemeral runners an can be disabled via this override. When running on GHES without support for JIT configuration this variable should be set to true for ephemeral runners." enable\_on\_demand\_failover\_for\_errors: "Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to `InsufficientInstanceCapacity`. When not defined the default behavior is to retry later." scale\_errors: "List of aws error codes that should trigger retry during scale up. This list will replace the default errors defined in the variable `defaultScaleErrors` in https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/lambdas/functions/control-plane/src/aws/runners.ts" enable\_organization\_runners: "Register runners to organization, instead of repo level" enable\_runner\_binaries\_syncer: "Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI." enable\_ssm\_on\_runners: "Enable to allow access the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances." enable\_userdata: "Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI." instance\_allocation\_strategy: "The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`." instance\_max\_spot\_price: "Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet." instance\_target\_capacity\_type: "Default lifecycle used for runner instances, can be either `spot` or `on-demand`." instance\_types: "List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win)." job\_queue\_retention\_in\_seconds: "The number of seconds the job is held in the queue before it is purged" minimum\_running\_time\_in\_minutes: "The time an ec2 action runner should be running at minimum before terminated if not busy." pool\_runner\_owner: "The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported." runner\_additional\_security\_group\_ids: "List of additional security groups IDs to apply to the runner. If added outside the multi\_runner\_config block, the additional security group(s) will be applied to all runner configs. If added inside the multi\_runner\_config, the additional security group(s) will be applied to the individual runner." runner\_as\_root: "Run the action runner under the root user. Variable `runner_run_as` will be ignored." runner\_boot\_time\_in\_minutes: "The minimum time for an EC2 runner to boot and register as a runner." runner\_disable\_default\_labels: "Disable default labels for the runners (os, architecture and `self-hosted`). If enabled, the runner will only have the extra labels provided in `runner_extra_labels`. In case you on own start script is used, this configuration parameter needs to be parsed via SSM." runner\_extra\_labels: "Extra (custom) labels for the runners (GitHub). Separate each label by a comma. Labels checks on the webhook can be enforced by setting `multi_runner_config.matcherConfig.exactMatch`. GitHub read-only labels should not be provided." runner\_group\_name: "Name of the runner group." runner\_name\_prefix: "Prefix for the GitHub runner name." runner\_run\_as: "Run the GitHub actions agent as user." runners\_maximum\_count: "The maximum number of runners that will be created. Setting the variable to `-1` desiables the maximum check." scale\_down\_schedule\_expression: "Scheduler expression to check every x for scale down." scale\_up\_reserved\_concurrent\_executions: "Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations." userdata\_template: "Alternative user-data template, replacing the default template. By providing your own user\_data you have to take care of installing all required software, including the action runner. Variables userdata\_pre/post\_install are ignored." enable\_jit\_config "Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is available. In case you are upgrading from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI." enable\_runner\_detailed\_monitoring: "Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details." enable\_cloudwatch\_agent: "Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`." cloudwatch\_config: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." userdata\_pre\_install: "Script to be ran before the GitHub Actions runner is installed on the EC2 instances" userdata\_post\_install: "Script to be ran after the GitHub Actions runner is installed on the EC2 instances" runner\_hook\_job\_started: "Script to be ran in the runner environment at the beginning of every job" runner\_hook\_job\_completed: "Script to be ran in the runner environment at the end of every job" runner\_ec2\_tags: "Map of tags that will be added to the launch template instance tag specifications." runner\_iam\_role\_managed\_policy\_arns: "Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role" vpc\_id: "The VPC for security groups of the action runners. If not set uses the value of `var.vpc_id`." subnet\_ids: "List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`. If not set, uses the value of `var.subnet_ids`." idle\_config: "List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle." runner\_log\_files: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." block\_device\_mappings: "The EC2 instance block device configuration. Takes the following keys: `device_name`, `delete_on_termination`, `volume_type`, `volume_size`, `encrypted`, `iops`, `throughput`, `kms_key_id`, `snapshot_id`." job\_retry: "Experimental! Can be removed / changed without trigger a major release. Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the instances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the rate limit of the GitHub app." pool\_config: "The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone` to override the schedule time zone (defaults to UTC)." } matcherConfig: { labelMatchers: "The list of list of labels supported by the runner configuration. `[[self-hosted, linux, x64, example]]`" exactMatch: "If set to true all labels in the workflow job must match the GitHub labels (os, architecture and `self-hosted`). When false if __any__ workflow label matches it will trigger the webhook." priority: "If set it defines the priority of the matcher, the matcher with the lowest priority will be evaluated first. Default is 999, allowed values 0-999." } redrive\_build\_queue: "Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting `enabled` to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries." } | map(object({ runner_config = object({ runner_os = string runner_architecture = string runner_metadata_options = optional(map(any), { instance_metadata_tags = "enabled" http_endpoint = "enabled" http_tokens = "required" http_put_response_hop_limit = 1 }) ami = optional(object({ filter = optional(map(list(string)), { state = ["available"] }) owners = optional(list(string), ["amazon"]) id_ssm_parameter_arn = optional(string, null) kms_key_arn = optional(string, null) }), null) create_service_linked_role_spot = optional(bool, false) credit_specification = optional(string, null) delay_webhook_event = optional(number, 30) disable_runner_autoupdate = optional(bool, false) ebs_optimized = optional(bool, false) enable_ephemeral_runners = optional(bool, false) enable_job_queued_check = optional(bool, null) enable_on_demand_failover_for_errors = optional(list(string), []) scale_errors = optional(list(string), [ "UnfulfillableCapacity", "MaxSpotInstanceCountExceeded", "TargetCapacityLimitExceededException", "RequestLimitExceeded", "ResourceLimitExceeded", "MaxSpotInstanceCountExceeded", "MaxSpotFleetRequestCountExceeded", "InsufficientInstanceCapacity", "InsufficientCapacityOnHost", ]) enable_organization_runners = optional(bool, false) enable_runner_binaries_syncer = optional(bool, true) enable_ssm_on_runners = optional(bool, false) enable_userdata = optional(bool, true) instance_allocation_strategy = optional(string, "lowest-price") instance_max_spot_price = optional(string, null) instance_target_capacity_type = optional(string, "spot") instance_types = list(string) job_queue_retention_in_seconds = optional(number, 86400) minimum_running_time_in_minutes = optional(number, null) pool_runner_owner = optional(string, null) runner_as_root = optional(bool, false) runner_boot_time_in_minutes = optional(number, 5) runner_disable_default_labels = optional(bool, false) runner_extra_labels = optional(list(string), []) runner_group_name = optional(string, "Default") runner_name_prefix = optional(string, "") runner_run_as = optional(string, "ec2-user") runners_maximum_count = number runner_additional_security_group_ids = optional(list(string), []) scale_down_schedule_expression = optional(string, "cron(*/5 * * * ? *)") scale_up_reserved_concurrent_executions = optional(number, 1) userdata_template = optional(string, null) userdata_content = optional(string, null) enable_jit_config = optional(bool, null) enable_runner_detailed_monitoring = optional(bool, false) enable_cloudwatch_agent = optional(bool, true) cloudwatch_config = optional(string, null) userdata_pre_install = optional(string, "") userdata_post_install = optional(string, "") runner_hook_job_started = optional(string, "") runner_hook_job_completed = optional(string, "") runner_ec2_tags = optional(map(string), {}) runner_iam_role_managed_policy_arns = optional(list(string), []) vpc_id = optional(string, null) subnet_ids = optional(list(string), null) idle_config = optional(list(object({ cron = string timeZone = string idleCount = number evictionStrategy = optional(string, "oldest_first") })), []) cpu_options = optional(object({ core_count = number threads_per_core = number }), null) placement = optional(object({ affinity = optional(string) availability_zone = optional(string) group_id = optional(string) group_name = optional(string) host_id = optional(string) host_resource_group_arn = optional(string) spread_domain = optional(string) tenancy = optional(string) partition_number = optional(number) }), null) runner_log_files = optional(list(object({ log_group_name = string prefix_log_group = bool file_path = string log_stream_name = string log_class = optional(string, "STANDARD") })), null) block_device_mappings = optional(list(object({ delete_on_termination = optional(bool, true) device_name = optional(string, "/dev/xvda") encrypted = optional(bool, true) iops = optional(number) kms_key_id = optional(string) snapshot_id = optional(string) throughput = optional(number) volume_size = number volume_type = optional(string, "gp3") })), [{ volume_size = 30 }]) pool_config = optional(list(object({ schedule_expression = string schedule_expression_timezone = optional(string) size = number })), []) job_retry = optional(object({ enable = optional(bool, false) delay_in_seconds = optional(number, 300) delay_backoff = optional(number, 2) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 30) max_attempts = optional(number, 1) }), {}) }) matcherConfig = object({ labelMatchers = list(list(string)) exactMatch = optional(bool, false) priority = optional(number, 999) }) redrive_build_queue = optional(object({ enabled = bool maxReceiveCount = number }), { enabled = false maxReceiveCount = null }) })) | n/a | yes | +| [multi\_runner\_config](#input\_multi\_runner\_config) | multi\_runner\_config = { runner\_config: { runner\_os: "The EC2 Operating System type to use for action runner instances (linux,windows)." runner\_architecture: "The platform architecture of the runner instance\_type." runner\_metadata\_options: "(Optional) Metadata options for the ec2 runner instances." ami: "(Optional) AMI configuration for the action runner instances. This object allows you to specify all AMI-related settings in one place." create\_service\_linked\_role\_spot: (Optional) create the serviced linked role for spot instances that is required by the scale-up lambda. credit\_specification: "(Optional) The credit specification of the runner instance\_type. Can be unset, `standard` or `unlimited`. delay\_webhook\_event: "The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event." disable\_runner\_autoupdate: "Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the [GitHub article](https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/)" ebs\_optimized: "The EC2 EBS optimized configuration." enable\_ephemeral\_runners: "Enable ephemeral runners, runners will only be used once." enable\_dynamic\_labels: "Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners." enable\_job\_queued\_check: Enables JIT configuration for creating runners instead of registration token based registraton. JIT configuration will only be applied for ephemeral runners. By default JIT configuration is enabled for ephemeral runners an can be disabled via this override. When running on GHES without support for JIT configuration this variable should be set to true for ephemeral runners." enable\_on\_demand\_failover\_for\_errors: "Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to `InsufficientInstanceCapacity`. When not defined the default behavior is to retry later." scale\_errors: "List of aws error codes that should trigger retry during scale up. This list will replace the default errors defined in the variable `defaultScaleErrors` in https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/lambdas/functions/control-plane/src/aws/runners.ts" enable\_organization\_runners: "Register runners to organization, instead of repo level" enable\_runner\_binaries\_syncer: "Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI." enable\_ssm\_on\_runners: "Enable to allow access the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances." enable\_userdata: "Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI." instance\_allocation\_strategy: "The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`." instance\_max\_spot\_price: "Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet." instance\_target\_capacity\_type: "Default lifecycle used for runner instances, can be either `spot` or `on-demand`." instance\_types: "List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win)." job\_queue\_retention\_in\_seconds: "The number of seconds the job is held in the queue before it is purged" minimum\_running\_time\_in\_minutes: "The time an ec2 action runner should be running at minimum before terminated if not busy." pool\_runner\_owner: "The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported." runner\_additional\_security\_group\_ids: "List of additional security groups IDs to apply to the runner. If added outside the multi\_runner\_config block, the additional security group(s) will be applied to all runner configs. If added inside the multi\_runner\_config, the additional security group(s) will be applied to the individual runner." runner\_as\_root: "Run the action runner under the root user. Variable `runner_run_as` will be ignored." runner\_boot\_time\_in\_minutes: "The minimum time for an EC2 runner to boot and register as a runner." runner\_disable\_default\_labels: "Disable default labels for the runners (os, architecture and `self-hosted`). If enabled, the runner will only have the extra labels provided in `runner_extra_labels`. In case you on own start script is used, this configuration parameter needs to be parsed via SSM." runner\_extra\_labels: "Extra (custom) labels for the runners (GitHub). Separate each label by a comma. Labels checks on the webhook can be enforced by setting `multi_runner_config.matcherConfig.exactMatch`. GitHub read-only labels should not be provided." runner\_group\_name: "Name of the runner group." runner\_name\_prefix: "Prefix for the GitHub runner name." runner\_run\_as: "Run the GitHub actions agent as user." runners\_maximum\_count: "The maximum number of runners that will be created. Setting the variable to `-1` desiables the maximum check." scale\_down\_schedule\_expression: "Scheduler expression to check every x for scale down." scale\_up\_reserved\_concurrent\_executions: "Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations." userdata\_template: "Alternative user-data template, replacing the default template. By providing your own user\_data you have to take care of installing all required software, including the action runner. Variables userdata\_pre/post\_install are ignored." enable\_jit\_config "Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is available. In case you are upgrading from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI." enable\_runner\_detailed\_monitoring: "Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details." enable\_cloudwatch\_agent: "Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`." cloudwatch\_config: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." userdata\_pre\_install: "Script to be ran before the GitHub Actions runner is installed on the EC2 instances" userdata\_post\_install: "Script to be ran after the GitHub Actions runner is installed on the EC2 instances" runner\_hook\_job\_started: "Script to be ran in the runner environment at the beginning of every job" runner\_hook\_job\_completed: "Script to be ran in the runner environment at the end of every job" runner\_ec2\_tags: "Map of tags that will be added to the launch template instance tag specifications." runner\_iam\_role\_managed\_policy\_arns: "Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role" vpc\_id: "The VPC for security groups of the action runners. If not set uses the value of `var.vpc_id`." subnet\_ids: "List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`. If not set, uses the value of `var.subnet_ids`." idle\_config: "List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle." runner\_log\_files: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." block\_device\_mappings: "The EC2 instance block device configuration. Takes the following keys: `device_name`, `delete_on_termination`, `volume_type`, `volume_size`, `encrypted`, `iops`, `throughput`, `kms_key_id`, `snapshot_id`." job\_retry: "Experimental! Can be removed / changed without trigger a major release. Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the instances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the rate limit of the GitHub app." pool\_config: "The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone` to override the schedule time zone (defaults to UTC)." } matcherConfig: { labelMatchers: "The list of list of labels supported by the runner configuration. `[[self-hosted, linux, x64, example]]`" exactMatch: "If set to true all labels in the workflow job must match the GitHub labels (os, architecture and `self-hosted`). When false if __any__ workflow label matches it will trigger the webhook." priority: "If set it defines the priority of the matcher, the matcher with the lowest priority will be evaluated first. Default is 999, allowed values 0-999." } redrive\_build\_queue: "Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting `enabled` to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries." } | map(object({ runner_config = object({ runner_os = string runner_architecture = string runner_metadata_options = optional(map(any), { instance_metadata_tags = "enabled" http_endpoint = "enabled" http_tokens = "required" http_put_response_hop_limit = 1 }) ami = optional(object({ filter = optional(map(list(string)), { state = ["available"] }) owners = optional(list(string), ["amazon"]) id_ssm_parameter_arn = optional(string, null) kms_key_arn = optional(string, null) }), null) create_service_linked_role_spot = optional(bool, false) credit_specification = optional(string, null) delay_webhook_event = optional(number, 30) disable_runner_autoupdate = optional(bool, false) ebs_optimized = optional(bool, false) enable_ephemeral_runners = optional(bool, false) enable_job_queued_check = optional(bool, null) enable_on_demand_failover_for_errors = optional(list(string), []) scale_errors = optional(list(string), [ "UnfulfillableCapacity", "MaxSpotInstanceCountExceeded", "TargetCapacityLimitExceededException", "RequestLimitExceeded", "ResourceLimitExceeded", "MaxSpotInstanceCountExceeded", "MaxSpotFleetRequestCountExceeded", "InsufficientInstanceCapacity", "InsufficientCapacityOnHost", ]) enable_organization_runners = optional(bool, false) enable_runner_binaries_syncer = optional(bool, true) enable_ssm_on_runners = optional(bool, false) enable_userdata = optional(bool, true) instance_allocation_strategy = optional(string, "lowest-price") instance_max_spot_price = optional(string, null) instance_target_capacity_type = optional(string, "spot") instance_types = list(string) job_queue_retention_in_seconds = optional(number, 86400) minimum_running_time_in_minutes = optional(number, null) pool_runner_owner = optional(string, null) runner_as_root = optional(bool, false) runner_boot_time_in_minutes = optional(number, 5) runner_disable_default_labels = optional(bool, false) runner_extra_labels = optional(list(string), []) runner_group_name = optional(string, "Default") runner_name_prefix = optional(string, "") runner_run_as = optional(string, "ec2-user") runners_maximum_count = number runner_additional_security_group_ids = optional(list(string), []) scale_down_schedule_expression = optional(string, "cron(*/5 * * * ? *)") scale_up_reserved_concurrent_executions = optional(number, 1) userdata_template = optional(string, null) userdata_content = optional(string, null) enable_jit_config = optional(bool, null) enable_runner_detailed_monitoring = optional(bool, false) enable_cloudwatch_agent = optional(bool, true) cloudwatch_config = optional(string, null) userdata_pre_install = optional(string, "") userdata_post_install = optional(string, "") runner_hook_job_started = optional(string, "") runner_hook_job_completed = optional(string, "") runner_ec2_tags = optional(map(string), {}) runner_iam_role_managed_policy_arns = optional(list(string), []) vpc_id = optional(string, null) subnet_ids = optional(list(string), null) idle_config = optional(list(object({ cron = string timeZone = string idleCount = number evictionStrategy = optional(string, "oldest_first") })), []) cpu_options = optional(object({ core_count = number threads_per_core = number }), null) placement = optional(object({ affinity = optional(string) availability_zone = optional(string) group_id = optional(string) group_name = optional(string) host_id = optional(string) host_resource_group_arn = optional(string) spread_domain = optional(string) tenancy = optional(string) partition_number = optional(number) }), null) runner_log_files = optional(list(object({ log_group_name = string prefix_log_group = bool file_path = string log_stream_name = string log_class = optional(string, "STANDARD") })), null) block_device_mappings = optional(list(object({ delete_on_termination = optional(bool, true) device_name = optional(string, "/dev/xvda") encrypted = optional(bool, true) iops = optional(number) kms_key_id = optional(string) snapshot_id = optional(string) throughput = optional(number) volume_size = number volume_type = optional(string, "gp3") })), [{ volume_size = 30 }]) pool_config = optional(list(object({ schedule_expression = string schedule_expression_timezone = optional(string) size = number })), []) job_retry = optional(object({ enable = optional(bool, false) delay_in_seconds = optional(number, 300) delay_backoff = optional(number, 2) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 30) max_attempts = optional(number, 1) }), {}) }) matcherConfig = object({ labelMatchers = list(list(string)) exactMatch = optional(bool, false) priority = optional(number, 999) }) redrive_build_queue = optional(object({ enabled = bool maxReceiveCount = number }), { enabled = false maxReceiveCount = null }) })) | n/a | yes | | [parameter\_store\_tags](#input\_parameter\_store\_tags) | Map of tags that will be added to all the SSM Parameter Store parameters created by the Lambda function. | `map(string)` | `{}` | no | | [pool\_lambda\_reserved\_concurrent\_executions](#input\_pool\_lambda\_reserved\_concurrent\_executions) | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | `number` | `1` | no | | [pool\_lambda\_timeout](#input\_pool\_lambda\_timeout) | Time out for the pool lambda in seconds. | `number` | `60` | no | diff --git a/modules/multi-runner/runners.tf b/modules/multi-runner/runners.tf index 59b6307aa0..32251eb1bc 100644 --- a/modules/multi-runner/runners.tf +++ b/modules/multi-runner/runners.tf @@ -35,6 +35,7 @@ module "runners" { scale_errors = each.value.runner_config.scale_errors enable_organization_runners = each.value.runner_config.enable_organization_runners enable_ephemeral_runners = each.value.runner_config.enable_ephemeral_runners + enable_dynamic_labels = var.enable_dynamic_labels enable_jit_config = each.value.runner_config.enable_jit_config enable_job_queued_check = each.value.runner_config.enable_job_queued_check disable_runner_autoupdate = each.value.runner_config.disable_runner_autoupdate diff --git a/modules/multi-runner/variables.tf b/modules/multi-runner/variables.tf index 613cf8b2ce..2a2e9bef4c 100644 --- a/modules/multi-runner/variables.tf +++ b/modules/multi-runner/variables.tf @@ -207,7 +207,8 @@ variable "multi_runner_config" { disable_runner_autoupdate: "Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the [GitHub article](https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/)" ebs_optimized: "The EC2 EBS optimized configuration." enable_ephemeral_runners: "Enable ephemeral runners, runners will only be used once." - enable_job_queued_check: "Enables JIT configuration for creating runners instead of registration token based registraton. JIT configuration will only be applied for ephemeral runners. By default JIT configuration is enabled for ephemeral runners an can be disabled via this override. When running on GHES without support for JIT configuration this variable should be set to true for ephemeral runners." + enable_dynamic_labels: "Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners." + enable_job_queued_check: Enables JIT configuration for creating runners instead of registration token based registraton. JIT configuration will only be applied for ephemeral runners. By default JIT configuration is enabled for ephemeral runners an can be disabled via this override. When running on GHES without support for JIT configuration this variable should be set to true for ephemeral runners." enable_on_demand_failover_for_errors: "Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to `InsufficientInstanceCapacity`. When not defined the default behavior is to retry later." scale_errors: "List of aws error codes that should trigger retry during scale up. This list will replace the default errors defined in the variable `defaultScaleErrors` in https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/lambdas/functions/control-plane/src/aws/runners.ts" enable_organization_runners: "Register runners to organization, instead of repo level" @@ -770,3 +771,9 @@ variable "parameter_store_tags" { type = map(string) default = {} } + +variable "enable_dynamic_labels" { + description = "Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners." + type = bool + default = false +} diff --git a/modules/multi-runner/webhook.tf b/modules/multi-runner/webhook.tf index 900040c609..df1fd63a50 100644 --- a/modules/multi-runner/webhook.tf +++ b/modules/multi-runner/webhook.tf @@ -39,5 +39,7 @@ module "webhook" { lambda_security_group_ids = var.lambda_security_group_ids aws_partition = var.aws_partition + enable_dynamic_labels = var.enable_dynamic_labels + log_level = var.log_level } diff --git a/modules/runners/README.md b/modules/runners/README.md index 6a27276624..1f3495be07 100644 --- a/modules/runners/README.md +++ b/modules/runners/README.md @@ -149,6 +149,7 @@ yarn run dist | [ebs\_optimized](#input\_ebs\_optimized) | The EC2 EBS optimized configuration. | `bool` | `false` | no | | [egress\_rules](#input\_egress\_rules) | List of egress rules for the GitHub runner instances. | list(object({ cidr_blocks = list(string) ipv6_cidr_blocks = list(string) prefix_list_ids = list(string) from_port = number protocol = string security_groups = list(string) self = bool to_port = number description = string })) | [ { "cidr_blocks": [ "0.0.0.0/0" ], "description": null, "from_port": 0, "ipv6_cidr_blocks": [ "::/0" ], "prefix_list_ids": null, "protocol": "-1", "security_groups": null, "self": null, "to_port": 0 }] | no | | [enable\_cloudwatch\_agent](#input\_enable\_cloudwatch\_agent) | Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`. | `bool` | `true` | no | +| [enable\_dynamic\_labels](#input\_enable\_dynamic\_labels) | Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners. | `bool` | `false` | no | | [enable\_ephemeral\_runners](#input\_enable\_ephemeral\_runners) | Enable ephemeral runners, runners will only be used once. | `bool` | `false` | no | | [enable\_jit\_config](#input\_enable\_jit\_config) | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is available. In case you are upgrading from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI. | `bool` | `null` | no | | [enable\_job\_queued\_check](#input\_enable\_job\_queued\_check) | Only scale if the job event received by the scale up lambda is is in the state queued. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | `bool` | `null` | no | diff --git a/modules/runners/scale-up.tf b/modules/runners/scale-up.tf index c5503f6394..d360520965 100644 --- a/modules/runners/scale-up.tf +++ b/modules/runners/scale-up.tf @@ -28,6 +28,7 @@ resource "aws_lambda_function" "scale_up" { AMI_ID_SSM_PARAMETER_NAME = local.ami_id_ssm_parameter_name DISABLE_RUNNER_AUTOUPDATE = var.disable_runner_autoupdate ENABLE_EPHEMERAL_RUNNERS = var.enable_ephemeral_runners + ENABLE_DYNAMIC_LABELS = var.enable_dynamic_labels ENABLE_JIT_CONFIG = var.enable_jit_config ENABLE_JOB_QUEUED_CHECK = local.enable_job_queued_check ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = var.metrics.enable && var.metrics.metric.enable_github_app_rate_limit diff --git a/modules/runners/variables.tf b/modules/runners/variables.tf index e2a33280b9..44d8b73bc8 100644 --- a/modules/runners/variables.tf +++ b/modules/runners/variables.tf @@ -532,6 +532,12 @@ variable "enable_ephemeral_runners" { default = false } +variable "enable_dynamic_labels" { + description = "Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners." + type = bool + default = false +} + variable "enable_job_queued_check" { description = "Only scale if the job event received by the scale up lambda is is in the state queued. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior." type = bool diff --git a/modules/webhook/README.md b/modules/webhook/README.md index 4c496d8775..5619f4e70a 100644 --- a/modules/webhook/README.md +++ b/modules/webhook/README.md @@ -67,6 +67,7 @@ yarn run dist | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| | [aws\_partition](#input\_aws\_partition) | (optional) partition for the base arn if not 'aws' | `string` | `"aws"` | no | +| [enable\_dynamic\_labels](#input\_enable\_dynamic\_labels) | Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners. | `bool` | `false` | no | | [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling. `enable`: Enable the EventBridge feature. `accept_events`: List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events. | object({ enable = optional(bool, false) accept_events = optional(list(string), null) }) | n/a | yes | | [github\_app\_parameters](#input\_github\_app\_parameters) | Parameter Store for GitHub App Parameters. | object({ webhook_secret = map(string) }) | n/a | yes | | [kms\_key\_arn](#input\_kms\_key\_arn) | Optional CMK Key ARN to be used for Parameter Store. | `string` | `null` | no | diff --git a/modules/webhook/direct/README.md b/modules/webhook/direct/README.md index 55ca0473da..984e2fa88a 100644 --- a/modules/webhook/direct/README.md +++ b/modules/webhook/direct/README.md @@ -40,7 +40,7 @@ No modules. | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| -| [config](#input\_config) | Configuration object for all variables. | object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) }) | n/a | yes | +| [config](#input\_config) | Configuration object for all variables. | object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) enable_dynamic_labels = optional(bool, false) }) | n/a | yes | ## Outputs diff --git a/modules/webhook/direct/variables.tf b/modules/webhook/direct/variables.tf index 4c4088eb1b..2283f37eb7 100644 --- a/modules/webhook/direct/variables.tf +++ b/modules/webhook/direct/variables.tf @@ -47,5 +47,6 @@ variable "config" { arn = string version = string })) + enable_dynamic_labels = optional(bool, false) }) } diff --git a/modules/webhook/direct/webhook.tf b/modules/webhook/direct/webhook.tf index 912829019a..91d41e4765 100644 --- a/modules/webhook/direct/webhook.tf +++ b/modules/webhook/direct/webhook.tf @@ -19,6 +19,7 @@ resource "aws_lambda_function" "webhook" { environment { variables = { for k, v in { + ENABLE_DYNAMIC_LABELS = var.config.enable_dynamic_labels LOG_LEVEL = var.config.log_level POWERTOOLS_LOGGER_LOG_EVENT = var.config.log_level == "debug" ? "true" : "false" POWERTOOLS_TRACE_ENABLED = var.config.tracing_config.mode != null ? true : false diff --git a/modules/webhook/eventbridge/README.md b/modules/webhook/eventbridge/README.md index fa6fa9b7f3..4929bfb935 100644 --- a/modules/webhook/eventbridge/README.md +++ b/modules/webhook/eventbridge/README.md @@ -54,7 +54,7 @@ No modules. | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| -| [config](#input\_config) | Configuration object for all variables. | object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) accept_events = optional(list(string), null) }) | n/a | yes | +| [config](#input\_config) | Configuration object for all variables. | object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) accept_events = optional(list(string), null) enable_dynamic_labels = optional(bool, false) }) | n/a | yes | ## Outputs diff --git a/modules/webhook/eventbridge/dispatcher.tf b/modules/webhook/eventbridge/dispatcher.tf index f199e129e9..98fbb893ee 100644 --- a/modules/webhook/eventbridge/dispatcher.tf +++ b/modules/webhook/eventbridge/dispatcher.tf @@ -37,6 +37,7 @@ resource "aws_lambda_function" "dispatcher" { environment { variables = { for k, v in { + ENABLE_DYNAMIC_LABELS = var.config.enable_dynamic_labels LOG_LEVEL = var.config.log_level POWERTOOLS_LOGGER_LOG_EVENT = var.config.log_level == "debug" ? "true" : "false" POWERTOOLS_SERVICE_NAME = "${var.config.prefix}-dispatcher" diff --git a/modules/webhook/eventbridge/variables.tf b/modules/webhook/eventbridge/variables.tf index 907523d67d..9f9ab7ba56 100644 --- a/modules/webhook/eventbridge/variables.tf +++ b/modules/webhook/eventbridge/variables.tf @@ -47,6 +47,7 @@ variable "config" { arn = string version = string })) - accept_events = optional(list(string), null) + accept_events = optional(list(string), null) + enable_dynamic_labels = optional(bool, false) }) } diff --git a/modules/webhook/variables.tf b/modules/webhook/variables.tf index a7b8f8173e..bf50ceeb41 100644 --- a/modules/webhook/variables.tf +++ b/modules/webhook/variables.tf @@ -225,3 +225,10 @@ EOF accept_events = optional(list(string), null) }) } + +variable "enable_dynamic_labels" { + description = "Experimental! Can be removed / changed without trigger a major release. Enable dynamic labels with 'ghr-' prefix. When enabled, jobs can use 'ghr-ec2-:' labels to dynamically configure EC2 instances (e.g., 'ghr-ec2-instance-type:t3.large') and 'ghr-run-' to add unique labels dynamically to runners." + type = bool + default = false +} + diff --git a/modules/webhook/webhook.tf b/modules/webhook/webhook.tf index 0516a98c21..6c8fe88c97 100644 --- a/modules/webhook/webhook.tf +++ b/modules/webhook/webhook.tf @@ -86,6 +86,7 @@ module "direct" { version = p.version } ] + enable_dynamic_labels = var.enable_dynamic_labels } } @@ -128,7 +129,8 @@ module "eventbridge" { version = p.version } ] - accept_events = var.eventbridge.accept_events + accept_events = var.eventbridge.accept_events + enable_dynamic_labels = var.enable_dynamic_labels } } diff --git a/variables.tf b/variables.tf index d739e916fb..987a488f2b 100644 --- a/variables.tf +++ b/variables.tf @@ -673,6 +673,12 @@ variable "enable_ephemeral_runners" { default = false } +variable "enable_dynamic_labels" { + description = "Experimental! Can be removed / changed without trigger a major release. Enable dynamic EC2 configs based on workflow job labels. When enabled, jobs can request specific configs via the 'gh-ec2-:' label (e.g., 'gh-ec2-instance-type:t3.large')." + type = bool + default = false +} + variable "enable_job_queued_check" { description = "Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior." type = bool
object({ enable = optional(bool, true) accept_events = optional(list(string), []) })
object({ enable = optional(bool, false) namespace = optional(string, "GitHub Runners") metric = optional(object({ enable_github_app_rate_limit = optional(bool, true) enable_job_retry = optional(bool, true) enable_spot_termination_warning = optional(bool, true) }), {}) })
map(object({ runner_config = object({ runner_os = string runner_architecture = string runner_metadata_options = optional(map(any), { instance_metadata_tags = "enabled" http_endpoint = "enabled" http_tokens = "required" http_put_response_hop_limit = 1 }) ami = optional(object({ filter = optional(map(list(string)), { state = ["available"] }) owners = optional(list(string), ["amazon"]) id_ssm_parameter_arn = optional(string, null) kms_key_arn = optional(string, null) }), null) create_service_linked_role_spot = optional(bool, false) credit_specification = optional(string, null) delay_webhook_event = optional(number, 30) disable_runner_autoupdate = optional(bool, false) ebs_optimized = optional(bool, false) enable_ephemeral_runners = optional(bool, false) enable_job_queued_check = optional(bool, null) enable_on_demand_failover_for_errors = optional(list(string), []) scale_errors = optional(list(string), [ "UnfulfillableCapacity", "MaxSpotInstanceCountExceeded", "TargetCapacityLimitExceededException", "RequestLimitExceeded", "ResourceLimitExceeded", "MaxSpotInstanceCountExceeded", "MaxSpotFleetRequestCountExceeded", "InsufficientInstanceCapacity", "InsufficientCapacityOnHost", ]) enable_organization_runners = optional(bool, false) enable_runner_binaries_syncer = optional(bool, true) enable_ssm_on_runners = optional(bool, false) enable_userdata = optional(bool, true) instance_allocation_strategy = optional(string, "lowest-price") instance_max_spot_price = optional(string, null) instance_target_capacity_type = optional(string, "spot") instance_types = list(string) job_queue_retention_in_seconds = optional(number, 86400) minimum_running_time_in_minutes = optional(number, null) pool_runner_owner = optional(string, null) runner_as_root = optional(bool, false) runner_boot_time_in_minutes = optional(number, 5) runner_disable_default_labels = optional(bool, false) runner_extra_labels = optional(list(string), []) runner_group_name = optional(string, "Default") runner_name_prefix = optional(string, "") runner_run_as = optional(string, "ec2-user") runners_maximum_count = number runner_additional_security_group_ids = optional(list(string), []) scale_down_schedule_expression = optional(string, "cron(*/5 * * * ? *)") scale_up_reserved_concurrent_executions = optional(number, 1) userdata_template = optional(string, null) userdata_content = optional(string, null) enable_jit_config = optional(bool, null) enable_runner_detailed_monitoring = optional(bool, false) enable_cloudwatch_agent = optional(bool, true) cloudwatch_config = optional(string, null) userdata_pre_install = optional(string, "") userdata_post_install = optional(string, "") runner_hook_job_started = optional(string, "") runner_hook_job_completed = optional(string, "") runner_ec2_tags = optional(map(string), {}) runner_iam_role_managed_policy_arns = optional(list(string), []) vpc_id = optional(string, null) subnet_ids = optional(list(string), null) idle_config = optional(list(object({ cron = string timeZone = string idleCount = number evictionStrategy = optional(string, "oldest_first") })), []) cpu_options = optional(object({ core_count = number threads_per_core = number }), null) placement = optional(object({ affinity = optional(string) availability_zone = optional(string) group_id = optional(string) group_name = optional(string) host_id = optional(string) host_resource_group_arn = optional(string) spread_domain = optional(string) tenancy = optional(string) partition_number = optional(number) }), null) runner_log_files = optional(list(object({ log_group_name = string prefix_log_group = bool file_path = string log_stream_name = string log_class = optional(string, "STANDARD") })), null) block_device_mappings = optional(list(object({ delete_on_termination = optional(bool, true) device_name = optional(string, "/dev/xvda") encrypted = optional(bool, true) iops = optional(number) kms_key_id = optional(string) snapshot_id = optional(string) throughput = optional(number) volume_size = number volume_type = optional(string, "gp3") })), [{ volume_size = 30 }]) pool_config = optional(list(object({ schedule_expression = string schedule_expression_timezone = optional(string) size = number })), []) job_retry = optional(object({ enable = optional(bool, false) delay_in_seconds = optional(number, 300) delay_backoff = optional(number, 2) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 30) max_attempts = optional(number, 1) }), {}) }) matcherConfig = object({ labelMatchers = list(list(string)) exactMatch = optional(bool, false) priority = optional(number, 999) }) redrive_build_queue = optional(object({ enabled = bool maxReceiveCount = number }), { enabled = false maxReceiveCount = null }) }))
list(object({ cidr_blocks = list(string) ipv6_cidr_blocks = list(string) prefix_list_ids = list(string) from_port = number protocol = string security_groups = list(string) self = bool to_port = number description = string }))
[ { "cidr_blocks": [ "0.0.0.0/0" ], "description": null, "from_port": 0, "ipv6_cidr_blocks": [ "::/0" ], "prefix_list_ids": null, "protocol": "-1", "security_groups": null, "self": null, "to_port": 0 }]
object({ enable = optional(bool, false) accept_events = optional(list(string), null) })
object({ webhook_secret = map(string) })
object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) })
object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) enable_dynamic_labels = optional(bool, false) })
object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) accept_events = optional(list(string), null) })
object({ prefix = string archive = optional(object({ enable = optional(bool, true) retention_days = optional(number, 7) }), {}) tags = optional(map(string), {}) lambda_subnet_ids = optional(list(string), []) lambda_security_group_ids = optional(list(string), []) sqs_job_queues_arns = list(string) lambda_zip = optional(string, null) lambda_memory_size = optional(number, 256) lambda_timeout = optional(number, 10) role_permissions_boundary = optional(string, null) role_path = optional(string, null) logging_retention_in_days = optional(number, 180) logging_kms_key_id = optional(string, null) log_class = optional(string, "STANDARD") lambda_s3_bucket = optional(string, null) lambda_s3_key = optional(string, null) lambda_s3_object_version = optional(string, null) lambda_apigateway_access_log_settings = optional(object({ destination_arn = string format = string }), null) repository_white_list = optional(list(string), []) kms_key_arn = optional(string, null) log_level = optional(string, "info") lambda_runtime = optional(string, "nodejs24.x") aws_partition = optional(string, "aws") lambda_architecture = optional(string, "arm64") github_app_parameters = object({ webhook_secret = map(string) }) tracing_config = optional(object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) }), {}) lambda_tags = optional(map(string), {}) api_gw_source_arn = string ssm_parameter_runner_matcher_config = list(object({ name = string arn = string version = string })) accept_events = optional(list(string), null) enable_dynamic_labels = optional(bool, false) })