Propagate PERF_ENV_VARS and overrides to SlurmExecutor container_env#15167
Propagate PERF_ENV_VARS and overrides to SlurmExecutor container_env#15167sudostock wants to merge 1 commit intoNVIDIA-NeMo:llmb-nemo-r2.5.0from
Conversation
| template_vars={"pre_cmds": " ; ".join(custom_bash_cmds)}, | ||
| ) | ||
|
|
||
| all_container_env = list(set(PERF_ENV_VARS) | set(container_env)) |
There was a problem hiding this comment.
| all_container_env = list(set(PERF_ENV_VARS) | set(container_env)) | |
| # In case the same key is present in both container_env and in PERF_ENV_VARS the final list will | |
| # have multiple entries for the same env var key. To solve this, environments need to be merged with a method like below. | |
| # e.g., PERF_ENV_VARS = ["PATH=/usr/bin", "DEBUG=1"], container_env = ["PATH=/home/bin", "USER=alice"] and there would be multiple PATH=xxx entries in the all_container_env | |
| def merge_env_vars(base_env, override_env): | |
| """Merge environment variables, with override_env taking precedence.""" | |
| env_dict = {} | |
| # Parse base environment (defaults) | |
| for var in base_env: | |
| if '=' in var: | |
| key, value = var.split('=', 1) | |
| env_dict[key] = value | |
| # Override with PERF_ENV_VARS | |
| for var in override_env: | |
| if '=' in var: | |
| key, value = var.split('=', 1) | |
| env_dict[key] = value | |
| return [f"{k}={v}" for k, v in env_dict.items()] | |
| # container_env = defaults, PERF_ENV_VARS = overrides/additions | |
| all_container_env = list(set(container_env) | set(PERF_ENV_VARS)) |
also, if container_env is original set of environment variables and additional variables are provided via PERF_ENV_VARS, then the order should be swapped to ensure PERF_ENV_VARS can change original values.
There was a problem hiding this comment.
I think there's a slight misunderstanding: container_env is a list not a dict. The --container-env flag it sets on the slurm side only takes a list of variable names that the container should override from the current env if they are already set in the container.
all_container_env = list(set(PERF_ENV_VARS) | set(container_env)) this handles the dedup of key names. set(dict) == set(dict.keys()) Then we do the intersection of both sets which gives a set as the final data type. There are no duplicates.
|
Closing this PR as "won't merge". |
Expose the container_env parameter of SlurmExecutor, explicitly set container_env for any environment variable being set. If you're setting performance variables for your run you expect them to be set in the container.
The pyxis/enroot defaults are to set all variables in the user environment in the container environment, except when that variable already exists in the container. Then it won't overwrite it unless '--container-env=VAR' is set.