Skip to content

Propagate PERF_ENV_VARS and overrides to SlurmExecutor container_env#15167

Closed
sudostock wants to merge 1 commit intoNVIDIA-NeMo:llmb-nemo-r2.5.0from
sudostock:add-container-env-llmb-2.5.0
Closed

Propagate PERF_ENV_VARS and overrides to SlurmExecutor container_env#15167
sudostock wants to merge 1 commit intoNVIDIA-NeMo:llmb-nemo-r2.5.0from
sudostock:add-container-env-llmb-2.5.0

Conversation

@sudostock
Copy link
Copy Markdown
Contributor

Expose the container_env parameter of SlurmExecutor, explicitly set container_env for any environment variable being set. If you're setting performance variables for your run you expect them to be set in the container.

The pyxis/enroot defaults are to set all variables in the user environment in the container environment, except when that variable already exists in the container. Then it won't overwrite it unless '--container-env=VAR' is set.

Copy link
Copy Markdown
Collaborator

@bdubauski bdubauski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, found a possible problem - see suggestion below.

template_vars={"pre_cmds": " ; ".join(custom_bash_cmds)},
)

all_container_env = list(set(PERF_ENV_VARS) | set(container_env))
Copy link
Copy Markdown
Collaborator

@bdubauski bdubauski Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all_container_env = list(set(PERF_ENV_VARS) | set(container_env))
# In case the same key is present in both container_env and in PERF_ENV_VARS the final list will
# have multiple entries for the same env var key. To solve this, environments need to be merged with a method like below.
# e.g., PERF_ENV_VARS = ["PATH=/usr/bin", "DEBUG=1"], container_env = ["PATH=/home/bin", "USER=alice"] and there would be multiple PATH=xxx entries in the all_container_env
def merge_env_vars(base_env, override_env):
"""Merge environment variables, with override_env taking precedence."""
env_dict = {}
# Parse base environment (defaults)
for var in base_env:
if '=' in var:
key, value = var.split('=', 1)
env_dict[key] = value
# Override with PERF_ENV_VARS
for var in override_env:
if '=' in var:
key, value = var.split('=', 1)
env_dict[key] = value
return [f"{k}={v}" for k, v in env_dict.items()]
# container_env = defaults, PERF_ENV_VARS = overrides/additions
all_container_env = list(set(container_env) | set(PERF_ENV_VARS))

also, if container_env is original set of environment variables and additional variables are provided via PERF_ENV_VARS, then the order should be swapped to ensure PERF_ENV_VARS can change original values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a slight misunderstanding: container_env is a list not a dict. The --container-env flag it sets on the slurm side only takes a list of variable names that the container should override from the current env if they are already set in the container.

all_container_env = list(set(PERF_ENV_VARS) | set(container_env)) this handles the dedup of key names. set(dict) == set(dict.keys()) Then we do the intersection of both sets which gives a set as the final data type. There are no duplicates.

@bdubauski bdubauski self-requested a review December 10, 2025 03:05
@pzelasko
Copy link
Copy Markdown
Collaborator

Closing this PR as "won't merge".
The following collections have been moved to separate repos in https://github.com/NVIDIA-NeMo organization: avlm, llm, multimodal, multimodal-autoregressive, vlm, speechlm, diffusion.
If you still wish to proceed with this contribution, please re-open it in the relevant repo.

@pzelasko pzelasko closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants