-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workingcallback: model checkpointcheckpointingRelated to checkpointingRelated to checkpointingprofilerver: 2.5.x
Description
Bug description
When using the AdvancedProfiler to create and persist a profile of a training, it is not possible to have a ModelCheckpoint which monitors a variable with a slash in it, e.g. "val/JaccardIndex". The training run will result in a "No such file or directory" error.
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
profiler_dir.mkdir(parents=True, exist_ok=True)
profiler = AdvancedProfiler(dirpath=profiler_dir, filename="perf_logs", dump_stats=True)
checkpoint_callback = ModelCheckpoint(
filename="epoch={epoch}-step={step}-val_iou={val/JaccardIndex:.2f}",
auto_insert_metric_name=False,
verbose=True,
monitor="val/JaccardIndex",
mode="max",
save_last="link",
save_top_k=training_config.save_top_k,
)
L.Trainer(
callbacks=[checkpoint_callback],
profiler=profiler,
...
)Error messages and logs
It seems like the action_name of the ModelCheckpoint profiler is action_name = "[Callback]ModelCheckpoint{'monitor': 'val/JaccardIndex', 'mode': 'max', 'every_n"+73
FileNotFoundError: [Errno 2] No such file or directory: ".../profiler/fit-perf_logs-[Callback]ModelCheckpoint{'monitor': 'val/JaccardIndex', 'mode': 'max', 'every_n_train_steps': 0,'every_n_epochs': 1, 'train_time_interval': None}.setup.prof"
Environment
Current environment
- CUDA:
- GPU:
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- NVIDIA A100-SXM4-40GB
- available: True
- version: 12.1 - Lightning:
- lightning: 2.5.5
- lightning-utilities: 0.15.2
- pytorch-lightning: 2.5.5
- segmentation-models-pytorch: 0.5.0
- torch: 2.5.1+cu121
- torchmetrics: 1.8.2
- torchvision: 0.20.1+cu121 - Packages:
- affine: 2.4.0
- aiohappyeyeballs: 2.6.1
- aiohttp: 3.13.1
- aiohttp-cors: 0.8.1
- aiosignal: 1.4.0
- albucore: 0.0.24
- albumentations: 2.0.8
- alembic: 1.17.0
- annotated-types: 0.7.0
- appdirs: 1.4.4
- asttokens: 3.0.0
- attrs: 25.4.0
- autocommand: 2.2.2
- babel: 2.17.0
- backports.tarfile: 1.2.0
- backrefs: 5.9
- beautifulsoup4: 4.14.2
- bleach: 6.2.0
- bokeh: 3.8.0
- boto3: 1.40.55
- botocore: 1.40.55
- branca: 0.8.2
- cachetools: 6.2.1
- cairocffi: 1.7.1
- cairosvg: 2.8.2
- cartopy: 0.25.0
- certifi: 2025.10.5
- cffi: 2.0.0
- charset-normalizer: 3.4.4
- click: 8.2.1
- click-plugins: 1.1.1.2
- cligj: 0.7.2
- cloudpickle: 3.1.1
- cmdkit: 2.7.7
- colorama: 0.4.6
- colorcet: 3.1.0
- colorful: 0.5.7
- colorlog: 6.10.1
- comm: 0.2.3
- contourpy: 1.3.3
- cql2: 0.4.0
- crc32c: 2.8
- cssselect2: 0.8.0
- cucim-cu12: 25.10.0
- cuda-bindings: 13.0.2
- cuda-pathfinder: 1.3.1
- cuda-python: 13.0.2
- cupy-cuda12x: 14.0.0a1
- cupy-xarray: 0.1.4+14.g1c50016
- cycler: 0.12.1
- cyclopts: 3.24.0
- darts-acquisition: 0.1.0
- darts-ensemble: 0.1.0
- darts-export: 0.1.0
- darts-nextgen: 0.10.0.post13+bf81304
- darts-postprocessing: 0.1.0
- darts-preprocessing: 0.1.0
- darts-segmentation: 0.1.0
- darts-superresolution: 0.1.0
- darts-utils: 0.1.0
- dask: 2025.2.0
- datashader: 0.18.2
- debugpy: 1.8.17
- decorator: 5.2.1
- defusedxml: 0.7.1
- deprecated: 1.2.18
- distlib: 0.4.0
- distributed: 2025.2.0
- docstring-parser: 0.17.0
- docutils: 0.22.2
- donfig: 0.8.1.post1
- earthengine-api: 1.6.12
- executing: 2.2.1
- fastcore: 1.8.13
- fastjsonschema: 2.21.2
- fastrlock: 0.8.3
- filelock: 3.20.0
- folium: 0.20.0
- fonttools: 4.60.1
- frozenlist: 1.8.0
- fsspec: 2025.9.0
- geocube: 0.7.1
- geopandas: 1.1.1
- geoviews: 1.14.1
- ghp-import: 2.1.0
- gitdb: 4.0.12
- gitpython: 3.1.45
- google-api-core: 2.26.0
- google-api-python-client: 2.185.0
- google-auth: 2.41.1
- google-auth-httplib2: 0.2.0
- google-cloud-core: 2.4.3
- google-cloud-storage: 3.4.1
- google-crc32c: 1.7.1
- google-resumable-media: 2.7.2
- googleapis-common-protos: 1.71.0
- greenlet: 3.2.4
- griffe: 1.14.0
- grpcio: 1.75.1
- h5netcdf: 1.7.2
- h5py: 3.15.1
- hf-xet: 1.1.10
- holoviews: 1.21.0
- httplib2: 0.31.0
- huggingface-hub: 0.35.3
- hvplot: 0.12.1
- icechunk: 0.2.18
- idna: 3.11
- imageio: 2.37.0
- importlib-metadata: 8.7.0
- importlib-resources: 6.5.2
- inflect: 7.3.1
- iniconfig: 2.3.0
- ipykernel: 7.0.1
- ipython: 9.6.0
- ipython-pygments-lexers: 1.1.1
- ipywidgets: 8.1.7
- jaraco.collections: 5.1.0
- jaraco.context: 5.3.0
- jaraco.functools: 4.0.1
- jaraco.text: 3.12.1
- jedi: 0.19.2
- jinja2: 3.1.6
- jmespath: 1.0.1
- joblib: 1.5.2
- jsonschema: 4.25.1
- jsonschema-specifications: 2025.9.1
- jupyter-bokeh: 4.0.5
- jupyter-client: 8.6.3
- jupyter-core: 5.9.1
- jupyterlab-pygments: 0.3.0
- jupyterlab-widgets: 3.0.15
- kiwisolver: 1.4.9
- lazy-loader: 0.4
- lightning: 2.5.5
- lightning-utilities: 0.15.2
- linkify-it-py: 2.0.3
- llvmlite: 0.45.1
- locket: 1.0.0
- lovely-numpy: 0.2.16
- lovely-tensors: 0.1.19
- lz4: 4.4.4
- mako: 1.3.10
- mapclassify: 2.10.0
- markdown: 3.9
- markdown-it-py: 4.0.0
- markupsafe: 3.0.3
- matplotlib: 3.10.7
- matplotlib-inline: 0.1.7
- mdit-py-plugins: 0.5.0
- mdurl: 0.1.2
- mergedeep: 1.3.4
- mike: 2.1.3
- mistune: 3.1.4
- mkdocs: 1.6.1
- mkdocs-api-autonav: 0.4.0
- mkdocs-autorefs: 1.4.3
- mkdocs-get-deps: 0.2.0
- mkdocs-git-committers-plugin-2: 2.5.0
- mkdocs-git-revision-date-localized-plugin: 1.4.7
- mkdocs-glightbox: 0.5.1
- mkdocs-material: 9.6.22
- mkdocs-material-extensions: 1.3.1
- mkdocstrings: 0.30.1
- mkdocstrings-python: 1.18.2
- more-itertools: 10.3.0
- mpmath: 1.3.0
- msgpack: 1.1.2
- multidict: 6.7.0
- multipledispatch: 1.0.0
- names-generator: 0.2.0
- narwhals: 2.9.0
- nbclient: 0.10.2
- nbconvert: 7.16.6
- nbformat: 5.10.4
- nest-asyncio: 1.6.0
- networkx: 3.5
- nodeenv: 1.9.1
- numba: 0.62.1
- numcodecs: 0.15.1
- numpy: 2.3.4
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 9.1.0.70
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-nccl-cu12: 2.21.5
- nvidia-nvjitlink-cu12: 12.8.93
- nvidia-nvtx-cu12: 12.1.105
- odc-geo: 0.4.10
- odc-loader: 0.5.1
- odc-stac: 0.4.0
- opencensus: 0.11.4
- opencensus-context: 0.1.3
- opencv-python-headless: 4.11.0.86
- opentelemetry-api: 1.38.0
- opentelemetry-exporter-prometheus: 0.59b0
- opentelemetry-proto: 1.38.0
- opentelemetry-sdk: 1.38.0
- opentelemetry-semantic-conventions: 0.59b0
- optuna: 4.5.0
- packaging: 25.0
- paginate: 0.5.7
- pandas: 2.3.3
- pandocfilters: 1.5.1
- panel: 1.8.2
- param: 2.2.1
- parso: 0.8.5
- partd: 1.4.2
- pathspec: 0.12.1
- pexpect: 4.9.0
- pillow: 11.3.0
- platformdirs: 4.5.0
- pluggy: 1.6.0
- prometheus-client: 0.23.1
- prompt-toolkit: 3.0.52
- propcache: 0.4.1
- proto-plus: 1.26.1
- protobuf: 6.33.0
- psutil: 7.1.1
- psycopg2-binary: 2.9.11
- ptyprocess: 0.7.0
- pure-eval: 0.2.3
- py-spy: 0.4.1
- pyarrow: 21.0.0
- pyasn1: 0.6.1
- pyasn1-modules: 0.4.2
- pycparser: 2.23
- pyct: 0.6.0
- pydantic: 2.12.3
- pydantic-core: 2.41.4
- pygments: 2.19.2
- pymdown-extensions: 10.16.1
- pynvml: 11.4.1
- pyogrio: 0.11.1
- pypalettes: 0.2.1
- pyparsing: 3.2.5
- pyperclip: 1.11.0
- pyproj: 3.7.2
- pyright: 1.1.406
- pyshp: 3.0.2.post1
- pystac: 1.14.1
- pystac-client: 0.9.0
- pytest: 8.4.2
- python-box: 7.3.2
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.5.5
- pytz: 2025.2
- pyviz-comms: 3.0.6
- pyyaml: 6.0.3
- pyyaml-env-tag: 1.1
- pyzmq: 27.1.0
- rasterio: 1.4.3
- ray: 2.50.1
- referencing: 0.37.0
- requests: 2.32.5
- rich: 14.2.0
- rich-rst: 1.3.2
- rioxarray: 0.19.0
- rpds-py: 0.27.1
- rsa: 4.9.1
- ruff: 0.14.4
- s3transfer: 0.14.0
- safetensors: 0.6.2
- scikit-image: 0.25.2
- scikit-learn: 1.7.2
- scipy: 1.16.2
- seaborn: 0.13.2
- segmentation-models-pytorch: 0.5.0
- selectolax: 0.3.29
- sentry-sdk: 2.42.1
- setuptools: 80.9.0
- shapely: 2.1.2
- simsimd: 6.5.3
- six: 1.17.0
- smart-geocubes: 0.0.9
- smart-open: 7.4.0
- smmap: 5.0.2
- sortedcontainers: 2.4.0
- soupsieve: 2.8
- spyndex: 0.8.0
- sqlalchemy: 2.0.44
- stack-data: 0.6.3
- stopuhr: 0.0.10
- stringzilla: 4.2.1
- sympy: 1.13.1
- tblib: 3.1.0
- threadpoolctl: 3.6.0
- tifffile: 2025.10.16
- timm: 1.0.20
- tinycss2: 1.4.0
- toml: 0.10.2
- tomli: 2.0.1
- toolz: 1.1.0
- torch: 2.5.1+cu121
- torchmetrics: 1.8.2
- torchvision: 0.20.1+cu121
- tornado: 6.5.2
- tqdm: 4.67.1
- traitlets: 5.14.3
- triton: 3.1.0
- typeguard: 4.3.0
- typing-extensions: 4.15.0
- typing-inspection: 0.4.2
- tzdata: 2025.2
- uc-micro-py: 1.0.3
- ultraplot: 1.65.1
- uritemplate: 4.2.0
- urllib3: 2.5.0
- verspec: 0.1.0
- virtualenv: 20.35.3
- wandb: 0.22.2
- watchdog: 6.0.0
- wcwidth: 0.2.14
- webencodings: 0.5.1
- wheel: 0.45.1
- widgetsnbextension: 4.0.14
- wrapt: 1.17.3
- xarray: 2025.10.1
- xarray-spatial: 0.4.0
- xee: 0.0.22
- xpystac: 0.5.0
- xyzservices: 2025.4.0
- yarl: 1.22.0
- zarr: 3.0.10
- zict: 3.0.0
- zipp: 3.23.0 - System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.12.9
- release: 5.15.0-1076-nvidia
- version: make exp optional #77-Ubuntu SMP Tue Mar 25 23:43:36 UTC 2025
More info
It seems like having the configuration of the callback in the filename is desired behavior: #19703
However, maybe this should be done differently. Maybe by sanitizing the action_name or by doing something similar like ModelCheckpoint with auto_insert_metric_name.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcallback: model checkpointcheckpointingRelated to checkpointingRelated to checkpointingprofilerver: 2.5.x