Dataset cache name doesn't seem to include the tables used when cache_dir is set.

Tldr; 

```
if self._cache_dir is None:
    # Only creates UUID with tables if NO cache_dir provided
    id_str = json.dumps({
        "root": self.root,
        "tables": sorted(self.tables),  # <-- Only used here
        "dataset_name": self.dataset_name,
        "dev": self.dev,
    })
    cache_dir = Path(...) / str(uuid.uuid5(uuid.NAMESPACE_DNS, id_str))
else:
    # If cache_dir IS provided explicitly, just use it as-is
    cache_dir = Path(self._cache_dir)
    cache_dir.mkdir(parents=True, exist_ok=True)

```
When user specifies the cache_dir, uuid does not include the self.tables nor the self.dev or self.dataset in how the cache is being defined, which can potentially lead to downstream confusions in dataset initialization on why certain tasks fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset cache name doesn't seem to include the tables used when cache_dir is set. #833

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset cache name doesn't seem to include the tables used when cache_dir is set. #833

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions