Skip to content

Codegen emits 8 duplicate Extends* classes in capability.py #36

@pjordan

Description

@pjordan

Summary

The generated src/ucp_sdk/models/schemas/capability.py contains 8 nearly-identical Extends* classes (Extends, Extends2, Extends4, Extends6 are structurally identical strings; Extends1, Extends3, Extends5, Extends7 are structurally identical lists; their *Item siblings are also duplicated). This was flagged by the Gemini code review on #28:

The duplication harms readability and maintainability of the generated SDK surface.

Root cause

ucp/source/schemas/capability.json defines extends as an inline oneOf: [string, array<string>] inside $defs.base. preprocess_schemas.py::merge_all_of_to_node resolves the local $ref with copy.deepcopy and inlines the block into each of the 4 consumers (Base, PlatformSchema, BusinessSchema, ResponseSchema). distribute_properties_to_branches then propagates the oneOf into each variant.

By the time datamodel-codegen runs, the extends oneOf appears inline ~4× across the schema. datamodel-codegen mints a numbered class per anonymous inline subschema and never deduplicates structurally identical types unless told to — hence 4 string variants + 4 list variants (with companion *Item classes) = 11 generated symbols where there should be 1–2.

Proposed fixes

Quick win (recommended first)

Add --reuse-model to the datamodel-codegen invocation in generate_models.sh (~line 73). This is the official flag designed exactly for collapsing structurally identical types and applies repo-wide, so it will also dedup any other accidental duplicates created by the allOf flattening pass.

Architectural follow-up

Add a hoist_duplicate_subschemas() pass to preprocess_schemas.py, called after merge_all_of_to_node, that:

  1. Walks the merged schema and hashes anonymous subschemas.
  2. Lifts any subschema appearing >=2 times into a top-level $defs entry.
  3. Replaces consumers with $ref pointers.

This stops emitting redundant inline schemas at the source rather than relying on codegen heuristics.

Verification

./generate_models.sh
grep -c "class Extends" src/ucp_sdk/models/schemas/capability.py

Should drop from 11 to 1-2.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions