Skip to content

OpenLineage support #22944

@fm100

Description

@fm100

Is your feature request related to a problem or challenge?

OpenLineage has become a common standard for collecting lineage metadata from processing engines. DataFusion is increasingly used to build query engines, but each DataFusion-based project currently needs to implement lineage extraction independently. This leads to duplicated effort and inconsistent OpenLineage support.

Describe the solution you'd like

I would like DataFusion to expose OpenLineage support, either directly or through stable APIs/hooks that downstream engines can use.

Useful metadata to capture would include:

  • Resolved input and output datasets
  • Dataset schemas
  • Column-level lineage, where possible
  • Logical and/or physical plans, if appropriate
  • Query metadata such as query ID, status, timing, and errors

I do not have a strong preference on the implementation. A separate crate, feature flag, or stable lineage extraction API would all be reasonable options.

Describe alternatives you've considered

Each DataFusion-based engine could implement OpenLineage support independently by inspecting SQL, logical plans, or physical plans. However, this duplicates work, may depend on unstable internals, and can produce inconsistent lineage semantics.

Additional context

OpenLineage integration would make DataFusion more useful as a foundation for production query engines and data platforms, especially for projects that want lineage and observability support without building it from scratch.

I am not very familiar with the DataFusion codebase yet, but I would be happy to collaborate with the DataFusion community on the OpenLineage side and help shape the expected metadata/modeling requirements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions