Skip to content

Slow query: file lookup with subquery #5880

@bjester

Description

@bjester

This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.

Target branch: hotfixes

Observed behavior

The following query was observed performing poorly:

SELECT
    "contentcuration_file"."original_filename",
    "contentcuration_file"."file_size",
    "contentcuration_file"."checksum",
    "contentcuration_file"."file_format_id",
    "contentcuration_language"."readable_name",
    "contentcuration_contentnode"."title",
    T6."readable_name",
    "contentcuration_license"."license_name",
    "contentcuration_contentnode"."kind_id",
    "contentcuration_contentnode"."description",
    "contentcuration_contentnode"."author",
    "contentcuration_contentnode"."provider",
    "contentcuration_contentnode"."aggregator",
    "contentcuration_contentnode"."license_description",
    "contentcuration_contentnode"."copyright_holder",
    (
        SELECT U0."name"
        FROM "contentcuration_channel" U0
            LEFT OUTER JOIN "contentcuration_contentnode" U1 ON (U0."main_tree_id" = U1."id")
            LEFT OUTER JOIN "contentcuration_contentnode" U2 ON (U0."trash_tree_id" = U2."id")
        WHERE (U1."tree_id" = "contentcuration_contentnode"."tree_id" OR
            U2."tree_id" = "contentcuration_contentnode"."tree_id")
        LIMIT 1
    ) AS "channel_name"
FROM "contentcuration_file"
    LEFT OUTER JOIN "contentcuration_contentnode"
        ON ("contentcuration_file"."contentnode_id" = "contentcuration_contentnode"."id")
    LEFT OUTER JOIN "contentcuration_language"
        ON ("contentcuration_file"."language_id" = "contentcuration_language"."id")
    LEFT OUTER JOIN "contentcuration_language" T6
        ON ("contentcuration_contentnode"."language_id" = T6."id")
    LEFT OUTER JOIN "contentcuration_license"
        ON ("contentcuration_contentnode"."license_id" = "contentcuration_license"."id")
WHERE "contentcuration_file"."uploaded_by_id" = <REDACTED>

Expected behavior

The select subquery and main table joins are not ideal considering it joins against content nodes in order to get a channel name. It would be better to restructure this query to use CTEs to select data from the file table, then performs join to produce the final intended result.

User-facing consequences

TBD

Steps to reproduce

It is not immediately clear where this query originates. Some investigation will need to be performed to find it and optimized the source of this query.

Context

Production

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions