Skip to content

ci(prow): migrate docs merged update postsubmits#4592

Open
wuhuizuo wants to merge 4 commits into
mainfrom
agent/coder/a41f455b
Open

ci(prow): migrate docs merged update postsubmits#4592
wuhuizuo wants to merge 4 commits into
mainfrom
agent/coder/a41f455b

Conversation

@wuhuizuo
Copy link
Copy Markdown
Contributor

Summary

  • migrate pingcap/docs/merged_update_docs and pingcap/docs-cn/merged_update_docs_cn to agent: kubernetes + cluster: default postsubmits
  • keep GitHub contexts docs/merged-ci and docs-cn/merged-ci while replacing the legacy Jenkins Groovy/pod-template flow with inline container specs
  • drop the hub.pingcap.net/jenkins/docs-cn-checker dependency in favor of a public pandoc/extra:latest-ubuntu runtime that installs python3, rclone, and fonts-wqy-microhei

Validation

  • .ci/update-prow-job-kustomization.sh
  • checkconfig against flattened prow-jobs/
  • flux build kustomization prow-jobs --path=prow-jobs --kustomization-file=../ee-ops/apps/gcp/prow/pre/prow-jobs.yaml --dry-run
  • runtime sanity check with docker run --rm --entrypoint bash pandoc/extra:latest-ubuntu ...

Notes

  • the new jobs read docs-cn-tencent-ak and docs-cn-tencent-bn from the docs-cn Kubernetes secret; the matching ExternalSecret still needs to exist in Prow post/job-ns before first execution

Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary

This PR replaces legacy Jenkins-based postsubmit jobs for documentation updates with Kubernetes-based Prow jobs. The migration involves switching to inline container specifications, adding new runtime dependencies (python3, rclone, and fonts), and using a public Docker image (pandoc/extra:latest-ubuntu). The changes are well-structured and adhere to Prow configuration conventions, but some areas need attention for robustness and maintainability.


Critical Issues

  1. Error Handling in PDF Generation Scripts

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml

    • Lines: args: section

    • Problem: The merge_by_toc.py and generate_pdf.sh scripts lack robust error handling. If these scripts fail, the build will terminate without providing actionable feedback or cleanup.

    • Solution: Add error trapping and logging to these scripts using trap in Bash or explicit error messages.

      trap 'echo "Error occurred during PDF generation"; exit 1' ERR
      python3 scripts/merge_by_toc.py || { echo "Failed to merge by TOC"; exit 1; }
      scripts/generate_pdf.sh || { echo "Failed to generate PDF"; exit 1; }
  2. Missing Secrets Validation

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml

    • Lines: env: section

    • Problem: The environment variables TENCENTCLOUD_RCLONE_CONN and TENCENTCLOUD_BUCKET_ID depend on Kubernetes secrets, but no fallback mechanism is provided if these secrets are unavailable.

    • Solution: Validate the presence of secrets explicitly before running the workflow and provide meaningful error messages if they are missing.

      if [[ -z "${TENCENTCLOUD_RCLONE_CONN}" || -z "${TENCENTCLOUD_BUCKET_ID}" ]]; then
        echo "Missing required secrets for cloud storage. Exiting."
        exit 1
      fi
  3. Potential Tab Character Check False Positives

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: if find . -name '*.md' -print0 | xargs -0 grep -P '\t'; then
    • Problem: The tab character check may incorrectly flag valid markdown files containing tab characters in code blocks or other contexts.
    • Solution: Scope the check to exclude code blocks or specific markdown sections using more precise grep patterns.

Code Improvements

  1. Runtime Dependency Optimization

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei
    • Problem: Installing dependencies at runtime increases job execution time.
    • Solution: Build a custom Docker image with pre-installed dependencies to speed up job execution.
  2. Cluster Resource Allocation

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: resources:
    • Problem: Both jobs request 12Gi of memory and 4 CPUs, which might be excessive for PDF generation tasks.
    • Solution: Benchmark the actual resource usage and adjust requests/limits accordingly to optimize cluster utilization.

Best Practices

  1. Improved Documentation for Secrets

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Problem: The usage of Kubernetes secrets (docs-cn-tencent-ak and docs-cn-tencent-bn) is not documented clearly.
    • Solution: Add inline comments or external documentation detailing the expected structure and purpose of these secrets.
  2. Testing Coverage for Migration

    • Problem: The PR does not mention automated tests to verify the functionality of the new Prow jobs.
    • Solution: Implement integration tests for the new workflow, ensuring the PDF generation and upload work as intended.
  3. Naming Consistency

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Problem: The job names (pingcap/docs-cn/merged_update_docs_cn, pingcap/docs/merged_update_docs) do not clearly distinguish between Jenkins and Prow implementations.
    • Solution: Add a suffix or prefix to indicate the Prow-based implementation, e.g., pingcap/docs-cn/merged_update_docs_cn_prow.

By addressing these issues, the PR will be more robust, maintainable, and aligned with best practices both for Kubernetes-based workflows and CI/CD systems.

Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary

This PR migrates documentation update postsubmit jobs from Jenkins to Prow with Kubernetes agents. It replaces Groovy-based pipelines and pod templates with inline container specifications in Prow configuration files. The legacy private container image dependency (hub.pingcap.net/jenkins/docs-cn-checker) is dropped in favor of the public pandoc/extra:latest-ubuntu image that installs required tools during runtime. The PR is well-structured but introduces a few critical issues related to security, error handling, and maintainability of the new configuration.


Critical Issues

  1. Security concern: Running containers as root (securityContext)

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Line: securityContext: runAsUser: 0
    • Issue: Running containers as the root user poses a security risk. If the container is compromised, the attacker may gain root access to the host.
    • Suggested Solution: Specify a non-root user for the container. If specific permissions are required, assign them to the user instead. For example:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
  2. Missing error handling for external dependencies (e.g., apt-get and rclone)

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: Within the args block, e.g., apt-get install and rclone copyto commands
    • Issue: Commands like apt-get install and rclone copyto are critical for the job but lack error handling for failures. An apt-get failure could result in missing dependencies, and rclone copyto could silently fail without notifying the user.
    • Suggested Solution: Add explicit error handling for these commands. For example:
      apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei || { echo "Dependency installation failed"; exit 1; }
      rclone copyto output.pdf "${dst}/tidb-${version}-zh-manual.pdf" || { echo "PDF upload failed"; exit 1; }
  3. Potential race condition with shared secrets

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: Environment variable configuration for TENCENTCLOUD_RCLONE_CONN and TENCENTCLOUD_BUCKET_ID.
    • Issue: Multiple concurrent jobs may access or overwrite shared secrets (docs-cn-tencent-ak and docs-cn-tencent-bn), leading to unpredictable behavior.
    • Suggested Solution: Use namespaced or job-specific secrets if possible, and ensure the secrets are not shared across jobs.

Code Improvements

  1. Decouple runtime dependency installation

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml

    • Lines: apt-get install block inside args.

    • Issue: Installing dependencies at runtime increases job execution time and introduces potential failure points.

    • Suggested Solution: Build and use a pre-configured container image that includes all required tools (Python, rclone, fonts). This reduces runtime variability and speeds up execution.

      Example Dockerfile for a custom image:

      FROM pandoc/extra:latest-ubuntu
      RUN apt-get update && apt-get install -y python3 rclone fonts-wqy-microhei && \
          rm -rf /var/lib/apt/lists/* && \
          fc-cache -fv >/dev/null || true

      Update job configuration to use the custom image:

      image: custom-pandoc-image:latest
  2. Refactor markdown validation logic

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: find . -name '*.md' -print0 | xargs -0 grep -P '\t'
    • Issue: The current approach is hard to read and may fail in edge cases (e.g., filenames with special characters).
    • Suggested Solution: Refactor this logic into a Python script or use simpler Bash constructs. Example:
      grep -rnP '\t' --include="*.md" . || { echo "Tabs detected in markdown files"; exit 1; }

Best Practices

  1. Container resource limits

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Lines: Resource requests and limits.
    • Issue: Requests and limits are hard-coded without justification. This could lead to inefficient resource utilization or job failures under load.
    • Suggested Solution: Monitor resource usage of these jobs to fine-tune memory and CPU allocation.
  2. Testing coverage gaps

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml, prow-jobs/pingcap/docs/docs-postsubmits.yaml
    • Issue: No mention of how the changes are tested beyond runtime sanity checks.
    • Suggested Solution: Add automated tests for the new configuration in a staging environment to validate job behavior for edge cases (e.g., missing secrets, invalid markdown files).

Notes

  • Ensure that the external secret (docs-cn) mentioned in the PR description exists before merging, as it is a prerequisite for job execution.
  • Consider documenting the migration process and rationale for future maintainability.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the postsubmit jobs for building and uploading documentation PDFs from Jenkins to Prow for the pingcap/docs and pingcap/docs-cn repositories. The reviewer recommends improving CI reliability and performance by baking system dependencies, such as Python, rclone, and fonts, into a custom Docker image instead of installing them at runtime via apt-get.

Comment on lines +28 to +32
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei
rm -rf /var/lib/apt/lists/*
fc-cache -fv >/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Installing system dependencies like python3, rclone, and fonts at runtime via apt-get increases CI job duration and introduces external dependencies on package mirrors, which can lead to intermittent failures. Per the general rules, these dependencies should be baked into a custom Docker image for better performance and reliability. Additionally, baking the image would allow the job to run without root privileges (runAsUser: 0).

References
  1. For better CI performance and reliability, bake dependencies into the Docker image instead of installing them at runtime in CI scripts.

Comment on lines +28 to +32
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei
rm -rf /var/lib/apt/lists/*
fc-cache -fv >/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Installing system dependencies like python3, rclone, and fonts at runtime via apt-get increases CI job duration and introduces external dependencies on package mirrors, which can lead to intermittent failures. Per the general rules, these dependencies should be baked into a custom Docker image for better performance and reliability. Additionally, baking the image would allow the job to run without root privileges (runAsUser: 0).

References
  1. For better CI performance and reliability, bake dependencies into the Docker image instead of installing them at runtime in CI scripts.

@wuhuizuo
Copy link
Copy Markdown
Contributor Author

/hold

Copy link
Copy Markdown
Contributor Author

@wuhuizuo wuhuizuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review pending local validation

@wuhuizuo
Copy link
Copy Markdown
Contributor Author

Blocking review point: this PR drops ^release-5.4$ from both prow-jobs/pingcap/docs/docs-postsubmits.yaml and prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml. main still runs merged_update_docs* for release-5.4, so merging this would silently stop merged PDF generation on that branch. Please either restore release-5.4 coverage or document that the branch is intentionally retired before re-requesting review.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wuhuizuo for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot removed the size/XL label May 15, 2026
Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary

This PR migrates postsubmit jobs from Jenkins to Prow with Kubernetes agents, replacing Groovy scripts and pod templates with inline container specifications. It also transitions from private dependencies to a public runtime (pandoc/extra:latest-ubuntu) and updates job definitions in Prow configuration files. The approach is clean and aligns well with modern CI/CD practices, but a few areas require attention for robustness and maintainability.


Critical Issues

  1. Missing Error Handling in PDF Generation

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and docs-postsubmits.yaml, lines where scripts/generate_pdf.sh is invoked.
    • Issue: If scripts/generate_pdf.sh fails, subsequent steps like rclone copyto will still be executed, potentially uploading incomplete or invalid files.
    • Suggestion: Add explicit error handling after PDF generation commands:
      scripts/generate_pdf.sh || { echo "PDF generation failed"; exit 1; }
  2. Hardcoded Image Versions

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and docs-postsubmits.yaml, container image references.
    • Issue: The image docs-merged-update-runtime@sha256:ce3647630176a08c... is hardcoded without a tag or version variable, making upgrades or rollbacks difficult.
    • Suggestion: Use a parameterized version or tag:
      image: us-docker.pkg.dev/pingcap-testing-account/hub/pingcap-qe/docs-merged-update-runtime:${VERSION_TAG}

Code Improvements

  1. Unsupported Markdown Tab Check

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and docs-postsubmits.yaml, lines checking for tab characters in markdown sources.
    • Issue: The check outputs a generic error message but doesn't specify which files contain tabs. This can make debugging tedious.
    • Suggestion: Modify the error message to include filenames:
      if find . -name '*.md' -print0 | xargs -0 grep -P '\t'; then
        echo "Found tab characters in markdown sources:" >&2
        find . -name '*.md' -print0 | xargs -0 grep -P '\t'
        exit 1
      fi
  2. Redundant Node Affinity Specification

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and docs-postsubmits.yaml, nodeAffinity section.
    • Issue: Node affinity is specified explicitly for amd64, which may not be future-proof if additional architectures need support.
    • Suggestion: Consider making this configurable or using kubernetes.io/os for broader compatibility:
      matchExpressions:
        - key: kubernetes.io/os
          operator: In
          values:
            - linux

Best Practices

  1. Missing Resource Limits for Python Processes

    • File: prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml and docs-postsubmits.yaml.
    • Issue: Python scripts (e.g., merge_by_toc.py) are run without specific resource constraints, which could lead to unpredictable behavior under high load.
    • Suggestion: Add resource limits for Python processes:
      python3 scripts/merge_by_toc.py --limit-memory=2G
  2. Documentation Gaps for Secrets

    • File: Secrets like docs-cn-tencent-ak and docs-cn-tencent-bn are referenced but their setup isn't documented in the PR description.
    • Issue: Lack of explanation about how these secrets should be created or updated may confuse new maintainers.
    • Suggestion: Add a README or update the PR description to include instructions for managing these secrets.
  3. Testing Coverage

    • Issue: The PR description mentions validation steps but doesn't specify automated tests for the PDF generation workflow.
    • Suggestion: Add unit tests or integration tests to verify scripts/merge_by_toc.py and scripts/generate_pdf.sh functionality, especially edge cases like missing TOC entries.

Conclusion

While the migration to Prow and Kubernetes is well-executed overall, addressing the above issues will significantly improve the reliability and maintainability of the pipeline. Focus on error handling, resource constraints, and documentation to ensure smooth execution and easy troubleshooting in production environments.

@ti-chi-bot ti-chi-bot Bot added the size/XXL label May 15, 2026
Copy link
Copy Markdown

@ti-chi-bot ti-chi-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary

This PR migrates two Jenkins-based postsubmit jobs (pingcap/docs/merged_update_docs and pingcap/docs-cn/merged_update_docs_cn) to Prow using Kubernetes agents. It replaces the legacy Jenkins pipeline and Groovy configurations with inline container specs, leveraging a public Docker image (pandoc/extra:latest-ubuntu) and Kubernetes secrets for credentials. The migration appears well-structured and adheres to Prow's standard configurations. The PR eliminates redundant Groovy pipeline files and pod templates, simplifying the CI/CD setup while maintaining functionality. Overall, the changes improve maintainability, but there are minor issues related to error handling and container configuration.


Critical Issues

  1. Error Handling in Markdown Validation

    • File: docs-cn-postsubmits.yaml: lines 55-57 and docs-postsubmits.yaml: lines 55-57
    • Issue: The find command checks for tab characters in markdown files but only reports the error without providing actionable context (e.g., file names with tabs). This can lead to debugging difficulties.
    • Suggested Solution: Modify the find command to include filenames in the output. Example:
      if find . -name '*.md' -print0 | xargs -0 grep -P '\t'; then
          echo "Found tab characters in markdown sources:" >&2
          find . -name '*.md' -print0 | xargs -0 grep -Pl '\t' >&2
          exit 1
      fi
  2. Ambiguity in Branch Handling

    • File: docs-cn-postsubmits.yaml: line 72 and docs-postsubmits.yaml: line 72
    • Issue: The case statement for branch identification relies on implicit patterns but does not validate unexpected branch formats. This could cause silent failures if a branch does not match any pattern.
    • Suggested Solution: Add a fallback default case to handle unexpected branches explicitly.
      *)
          echo "Error: Unexpected base ref: ${branch}" >&2
          exit 1
      ;;

Code Improvements

  1. Container Image Versioning

    • File: Dockerfile: line 1
    • Issue: The base image (pandoc/extra) is referenced by a SHA256 digest. While this ensures immutability, it may become outdated over time. A mechanism for updating the base image should be considered.
    • Suggested Solution: Introduce CI automation to periodically check for new versions of the image and update the digest.
  2. Resource Definition in Postsubmit Jobs

    • File: docs-cn-postsubmits.yaml: lines 93-95 and docs-postsubmits.yaml: lines 93-95
    • Issue: The resource limits (memory: 12Gi, cpu: "4") may be excessive for the PDF generation process and could lead to resource contention in shared Kubernetes clusters.
    • Suggested Solution: Profile the job's actual resource usage and adjust limits accordingly. For instance:
      resources:
        requests:
          memory: 4Gi
          cpu: "2"
        limits:
          memory: 8Gi
          cpu: "4"

Best Practices

  1. Testing Coverage for PDF Generation

    • Issue: No mention of automated tests for the scripts (merge_by_toc.py, generate_pdf.sh, etc.). These scripts are critical to the functionality of the pipeline.
    • Suggested Solution: Include unit tests for the scripts in the repository and integrate them into the CI pipeline.
  2. Documentation for Secrets Usage

    • File: docs-cn-postsubmits.yaml and docs-postsubmits.yaml
    • Issue: The use of Kubernetes secrets (docs-cn-tencent-ak and docs-cn-tencent-bn) is not documented clearly in the PR description or comments.
    • Suggested Solution: Add comments explaining the purpose of these secrets and ensure their values are documented securely for maintainers.
  3. Duplication in Branch Handling Logic

    • Files: docs-cn-postsubmits.yaml and docs-postsubmits.yaml
    • Issue: The branch handling logic is duplicated across both files.
    • Suggested Solution: Refactor the logic into a shared script that can be sourced by both configurations.

Summary of Changes Needed

  • Improve error handling in markdown validation and branch pattern matching.
  • Optimize resource limits for PDF generation containers.
  • Add automated testing for critical scripts and document secrets usage.
  • Refactor duplicated branch handling logic into reusable components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant