ci(prow): migrate docs merged update postsubmits#4592
Conversation
There was a problem hiding this comment.
I have already done a preliminary review for you, and I hope to help you do a better job.
Summary
This PR replaces legacy Jenkins-based postsubmit jobs for documentation updates with Kubernetes-based Prow jobs. The migration involves switching to inline container specifications, adding new runtime dependencies (python3, rclone, and fonts), and using a public Docker image (pandoc/extra:latest-ubuntu). The changes are well-structured and adhere to Prow configuration conventions, but some areas need attention for robustness and maintainability.
Critical Issues
-
Error Handling in PDF Generation Scripts
-
File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml -
Lines:
args:section -
Problem: The
merge_by_toc.pyandgenerate_pdf.shscripts lack robust error handling. If these scripts fail, the build will terminate without providing actionable feedback or cleanup. -
Solution: Add error trapping and logging to these scripts using
trapin Bash or explicit error messages.trap 'echo "Error occurred during PDF generation"; exit 1' ERR python3 scripts/merge_by_toc.py || { echo "Failed to merge by TOC"; exit 1; } scripts/generate_pdf.sh || { echo "Failed to generate PDF"; exit 1; }
-
-
Missing Secrets Validation
-
File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml -
Lines:
env:section -
Problem: The environment variables
TENCENTCLOUD_RCLONE_CONNandTENCENTCLOUD_BUCKET_IDdepend on Kubernetes secrets, but no fallback mechanism is provided if these secrets are unavailable. -
Solution: Validate the presence of secrets explicitly before running the workflow and provide meaningful error messages if they are missing.
if [[ -z "${TENCENTCLOUD_RCLONE_CONN}" || -z "${TENCENTCLOUD_BUCKET_ID}" ]]; then echo "Missing required secrets for cloud storage. Exiting." exit 1 fi
-
-
Potential Tab Character Check False Positives
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines:
if find . -name '*.md' -print0 | xargs -0 grep -P '\t'; then - Problem: The tab character check may incorrectly flag valid markdown files containing tab characters in code blocks or other contexts.
- Solution: Scope the check to exclude code blocks or specific markdown sections using more precise grep patterns.
- File:
Code Improvements
-
Runtime Dependency Optimization
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines:
apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei - Problem: Installing dependencies at runtime increases job execution time.
- Solution: Build a custom Docker image with pre-installed dependencies to speed up job execution.
- File:
-
Cluster Resource Allocation
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines:
resources: - Problem: Both jobs request
12Giof memory and4CPUs, which might be excessive for PDF generation tasks. - Solution: Benchmark the actual resource usage and adjust requests/limits accordingly to optimize cluster utilization.
- File:
Best Practices
-
Improved Documentation for Secrets
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml - Problem: The usage of Kubernetes secrets (
docs-cn-tencent-akanddocs-cn-tencent-bn) is not documented clearly. - Solution: Add inline comments or external documentation detailing the expected structure and purpose of these secrets.
- File:
-
Testing Coverage for Migration
- Problem: The PR does not mention automated tests to verify the functionality of the new Prow jobs.
- Solution: Implement integration tests for the new workflow, ensuring the PDF generation and upload work as intended.
-
Naming Consistency
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlandprow-jobs/pingcap/docs/docs-postsubmits.yaml - Problem: The job names (
pingcap/docs-cn/merged_update_docs_cn,pingcap/docs/merged_update_docs) do not clearly distinguish between Jenkins and Prow implementations. - Solution: Add a suffix or prefix to indicate the Prow-based implementation, e.g.,
pingcap/docs-cn/merged_update_docs_cn_prow.
- File:
By addressing these issues, the PR will be more robust, maintainable, and aligned with best practices both for Kubernetes-based workflows and CI/CD systems.
There was a problem hiding this comment.
I have already done a preliminary review for you, and I hope to help you do a better job.
Summary
This PR migrates documentation update postsubmit jobs from Jenkins to Prow with Kubernetes agents. It replaces Groovy-based pipelines and pod templates with inline container specifications in Prow configuration files. The legacy private container image dependency (hub.pingcap.net/jenkins/docs-cn-checker) is dropped in favor of the public pandoc/extra:latest-ubuntu image that installs required tools during runtime. The PR is well-structured but introduces a few critical issues related to security, error handling, and maintainability of the new configuration.
Critical Issues
-
Security concern: Running containers as root (securityContext)
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml - Line:
securityContext: runAsUser: 0 - Issue: Running containers as the root user poses a security risk. If the container is compromised, the attacker may gain root access to the host.
- Suggested Solution: Specify a non-root user for the container. If specific permissions are required, assign them to the user instead. For example:
securityContext: runAsUser: 1000 runAsGroup: 1000
- File:
-
Missing error handling for external dependencies (e.g.,
apt-getandrclone)- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines: Within the
argsblock, e.g.,apt-get installandrclone copytocommands - Issue: Commands like
apt-get installandrclone copytoare critical for the job but lack error handling for failures. Anapt-getfailure could result in missing dependencies, andrclone copytocould silently fail without notifying the user. - Suggested Solution: Add explicit error handling for these commands. For example:
apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei || { echo "Dependency installation failed"; exit 1; } rclone copyto output.pdf "${dst}/tidb-${version}-zh-manual.pdf" || { echo "PDF upload failed"; exit 1; }
- File:
-
Potential race condition with shared secrets
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines: Environment variable configuration for
TENCENTCLOUD_RCLONE_CONNandTENCENTCLOUD_BUCKET_ID. - Issue: Multiple concurrent jobs may access or overwrite shared secrets (
docs-cn-tencent-akanddocs-cn-tencent-bn), leading to unpredictable behavior. - Suggested Solution: Use namespaced or job-specific secrets if possible, and ensure the secrets are not shared across jobs.
- File:
Code Improvements
-
Decouple runtime dependency installation
-
File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml -
Lines:
apt-get installblock insideargs. -
Issue: Installing dependencies at runtime increases job execution time and introduces potential failure points.
-
Suggested Solution: Build and use a pre-configured container image that includes all required tools (Python,
rclone, fonts). This reduces runtime variability and speeds up execution.Example Dockerfile for a custom image:
FROM pandoc/extra:latest-ubuntu RUN apt-get update && apt-get install -y python3 rclone fonts-wqy-microhei && \ rm -rf /var/lib/apt/lists/* && \ fc-cache -fv >/dev/null || true
Update job configuration to use the custom image:
image: custom-pandoc-image:latest
-
-
Refactor markdown validation logic
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines:
find . -name '*.md' -print0 | xargs -0 grep -P '\t' - Issue: The current approach is hard to read and may fail in edge cases (e.g., filenames with special characters).
- Suggested Solution: Refactor this logic into a Python script or use simpler Bash constructs. Example:
grep -rnP '\t' --include="*.md" . || { echo "Tabs detected in markdown files"; exit 1; }
- File:
Best Practices
-
Container resource limits
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml - Lines: Resource requests and limits.
- Issue: Requests and limits are hard-coded without justification. This could lead to inefficient resource utilization or job failures under load.
- Suggested Solution: Monitor resource usage of these jobs to fine-tune memory and CPU allocation.
- File:
-
Testing coverage gaps
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml,prow-jobs/pingcap/docs/docs-postsubmits.yaml - Issue: No mention of how the changes are tested beyond runtime sanity checks.
- Suggested Solution: Add automated tests for the new configuration in a staging environment to validate job behavior for edge cases (e.g., missing secrets, invalid markdown files).
- File:
Notes
- Ensure that the external secret (
docs-cn) mentioned in the PR description exists before merging, as it is a prerequisite for job execution. - Consider documenting the migration process and rationale for future maintainability.
There was a problem hiding this comment.
Code Review
This pull request migrates the postsubmit jobs for building and uploading documentation PDFs from Jenkins to Prow for the pingcap/docs and pingcap/docs-cn repositories. The reviewer recommends improving CI reliability and performance by baking system dependencies, such as Python, rclone, and fonts, into a custom Docker image instead of installing them at runtime via apt-get.
| export DEBIAN_FRONTEND=noninteractive | ||
| apt-get update | ||
| apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei | ||
| rm -rf /var/lib/apt/lists/* | ||
| fc-cache -fv >/dev/null || true |
There was a problem hiding this comment.
Installing system dependencies like python3, rclone, and fonts at runtime via apt-get increases CI job duration and introduces external dependencies on package mirrors, which can lead to intermittent failures. Per the general rules, these dependencies should be baked into a custom Docker image for better performance and reliability. Additionally, baking the image would allow the job to run without root privileges (runAsUser: 0).
References
- For better CI performance and reliability, bake dependencies into the Docker image instead of installing them at runtime in CI scripts.
| export DEBIAN_FRONTEND=noninteractive | ||
| apt-get update | ||
| apt-get install -y --no-install-recommends python3 rclone fonts-wqy-microhei | ||
| rm -rf /var/lib/apt/lists/* | ||
| fc-cache -fv >/dev/null || true |
There was a problem hiding this comment.
Installing system dependencies like python3, rclone, and fonts at runtime via apt-get increases CI job duration and introduces external dependencies on package mirrors, which can lead to intermittent failures. Per the general rules, these dependencies should be baked into a custom Docker image for better performance and reliability. Additionally, baking the image would allow the job to run without root privileges (runAsUser: 0).
References
- For better CI performance and reliability, bake dependencies into the Docker image instead of installing them at runtime in CI scripts.
|
/hold |
wuhuizuo
left a comment
There was a problem hiding this comment.
review pending local validation
|
Blocking review point: this PR drops ^release-5.4$ from both prow-jobs/pingcap/docs/docs-postsubmits.yaml and prow-jobs/pingcap/docs/docs-cn-postsubmits.yaml. main still runs merged_update_docs* for release-5.4, so merging this would silently stop merged PDF generation on that branch. Please either restore release-5.4 coverage or document that the branch is intentionally retired before re-requesting review. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
I have already done a preliminary review for you, and I hope to help you do a better job.
Summary
This PR migrates postsubmit jobs from Jenkins to Prow with Kubernetes agents, replacing Groovy scripts and pod templates with inline container specifications. It also transitions from private dependencies to a public runtime (pandoc/extra:latest-ubuntu) and updates job definitions in Prow configuration files. The approach is clean and aligns well with modern CI/CD practices, but a few areas require attention for robustness and maintainability.
Critical Issues
-
Missing Error Handling in PDF Generation
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlanddocs-postsubmits.yaml, lines wherescripts/generate_pdf.shis invoked. - Issue: If
scripts/generate_pdf.shfails, subsequent steps likerclone copytowill still be executed, potentially uploading incomplete or invalid files. - Suggestion: Add explicit error handling after PDF generation commands:
scripts/generate_pdf.sh || { echo "PDF generation failed"; exit 1; }
- File:
-
Hardcoded Image Versions
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlanddocs-postsubmits.yaml, container image references. - Issue: The image
docs-merged-update-runtime@sha256:ce3647630176a08c...is hardcoded without a tag or version variable, making upgrades or rollbacks difficult. - Suggestion: Use a parameterized version or tag:
image: us-docker.pkg.dev/pingcap-testing-account/hub/pingcap-qe/docs-merged-update-runtime:${VERSION_TAG}
- File:
Code Improvements
-
Unsupported Markdown Tab Check
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlanddocs-postsubmits.yaml, lines checking for tab characters in markdown sources. - Issue: The check outputs a generic error message but doesn't specify which files contain tabs. This can make debugging tedious.
- Suggestion: Modify the error message to include filenames:
if find . -name '*.md' -print0 | xargs -0 grep -P '\t'; then echo "Found tab characters in markdown sources:" >&2 find . -name '*.md' -print0 | xargs -0 grep -P '\t' exit 1 fi
- File:
-
Redundant Node Affinity Specification
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlanddocs-postsubmits.yaml,nodeAffinitysection. - Issue: Node affinity is specified explicitly for
amd64, which may not be future-proof if additional architectures need support. - Suggestion: Consider making this configurable or using
kubernetes.io/osfor broader compatibility:matchExpressions: - key: kubernetes.io/os operator: In values: - linux
- File:
Best Practices
-
Missing Resource Limits for Python Processes
- File:
prow-jobs/pingcap/docs/docs-cn-postsubmits.yamlanddocs-postsubmits.yaml. - Issue: Python scripts (e.g.,
merge_by_toc.py) are run without specific resource constraints, which could lead to unpredictable behavior under high load. - Suggestion: Add resource limits for Python processes:
python3 scripts/merge_by_toc.py --limit-memory=2G
- File:
-
Documentation Gaps for Secrets
- File: Secrets like
docs-cn-tencent-akanddocs-cn-tencent-bnare referenced but their setup isn't documented in the PR description. - Issue: Lack of explanation about how these secrets should be created or updated may confuse new maintainers.
- Suggestion: Add a README or update the PR description to include instructions for managing these secrets.
- File: Secrets like
-
Testing Coverage
- Issue: The PR description mentions validation steps but doesn't specify automated tests for the PDF generation workflow.
- Suggestion: Add unit tests or integration tests to verify
scripts/merge_by_toc.pyandscripts/generate_pdf.shfunctionality, especially edge cases like missing TOC entries.
Conclusion
While the migration to Prow and Kubernetes is well-executed overall, addressing the above issues will significantly improve the reliability and maintainability of the pipeline. Focus on error handling, resource constraints, and documentation to ensure smooth execution and easy troubleshooting in production environments.
There was a problem hiding this comment.
I have already done a preliminary review for you, and I hope to help you do a better job.
Summary
This PR migrates two Jenkins-based postsubmit jobs (pingcap/docs/merged_update_docs and pingcap/docs-cn/merged_update_docs_cn) to Prow using Kubernetes agents. It replaces the legacy Jenkins pipeline and Groovy configurations with inline container specs, leveraging a public Docker image (pandoc/extra:latest-ubuntu) and Kubernetes secrets for credentials. The migration appears well-structured and adheres to Prow's standard configurations. The PR eliminates redundant Groovy pipeline files and pod templates, simplifying the CI/CD setup while maintaining functionality. Overall, the changes improve maintainability, but there are minor issues related to error handling and container configuration.
Critical Issues
-
Error Handling in Markdown Validation
- File:
docs-cn-postsubmits.yaml: lines 55-57anddocs-postsubmits.yaml: lines 55-57 - Issue: The
findcommand checks for tab characters in markdown files but only reports the error without providing actionable context (e.g., file names with tabs). This can lead to debugging difficulties. - Suggested Solution: Modify the
findcommand to include filenames in the output. Example:if find . -name '*.md' -print0 | xargs -0 grep -P '\t'; then echo "Found tab characters in markdown sources:" >&2 find . -name '*.md' -print0 | xargs -0 grep -Pl '\t' >&2 exit 1 fi
- File:
-
Ambiguity in Branch Handling
- File:
docs-cn-postsubmits.yaml: line 72anddocs-postsubmits.yaml: line 72 - Issue: The
casestatement for branch identification relies on implicit patterns but does not validate unexpected branch formats. This could cause silent failures if a branch does not match any pattern. - Suggested Solution: Add a fallback
defaultcase to handle unexpected branches explicitly.*) echo "Error: Unexpected base ref: ${branch}" >&2 exit 1 ;;
- File:
Code Improvements
-
Container Image Versioning
- File:
Dockerfile: line 1 - Issue: The base image (
pandoc/extra) is referenced by a SHA256 digest. While this ensures immutability, it may become outdated over time. A mechanism for updating the base image should be considered. - Suggested Solution: Introduce CI automation to periodically check for new versions of the image and update the digest.
- File:
-
Resource Definition in Postsubmit Jobs
- File:
docs-cn-postsubmits.yaml: lines 93-95anddocs-postsubmits.yaml: lines 93-95 - Issue: The resource limits (
memory: 12Gi,cpu: "4") may be excessive for the PDF generation process and could lead to resource contention in shared Kubernetes clusters. - Suggested Solution: Profile the job's actual resource usage and adjust limits accordingly. For instance:
resources: requests: memory: 4Gi cpu: "2" limits: memory: 8Gi cpu: "4"
- File:
Best Practices
-
Testing Coverage for PDF Generation
- Issue: No mention of automated tests for the scripts (
merge_by_toc.py,generate_pdf.sh, etc.). These scripts are critical to the functionality of the pipeline. - Suggested Solution: Include unit tests for the scripts in the repository and integrate them into the CI pipeline.
- Issue: No mention of automated tests for the scripts (
-
Documentation for Secrets Usage
- File:
docs-cn-postsubmits.yamlanddocs-postsubmits.yaml - Issue: The use of Kubernetes secrets (
docs-cn-tencent-akanddocs-cn-tencent-bn) is not documented clearly in the PR description or comments. - Suggested Solution: Add comments explaining the purpose of these secrets and ensure their values are documented securely for maintainers.
- File:
-
Duplication in Branch Handling Logic
- Files:
docs-cn-postsubmits.yamlanddocs-postsubmits.yaml - Issue: The branch handling logic is duplicated across both files.
- Suggested Solution: Refactor the logic into a shared script that can be sourced by both configurations.
- Files:
Summary of Changes Needed
- Improve error handling in markdown validation and branch pattern matching.
- Optimize resource limits for PDF generation containers.
- Add automated testing for critical scripts and document secrets usage.
- Refactor duplicated branch handling logic into reusable components.
Summary
pingcap/docs/merged_update_docsandpingcap/docs-cn/merged_update_docs_cntoagent: kubernetes+cluster: defaultpostsubmitsdocs/merged-cianddocs-cn/merged-ciwhile replacing the legacy Jenkins Groovy/pod-template flow with inline container specshub.pingcap.net/jenkins/docs-cn-checkerdependency in favor of a publicpandoc/extra:latest-ubunturuntime that installspython3,rclone, andfonts-wqy-microheiValidation
.ci/update-prow-job-kustomization.shcheckconfigagainst flattenedprow-jobs/flux build kustomization prow-jobs --path=prow-jobs --kustomization-file=../ee-ops/apps/gcp/prow/pre/prow-jobs.yaml --dry-rundocker run --rm --entrypoint bash pandoc/extra:latest-ubuntu ...Notes
docs-cn-tencent-akanddocs-cn-tencent-bnfrom thedocs-cnKubernetes secret; the matching ExternalSecret still needs to exist in Prowpost/job-nsbefore first execution