Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/135772 135785 bulk export page to local fs in pdf #8646

Open
wants to merge 27 commits into
base: feat/page-bulk-export
Choose a base branch
from

Conversation

arafubeatbox
Copy link
Contributor

@arafubeatbox arafubeatbox commented Mar 31, 2024

実装内容

  • pdf での bulk export を実装
  • pdf への変換には、まず markdown を remark で html に変換し、それを puppeteer を使って pdf に変換する
  • puppeteer の起動には https://github.com/thomasdondorf/puppeteer-cluster を使用
    • puppeteer の最大起動数を管理できる
      • デフォルト最大起動数を 10 に設定
    • 失敗した時のリトライ機能もあるが、await する場合はリトライできなかったため、リトライの部分は自前で実装
      • 変換処理が多いと puppeteer が安定しない時があるため、リトライ処理は必要

変換結果

以下の実行でいずれもメモリの使用率に大きな変動はなし

  • 10,000 ページ
    • 圧縮前 pdf サイズ: 942 MB
    • 圧縮後 tar.gz サイズ: 676 MB
    • エクスポート時間: 1h6min
  • 10,000 ページ * 10 を並列で実行
    • 圧縮前 pdf サイズ: 900 ~ 1.5MB * 10
    • 圧縮後 tar.gz サイズ: およそ 680 ~ 780 MB * 10
    • エクスポート時間: 7h38min
      • 変換処理は大体全て同じくらいの時刻に終わった

task

https://redmine.weseek.co.jp/issues/135785

備考

puppeteer-cluster を pdf 変換処理時にだけ起動しておく実装を進行中
https://redmine.weseek.co.jp/issues/150248

@arafubeatbox arafubeatbox changed the base branch from master to feat/page-bulk-export March 31, 2024 15:04
Copy link

reg-suit bot commented Mar 31, 2024

reg-suit detected visual differences.

Check this report, and review them.

🔴🔴🔴🔴
⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪
⚫⚫⚫⚫⚫⚫⚫⚫⚫⚫⚫⚫⚫⚫
🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵

What do the circles mean? The number of circles represent the number of changed images.
🔴 : Changed items, ⚪ : New items, ⚫ : Deleted items, and 🔵 Passed items

How can I change the check status? If reviewers approve this PR, the reg context status will be green automatically.

Copy link

changeset-bot bot commented Jun 30, 2024

⚠️ No Changeset found

Latest commit: 594fe5a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@@ -26,7 +28,8 @@ export class GcsMultipartUploader extends MultipartUploader implements IGcsMulti
constructor(bucket: Bucket, uploadKey: string, maxPartSize: number) {
super(uploadKey, maxPartSize);

this.file = bucket.file(this.uploadKey);
const namespace = configManager.getConfig('crowi', 'gcs:uploadNamespace');
this.file = bucket.file(urljoin(namespace || '', uploadKey));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

アップロード先の namespace の指定が抜けており、ファイル取得時のパスと齟齬があったため修正。


/**
* Convert markdown string to html, then to PDF
* When PDF conversion is unstable and error occurs, it will retry up to the specified limit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

変換リクエストが多いと、puppeteer のエラーが生じる場合があるので、指定された数までは変換のリトライを可能とする。

@@ -119,7 +119,7 @@ ENV appDir ${optDir}/growi
# see: https://github.com/tianon/gosu/blob/1.13/INSTALL.md
RUN set -eux; \
apt-get update; \
apt-get install -y gosu; \
apt-get install -y gosu chromium; \
Copy link
Contributor Author

@arafubeatbox arafubeatbox Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker build したものを動かしてみて、puppeteer が動くことを確認済み

@arafubeatbox arafubeatbox marked this pull request as ready for review July 11, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant