enable pdf bulk export #9283

arafubeatbox · 2024-10-20T09:26:41Z

仕様書

https://dev.growi.org/66ee8495830566b31e02c953#growi

task

https://redmine.weseek.co.jp/issues/153348

備考

base branch は feat/page-bulk-export ではなく、feat/135772-pdf-page-bulk-export

changeset-bot · 2024-10-20T09:26:46Z

⚠️ No Changeset found

Latest commit: c8d4802

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

…o feat/135772-153348-enable-pdf-bulk-export-tsed

…48-enable-pdf-bulk-export-tsed

apps/pdf-converter/package.json

…48-enable-pdf-bulk-export-tsed

yuki-takei · 2024-11-21T15:35:27Z

apps/app/src/features/page-bulk-export/server/service/page-bulk-export/index.ts

+      .toString();
+
+    return htmlString;
+  }


remark().use(html) で得られるインスタンスは保持して使い回した方がいいんじゃないかな
どこまで差が出るかはわからないけど

対応しました。
インスタンス変数にして全ての job で共通にするのは並列で処理が走ったときバグる可能性があるように思えたので、ジョブごとに stream 開始前に定義して引数として渡す形にしました。

引数の型ですが、remark().use(html) の返り値の type が unified のものですが、

unified.Processor<remark.RemarkOptions>

remark が依存している unified と growi が package.json で指定している unified のバージョンが異なるっぽくて、同様の type を付与するとエラーが生じてしまいます。
そのため引数に type は付与していません。
(バージョンを対応させようとすると、esm only の remark 系のパッケージが必要となり、エラーが生じてしまいます)

使われるバージョンが一致するように別タスクで処理した方がいいように思う

サーバーサイドでの ESM only package のインポートは、dynamic-import で対応できる

import { dynamicImport } from '@cspell/dynamic-import'; でソース内検索

https://redmine.weseek.co.jp/issues/159172
タスク作りました。

apps/app/src/features/page-bulk-export/server/service/page-bulk-export/index.ts

apps/app/turbo.json

yuki-takei · 2024-11-22T12:10:24Z

apps/app/src/features/page-bulk-export/server/service/page-bulk-export/index.ts

+
+    return new Promise<void>((resolve, reject) => {
+      // Request sync job API until the pdf export is done. If pdf export status is updated in growi, send the status to pdf-converter.
+      const interval = setInterval(async() => {


この setInterval、job の数だけ走るのはあまりよくない気がする。
setInterval よりシステムワイドに node-cron を使うべきではないか？

また、サーバーが再起動した時に job を参照して再開してほしい。

参考:

thread-deletion-cron.ts

vector-store-file-deletion-cron.ts

実装アイデアメモ

cronJob は2種類あってもいいかも。

1つは処理完了待ちの job で、10秒とか比較的短いインターバルで動いて fail / done へステータスを遷移させる

もう1つは上記の処理完了待ち job の start/stop を担当する (2～3分の長めのインターバルでよい)

in progress な job が 1 つでもあれば処理完了待ち job を start させる

in progress な job が 0 個なら処理完了待ち job を stop させる

サーバー再起動時はこれを start しておくことで、数分後には必要があれば処理完了待ち job が start することを期待

https://redmine.weseek.co.jp/issues/158220 で service を cron 化し、pdf 化の処理については request-pdf-converter 内に記載し、cron 内で呼び出すように実装しました。

…48-enable-pdf-bulk-export-tsed

yuki-takei · 2024-11-25T09:26:31Z

apps/app/src/features/page-bulk-export/server/service/page-bulk-export/index.ts

+      .toString();
+
+    return htmlString;
+  }


使われるバージョンが一致するように別タスクで処理した方がいいように思う

サーバーサイドでの ESM only package のインポートは、dynamic-import で対応できる

import { dynamicImport } from '@cspell/dynamic-import'; でソース内検索

…48-enable-pdf-bulk-export-tsed

apps/pdf-converter/src/service/pdf-convert.ts

arafubeatbox · 2024-12-14T07:24:35Z

apps/pdf-converter/package.json

@@ -11,7 +11,7 @@
    "start:prod:ci": "pnpm start:prod --ci",
    "start:prod": "node dist/index.js",
    "lint": "pnpm eslint **/*.{js,ts}",
-    "gen:client-code": "tsed run generate-swagger --output ./specs && orval",
+    "gen:client-code": "SWAGGER_GENERATION=true tsed run generate-swagger --output ./specs && orval",


apps/pdf-converter/src/service/pdf-convert.ts で $onInit で puppeteer を初期化するように変更した結果、tsed run generate-swagger で puppetter のエラー (Error: Unable to launch browser, error message: Could not find Chrome (ver. 130.0.6723.69)) が生じるようになったが、swagger schema 生成に puppeteer の立ち上げは必要ないため、SWAGGER_GENERATION=true の時立ち上げないように設定。

この先この仕様をしらないプログラマがどのサービスの onInit に何を仕込むかわからないから、generate-swagger するためだけのエントリーポイントを作るのがよさそうかなと (後続ストーリーでよい)

arafubeatbox · 2024-12-14T08:01:57Z

apps/app/src/features/page-bulk-export/server/service/page-bulk-export-job-cron/index.ts

+
+      if (pageBulkExportJob.status === PageBulkExportJobStatus.exporting && pageBulkExportJob.format === PageBulkExportFormat.pdf) {
+        await requestPdfConverter(pageBulkExportJob);
+      }


以下の流れを想定

L166 の非同期実行されている createPageSnapshotsAsync が完了し、status が exporting になる

cron 実行の度、requestPdfConverter が実行されるようになる

pdf 変換が完了するまで statusOnPreviousCronExec が同じになるため、L154 より前に実行する必要がある

ある cron の実行で L169 の exportPagesToFsAsync が非同期で実行される

非同期で実行されている exportPagesToFsAsync が完了し、HTML でのエクスポートが完了する

pdf converter での pdf 変換処理が完了する

requestPdfConverter で pdf 変換処理を検知し、status が uploading になる

yuki-takei · 2024-12-16T12:11:07Z

apps/pdf-converter/package.json

@@ -11,7 +11,7 @@
    "start:prod:ci": "pnpm start:prod --ci",
    "start:prod": "node dist/index.js",
    "lint": "pnpm eslint **/*.{js,ts}",
-    "gen:client-code": "tsed run generate-swagger --output ./specs && orval",
+    "gen:client-code": "SWAGGER_GENERATION=true tsed run generate-swagger --output ./specs && orval",


この先この仕様をしらないプログラマがどのサービスの onInit に何を仕込むかわからないから、generate-swagger するためだけのエントリーポイントを作るのがよさそうかなと (後続ストーリーでよい)

mergify · 2024-12-16T12:15:01Z

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #9283 has been dequeued. The pull request has been merged manually. The pull request has been merged manually at 5fd17c8.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

enable pdf bulk export

e852b29

arafubeatbox mentioned this pull request Oct 20, 2024

add pdf-converter app #9279

Merged

arafubeatbox added 9 commits October 27, 2024 23:39

Merge branch 'feat/135772-155547-pdf-converter-tsed-devcontainer' int…

9c5a514

…o feat/135772-153348-enable-pdf-bulk-export-tsed

Merge branch 'feat/135772-155547-pdf-converter-tsed-devcontainer' int…

aa6154f

…o feat/135772-153348-enable-pdf-bulk-export-tsed

remove file added in wrong merge

7095c0f

run pnpm install

e00dfc4

Merge branch 'feat/135772-155547-pdf-converter-tsed-devcontainer' int…

ee444b3

…o feat/135772-153348-enable-pdf-bulk-export-tsed

fix pdf-converter port number

4209913

add noEmit false to pdf-converter tsconfig

52fa250

get pdf convert job status before cleanup

8b7cec1

Merge branch 'feat/135772-155547-pdf-converter-tsed-devcontainer' int…

b0636d3

…o feat/135772-153348-enable-pdf-bulk-export-tsed

Base automatically changed from feat/135772-155547-pdf-converter-tsed-devcontainer to feat/135772-pdf-page-bulk-export November 11, 2024 13:37

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

5fe093a

…48-enable-pdf-bulk-export-tsed

arafubeatbox marked this pull request as ready for review November 12, 2024 10:39

arafubeatbox added 3 commits November 12, 2024 20:01

add comments

45d88ed

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

48f12c9

…48-enable-pdf-bulk-export-tsed

change pdf-converter dev command to dev:pdf-converter

426cf56

arafubeatbox commented Nov 13, 2024

View reviewed changes

apps/pdf-converter/package.json Outdated Show resolved Hide resolved

arafubeatbox added 2 commits November 19, 2024 20:51

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

d0ae507

…48-enable-pdf-bulk-export-tsed

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

7ee3eaf

…48-enable-pdf-bulk-export-tsed

yuki-takei requested changes Nov 21, 2024

View reviewed changes

yuki-takei requested changes Nov 22, 2024

View reviewed changes

arafubeatbox added 3 commits November 22, 2024 21:31

create remark instance before pdf conversion stream starts

4bae443

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

c07165d

…48-enable-pdf-bulk-export-tsed

add PageBulkExportPdfConvertCronService

b98f78e

yuki-takei requested changes Nov 25, 2024

View reviewed changes

arafubeatbox added 3 commits December 8, 2024 14:38

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

9bc3d72

…48-enable-pdf-bulk-export-tsed

add pdf convert process to PageBulkExportJobCronService

b2b30bc

convert md to html on pdf export

394cad5

arafubeatbox added 6 commits December 8, 2024 18:08

fix pageBulkExportJobCronSchedule

9e0a6ab

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

c06158c

…48-enable-pdf-bulk-export-tsed

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

f838283

…48-enable-pdf-bulk-export-tsed

Merge branch 'feat/135772-pdf-page-bulk-export' into feat/135772-1533…

4ada71c

…48-enable-pdf-bulk-export-tsed

fix pdf bulk export extension

1a78cf9

init puppeteerCluster on app launch and cleanup html dir

da13dea

arafubeatbox commented Dec 14, 2024

View reviewed changes

apps/pdf-converter/src/service/pdf-convert.ts Show resolved Hide resolved

arafubeatbox commented Dec 14, 2024

View reviewed changes

add comments

726a16c

arafubeatbox mentioned this pull request Dec 14, 2024

use newest unified packages specified in package.json #9488

Merged

shorten timespan for html file check in pdf converter

c8d4802

yuki-takei approved these changes Dec 16, 2024

View reviewed changes

yuki-takei merged commit 5fd17c8 into feat/135772-pdf-page-bulk-export Dec 16, 2024
10 checks passed

yuki-takei deleted the feat/135772-153348-enable-pdf-bulk-export-tsed branch December 16, 2024 12:14

arafubeatbox mentioned this pull request Dec 16, 2024

Imprv/159173 159174 independent pdf converter client package #9489

Merged

arafubeatbox mentioned this pull request Dec 26, 2024

only skip puppeteer init on swagger generation #9528

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable pdf bulk export #9283

enable pdf bulk export #9283

arafubeatbox commented Oct 20, 2024 •

edited

Loading

changeset-bot bot commented Oct 20, 2024 •

edited

Loading

yuki-takei Nov 21, 2024

arafubeatbox Nov 22, 2024

yuki-takei Nov 25, 2024

arafubeatbox Dec 14, 2024

yuki-takei Nov 22, 2024 •

edited

Loading

arafubeatbox Dec 14, 2024

yuki-takei Nov 25, 2024

arafubeatbox Dec 14, 2024 •

edited

Loading

yuki-takei Dec 16, 2024

arafubeatbox Dec 14, 2024 •

edited

Loading

yuki-takei Dec 16, 2024

mergify bot commented Dec 16, 2024

enable pdf bulk export #9283

enable pdf bulk export #9283

Conversation

arafubeatbox commented Oct 20, 2024 • edited Loading

仕様書

task

備考

changeset-bot bot commented Oct 20, 2024 • edited Loading

⚠️ No Changeset found

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuki-takei Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

実装アイデアメモ

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arafubeatbox Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arafubeatbox Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Dec 16, 2024

arafubeatbox commented Oct 20, 2024 •

edited

Loading

changeset-bot bot commented Oct 20, 2024 •

edited

Loading

yuki-takei Nov 22, 2024 •

edited

Loading

arafubeatbox Dec 14, 2024 •

edited

Loading

arafubeatbox Dec 14, 2024 •

edited

Loading