Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf convert *.docx Format error (font, color, line break) #568

Open
dashugege opened this issue Mar 12, 2024 · 0 comments
Open

pdf convert *.docx Format error (font, color, line break) #568

dashugege opened this issue Mar 12, 2024 · 0 comments
Labels
type: bug Existing feature doesn't work correctly

Comments

@dashugege
Copy link

try {

        // 加载PDF文档
        PDDocument pdfDocument = PDDocument.load(new File(pdfFilePath));
        PDFTextStripper pdfStripper = new PDFTextStripper();
        pdfStripper.setSortByPosition(true);
        int totalPages = pdfDocument.getNumberOfPages();

        XWPFDocument wordDocument = new XWPFDocument();
        for (int pageIndex = 0; pageIndex < totalPages; pageIndex++) {

            pdfStripper.setStartPage(pageIndex+1);
            pdfStripper.setEndPage(pageIndex+1);
            PDResources resources = pdfDocument.getPages().get(pageIndex).getResources();
            String pageText = pdfStripper.getText(pdfDocument).trim();

            if (!pageText.isEmpty()) {
                XWPFParagraph paragraph = wordDocument.createParagraph();
                XWPFRun run = paragraph.createRun();
                paragraph.setAlignment(ParagraphAlignment.LEFT);
                run.setText(pageText);
                // 设置段落的对齐方式
                paragraph.setAlignment(ParagraphAlignment.LEFT);
                if (pageText.indexOf("\n") > 0) {
                    //设置换行
                    String[] text = pageText.split("\n");
                    for (int i = 0; i < text.length; i++) {
                        if (i != 0) {
                            run.addCarriageReturn();
                        }
                        run.setText(text[i]);
                    }
                } else {
                    run.setText(pageText);
                }
            }

            for (COSName xObjectName : resources.getXObjectNames()) {
                PDXObject xObject = resources.getXObject(xObjectName);
                if (xObject instanceof PDImageXObject) {
                    PDImageXObject image = (PDImageXObject) xObject;
                    Bitmap bitmap = image.getImage();

                    // 将图片保存到临时文件
                    File tempImageFile = new File(parentPath + "/" + "convert_temp_image.png");
                    if (tempImageFile.exists()) {
                        tempImageFile.delete();
                    }
                    tempImageFile.createNewFile();
                    ByteArrayOutputStream bos = new ByteArrayOutputStream();
                    bitmap.compress(Bitmap.CompressFormat.JPEG, 75, bos);
                    byte[] bitmapdata = bos.toByteArray();

                    FileOutputStream fos = new FileOutputStream(tempImageFile);
                    fos.write(bitmapdata);
                    fos.close();

                    // 在Word文档中插入图片
                    try (InputStream is = new FileInputStream(tempImageFile)) {
                        int format = XWPFDocument.PICTURE_TYPE_JPEG;
                        String fileName = tempImageFile.getName();
                        int indent = 0;
                        XWPFParagraph imgParagraph = wordDocument.createParagraph();
                        XWPFRun imgRun = imgParagraph.createRun();
                        imgRun.addBreak();
                        imgRun.addPicture(is, format, fileName, Units.toEMU(image.getWidth()), Units.toEMU(image.getHeight()));
                    }
                }
            }

        }
        //保存Word文档
        try (FileOutputStream out = new FileOutputStream(wordFilePath)) {
            wordDocument.write(out);
        }

        // 关闭PDF文档
        pdfDocument.close();

        System.out.println("PDF内容已成功写入Word文件!");


    } catch (Exception e) {
        e.printStackTrace();
    }
@dashugege dashugege added the type: bug Existing feature doesn't work correctly label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Existing feature doesn't work correctly
Projects
None yet
Development

No branches or pull requests

1 participant