Skip to content

Conversation

@RamanaMenda
Copy link

@RamanaMenda RamanaMenda commented Dec 11, 2025

Removing extra space before and after group items to resolve the issue raised in #2745

Resolves #371
Resolves docling-project/docling#2745

@github-actions
Copy link
Contributor

github-actions bot commented Dec 11, 2025

DCO Check Passed

Thanks @RamanaMenda, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Dec 11, 2025

Related Documentation

Checked 7 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Dec 11, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

…il.com>

I, Venkata Ramana Menda <[email protected]>, hereby add my Signed-off-by to this commit: 00cda36

Signed-off-by: Venkata Ramana Menda <[email protected]>
@RamanaMenda
Copy link
Author

#371

@RamanaMenda RamanaMenda changed the title Remove Extra Space Before and After Group Items fix: Remove Extra Space Before and After Group Items Dec 11, 2025
@dolfim-ibm dolfim-ibm requested a review from ceberam December 11, 2025 12:19
@ceberam
Copy link
Member

ceberam commented Dec 11, 2025

@RamanaMenda thanks for suggesting a fix for this issue
Please, make sure to resolve all the code checks before pushing new commits. I strongly recommend installing pre-commit in your local repository. It will run the code checks every time you execute git commit. Simply type uv run pre-commit install on your local repository.
In addition, keep in mind that fixing the serialization in docling-core will have an impact on docling library. Some backend parsers may trim trailing blanks in formatted text and, after this PR, the serialization may lead to missing blank space between words. In addition, some ground-truth files in the tests folder contain the (wrong) extra spaces and will need to be updated to ensure that the regression tests pass.
Therefore, I see the following pipeline to fix this issue:

  1. Fix this issue of extra spaces in docling-core and submit a PR.
  2. Get the PR approved, merged, and published in a new release of docling-core.
  3. Fix backend parsers and ground truth files in docling to ensure that necessary blank spaces are not removed.
  4. Pin the new release version of docling-core in docling's pyproject.toml file.
  5. Update the uv.lock file (e.g., with uv lock --upgrade-package docling-core).
  6. Ensure all tests pass in docling with the new release of docling-core.
  7. Submit a PR in docling project with the changes above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants