Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plaintext draft parsing fails to extract document date and title with long author lists #5731

Open
1 task done
jennifer-richards opened this issue May 31, 2023 · 0 comments

Comments

@jennifer-richards
Copy link
Member

Describe the issue

On a draft where the document creation date appears after line 15, such as happens with long author lists, the _stripheaders() helper method breaks the first page as ending just before the date. Its paginated output is then used by PlaintextDraft to extract (among other things) the title and creation date. When doing so, it is assumed that both these fields will appear on the first page of the draft. As a result, neither can be extracted when the author list is long.

This can be fixed by modifying PlaintextDraft to consider the first two pages instead of just the first page when extracting these fields.

Alternatively, _stripheaders() could be changed, but it's quite intentional in doing it this way so I'm worried that the change might have other consequences.

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants