-
-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to highlight sentence over multiple lines? #614
Comments
Hah, that's a good one! For this to work you need to:
And this is a simplified version, since it only supports single matches! For multiple matches, you will need to make it even more complicated. Anyway, in code it would look like so: const stringToHighlight = 'Donec sodales placerat dui';
// You might want to merge the items a little smarter than that
function getTextItemWithNeighbors(textItems, itemIndex, span = 1) {
return textItems.slice(
Math.max(0, itemIndex - span),
itemIndex + 1 + span
)
.filter(Boolean)
.map(item => item.str)
.join('');
}
function getIndexRange(string, substring) {
const indexStart = string.indexOf(substring);
const indexEnd = indexStart + substring.length;
return [indexStart, indexEnd];
}
function Test() {
const [textItems, setTextItems] = useState();
const onPageLoadSuccess = useCallback(async page => {
const textContent = await page.getTextContent();
setTextItems(textContent.items);
}, []);
const customTextRenderer = useCallback(textItem => {
if (!textItems) {
return;
}
const { itemIndex } = textItem;
const matchInTextItem = textItem.str.match(stringToHighlight);
if (matchInTextItem) {
// Found full match within current item, no need for black magic
return highlightPattern(textItem.str, stringToHighlight);
}
// Full match within current item not found, let's check if we can find it
// spanned across multiple lines
// Get text item with neighbors
const textItemWithNeighbors = getTextItemWithNeighbors(textItems, itemIndex);
const matchInTextItemWithNeighbors = textItemWithNeighbors.match(stringToHighlight);
if (!matchInTextItemWithNeighbors) {
// No match
return textItem.str;
}
// Now we need to figure out if the match we found was at least partially
// in the line we're currently rendering
const [matchIndexStart, matchIndexEnd] = getIndexRange(textItemWithNeighbors, stringToHighlight);
const [textItemIndexStart, textItemIndexEnd] = getIndexRange(textItemWithNeighbors, textItem.str);
if (
// Match entirely in the previous line
matchIndexEnd < textItemIndexStart ||
// Match entirely in the next line
matchIndexStart > textItemIndexEnd
) {
return textItem.str;
}
// Match found was partially in the line we're currently rendering. Now
// we need to figure out what does "partially" exactly mean
// Find partial match in a line
const indexOfCurrentTextItemInMergedLines = textItemWithNeighbors.indexOf(textItem.str);
const matchIndexStartInTextItem = Math.max(0, matchIndexStart - indexOfCurrentTextItemInMergedLines);
const matchIndexEndInTextItem = matchIndexEnd - indexOfCurrentTextItemInMergedLines;
const partialStringToHighlight = textItem.str.slice(matchIndexStartInTextItem matchIndexEndInTextItem);
return highlightPattern(textItem.str, partialStringToHighlight);
}, [stringToHighlight, textItems]);
return (
<Document file={samplePDF}>
<Page
customTextRenderer={customTextRenderer}
onLoadSuccess={onPageLoadSuccess}
pageNumber={1}
/>
</Document>
);
} Yeah, I hate it too. |
Thank you for the algo and the piece of code. However I noticed that depending on how the pdf is rendered, it may not work. Do you know why in some PDFs, each line will be wrapped in a |
Absolutely! Things to consider:
|
Tried to implement this and no matter what I do it leads to an infinite re-render loop. Please consider having a look |
@pedro-surf You have a working example in my comment above, so you need either share the full code with us or find the differences yourself. Perhaps you're creating your custom text renderer with every render because you forgot to use useCallback? Just a blind guess though. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 14 days. |
This issue was closed because it has been stalled for 14 days with no activity. |
The code sand box example link is not opening up |
Hello, Thank you for providing the cod for highlighting text spread over multiple lines! Works great. I have another questions. When working with pdf's with multiple pages, how would that look like? Just implementing the code above doesn't seem to work. Thanks |
This would probably be more efficient if instead of trying to use nearest-neighbor for every I see there is a PR for this, but how can I use that PR? |
I am just running into this. On the items there is a transform matrix. it looks like it might be possible to get the bounding box of each item mozilla/pdf.js#5643 (comment) |
If you just want to highlight text, https://markjs.io/ worked out of the box for me
|
Before you start - checklist
What are you trying to achieve? Please describe.
I would like to highlight patterns which are spread over multiple lines.
If I try to highlight a sentence which is broken by a line break, nothing will be highlighted since each line belongs to its own tag.
Describe solutions you've tried
I thought of looking for the rest of the sentence in the following span in the DOM, but this solutions seems to be really laborious.
The text was updated successfully, but these errors were encountered: