Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement highlights properly and efficently #30

Open
karlicoss opened this issue Dec 27, 2019 · 10 comments
Open

implement highlights properly and efficently #30

karlicoss opened this issue Dec 27, 2019 · 10 comments
Labels
can-we-share? Can we reuse as much code with other projects as possible? performance

Comments

@karlicoss
Copy link
Owner

At the moment the algorithm is very basic and not super reliable.

Tried using diff-match-patch, but it was quite awkward. Doesn't seem to support patterns longer than 32 bits?
I think Hypothes.is is using it for highlights, but they have modified it?

@clintgibler
Copy link
Contributor

You probably already know this, but wanted to leave a note on some behavior I've observed regarding highlighting.

It appears that the granularity of highlighting is based on a block of text. That is, if you have one sentence out of a paragraph included in your org mode file, Promnesia will highlight the entire paragraph when the web page is visited.

For example:

#+TITLE: Promnesia
* My Notes
Here are some manual notes I've typed about this great project.

** Highlights
https://github.com/karlicoss/promnesia

This is unlike most modern browsers, where you can only see when you visited the
link.

Will cause the full paragraph to be highlighted:

TLDR: it lets you explore your browsing history in context: where you encountered it, in chat, on Twitter, on Reddit, or just in one of the text files on your computer. This is unlike most modern browsers, where you can only see when you visited the link.

@karlicoss
Copy link
Owner Author

karlicoss commented Jul 11, 2020

Yes! Thanks for a specific example though.
The 'algorithm' is pretty much the dumbest thing I've come up with, so there are artifacts at times:

// TODO not very effecient; replace with something existing (Hypothesis??)
function _highlight(text: string, idx: number) {
for (const line of text.split('\n')) {
// TODO filter too short strings? or maybe only pick the longest one?
const found = findText(unwrap(doc.body), line);
if (found == null) {
console.debug('No match found for %s', line);
continue;
}
console.debug("highlighting %o %s", found, line);
// $FlowFixMe
const target: HTMLElement = unwrap(found.nodeType == Node.TEXT_NODE ? found.parentElement : found);
if (target.classList.contains('toastify')) {
// TODO hacky...
continue;
}
// TODO why doesn't flow warn about this??
// target.name === 'body'
if (target === doc.body) {
// meh, but otherwise too spammy
console.warn('body matched for highlight; skipping it');
continue;
}
target.classList.add('promnesia-highlight');
const ref = doc.createElement('span');
ref.classList.add('promnesia-highlight-reference');
ref.classList.add('nonselectable');
ref.appendChild(doc.createTextNode(String(idx)));
target.insertAdjacentElement('beforeend', ref);
}
}

This specific thing might not be too hard to fix (e.g. find the exact bit to be highlighted, split the element in to spans, etc), but I've tried not to invest too much time in it so far because it's very hard to get it right (and with good performance), so ultimately I hope to collaborate/borrow code from Hypothes.is or Worldbrain Memex!

@clintgibler
Copy link
Contributor

Ah nice, thanks for the link to the implementation, that's interesting. I agree, doing this "well" and with good performance seems hard. +1 re: punting and borrowing from Hypothes.is / Worldbrain Memex 👍

Another thing I noticed is that sometimes only some of the sections would be highlighted, or only the first one (e.g. of 3 sections of copied text from a page, only the first in my notes file would be highlighted).

I was able to hack around that a bit by creating a new header for each block of text I wanted highlighted, and including the link to the web page under each header before the copied text.

I haven't poked at it in too much detail, it's Good Enough for now :)

@karlicoss
Copy link
Owner Author

More work on it here 9d21d18

@OmarAshkar
Copy link

Hi @karlicoss . I have an issue here when I tried the example above in an org file. Despite the URL is showing correctly, the highlight is not showing despite the exact text!

@karlicoss
Copy link
Owner Author

Hi @OmarAshkar can you share a snippet of org-mode and the website you clipped it from? So I could try to debug

@OmarAshkar
Copy link

@karlicoss I just used this same snippet with clipped text from the home page.

* Highlights
https://github.com/karlicoss/promnesia

This is unlike most modern browsers, where you can only see when you visited the
link.

I am using doom emacs and firefox on ubuntu.

@karlicoss
Copy link
Owner Author

Ah indeed, thanks, same happens for me. I guess in principle, I should have the whole https://github.com/karlicoss/promnesia highlighted because I have this repository cloned and indexed, wheras it has a few gaps.

@OmarAshkar
Copy link

@karlicoss Oh, the whole page is to be highlighted! I thought only the clipped part will be highlighted like hypothesis.is and memex? Either way is working anyway!

@karlicoss
Copy link
Owner Author

Ah no, indeed it will ideally highlight the sentence/paragraph only.
I just meant that because I have the whole README.org file indexed by promnesia, it should match all content on the page for me (because technically it's all "clipped")

@karlicoss karlicoss added the can-we-share? Can we reuse as much code with other projects as possible? label Dec 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can-we-share? Can we reuse as much code with other projects as possible? performance
Projects
None yet
Development

No branches or pull requests

3 participants