implement highlights properly and efficently #30

karlicoss · 2019-12-27T23:01:11Z

At the moment the algorithm is very basic and not super reliable.

Tried using diff-match-patch, but it was quite awkward. Doesn't seem to support patterns longer than 32 bits?
I think Hypothes.is is using it for highlights, but they have modified it?

clintgibler · 2020-07-11T20:19:33Z

You probably already know this, but wanted to leave a note on some behavior I've observed regarding highlighting.

It appears that the granularity of highlighting is based on a block of text. That is, if you have one sentence out of a paragraph included in your org mode file, Promnesia will highlight the entire paragraph when the web page is visited.

For example:

#+TITLE: Promnesia
* My Notes
Here are some manual notes I've typed about this great project.

** Highlights
https://github.com/karlicoss/promnesia

This is unlike most modern browsers, where you can only see when you visited the
link.

Will cause the full paragraph to be highlighted:

TLDR: it lets you explore your browsing history in context: where you encountered it, in chat, on Twitter, on Reddit, or just in one of the text files on your computer. This is unlike most modern browsers, where you can only see when you visited the link.

karlicoss · 2020-07-11T20:43:40Z

Yes! Thanks for a specific example though.
The 'algorithm' is pretty much the dumbest thing I've come up with, so there are artifacts at times:

promnesia/extension/src/sidebar.js

Lines 215 to 249 in afceaa4

    
           // TODO not very effecient; replace with something existing (Hypothesis??) 
        
           function _highlight(text: string, idx: number) { 
        
               for (const line of text.split('\n')) { 
        
                   // TODO filter too short strings? or maybe only pick the longest one? 
        
                   const found = findText(unwrap(doc.body), line); 
        
                   if (found == null) { 
        
                       console.debug('No match found for %s', line); 
        
                       continue; 
        
                   } 
        
                   console.debug("highlighting %o %s", found, line); 
        
                   // $FlowFixMe 
        
                   const target: HTMLElement = unwrap(found.nodeType == Node.TEXT_NODE ? found.parentElement : found); 
        
                   if (target.classList.contains('toastify')) { 
        
                       // TODO hacky... 
        
                       continue; 
        
                   } 
        
                   // TODO why doesn't flow warn about this?? 
        
                   // target.name === 'body' 
        
                   if (target === doc.body) { 
        
                       // meh, but otherwise too spammy 
        
                       console.warn('body matched for highlight; skipping it'); 
        
                       continue; 
        
                   } 
        
                   target.classList.add('promnesia-highlight'); 
        
                   const ref = doc.createElement('span'); 
        
                   ref.classList.add('promnesia-highlight-reference'); 
        
                   ref.classList.add('nonselectable'); 
        
                   ref.appendChild(doc.createTextNode(String(idx))); 
        
                   target.insertAdjacentElement('beforeend', ref); 
        
               } 
        
           }

This specific thing might not be too hard to fix (e.g. find the exact bit to be highlighted, split the element in to spans, etc), but I've tried not to invest too much time in it so far because it's very hard to get it right (and with good performance), so ultimately I hope to collaborate/borrow code from Hypothes.is or Worldbrain Memex!

clintgibler · 2020-07-11T20:51:21Z

Ah nice, thanks for the link to the implementation, that's interesting. I agree, doing this "well" and with good performance seems hard. +1 re: punting and borrowing from Hypothes.is / Worldbrain Memex 👍

Another thing I noticed is that sometimes only some of the sections would be highlighted, or only the first one (e.g. of 3 sections of copied text from a page, only the first in my notes file would be highlighted).

I was able to hack around that a bit by creating a new header for each block of text I wanted highlighted, and including the link to the web page under each header before the copied text.

I haven't poked at it in too much detail, it's Good Enough for now :)

karlicoss · 2020-11-05T05:03:14Z

More work on it here 9d21d18

OmarAshkar · 2021-04-23T04:10:51Z

Hi @karlicoss . I have an issue here when I tried the example above in an org file. Despite the URL is showing correctly, the highlight is not showing despite the exact text!

karlicoss · 2021-04-24T07:06:53Z

Hi @OmarAshkar can you share a snippet of org-mode and the website you clipped it from? So I could try to debug

OmarAshkar · 2021-04-24T13:43:24Z

@karlicoss I just used this same snippet with clipped text from the home page.

* Highlights
https://github.com/karlicoss/promnesia

This is unlike most modern browsers, where you can only see when you visited the
link.

I am using doom emacs and firefox on ubuntu.

karlicoss · 2021-04-26T06:19:20Z

Ah indeed, thanks, same happens for me. I guess in principle, I should have the whole https://github.com/karlicoss/promnesia highlighted because I have this repository cloned and indexed, wheras it has a few gaps.

OmarAshkar · 2021-04-26T07:00:38Z

@karlicoss Oh, the whole page is to be highlighted! I thought only the clipped part will be highlighted like hypothesis.is and memex? Either way is working anyway!

karlicoss · 2021-04-26T07:05:05Z

Ah no, indeed it will ideally highlight the sentence/paragraph only.
I just meant that because I have the whole README.org file indexed by promnesia, it should match all content on the page for me (because technically it's all "clipped")

karlicoss added highlights performance labels Dec 27, 2019

karlicoss mentioned this issue Jul 11, 2020

User workflow documentation / understanding how components fit together #125

Open

karlicoss removed the highlights label Nov 20, 2020

karlicoss mentioned this issue Mar 23, 2022

How does highlighting work in promnesia ? #284

Open

karlicoss added the can-we-share? Can we reuse as much code with other projects as possible? label Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement highlights properly and efficently #30

implement highlights properly and efficently #30

karlicoss commented Dec 27, 2019

clintgibler commented Jul 11, 2020

karlicoss commented Jul 11, 2020 •

edited

Loading

clintgibler commented Jul 11, 2020

karlicoss commented Nov 5, 2020

OmarAshkar commented Apr 23, 2021

karlicoss commented Apr 24, 2021

OmarAshkar commented Apr 24, 2021

karlicoss commented Apr 26, 2021

OmarAshkar commented Apr 26, 2021

karlicoss commented Apr 26, 2021

implement highlights properly and efficently #30

implement highlights properly and efficently #30

Comments

karlicoss commented Dec 27, 2019

clintgibler commented Jul 11, 2020

karlicoss commented Jul 11, 2020 • edited Loading

clintgibler commented Jul 11, 2020

karlicoss commented Nov 5, 2020

OmarAshkar commented Apr 23, 2021

karlicoss commented Apr 24, 2021

OmarAshkar commented Apr 24, 2021

karlicoss commented Apr 26, 2021

OmarAshkar commented Apr 26, 2021

karlicoss commented Apr 26, 2021

karlicoss commented Jul 11, 2020 •

edited

Loading