Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recently Edited Thoughts do not fully merge #323

Closed
raineorshine opened this issue Feb 7, 2020 · 15 comments
Closed

Recently Edited Thoughts do not fully merge #323

raineorshine opened this issue Feb 7, 2020 · 15 comments
Assignees

Comments

@raineorshine
Copy link
Contributor

Single-pass merging leaves too many duplicate and adjacent thoughts in the Recently Edited Thoughts list. Given that the list is updated on edit, the solution must be performant, circa O(n).

Screen Shot 2020-02-07 at 9 17 24 AM

@shresthabijay
Copy link
Contributor

shresthabijay commented Feb 11, 2020

Yes indeed current implementation is leaving many duplicates. I will start by looking for potential edge case in current logic that may be causing this behavior.

@shresthabijay
Copy link
Contributor

shresthabijay commented Feb 11, 2020

After spending quality time I found a weird edge case that causes duplicates in the recently edited thoughts list. Earlier we only cared about merging two thoughts with majority subcontext but we failed to see that when a thought is edited and merge happens the resulting subcontext may already be in the list and there is a possibility that it can merge with two or more different unique thoughts from the list to give same subcontext . Also we failed to account for the relation of newly merged context with all the thoughts. It may be adjacent to some of the thoughts already available in the list. Thus resulting in duplicate and adjacent thoughts.

Here are very specific steps that reproduces the issue:

Let's say we have following tree :

  • A
    • B
      • E
    • C
      • F
    • D
      • G

At this point we will have recently edited list

  1. A / D / G
  2. A / C / F
  3. A / B / E
  4. A

Now on editing thought B to BH will result in merge of context 3. and A / BH and list will look like this:

  • A
    • BH
      • E
    • C
      • F
    • D
      • G

Recently edited list:

  1. A / BH
  2. A
  3. A / D / G
  4. A / C / F

Now on editing thought C to CI , A / CI will merge with context 1. to give A which is already in the list.

  • A
    • BH
      • E
    • CI
      • F
    • D
      • G

Recently edited list

  1. A
  2. A
  3. A / CI
  4. A / D / G

Currently this is the only edge case I found that causes duplicates. It is also the reason for having multiple adjacent thoughts in the list One quick solution to this problem is to find duplicates and remove them from the list but I don't think it is the best one because we have to iterate the list multiple times to find and filter duplicates. We must think of a better performant solution to this issue. Do you have anything in mind ?

@raineorshine
Copy link
Contributor Author

Thank you for the detailed discovery! That does help clarify the problem.

One option is to store recentlyEdited as an object:

{
  [contextToString(pathToContext(path))]: {
    path: ...,
    lastUpdated: ...
  }
  ...
}

This would naturally get rid of duplicates. We don't rely on the order of recentlyEdited anyway since lastUpdated is used.

(Note that I intentionally convert the path to a context for indexing purposes)

The alternative is the tree structure I suggested before, e.g.:

{
  A: {
    B: {
      E: {
        __terminus__: true,
        path: ...
        lastUpdated: ...
      }
    },
    C: {
      F: {
        __terminus__: true,
        path: ...
        lastUpdated: ...
      }
    },
    D: {
      G: {
        __terminus__: true,
        path: ...
        lastUpdated: ...
      }
    }
  }
}

This structure has the most flexibility, as is has not only O(depth) lookups, but O(depth) adjacent lookups. This would be a good choice if we wanted to not just remove duplicates, but remove adjacent thoughts (or majority subcontext).

@shresthabijay
Copy link
Contributor

shresthabijay commented Feb 13, 2020

I think it would be better to go with the last structure that helps to remove both duplicates and adjacent thoughts. Also I have a question about adjacent thoughts and majority subcontext. So whenever a thought is edited we check if there are any thought in the list that shares majority context with it. If so we merge it and then we check again to make sure if new context shares majority with other thoughts on the list. Merge it again if it shares majority context until the resulting context has no thoughts on the list that share majority context.

List:

  1. A.B.C
  2. A

When we add D inside B , A.B.D will merge with A.B.C , then resulting A.B that shares majority subcontext with A should now be merged to become only A right?

Please let me know if I am wrong about something.

Also in the current implementation I have not merged thoughts if they are not updated within 2hrs of each other. Should adjacent entries( that are next to each other ) that were updated after 2hrs of each other but share majority subcontext be merged ?

@raineorshine
Copy link
Contributor Author

I think it would be better to go with the last structure that helps to remove both duplicates and adjacent thoughts.

Great, I think this will pay off in the long run.

So whenever a thought is edited we check if there are any thought in the list that shares majority context with it. If so we merge it and then we check again to make sure if new context shares majority with other thoughts on the list. Merge it again if it shares majority context until the resulting context has no thoughts on the list that share majority context.

It would be great to find an O(m) solution, where m is the number of thoughts to merge. I may have to reconsider the "majority subcontext" concept. The question is, how to prevent a chain of merges from occurring? This should be analogous to the problem of how to insert an item into a sorted list. If the list is already sorted, it takes O(log(n)) time to insert the item, rather than O(n).

At the very least we should get rid of the 2hr rule. I don't think it's necessary.

Let me think about the rest of the question and get back to you.

@raineorshine
Copy link
Contributor Author

Thinking through the recently edited algorithm from scratch, based on an initial feel from the currently implemented algorithm and further analysis. (Other times I may ask you to do this kind of analysis, but today I am thinking about this a lot.)

Given the breadcrumb rendering, it is easy to navigate to an ancestor when a deeply nested thought (long path) is in the recently edited list. e.g. if A.B.C.D. is in the recently edited list, we can easily navigate to A, A.B., and A.B.C. Thus ancestors really do not need to be stored separately in the recently edited thoughts list. The object structure would allow O(depth) lookup by checking each ancestor iteratively, but unless I am mistaken I think recently edited descendants would still require the tree structure.

Conclusion 1: When a thought is edited, remove all ancestors from the recently edited list.

What happens with nearby thoughts that are not direct ancestors, e.g. siblings, uncles, cousins? e.g. if A.B.C.D is edited, what happens to recently edited:

  • A.B.C.X (sibling)
  • A.B.Y.Z (cousin)

The desired behavior, if I were to guess, is that #1 should be replaced by A.B.C.D because they share the same context (A.B.C) and #2 should remain. This should occur for more "distantly related cousins", e.g. A.B.C.D.E.F.G should absorb A.B.C.x.

Conclusion 2: When a thought is edited, it should be merged with any other children of its ancestors, that is, recently edited thoughts whose context is an ancestor of the edited thought. The longer should replace the shorter.

The only exception I can think to the above two is that eventually deeply nested thoughts should "expire" and be replaced by edited ancestors. e.g., imagine that A.B.C.D.E.F.G.H was edited and added to the recently edited list months ago. It may be perpetually kept alive by ancestor edits, e.g. A.B.C. After a certain amount of time without editing the longer, it should be replaced by the shorter if the shorter is edited.

Conclusion 3: Ancestors should replace descendants if the descendant has not been edited in a long time.

There may indeed be further iterations after additional real-world usage, but I think this makes more sense the majority subcontext approach and the tree structure will allow O(depth) lookups.

Looking forward to hearing your response and opinion regarding design, performance, and implementation effort.

@raineorshine
Copy link
Contributor Author

Hi! I'd prefer that we figure out what we're doing before doing any implementation. I don't want to log hours for fixing duplicates if it is based on a data structure we know we will not be using.

@shresthabijay
Copy link
Contributor

shresthabijay commented Feb 14, 2020

Sure. I agree on that. We should be clear about what we are doing before we start implementing it.

At first let's be sure of all the rules when merge should happen. From you previous comment I can list out following conditions for the merge. I am still not sure about some indirect ancestor relation.

  1. Remove all the direct ancestors. If the newly edited thought is direct ancestor of thoughts in the list then merge them to the longer context.

    A and A.B-> A.B , A.B.C.D.E and A.B.C. -> A.B.C.D.E

  2. In case when they are not direct ancestors
    direct siblings : A.B and A.C -> A.B if A.B is the newly edited.
    direct cousins : A.B.C.D and A.B.X.Y should not merge.
    direct uncles : A.B.C.D.E and A.B.C.X ? (I am not sure about this)

    also about cases like these:
    A.B.C and A.B.D.E ?
    A.B.C.F and A.B.D.E ?
    A.B.C and A.B.D.E.F.G.H ?

  3. Ancestors should replace descendants if the descendant has not been edited in a long time.
    (When we do this we need to make sure that the new ancestor context doesn't have direct
    sibling relation with other thoughts in the list)

After sorting out how we merge the context, we need to think of proper way of implementing this. I am thinking of a tree structure as you suggested where each leaf nodes represent the recently edited thought. When thought is edited we can directly find if the node already exists since we are using nested objects. And if thought is not already on the tree , we can find the deepest node in the tree it shares common subcontext with and then we only have to traverse it's sibling nodes and their depth to find their relationship with the new thought context. I will write down a detailed explanation about the structure with performance factor after we have decided on how we merge thoughts.

@raineorshine
Copy link
Contributor Author

raineorshine commented Feb 14, 2020

  1. Remove all the direct ancestors. If the newly edited thought is direct ancestor of thoughts in the list then merge them to the longer context.

Yes, but note that this is a subset of the next case and thus does not need to be handled separately.

  1. In case when they are not direct ancestors

The general description is the following: Given edited thought P, merge each thought Q in the recently edited list where subsetThoughts(P, contextOf(Q)) === true || subsetThoughts(Q, contextOf(P)) === true. Keep the longer of P and Q.

direct uncles : A.B.C.D.E and A.B.C.X

A.B.C.D.E

A.B.C and A.B.D.E ?

A.B.D.E

A.B.C.F and A.B.D.E ?

✗ do not merge

A.B.C and A.B.D.E.F.G.H ?

A.B.D.E.F.G.H

This should be done in O(depth), where depth is the length of the newly edited thought.

When thought is edited we can directly find if the node already exists since we are using nested objects. And if thought is not already on the tree , we can find the deepest node in the tree it shares common subcontext

Yes, that's right.

We only have to traverse it's sibling nodes and their depth to find their relationship with the new thought context.

I'm not sure about this yet. I think we can simply take contextOf(P) and look for direct ancestors. See 2 above.

@shresthabijay
Copy link
Contributor

shresthabijay commented Feb 16, 2020

Hi this is my plan for implementing this using tree structure.
In this data structure all the unique paths to leaf nodes represents the recently edited thoughts.

 tree = {
    '_ROOT_': {
      a: {
        b: {
          d: {
            e: {
              leaf: true,
              lastUpdated: ...,
              path:...
            }
          }
        },
        c: {
          f: {
            leaf: true,
            lastUpdated:...,
            path: ...
          },
        }
      }
    }
  }

We will have following operations on this structure.

  1. findDeepestCommonNode(tree,path)
    This function will search for deepest common node that given path and tree shares.
    For example: if path is ['a','b','d','g'] with given tree it will return common path['a','b','d'] and it's node value. It starts searching for available nodes from the bottom of the given path. It will have time complexity of O(depth) depth of the given path. If given path is already available in the tree it's doesn't iterate and returns the node without iterating.

  2. findAllLeafNodes = (tree, startingPath)
    Provided with startingPath i.e any available nodes path in the tree, it will return all the leafNodes i.e all the recently edited thoughts data who are descendants of that startingPath.
    It uses recursion to find leaf nodes directly from the any node i.e startingPath.

For example: if startingPath is ['_ROOT_'] with given it will return all the recently edited thoughts i.e `[{ lastUpdated : ... , path : ['a', 'b', 'd, 'e'] } , { lastUpdated : ... , path : ['a', 'c', 'f'] } ]

if startingPath is given ['ROOT,'a','c'] it will return `[{ lastUpdated : ... , path : ['a', 'c', 'f'] } ]

Its time complexity is O(n) where n is the number of nodes under the startingPath node in the tree.

So we can can use these operations on tree to do following:

  1. onNodeAdd(tree,oldPath,newPath)
    This function will handle existingThoughtChange
onNodeAdd(tree,oldPath,newPath){
const { node: commonNode, path: commonPath } = findDeepestCommonNode(tree, oldPath)
if ( commonNode && commonNode.leaf ) // unset oldPath from the tree and set newPath and update timestamp
else{
 const leafNodes = findAllLeafNodes(tree, commonPath)
/*** 
logic for itearing leaf nodes i.e descendants recently edited data and merging for direct and indirect relations or adding new recently edited thought to the tree .
***/

/***
when ancestors should replace descendants if the descendant has not been edited in a long time
then new shorter context could bring regression of sibling relations etc. so call 'onNodeAdd' recursively with shorter context as 'newPath'.
***/
}

Similarly we can use same operations to create functions for handling onExistingThoughtDelete and onExistingThoughtMove .

  1. We can use findAllLeafNodes with startingPath as ['_ROOT_'] for getting all the recently edited thoughts array and sort them based on 'lastUpdated' for sidebar.

What you think about this data structure ? If you have any queries let me know. I would love to discuss about which data structure to use and possible changes we need in our plan.

@raineorshine
Copy link
Contributor Author

Thanks for the detailed description!

  1. findDeepestCommonNode(tree,path)
    This function will search for deepest common node that given path and tree shares.

Great. Let's call this findTreeDeepestSubcontext.

I think that path should not be stored in the leaf node, but rather be left implicit, i.e. derived from traversing the tree. The reason for this is that it will make much easier the process of updating the tree after editing a context with many descendants.

e.g. if a is edited to a1, then all you you would need to do effectively is:

tree.root.a1 = tree.root.a
delete tree.root.a

If the path was stored explicitly, we would have to iterate over all descendants and update them when any ancestor was changed. This is indeed a fault in the current contextIndex architecture.

If given path is already available in the tree it's doesn't iterate and returns the node without iterating.

I don't see how it could find the given path without iterating.

  1. findAllLeafNodes = (tree, startingPath)

Let's call this findTreeDescendants.

@shresthabijay
Copy link
Contributor

shresthabijay commented Feb 17, 2020

I think that path should not be stored in the leaf node, but rather be left implicit, i.e. derived from traversing the tree. The reason for this is that it will make much easier the process of updating the tree after editing a context with many descendants.

If the path was stored explicitly, we would have to iterate over all descendants and update them when any ancestor was changed. This is indeed a fault in the current contextIndex architecture.

Yes I understand your view but we gonna need to update the lastUpdated field for each recently edited thoughts(i.e leaf nodes) too right?.

I don't see how it could find the given path without iterating.

If we want to find deepestSubcontext of a path in a tree, we first directly take the full path and convert into nested index and try to access tree. Since tree is just an nested object it will return a value if that path is available. If available no need to iterate the path , else we check for common subcontext by iterating the given path.

Something like this.

export const findTreeDeepestSubcontext = (tree, path) => {
// reducePathToIndex will take a path and returns a string of path separated by "." ['a','b','c'] -->'a.b.c'
// at(object,[...nestedIndexStrings]) helps access javascript nested object

  const availableNode = at(tree, [reducePathToIndex(path)])[0] 
  if (availableNode) return { node: availableNode, path }
// only iterate if the whole path is not already available in tree
  const pathIndex = path.findIndex((value, index) => at(tree, [reducePathToIndex(path.slice(0, path.length - index))])[0])
  return pathIndex > -1 ? { node: at(tree, [reducePathToIndex(path.slice(0, path.length - pathIndex))])[0], path: path.slice(0, path.length - pathIndex) } : {}
}

@raineorshine
Copy link
Contributor Author

Yes I understand your view but we gonna need to update the lastUpdated field for each recently edited thoughts(i.e leaf nodes) too right?.

Yes, we should store lastUpdated, but not path.

If we want to find deepestSubcontext of a path in a tree, we first directly take the full path and convert into nested index and try to access tree. Since tree is just an nested object it will return a value if that path is available.

I think we may have different understandings of "iterating". Accessing tree.a.b.c.d.e is technically O(depth) since there are depth property dereferences. Converting it into object accessor syntax doesn't make it O(1). This is not important though, as O(depth) is just fine for our purposes. Since it's the same time complexity though, I do think the reducePathToIndex is unnecessary.

@raineorshine
Copy link
Contributor Author

One thing always to keep in mind when designing systems is encapsulation: How can I minimize how much one module needs to know about the workings of another? This concept can be applied to the recently edited list. existingThoughtChange etc should not know about the implementation details of the recently edited list. The issue with findTreeDeepestSubcontext and findAllLeafNodes is that they unnecessarily expose, if only in name, the tree implementation. They may still be used internally. A better external interface would be:

recentlyEditedInsert(path)
recentlyEditedDelete(path)

Does that make sense?

@shresthabijay
Copy link
Contributor

One thing always to keep in mind when designing systems is encapsulation: How can I minimize how much one module needs to know about the workings of another? This concept can be applied to the recently edited list. existingThoughtChange etc should not know about the implementation details of the recently edited list. The issue with findTreeDeepestSubcontext and findAllLeafNodes is that they unnecessarily expose, if only in name, the tree implementation. They may still be used internally. A better external interface would be:

recentlyEditedInsert(path)
recentlyEditedDelete(path)

Does that make sense?

Yes it makes sense. I would keep that in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants