-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recently Edited Thoughts do not fully merge #323
Comments
Yes indeed current implementation is leaving many duplicates. I will start by looking for potential edge case in current logic that may be causing this behavior. |
After spending quality time I found a weird edge case that causes duplicates in the recently edited thoughts list. Earlier we only cared about merging two thoughts with majority Here are very specific steps that reproduces the issue: Let's say we have following tree :
At this point we will have recently edited list
Now on editing thought B to BH will result in merge of
Recently edited list:
Now on editing thought C to CI , A / CI will merge with
Recently edited list
Currently this is the only edge case I found that causes duplicates. It is also the reason for having multiple adjacent thoughts in the list One quick solution to this problem is to find duplicates and remove them from the list but I don't think it is the best one because we have to iterate the list multiple times to find and filter duplicates. We must think of a better performant solution to this issue. Do you have anything in mind ? |
Thank you for the detailed discovery! That does help clarify the problem. One option is to store {
[contextToString(pathToContext(path))]: {
path: ...,
lastUpdated: ...
}
...
} This would naturally get rid of duplicates. We don't rely on the order of (Note that I intentionally convert the path to a context for indexing purposes) The alternative is the tree structure I suggested before, e.g.: {
A: {
B: {
E: {
__terminus__: true,
path: ...
lastUpdated: ...
}
},
C: {
F: {
__terminus__: true,
path: ...
lastUpdated: ...
}
},
D: {
G: {
__terminus__: true,
path: ...
lastUpdated: ...
}
}
}
} This structure has the most flexibility, as is has not only O(depth) lookups, but O(depth) adjacent lookups. This would be a good choice if we wanted to not just remove duplicates, but remove adjacent thoughts (or majority subcontext). |
I think it would be better to go with the last structure that helps to remove both duplicates and adjacent thoughts. Also I have a question about adjacent thoughts and majority subcontext. So whenever a thought is edited we check if there are any thought in the list that shares majority context with it. If so we merge it and then we check again to make sure if new context shares majority with other thoughts on the list. Merge it again if it shares majority context until the resulting context has no thoughts on the list that share majority context. List:
When we add D inside B , A.B.D will merge with A.B.C , then resulting A.B that shares majority Please let me know if I am wrong about something. Also in the current implementation I have not merged thoughts if they are not updated within 2hrs of each other. Should adjacent entries( that are next to each other ) that were updated after 2hrs of each other but share majority |
Great, I think this will pay off in the long run.
It would be great to find an O(m) solution, where m is the number of thoughts to merge. I may have to reconsider the "majority subcontext" concept. The question is, how to prevent a chain of merges from occurring? This should be analogous to the problem of how to insert an item into a sorted list. If the list is already sorted, it takes O(log(n)) time to insert the item, rather than O(n). At the very least we should get rid of the 2hr rule. I don't think it's necessary. Let me think about the rest of the question and get back to you. |
Thinking through the recently edited algorithm from scratch, based on an initial feel from the currently implemented algorithm and further analysis. (Other times I may ask you to do this kind of analysis, but today I am thinking about this a lot.) Given the breadcrumb rendering, it is easy to navigate to an ancestor when a deeply nested thought (long path) is in the recently edited list. e.g. if Conclusion 1: When a thought is edited, remove all ancestors from the recently edited list. What happens with nearby thoughts that are not direct ancestors, e.g. siblings, uncles, cousins? e.g. if
The desired behavior, if I were to guess, is that #1 should be replaced by Conclusion 2: When a thought is edited, it should be merged with any other children of its ancestors, that is, recently edited thoughts whose context is an ancestor of the edited thought. The longer should replace the shorter. The only exception I can think to the above two is that eventually deeply nested thoughts should "expire" and be replaced by edited ancestors. e.g., imagine that Conclusion 3: Ancestors should replace descendants if the descendant has not been edited in a long time. There may indeed be further iterations after additional real-world usage, but I think this makes more sense the majority subcontext approach and the tree structure will allow O(depth) lookups. Looking forward to hearing your response and opinion regarding design, performance, and implementation effort. |
Hi! I'd prefer that we figure out what we're doing before doing any implementation. I don't want to log hours for fixing duplicates if it is based on a data structure we know we will not be using. |
Sure. I agree on that. We should be clear about what we are doing before we start implementing it. At first let's be sure of all the rules when merge should happen. From you previous comment I can list out following conditions for the merge. I am still not sure about some indirect ancestor relation.
After sorting out how we merge the context, we need to think of proper way of implementing this. I am thinking of a tree structure as you suggested where each leaf nodes represent the recently edited thought. When thought is edited we can directly find if the node already exists since we are using nested objects. And if thought is not already on the tree , we can find the deepest node in the tree it shares common |
Yes, but note that this is a subset of the next case and thus does not need to be handled separately.
The general description is the following: Given edited thought
✓
✓
✗ do not merge
✓ This should be done in
Yes, that's right.
I'm not sure about this yet. I think we can simply take |
Hi this is my plan for implementing this using tree structure. tree = {
'_ROOT_': {
a: {
b: {
d: {
e: {
leaf: true,
lastUpdated: ...,
path:...
}
}
},
c: {
f: {
leaf: true,
lastUpdated:...,
path: ...
},
}
}
}
} We will have following operations on this structure.
For example: if if Its time complexity is O(n) where n is the number of nodes under the So we can can use these operations on tree to do following:
onNodeAdd(tree,oldPath,newPath){
const { node: commonNode, path: commonPath } = findDeepestCommonNode(tree, oldPath)
if ( commonNode && commonNode.leaf ) // unset oldPath from the tree and set newPath and update timestamp
else{
const leafNodes = findAllLeafNodes(tree, commonPath)
/***
logic for itearing leaf nodes i.e descendants recently edited data and merging for direct and indirect relations or adding new recently edited thought to the tree .
***/
/***
when ancestors should replace descendants if the descendant has not been edited in a long time
then new shorter context could bring regression of sibling relations etc. so call 'onNodeAdd' recursively with shorter context as 'newPath'.
***/
} Similarly we can use same operations to create functions for handling
What you think about this data structure ? If you have any queries let me know. I would love to discuss about which data structure to use and possible changes we need in our plan. |
Thanks for the detailed description!
Great. Let's call this I think that e.g. if tree.root.a1 = tree.root.a
delete tree.root.a If the
I don't see how it could find the given path without iterating.
Let's call this |
Yes I understand your view but we gonna need to update the
If we want to find Something like this. export const findTreeDeepestSubcontext = (tree, path) => {
// reducePathToIndex will take a path and returns a string of path separated by "." ['a','b','c'] -->'a.b.c'
// at(object,[...nestedIndexStrings]) helps access javascript nested object
const availableNode = at(tree, [reducePathToIndex(path)])[0]
if (availableNode) return { node: availableNode, path }
// only iterate if the whole path is not already available in tree
const pathIndex = path.findIndex((value, index) => at(tree, [reducePathToIndex(path.slice(0, path.length - index))])[0])
return pathIndex > -1 ? { node: at(tree, [reducePathToIndex(path.slice(0, path.length - pathIndex))])[0], path: path.slice(0, path.length - pathIndex) } : {}
} |
Yes, we should store
I think we may have different understandings of "iterating". Accessing |
One thing always to keep in mind when designing systems is encapsulation: How can I minimize how much one module needs to know about the workings of another? This concept can be applied to the recently edited list. recentlyEditedInsert(path)
recentlyEditedDelete(path) Does that make sense? |
Yes it makes sense. I would keep that in mind. |
Single-pass merging leaves too many duplicate and adjacent thoughts in the Recently Edited Thoughts list. Given that the list is updated on edit, the solution must be performant, circa O(n).
The text was updated successfully, but these errors were encountered: