Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent Editing #495

Closed
raineorshine opened this issue Apr 1, 2020 · 47 comments · Fixed by #1349
Closed

Independent Editing #495

raineorshine opened this issue Apr 1, 2020 · 47 comments · Fixed by #1349
Labels
performance Improve performance refactor Refactor without changing behavior

Comments

@raineorshine
Copy link
Contributor

raineorshine commented Apr 1, 2020

Decouple thought identity from value so that thoughts can be edited without having to update all descendants.

  • a ← should be able to edit without changing all descendants
    • b

Migration strategy:

  1. Make Child.id required - This should still be done iteratively, as mentioned, since it has a large surface area. You can detect Child nodes that are missing an id and add them one by one.
  2. Convert Child.id from uuid to contextIndex key - Ultimately we need to unify Child.id and Parent.id. If we change Parent.id to a uuid to match Child.id, it will break hashContext. Thus we change Child.id to match Parent.id. They will be treated as separate universes at first, but this will make it easier to unify them later.
  3. Disable pull/push - We need to somehow limit the domain of the initial structural change since there are 20+ low-level functions. Disabling pull/push is the best seam I can see to split it along.
    (3b) Disable Context View - Here is another domain that can be disabled without affecting the initial conversion.
  4. Change Parent.id to an arbitrary uuid - This will change the thought structure and require changes in all low-level functions that are still enabled. Preserve their API to avoid any higher-level changes at this point. e.g. hashContext will traverse from the HOME context to the desired Parent and return its id instead of hashing the Context. (This will be slow. We can optimize it later by also accepting a Path, which will allow O(1) Parent lookup with head(path).id.)
  5. Remove Parent.context - We begin removing Context from State. We are anticipating that once Parent entries become stable (i.e. unchanged with edits and ancestors), Contexts for a given thought can change. Thus Context shifts from a unique key to an ad hoc structure that is eventually replaced with simpler and more efficient access by id. Doing this now will make it easier to refactor recursive functionality since Parents will no longer have any reference to their ancestors.
  6. Delete descendant updates - At this point, we can pretty much delete all the descendant update code in moveThought, editThought, and deleteThought, since they no longer depend on ancestors.
  7. Re-enable pull/push - All low-level push/pull functions will need to be updated to use the new state.thoughts structure.
  8. Convert Child to Parent or Parent.id - That is, wherever Child is used, replace it with a Parent.id (if persisted, e.g. Parent.children) or a direct reference to a Parent object (in memory business logic). Move Child.rank to Parent.rank.
  9. Convert ThoughtContext to Parent or Parent.id - Same as the previous steps, but for Lexeme.contexts.
  10. Refactor and optimize - We now have independent editing and unified Child, Parent, and ThoughtContext. We can begin refactoring and optimizing high-level functions which are performing some contortionist moves in order to access the new structure with their old logic. Luckily this can be done iteratively as they are not structural or functional changes. We can refactor systematically, or do some profiling to start with the most expensive functions.
@raineorshine raineorshine added the refactor Refactor without changing behavior label Apr 1, 2020
@raineorshine
Copy link
Contributor Author

raineorshine commented Apr 4, 2020

  • For O(1) update, descendants must be stored in a tree.
  • Therefore, hashed context groupings must be stored in a parallel structure.
    • The challenge is how to efficiently represent this structure since, e.g., a Path of length 5 entails 5 nested contexts.
  • Editing reflects this: changing a thought updates all descendants, i.e. they are all part of the same wider context, but does not update the thought or its descendants of the thought in a different context.

@raineorshine
Copy link
Contributor Author

Might be worth doing a survey of all references to thoughtIndex and contextIndex to see the affected surface area as well as better understand how they are being used.

@raineorshine
Copy link
Contributor Author

A contextIndex entry has to be indexed by a single, stable id rather than a hashed context, since an ancestor may change. Thus we can rule out wrapping hashContext with logic to lookup the stable is since its input (Context) is still composite. We’re looking at complete removal of hashContext and refactoring all call sites to use available information to reconstruct the Path if needed.

@raineorshine
Copy link
Contributor Author

raineorshine commented May 17, 2021

Thought structure requirements:

  • Forward links (Parent)
  • Back links (Lexeme)
  • Context overlap (a/b/c == b/c/a)

Context overlap is the main difference from traditional graph structures. How do we add this functionality without adding an ancestor dependency and thus O(depth) edits?

@raineorshine
Copy link
Contributor Author

raineorshine commented Jun 25, 2021

I'm willing to give up Context overlap and use a more straightforward graph structure. This should make the implementation fairly straightforward, though still large.

  1. Ensure every Child has an id and ids are preserved across edits.
  2. Index Parents (in the contextIndex) by id rather than hashContext.
  3. Use ids to look up children instead of Contexts.

@shresthabijay
Copy link
Contributor

A contextIndex entry has to be indexed by a single, stable id rather than a hashed context, since an ancestor may change. Thus we can rule out wrapping hashContext with logic to lookup the stable is since its input (Context) is still composite. We’re looking at complete removal of hashContext and refactoring all call sites to use available information to reconstruct the Path if needed.

I agree. I also came to a similar conclusion. If we don't want any recursive updates, we should completely remove the association of a context with thought too. A thought should only know its value and its children thought ids.

@shresthabijay
Copy link
Contributor

thoughtIndex:

{
  value: test_id,
  contexts: [
    { context: [a_id], rank: 0 },
    { context: [root_id], rank: 1 }
  ]
}

e.g. Contexts will be arrays of ids instead of arrays of values

@raineorshine What about move operation? We would still have to recursively update all the lexeme related to descendants after a single move operation. Are we only scoping this to edit operation?

@raineorshine
Copy link
Contributor Author

We're changing the core thought structure, so it will affect all operations.

Lexemes will point to a single Parent in the contextIndex rather than a Context.

@raineorshine
Copy link
Contributor Author

raineorshine commented Jun 27, 2021

I was thinking we should have a clear plan of action before starting an implementation.

The goal is to transition the contextIndex from indexing by Context to indexing by unique ids. This will allow thoughts to change without affecting descendants. Child thoughts will belong to other Child thoughts (by id) rather than belonging to Contexts. Child, Parent, and ThoughtContext will effectively all become the same thing (an entry in the contextIndex).

Here are some changes I believe will need to be made:

  • When creating a new thought, use a uuid as the new contextIndex key instead of hashing the Context.
  • Change Parent.context to Parent.value and add rank.
  • Change Child.id to point to the corresponding Parent in the contextIndex.
  • Change Lexeme.contexts to an array of Parent ids. We can rename the field now or at a later time if that's easier.
  • Remove ThoughtContext.
  • Rename hashContext to findContextId which will traverse the given Context starting at the root and return the id of the corresponding Parent. This provides a more gradual migration path to using only ids. Most callers of hashContext are just trying to look up the id of a Context, so this should be a drop-in replacement in most cases.

That is a large number of interrelated changes. I'm not sure how to divide-and-conquer.

We should completely defer IndexedDB and Firebase changes until it works in Redux state only.

Looking forward to hearing your thoughts!

@shresthabijay
Copy link
Contributor

I was thinking we should have a clear plan of action before starting an implementation.

The goal is to transition the contextIndex from indexing by Context to indexing by unique ids. This will allow thoughts to change without affecting descendants. Child thoughts will belong to other Child thoughts (by id) rather than belonging to Contexts. Child, Parent, and ThoughtContext will effectively all become the same thing (an entry in the contextIndex).

Here are some changes I believe will need to be made:

  • When creating a new thought, use a uuid as the new contextIndex key instead of hashing the Context.
  • Change Parent.context to Parent.value and add rank.
  • Change Child.id to point to the corresponding Parent in the contextIndex.
  • Change Lexeme.contexts to an array of Parent ids. We can rename the field now or at a later time if that's easier.
  • Remove ThoughtContext.
  • Rename hashContext to findContextId which will traverse the given Context starting at the root and return the id of the corresponding Parent. This provides a more gradual migration path to using only ids. Most callers of hashContext are just trying to look up the id of a Context, so this should be a drop-in replacement in most cases.

That is a large number of interrelated changes. I'm not sure how to divide-and-conquer.

We should completely defer IndexedDB and Firebase changes until it works in Redux state only.

Looking forward to hearing your thoughts!

Thanks for the details. I agree with all of these changes. I would like to add another change that seems necessary. We also need to generate context from a thoughtId. We need to show the context of thoughts when the context view is active.

One way would be to add parentId property to the Parent. Given a thoughtId we can get the Parent and traverse the way up to the root based on parentId to generate context. Association of thought with its immediate parent thought id would not introduce any necessity of recursive updates as well. We would need to update it only on move operation. But still, we would have complexity to first pull all necessary parent data to successfully traverse up.

@raineorshine
Copy link
Contributor Author

Thanks for the details. I agree with all of these changes. I would like to add another change that seems necessary. We also need to generate context from a thoughtId. We need to show the context of thoughts when the context view is active.

The context view shows all the contexts that the corresponding Lexeme appears in. We just take Child.value (or Parent.value since they will be the same) and hash it to look up the Lexeme from the thoughtIndex (as we do currently). Lexeme.contexts will contain a list of Parent ids that the Lexeme appears in, rather than a list of ThoughtContexts.

One way would be to add parentId property to the Parent. Given a thoughtId we can get the Parent and traverse the way up to the root based on parentId to generate context.

Thoughts can have multiple parents.

The visible parent of a thought is a product of the navigation path. If you get to a thought through a different path, it will have a different parent. This is why we build up the path prop as we move down the tree of Subthoughts. The view is a tree, but the underlying data structure is a graph.

Association of thought with its immediate parent thought id would not introduce any necessity of recursive updates as well. We would need to update it only on move operation. But still, we would have complexity to first pull all necessary parent data to successfully traverse up.

Correct, we need to update Parent.children and Lexeme.contexts on move. But it will no longer be recursive since they will just contain ids rather than full Contexts.

@shresthabijay
Copy link
Contributor

Thoughts can have multiple parents.

The visible parent of a thought is a product of the navigation path. If you get to a thought through a different path, it will have a different parent. This is why we build up the path prop as we move down the tree of Subthoughts. The view is a tree, but the underlying data structure is a graph.

The way we can get a thought in different paths is by using context view right ?

@shresthabijay
Copy link
Contributor

@raineorshine In the current implementation, we also show where the lexemes appeared in different contexts in ContextBreadcrumb.

  • a (~ Active context View)
  • b/c/a
  • e/f/a

Since now Lexeme.contexts will contain a list of Parent ids that the Lexeme appears in, rather than a list of ThoughtContexts, so we cannot show ContextBreadcrumbs. We cannot easily generate full context from ParentId alone, and can only show Parent value. But I don't think it would be as meaningful and semantic to not know the context where the lexeme appears. I am a little confused here.

@raineorshine
Copy link
Contributor Author

Yeah, that's less than ideal. The Context View is less useful that way. But we can't save a list of Contexts or Paths without creating recursive edits.

@shresthabijay
Copy link
Contributor

Yeah, that's less than ideal. The Context View is less useful that way. But we can't save a list of Contexts or Paths without creating recursive edits.

Yes, we cannot prevent recursive edits and also save a list of contexts. We should probably think about how this will affect the context view and sense-making.

@raineorshine
Copy link
Contributor Author

raineorshine commented Jul 4, 2021

@shresthabijay After devising and scrapping several complicated solutions that never worked quite right, I now have evidence that there is a very simple solution... or evidence that my brain does not work properly after a certain hour 😅. Please ask questions and let me know if I am missing something obvious!

Proposed Solution

Thoughts = Doubly linked tree
Lexemes = Sets of thoughts

  • Every thought has zero or more child thoughts.
  • Every thought has a single parent thought.
  • Each thought's "context" can be determined by traversing parents up to the root in O(depth).
  • Lexemes are deterministic sets of thoughts with the membership function hashThought.

When a thought is edited...

  • its value changes
  • it moves from one Lexeme to another
    • Lexemes are hashed thoughts, so we can look up the old Lexeme and the new Lexeme from the old value and the new value respectively in O(1)

When a thought is moved...

  • it is removed from its old parent’s children
    • A thought has a reference to its parent, so the old parent can be looked up in O(1)
  • it is added to its new parent’s children
    • The destination thought will also have a reference to its parent, so the new parent can be looked up in O(1)
  • it is updated to point to its new parent
    • The thought’s parent is held in a single property, so it can be changed in O(1)

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 4, 2021

@shresthabijay After devising and scrapping several complicated solutions that never worked quite right, I now have evidence that there is a very simple solution... or evidence that my brain does not work properly after a certain hour 😅. Please ask questions and let me know if I am missing something obvious!

Proposed Solution

Thoughts = Doubly linked tree
Lexemes = Sets of thoughts

  • Every thought has zero or more child thoughts.
  • Every thought has a single parent thought.
  • Each thought's "context" can be determined by traversing parents up to the root in O(depth).
  • Lexemes are deterministic sets of thoughts with the membership function hashThought.

@raineorshine I mentioned a similar solution up in the discussion. However, I didn't mention it as a doubly-linked tree.

One way would be to add parentId property to the Parent. Given a thoughtId we can get the Parent and traverse the way up to the root based on parentId to generate context. Association of thought with its immediate parent thought id would not introduce any necessity of recursive updates as well. We would need to update it only on move operation. But still, we would have complexity to first pull all necessary parent data to successfully traverse up.

As I mentioned above, the problem with this solution is that we may not have all the thoughts already loaded to traverse up the doubly-linked tree on the active context view. We will have to recursively traverse and pull from the remote or local source if we encounter any thought whose parent thought has not been loaded yet. Apart from that, I think this solution is fit for independent editing.

@raineorshine
Copy link
Contributor Author

@raineorshine I mentioned a similar solution up in the discussion. However, I didn't mention it as a doubly-linked tree.

Sorry, I was a little slow! I don't think I understood yet that Lexemes would handle multiple parents separately.

As I mentioned above, the problem with this solution is that we may not have all the thoughts already loaded to traverse up the doubly-linked tree on the active context view. We will have to recursively traverse and pull from the remote or local source if we encounter any thought whose parent thought has not been loaded yet.

I think that is an acceptable cost. A thought that appears in n contexts will take O(n * depth) to pull the complete contexts. That seems like a lot, but it is amenable to optimization: they can be preloaded and/or loaded iteratively. And what we gain is O(1) edits and moves, which is a massive improvement.

Apart from that, I think this solution is fit for independent editing.

I am so relieved. It's a huge change, but it really seems like the right solution.

We need to come up with a smart migration strategy to avoid trying to convert the app all at once, which would be a nearly impossible task.

  1. Most of the reducers use the basic building blocks of newThought, editThought, moveThought, and updateThoughts, to change the state of thoughts. If we can fully decouple the higher level commands from the low-level thought structure, we should be able to leave most of the reducers intact. That's sort of been the idea the whole time. Any places where we cut corners and access state.thoughts.contextIndex or state.thoughts.thoughtIndex directly violate this separation of concerns and need to be updated.
  2. Once the low level structure is isolated, we can change the structure of contextIndex and thoughtIndex and only have to update the low level functionality that touches them.
  3. Then we can iteratively replace Context-based functions (with the added layer of traversals to find ids) with ID-based functions that can access thoughts directly.
  4. We'll need to tighten up our use of ids on thoughts. They are still defined as optional, and I know there are some places where they don't get persisted properly after certain transformations. The new linked-list structure will depend on every thought having a stable id.

I haven't thought yet about how this affects all the pull logic. Presumably we'll want to get it working with Redux state first, but we should at least think ahead to have a rough idea of what we're getting into.

Thoughts?

@shresthabijay
Copy link
Contributor

I think that is an acceptable cost. A thought that appears in n contexts will take O(n * depth) to pull the complete contexts. That seems like a lot, but it is amenable to optimization: they can be preloaded and/or loaded iteratively. And what we gain is O(1) edits and moves, which is a massive improvement.

I completely agree.

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 5, 2021

I haven't thought yet about how this affects all the pull logic. Presumably we'll want to get it working with Redux state first, but we should at least think ahead to have a rough idea of what we're getting into.

I think in brief we will have to do the following things in the pull:

  • We will have to use thoughtIds instead of contextMap to pull the thoughts.
  • Then we have to pull lexemes for the pulled thoughts like before. But this time lexemes have thoughtIds instead of values, so we will have to pull those thoughts that haven't been pulled yet.

I am not sure where we will pull the context for the thoughtIds in the lexeme.

@shresthabijay
Copy link
Contributor

Any places where we cut corners and access state.thoughts.contextIndex or state.thoughts.thoughtIndex directly violate this separation of concerns and need to be updated.

I am still lacking a clear picture of the whole migration. Should we start by identifying such places ?

@shresthabijay
Copy link
Contributor

  • SearchSubthoughts (logic should probably be pulled out into a selector)

I too think we should abstract away any component logic that breaks the separation of concern into a separate selector or hook.

@raineorshine
Copy link
Contributor Author

raineorshine commented Jul 10, 2021

@raineorshine Apart from the mentioned functions, these are the remaining places where thoughts from the state are being accessed.

Interesting! I guess I expected state.thoughts to be more widely accessed than that. I guess we've done a pretty good job separating out low-level and high-level functions already. And thank god for reducerFlow.

  • TutorialNavigationNext: contextIndex is being accessed in the mapStateToProps but is not being used for anything else. I believe it's for rerendering the component. I don't know if we need this dependency. Please let me know if you do.

Yes, that needs to either be removed or replaced with a narrower slice. The Tutorial will need to be tested thoroughly with this change; we don't have any test coverage there. I would recommend ignoring this for now. It doesn't actually depend on any internal structure of the contextIndex.

  • pullQueue: Here it is being used for appendVisibleContexts to access the parent from the hashed context key. We can add a selector called getParentFromKey to abstract away the accessing of state.thoughts.contextIndex[key]. We could use this wherever the parent is being accessed directly using state.thoughts.contextIndex

Yes, that's a good intermediate abstraction. Ideally we abstract away the Parent object itself when possible. If the Parent structure changes, things will break. e.g. getAllChildren and isPending hide the structure of Parent. We won't be using hashContext once we key the contextIndex by Child id. Then there won't be an additional performance hit for calling getAllChildren, isPending, etc, that repeat their object access. We can also memoize selectors to optimize.

In summary, I would suggest using selectors like getAllChildren and isPending that do not expose the Parent object, and only use getParentFromKey if you have to. In appendVisibleContexts, you can use a combination of hasLexeme (formerly exists) and isPending.

  • getWhitelistedThoughts : Since it returns ThoughtIndices, It should remain low-level function.

Yes. Notably, it doesn't rely on the internal structure of contextIndex or thoughtIndex.

I don't think we can factor out the low-level functionalities from the above-mentioned low-level functions. I was thinking if we could remove it from pull but it too has to pass updates to updateThoughts and reconcile., so it is not possible.

Yeah, makes sense. As long as we are doing syncing and reconciliation ourselves, those functions will have to access to the low-level thought structure.

@raineorshine
Copy link
Contributor Author

raineorshine commented Jul 10, 2021

Decoupling state.thoughts was going to be our first step, but I guess it's not needed. So, what now? We need some way to break this task down into smaller pieces, a migration strategy, as we discussed. I'm willing to add some extra work to divide-and-conquer and avoid large-scale changes during which it won't compile. We will judge when such tradeoffs are worth it, and when we have to bite the bullet and tackle a large, atomic change.

Here were some of the changes mentioned earlier, but they are not presented in a sequential or comprehensive manner:

  • When creating a new thought, use a uuid as the new contextIndex key instead of hashing the Context.
  • Change Parent.context to Parent.value and add rank.
  • Change Child.id to point to the corresponding Parent in the contextIndex.
  • Change Lexeme.contexts to an array of Parent ids. We can rename the field now or at a later time if that's easier.
  • Remove ThoughtContext.
  • Rename hashContext to findContextId which will traverse the given Context starting at the root and return the id of the corresponding Parent. This provides a more gradual migration path to using only ids. Most callers of hashContext are just trying to look up the id of a Context, so this should be a drop-in replacement in most cases.

That is a large number of interrelated changes. I'm not sure how to divide-and-conquer.

Here's my attempt at a more linear and comprehensive migration strategy:

(Moved to issue description)

Open to feedback 😅

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 10, 2021

Here's my attempt at a more linear and comprehensive migration strategy:

It's a very good strategy. I would like to add something. After Child is converted to probably Parent.id we would have to change Path to be array of Parent id's.

Previously we would have path conflicts in such situation and we needed rank in the Path.

  • a
    • b
      • c (~)
        • b
        • b
        • b
  • k
    • b
      • c
  • m
    • b
      • c

As you can see have three exact same paths a.b.c.b

Not anymore after independent editing! We won't need rank in the Path because even with nested context view we will not have conflict in the path since every thought is now uniquely identified by an id. And contextViews key would be the hash of Path itself. Please let me know if I am wrong about this.

Also we would not need lastThoughtsFromContextChain and depend on SimplePath to get the thought we want to update. Just take the head of the Path and we have access to the exact Parent we are looking for. Exciting 😁

@raineorshine
Copy link
Contributor Author

That's right! Some things get so much easier.

Note that we may not want to change the type of Path itself, as it would force a refactoring of a large number of instances (244, at this time) all at once. Instead, we would probably want to introduce a different type, say ThoughtPath, that consisted of ids, then gradually port Path over to ThoughtPath. This technique is an important part of the overall migration strategy to avoid getting stuck in a large atomic change.

@shresthabijay
Copy link
Contributor

2. If we change Parent.id to a uuid to match Child.id, it will break hashContext

Do you mean we change contextIndex to have key represented by uuid instead of hashed context here? Because with current implementation just by adding arbitrary id to the Parent won't make any difference because id is not being using to find Parent anywhere.

@shresthabijay
Copy link
Contributor

4. Change Parent.id to an arbitrary uuid - This will change the thought structure and require changes in all low-level functions that are still enabled. Preserve their API to avoid any higher-level changes at this point. e.g. hashContext will traverse from the HOME context to the desired Parent and return its id instead of hashing the Context. (This will be slow. We can optimize it later by also accepting a Path, which will allow O(1) Parent lookup with head(path).id.)

I wanted to confirm this. This is how I understand this step. We are changing the key of contextIndex from being a hash of a context to the id of the Parent. So now hashContext needs to do tree traversal to find the path and return the thought id associated with it.

@raineorshine
Copy link
Contributor Author

  1. If we change Parent.id to a uuid to match Child.id, it will break hashContext

Do you mean we change contextIndex to have key represented by uuid instead of hashed context here? Because with current implementation just by adding arbitrary id to the Parent won't make any difference because id is not being using to find Parent anywhere.

In Step 2, I was suggesting that we change the Child.id to point to the contextIndex, however I see now that this isn't necessary. We will be removing Child objects completely in Step 8. I have crossed this step out.

  1. Change Parent.id to an arbitrary uuid - This will change the thought structure and require changes in all low-level functions that are still enabled. Preserve their API to avoid any higher-level changes at this point. e.g. hashContext will traverse from the HOME context to the desired Parent and return its id instead of hashing the Context. (This will be slow. We can optimize it later by also accepting a Path, which will allow O(1) Parent lookup with head(path).id.)

I wanted to confirm this. This is how I understand this step. We are changing the key of contextIndex from being a hash of a context to the id of the Parent. So now hashContext needs to do tree traversal to find the path and return the thought id associated with it.

Yes, exactly right.

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 11, 2021

  1. Change Parent.id to an arbitrary uuid - This will change the thought structure and require changes in all low-level functions that are still enabled. Preserve their API to avoid any higher-level changes at this point. e.g. hashContext will traverse from the HOME context to the desired Parent and return its id instead of hashing the Context. (This will be slow. We can optimize it later by also accepting a Path, which will allow O(1) Parent lookup with head(path).id.)

I wanted to confirm this. This is how I understand this step. We are changing the key of contextIndex from being a hash of a context to the id of the Parent. So now hashContext needs to do tree traversal to find the path and return the thought id associated with it.

Yes, exactly right.

@raineorshine After we ditch step 2, then step 4 would not be possible. If Child.id is not unified with Parent.id traversing the tree just from contextIndex with help of id won't be possible.

Let's say we do both steps 2 and 4.

  • Step 2: Requires adding an id to every path by hashing thought context.
  • Step 4: Requires changing hashContext to return thought id from tree traversal. (Also have to handle cases with active context view because path with active context view won't have entry in contextIndex)

Then Somewhere down the line, we have the following changes:

  • Change Parent.children from Child[] to be something like this { thoughtId, rank }[].
  • Change Path to be an ordered array of thought id.

In these steps we have to replace every change of step 2 by making child id to be head(path) and changes made in step 4 will no longer be required too. Also, all the low-level reducers need to be changed here anyway.

I am thinking we should directly change make these changes instead of working on Step 2 and step 4. What do you think?

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 12, 2021

@raineorshine Iterative changing of structures is causing confusion on migration. Specially fixing logic. Since much low-level logic is highly dependent on these structures I think it would be easy to first change the Parent, Path and Lexeme types at once, then deal with low-level changes. There could be possible changes in API. Most low-level functions that take Context param may have to be changed by the thoughtId. This approach will make it easy to think about the logical changes that need to be made. After that, we can see the effect on the high-level functions and make changes there. This is a wild change from the previous approach though.

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 12, 2021

@raineorshine Iterative changing of structures is causing confusion on migration. Specially fixing logic. Since much low-level logic is highly dependent on these structures I think it would be easy to first change the Parent, Path and Lexeme types at once, then deal with low-level changes. There could be possible changes in API. Most low-level functions that take Context param may have to be changed by the thoughtId. This approach will make it easy to think about the logical changes that need to be made. After that, we can see the effect on the high-level functions and make changes there. This is a wild change from the previous approach though.

I will try to explain my reasoning. At some point we will have to Change Path from being Child[] to ThoughtId[] and we will have to use ThoughtId to get Parent from contextIndex. Even though low-level reducers encapsulate most of the direct access of state.thoughts but structure of Path and Context is known to most part of the codebase. Most logic depends on this fundamental structure. So once we change the structure all these areas will get affected at once.

But I am worried about all at once migration. So I will change the structures and only fix getChildren which is the most used low level function used by most business logic that knows about Path and Context.

@raineorshine
Copy link
Contributor Author

raineorshine commented Jul 12, 2021

@raineorshine After we ditch step 2, then step 4 would not be possible. If Child.id is not unified with Parent.id traversing the tree just from contextIndex with help of id won't be possible.

Ah, good point! I guess we can't delete Step 2. It won't have an effect at first, but it is necessary for the new hashContext traversal logic.

Let's say we do both steps 2 and 4.

  • Step 2: Requires adding an id to every path by hashing thought context.

Since type Path = Child[], then Paths will get ids in Step 4. Step 2 is only changing the Child.id to key into the contextIndex.

  • Step 4: Requires changing hashContext to return thought id from tree traversal.

Yes

  • (Also have to handle cases with active context view because path with active context view won't have entry in contextIndex)

I didn't include it in the migration strategy, but I think we should bracket Context View functionality initially until we get Child and Parent unified. Then we can add Context View functionality back in when the normal view is working.

  • Change Parent.children from Child[] to be something like this { thoughtId, rank }[].

I don't think this is necessary once we move rank to Parent? Our current model doesn't allow thoughts to have more than one Parent in the contextIndex. Multiple parents are done through the thoughtIndex.

  • Change Path to be an ordered array of thought id.

This is Step 8.

In these steps we have to replace every change of step 2 by making child id to be head(path) and changes made in step 4 will no longer be required too. Also, all the low-level reducers need to be changed here anyway.

This will depend on the { thoughtId, rank }[] question above.

Regarding changing Path to be an ordered array of thought id, I think it's quite a bit better to do it as a later optimization. Path is everywhere throughout the app. Since the lower-level functions are decoupled, we only need to change the lower-level functions to get a working app with the new thought structure. Changing the structure of Path affects every level of abstraction, and must be done all at once. A temporary parallel type of ThoughtPath allows an easier and less risky gradual migration of the Path type, as suggested in the comment above.

I am thinking we should directly change make these changes instead of working on Step 2 and step 4. What do you think?

My main goal with the iterative approach is to reduce the risk of making this large architectural change. There is the risk of hitting a dead-end and the risk of just getting stuck or slowing down too much when there many changes made at once. The combined approach reduces the total work while increasing risk and decreasing transparency. More changes that have to be done in single batches means less opportunity to define milestones and evaluate progress, and thus less transparency. Adding more work in order to decrease risk and gain transparency feels like a really good tradeoff to me with this kind of thing. I'm not convinced that the total amount of work is the most important thing to optimize for.

@raineorshine
Copy link
Contributor Author

raineorshine commented Jul 12, 2021

@raineorshine Iterative changing of structures is causing confusion on migration. Specially fixing logic. Since much low-level logic is highly dependent on these structures I think it would be easy to first change the Parent, Path and Lexeme types at once, then deal with low-level changes.

I agree we need to make sure there's no confusion in our approach.

  • Parent and Lexeme are low-level types
    • Parent has 73 references
    • Lexeme has 85 references
  • Path is a high-level type
    • Path has 244 references

Using this above approximation of coupling patterns, we can see that Path is a more widespread dependency than Parent or Lexeme combined. I don't see how changing Path would be easier; it has way more surface area than the others.

Abstraction barriers allow us to insulate changes at a low-level from a higher-level. To me, it makes sense to first change the smaller number of low-level functions without affecting the large number of higher-level functions.

Most low-level functions that take Context param may have to be changed by the thoughtId.

I don't think we want to change the API for low-level functions all at once. We can wrap the new thought structure so that consumers (higher-level functions) can call the low-level functions as-is. Then we can introduce temporary parallel types to migrate higher-level functions gradually.

This approach will make it easy to think about the logical changes that need to be made. After that, we can see the effect on the high-level functions and make changes there.

I prefer this approach for small-to-medium sized changes and I understand the appeal. Having Typescript makes it very easy to take a "type-driven" approach. Change a type and watch it ripple throughout the system. Fix each type error and you'll have a working app again.

I don't think this approach works as well for large changes with nontrivial changes at 100's of call sites. The problem is that the app won't compile while you're making all these changes to fix the type. I think 4-5 hours of work is about the limit for getting from A to B when it doesn't compile in between. Your feedback loop goes down, and you're working for long stretches with a mental model of the application. A long list of type errors may give you obvious next steps, but they can easily hide problems that may go unseen until the end. You can refactor 90% only to realize there is something that doesn't work in the last 10%.

I think a "stepping stone" approach with multiple stopping points is a better approach for large-scale architectural changes. Temporary parallel types are another technique that allows for gradual change rather than all-at-once change.

Typescript differentiated itself from other strongly typed systems by adding any which allowed gradual transition of Javascript codebases. This decision was a big reason for its success. If you've converted any JS codebases to TS you know how nice it is to convert any gradually over time rather than having to do it at once.

I will try to explain my reasoning. At some point we will have to Change Path from being Child[] to ThoughtId[] and we will have to use ThoughtId to get Parent from contextIndex. Even though low-level reducers encapsulate most of the direct access of state.thoughts but structure of Path and Context is known to most part of the codebase. Most logic depends on this fundamental structure. So once we change the structure all these areas will get affected at once.

We have the same goal in mind. Since the structure of Path and Context is used by most parts of the codebase, we want to avoid changing its API all at once. We want to wean off the dependencies before changing the core types.

But I am worried about all at once migration. So I will change the structures and only fix getChildren which is the most used low level function used by most business logic that knows about Path and Context.

Do you mean change the internal implementation of getChildren, or change the API? And at what step in the migration process? If you can describe or show me the kind of change you have in mind here, I can better consider how it fits into the migration.

@shresthabijay
Copy link
Contributor

I prefer this approach for small-to-medium sized changes and I understand the appeal. Having Typescript makes it very easy to take a "type-driven" approach. Change a type and watch it ripple throughout the system. Fix each type error and you'll have a working app again.

I don't think this approach works as well for large changes with nontrivial changes at 100's of call sites. The problem is that the app won't compile while you're making all these changes to fix the type. I think 4-5 hours of work is about the limit for getting from A to B when it doesn't compile in between. Your feedback loop goes down, and you're working for long stretches with a mental model of the application. A long list of type errors may give you obvious next steps, but they can easily hide problems that may go unseen until the end. You can refactor 90% only to realize there is something that doesn't work in the last 10%.

I agree. I can see the problem with this approach now. It would be tedious to do and everything can go to waste if it doesn't work in the end. Thanks!

@shresthabijay
Copy link
Contributor

Do you mean change the internal implementation of getChildren, or change the API? And at what step in the migration process? If you can describe or show me the kind of change you have in mind here, I can better consider how it fits into the migration.

I tried this just with getChildren and it has cascading effects all over the codebase. So it doesn't work with gradual migration at all.

@raineorshine
Copy link
Contributor Author

Cool, good to know. We're getting closer!

@shresthabijay
Copy link
Contributor

shresthabijay commented Jul 18, 2021

4. hashContext will traverse from the HOME context to the desired Parent and return its id instead of hashing the Context. (This will be slow. We can optimize it later by also accepting a Path, which will allow O(1) Parent lookup with head(path).id.)

For this, we may need to change the API of hashContext. We need to pass state to hashContext. It will be a selector.

Also in many places hashContext is used to generate key for contextIndex. If the Parent for the given context is yet not available, we may need to return a new uuid.

@shresthabijay
Copy link
Contributor

I think preserving the Parent.id and making sure that Child.id is in sync with it also falls in the step. For example, after editing a thought, the hashContext of the new context may return new uuid as the Parent for new context is not available. We will need to copy the id of the old parent and preserve the id.

@raineorshine
Copy link
Contributor Author

  1. hashContext will traverse from the HOME context to the desired Parent and return its id instead of hashing the Context. (This will be slow. We can optimize it later by also accepting a Path, which will allow O(1) Parent lookup with head(path).id.)

For this, we may need to change the API of hashContext. We need to pass state to hashContext. It will be a selector.

Yes, I guess that's unavoidable. At least it is a trivial change. It will require a little work if state is not in scope at the call site.

You can trivially reduce the number of hashContext instances when the corresponding Path is available, i.e. replace hashContext(pathToContext(path)) with headId(path). You may want to do that first, so that you have fewer instances of hashContext to refactor. I saw quite a few instances that use that case.

Also in many places hashContext is used to generate key for contextIndex. If the Parent for the given context is yet not available, we may need to return a new uuid.

The Parent must exist somewhere, even if it is not in contextIndex yet. We never want to return a new uuid unless we are actually creating a new thought.

We might have to add a backlink from Child to Parent now that thoughts can only have one parent. It won't be relevant for a Path that crosses context views, but there will be other cases where we need to access the parent of a child. I'm not sure how a big a change that will be though.

I think preserving the Parent.id and making sure that Child.id is in sync with it also falls in the step. For example, after editing a thought, the hashContext of the new context may return new uuid as the Parent for new context is not available.

hashContext should never return a new uuid. I would rather have hashContext fail hard and have to refactor the caller than introduce new uuids there. New uuids should only be generated in a few key places, such as createThought or initialState. We don't want to risk an out-of-sync id.

We will need to copy the id of the old parent and preserve the id.

Yup, this is ultimately what we want everywhere, preserving ids from existing Paths and Parents. We just make hashContext backwards-compatible in a large subset of cases where all the thoughts are in the contextIndex to ease the migration.

@shresthabijay
Copy link
Contributor

@raineorshine I think for every Child there should be a corresponding Parent entry now. In previous implementation this was not the case.

@raineorshine
Copy link
Contributor Author

I just realized that we will still need an asynchronous recursive mechanism for deleting all descendants when we delete a thought. It should have resume support in case the connection to the server gets interrupted.

@shresthabijay
Copy link
Contributor

I just realized that we will still need an asynchronous recursive mechanism for deleting all descendants when we delete a thought. It should have resume support in case the connection to the server gets interrupted.

I was thinking about this. We need to have a separate mechanism to handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improve performance refactor Refactor without changing behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants