Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent Editing #1349

Merged
merged 34 commits into from
Dec 6, 2021
Merged

Conversation

shresthabijay
Copy link
Contributor

@shresthabijay shresthabijay commented Jul 14, 2021

Fixes #495

Changes

Step 1 - 3

  • Made Child.id and ThoughtContext.id required.
  • Made Parent.id and Parent.value required.
  • Unified Parent.id, Child.id and ThoughtContext.id, where id is the hashed context instead of UUID. Mostly using hashContext to generate an id where Path and Lexeme are created or updated.
  • Disabled push, pull and context view.
  • Disabled id dependent tests and also tests related to pull, push, and context view.

Step 4

  • Changed Parent.id to UUID and persist it on thought updates.
  • Changed hashContext to traverse the thought tree and return parent UUID.
  • Fixed related tests.
  • Disabled data provider tests, dexie and firebase too.
  • Add relatedParent entry for every Child. If there is a child then it should have a parent entry too.

Notes:

  • Nested duplicate merge on moveThought is broken. All the affected tests have been skipped.

Step 5

  • Added backlink to the Parent making it similar to Doubly Linked List. Now it is possible to traverse in both directions of the thought tree.
  • Removed context from context from Parent and ThoughtContext.
  • All the context view-related tests are skipped.

Notes:

  • Most of the API that uses context instead of id has not been changed. For now, some utility functions like getContextsForThought have been added to generate context from the thought id so that these functions can work. These api's will change in later steps.

Step 6

  • Removed recursive descendant update logic in editThought and moveThought.
  • Removed contextView related logic in editThought and moveThought because it will possibly implemented in a different way.

Notes:

  • deleteThought still has recursive updates because we always have to remove all the descendant thoughts from the persistence layer. We possibly need to do the recursive deletion separately outside this reducer. Open to discussion.

Step 7 (Skipped)

  • Re-enable push and pull is skipped because it will be implemented later on.

Step 8 / Step 9

  • Change Path to an array of ids, Child to id, and ThoughtContext to id. Add rank and archived to Parent.

  • Lexeme.contexts now stores the actual thought id instead of its parent id. The reason is we want to access both the actual thought and its parent on 0(1). If we just store parent id, we can access the parent on 0(1) using the parent id in the Lexeme.contexts but to find the actual thought we need to iterate the children array. If we store the actual thought id, we can easily access the parent on O(1) too because thought has a direct backlink to its parent.

  • Add newRank to the moveThought since Path doesn't include rank anymore.

  • Removed use of context views logic where required.

  • Kept most of the API's of the functions intact. Some have been changed like removeContext which uses id instead of context.

  • Added getContextForThought which gives context from the given thought id.

  • Changed pathToContext to selector which takes a path and returns the context.

  • Changed most of the getChildren functions to return thoughts instead of just ids.

  • Disabled recently edited and URL update tests.

  • Fixed all remaining tests.

Changes after review of Step 8 / Step 9

  • Removed direct access to the thought structure in high level functions.
  • Added test helpers to abstract away use of hashContext in tests.
  • Native parts of the codebase have not been migrated fully yet. So native files have been excluded fromtsconfig temporarily.

Step 7 (Renable push and pull)

  • Set EM context pending status to false before importing intial settings.
  • getContext data helper now uses rankThoughtsFirstMatch logic to get the parent entry from the persistance layer.
  • Previously force pull for pending deletion started from the parent of the pending thought . Now the pending thought that is to be deleted and its descendants are pulled.
  • After the migration to id instead of context, we cannot delete a parent entry if it doesn't have any children. So while deleting a thought which is in an orphaned state due to the pending delete mechanism, we will run into situation we cannot get the direct parent because it is already deleted in the first part of the delete process. So temporarily I have introduced an orphaned param to the deleteThought payload that will ignore all the parent's children update logic. I have added proper comments for now. This may need to refactored later on.
  • Removed the flushMoves and flushEdits 🎉

Step 10.

  • Fix data provider tests.
  • Fix data integrity tests.

Remaining Steps

  • Change hashContext to getThoughtIdByContext`
  • Change Child to branded ThoughId
  • Migrate native files to the lastest changes too.

@shresthabijay
Copy link
Contributor Author

@raineorshine I have completed step 3 of migration. Please review the changes. Most of the changes are related to changing id to hashed context.

@shresthabijay
Copy link
Contributor Author

shresthabijay commented Jul 14, 2021

One thing I noticed is that Parent.id is sometimes a hashed context and sometimes it's a uuid. The data provider always uses contextIndex key as id on pushing. The uuid hash is never stored in the persistence layer. When pulling the data the Parent.id is always a hashed context. So when we say unifying Child.id with Parent.id it actually means unifying Child.id with a key of contextIndex at this point. Since we are gonna change the key to be a UUID in the next step, I don't think we need to make any changes to make it consistent.

@raineorshine raineorshine changed the title Independent editing Independent Editing Jul 14, 2021
@raineorshine
Copy link
Contributor

One thing I noticed is that Parent.id is sometimes a hashed context and sometimes it's a uuid. The data provider always uses contextIndex key as id on pushing. The uuid hash is never stored in the persistence layer. When pulling the data the Parent.id is always a hashed context. So when we say unifying Child.id with Parent.id it actually means unifying Child.id with a key of contextIndex at this point. Since we are gonna change the key to be a UUID in the next step, I don't think we need to make any changes to make it consistent.

Yes, odd, but I agree we will be unifying them in the next step anyway. When you re-enable push/pull, make sure you're getting back the expected id from the databases and not some special database id.

@shresthabijay
Copy link
Contributor Author

@raineorshine Please review the changes. Now, createId is only used in some places for Parent.id and Lexeme.id.

@raineorshine raineorshine force-pushed the independent-editing branch from 5ea8362 to ffe8fee Compare July 16, 2021 23:14
@shresthabijay shresthabijay force-pushed the independent-editing branch 2 times, most recently from 8984ee3 to 10d9224 Compare July 31, 2021 05:28
@raineorshine
Copy link
Contributor

@shresthabijay Hey Bijay! We discussed your status of this briefly in our meeting, but I find it helpful to also maintain a written record. Please post a written update when you get a chance. Here are two questions to frame your response:

  1. What step in the migration plan are you currently on? What % complete is it?
  2. Do you anticipate any steps changing, or any new steps we did not account for? (Please document past steps that changed if they haven't been documented already.)

@shresthabijay
Copy link
Contributor Author

shresthabijay commented Aug 29, 2021

  1. What step in the migration plan are you currently on? What % complete is it?

@raineorshine I am currently on step 8 and I have skipped step 7. I have updated the progress on the comment above. #1349 (comment). I am currently making tests work. Due to change inPath, most of the tests need to be fixed. Right now only 70% of tests are passing.

@shresthabijay
Copy link
Contributor Author

shresthabijay commented Aug 29, 2021

  • Lexeme.contexts now stores the actual thought id instead of its parent id. The reason is we want to access both the actual thought and its parent on 0(1). If we just store parent id, we can access the parent on 0(1) using the parent id in the Lexeme.contexts but to find the actual thought we need to iterate the children array. If we store the actual thought id, we can easily access the parent on O(1) too because thought has a direct backlink to its parent.

@raineorshine What are your thoughts on this ?

Temporaily exclude native files from ts
@shresthabijay
Copy link
Contributor Author

@raineorshine I have started to enable push and pull. After that, we need to do the optimizations. I will list some of them that I have on top of my head.

  • hashContext is being used to get thought id from the context. So we need to rename it to something like getThoughtIdByContext instead.
  • If path is available then use head directly instead of using hashContext to find the thought id.
  • Change API of high-level selectors to use thought id instead of context wherever possible. (Ex: getChildren selectors).
  • Reduce the use of simplifyPath where possible as we can access the thought directly from the id without need of simplified context.

@raineorshine
Copy link
Contributor

I would agree with the suggested optimizations. Once everything is functional, we should get the PR merged before optimizing.

There are a few more things that need to be done before we move to optimization:

  • push/pull (in progress)
  • recently edited
  • data provider, dexie, and firebase tests
  • URL update tests

Am I missing anything?

@shresthabijay
Copy link
Contributor Author

Am I missing anything?

That pretty much sums it. I am working on these.

@shresthabijay
Copy link
Contributor Author

I would agree with the suggested optimizations. Once everything is functional, we should get the PR merged before optimizing.

Sure.

@shresthabijay
Copy link
Contributor Author

@raineorshine I ran into some issues while re-enabling push and pull. I wanted to discuss it with you.

  • At loadLocalState before we import initial settings into __EM__, we need to make its pending status to false else contextIndex updates of __EM__ will be rejected by push.

  • We cannot directly access thought for a specific context from a DB. For example, we need to directly access __EM__/Settings to check if we need to import initial settings. We cannot do so since we now access it using id. We may need to implement a recursive finder similar to howhashContext works. But it would access thoughts directly from the DB instead of the state.

  • childIdsToThoughts can return null values. It is causing problems as I am re-enabling pull. So it needs to have a return type of Parent[] | null. This will make most of the getChildren selectors likely to return null values too.

If you are okay with these, I will make the changes.

@raineorshine
Copy link
Contributor

  • We cannot directly access thought for a specific context from a DB. For example, we need to directly access __EM__/Settings to check if we need to import initial settings. We cannot do so since we now access it using id. We may need to implement a recursive finder similar to howhashContext works. But it would access thoughts directly from the DB instead of the state.

Good in theory, but I don't think more than one trip to IndexedDB is acceptable performance on startup. getContext(db, [EM_TOKEN, 'Settings']) is only used to determine if the settings have been initialized already. I suggest we set a flag when the Settings have been initialized that we can look up in a single call.

  • childIdsToThoughts can return null values. It is causing problems as I am re-enabling pull. So it needs to have a return type of Parent[] | null. This will make most of the getChildren selectors likely to return null values too.

The getChildren selectors are designed to return noChildren, which is a stable empty array, when the Context is invalid or missing. Can we continue to use these semantics?

@shresthabijay
Copy link
Contributor Author

The getChildren selectors are designed to return noChildren, which is a stable empty array, when the Context is invalid or missing. Can we continue to use these semantics?

Sure I will do that.

@shresthabijay
Copy link
Contributor Author

Good in theory, but I don't think more than one trip to IndexedDB is acceptable performance on startup. getContext(db, [EM_TOKEN, 'Settings']) is only used to determine if the settings have been initialized already. I suggest we set a flag when the Settings have been initialized that we can look up in a single call.

That's much better thanks. I will implement it.

remove flush edits and moves

fix related tests
@shresthabijay
Copy link
Contributor Author

@raineorshine Are the changes intended to be pushed before the flush pending, moves, and edits mechanism can force pull the related data?

@raineorshine
Copy link
Contributor

@raineorshine Are the changes intended to be pushed before the flush pending, moves, and edits mechanism can force pull the related data?

Which changes exactly? When an edit is made?

With the new implementation, since edits and moves are O(1), I don't believe that we have to worry about pending thoughts. All of the Parents and Lexemes that need to be modified are already in memory.

That is, unless you are referring to the descendant merging logic? I think that needs to be rethought. There are too many things that can go wrong with non-atomic changes to multiple thoughts.

@shresthabijay
Copy link
Contributor Author

shresthabijay commented Oct 7, 2021

Which changes exactly? When an edit is made?

Yes from edit, move, delete etc. I noticed that when deleting the context with pending children there was some inconsistency for a while.

When a buffered thought is deleted.

  - a (Deleted)
    - b (Pending)
      - c 

After ['a'] is deleted, context and lexemes related to a is deleted from the state and ['a', 'b'] is kept in the state so that later it can be used for starting point for pending deletion. However, we are actually force pulling from ['a'] and deleting ['a'] as the starting point.

So we don't have ['a'] in the local state and the lexeme of a is also updated.

So flushDelete triggers pull from ['a']. At this point, the new changes from the queue has not been pushed bypushQueue. So getDescendantThoughts pulls the contextIndex of ['a'] and we get the data because it is available in the DB. But before pulling the thoughtIndexor lexeme, the changes are pushed. Now when the lexeme is pulled it is empty. This causes inconsistency in the state for a while. However since ['a'] and its descendants are deleted again, it doesn't seem to affect anything.

This will not be the problem after the latest commit. But I thought it was important to mention this issue. I wonder if similar issue is the reason for data integrity problems in the current implementation.

@raineorshine
Copy link
Contributor

Yes from edit, move, delete etc.

Deletes are recursive. Edits and moves are not recursive any more. They're O(1). Unless you're considering the handling of merging descendants, but you didn't specify that. I don't think you can group edits, moves, and deletes into one category any more.

I noticed that when deleting the context with pending children there was some inconsistency for a while.

Yes, it's an unfortunate aspect of the implementation in dev. For the user experience we have to delete the thought and its non-pending descendants synchronously, but then somehow we need to pull the descendants in to be deleted. The data is in an inconsistent state until the pending descendants are pulled and deleted. This is described in flushDeletes:

// In a 2-part delete, all of the descendants that are in the Redux store are deleted in Part I, and pending descendants are pulled and then deleted in Part II. (See flushDeletes)
// With the old style pull, Part II pulled the pending descendants as expected. With the new style pull that only pulls pending thoughts, pull will short circuit in Part II since there is no Parent to be marked as pending (it was deleted in Part I).
// We cannot simply include missing Parents in pull addition to pending Parents though. Some Parents are missing when a context is edited after it was added to the pullQueue but before the pullQueue was flushed. If pull includes missing Parents, we get data integrity issues when outdated local thoughts get pulled.
// Therefore, force the pull here to fetch all descendants to delete in Part II.
await dispatch(pull(pending, { force: true, maxDepth: Infinity }))

So flushDelete triggers pull from ['a']. At this point, the new changes from the queue has not been pushed bypushQueue. So getDescendantThoughts pulls the contextIndex of ['a'] and we get the data because it is available in the DB. But before pulling the thoughtIndexor lexeme, the changes are pushed. Now when the lexeme is pulled it is empty. This causes inconsistency in the state for a while. However since ['a'] and its descendants are deleted again, it doesn't seem to affect anything.

That makes sense. I may have missed that timing consideration. At the very least, it is inefficient, but it also creates additional opportunities for data inconsistencies.

This will not be the problem after the latest commit. But I thought it was important to mention this issue. I wonder if similar issue is the reason for data integrity problems in the current implementation.

That's good! Yes, very well could be.

So how does the latest commit solve this?

refactor expand thoughts to return index of paths instead of index of context
@raineorshine
Copy link
Contributor

I am getting an error on a fresh load: getDescendantThoughts: Cannot get parent for some ids. ['__ROOT__'] [].

At getDescendantThoughts.ts:64.

@raineorshine
Copy link
Contributor

Also getting expandThought: Base path head thought not found! whenever an empty thought is deleted.

At expandThoughts.ts:72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Independent Editing
2 participants