Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implemented the Trie crow uses to match rules with URLs #166

Merged
merged 4 commits into from
Jul 19, 2021

Conversation

The-EDev
Copy link
Member

One of the parts I avoided documenting because of how complex it was.. Here I am digging into the deepest parts of it and making a better version.

With the poetic crap out of the way, Here's what the old version did and how I changed it.

OLD

  • Nodes are placed in a list, initially the list contains only 1 head node.
  • Nodes have 3 variables, rule_index, param_childrens, and children.
    • rule_index is an integer to inform where a rule name ends.
    • param_childrens is an array of 5 integers, representing the 5 different parameter types (<int>, <path> etc..). if a child of the node is a parameter, its index (from the nodes list) is placed in the corresponding place in the array.
    • children is a map<string, uint> which contained the letter (or letters) associated with a child, and its index in the rules list.
  • adding a node to the trie involves increasing the size of the nodes list and then adding the index of the last element to the children of the parent node.
  • searching is done by going through the tree one children list at a time, then recursively looking for the rest of the string in the correct child's children (if found), the idea is that checking an entire level happens in 1 function call, preventing unnecessary calls and jumps.
  • optimizing the node simply flattens the entire trie, this is done by getting the children of a node and merging their children into them, then recursively doing the same thing for the new merged children. Merging involves adding the strings of the grandchildren and children of a node, and having the new result point to the grandchild's index. This did not happen in cases where the grandchild was a param node, or if the child had a rule index. There were some anomalies where merging did not occur, but they were not investigated (primarily because the implementation was to be changed anyway).
  • Printing the tree had a problem where params would not be in the correct position according to their level, this is due to to CROW_LOG_DEBUG adding its own line end, and the fact that the printing function was adding the level spaces and param in 2 different calls.

NEW

  • The Trie always has direct access to the head node only, with each node containing its own children.
  • Nodes have 4 variables, rule_index, key, param, children
    • rule_index remains unchanged.
    • key was previously the string in children, is now part of the node itself.
    • param replaces param_childrens, being a single variable telling the parameter type of the node itself.
    • children a list of node pointers, replacing the map.
  • adding a node now involves adding a pointer to a new node (a parent has to be given) in the node's children.
  • searching was only changed as far as to be compatible with the variable changes, which simplified certain parts of the function. but the functionality was mostly left unchanged.
  • optimization was almost completely rewritten, the optimize function now checks only the node itself and its child, and only merges if the node has 1 child and if the node itself has no rule index or children with a param (because param overrides any key for a given node). solving both the flattening issue and the no merging anomalies.
  • printing was cleaned up to prevent the spacing issue from happening, and some commented code was removed.

The architectural changes were primarily giving the node more autonomy over data related to it, and fixing the optimization.

Performance wise, 2 different tests consistently showed the following:

  • old optimization function increased the memory consumption of the trie by about 10%
  • new implementation takes about 35% as much memory for the same tree as the old one (pre optimization)
  • post optimization, the new implementation consumes 10% as much memory as the old implementation did. Proving that the optimization (at least when it comes to memory) actually works.
  • speed testing (for searching) shows that the new implementation is a lot more consistent, having much less standard deviation from the average time when compared to the old version. And being consistently faster, taking on average 50% as much time to find a string. To put things into perspective, old: 20µs with 40µs jumps and new: 10µs, so while it is a large gain percentange, it is very small in actual time.

@The-EDev The-EDev requested a review from mrozigor July 10, 2021 20:44
include/crow/routing.h Show resolved Hide resolved
include/crow/routing.h Outdated Show resolved Hide resolved
@The-EDev The-EDev merged commit b18fbb1 into master Jul 19, 2021
@The-EDev The-EDev deleted the better_trie branch August 14, 2021 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants