algod: Hot/Cold Data Directories and Resource Paths#5614
algod: Hot/Cold Data Directories and Resource Paths#5614algorandskiy merged 51 commits intoalgorand:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5614 +/- ##
==========================================
+ Coverage 55.21% 55.51% +0.29%
==========================================
Files 473 473
Lines 66156 66244 +88
==========================================
+ Hits 36528 36774 +246
+ Misses 27164 26977 -187
- Partials 2464 2493 +29
... and 10 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
algorandskiy
left a comment
There was a problem hiding this comment.
I do not feel strong about OpenLedger interface change (see the suggestion) but the configuration must be well documented. Maybe even include couple examples there and in PR description.
|
The more I think about the individual resource paths I exposed, the more I think they should be changed somewhat. I won't make the change presently, but soliciting opinions from folks: I mention this in the description, but if a specific resource path is defined (like TrackerDBFilePath), no attempt to isolate it into a GenesisDir is made. This choice was made because when selecting a specific path to the a resource, I didn't want to modify it away from what the user expected/input. That, and the way pathing was wired into the ledger (and other components) didn't make this path customization very easy to support. However, not isolating files by GenesisID means that a config used twice with different genesis will have resource collisions. I can't really think of any argument to defend that other than to say that these are some very fine-grained options, so the ability to hurt oneself makes sense. I think I can fix this by changing these individual resource paths to "individual resource datadirs":
In this way, any individually specified resource will be defined by a directory, and that directory will auto isolate by Genesis ID. An exception should be made for |
I am pro this change. Not separating genesis dirs does allow for collisions that might not be expected by the end-user not reading the source code. Are there downsides to this approach vs using filepaths that you see? Other than potential implementation complications? |
|
config v29 is in master, please rebase |
The current challenge is making sure that the ledger (maybe other components too) are handed these paths in a way that doesn't change their behavior too fundamentally. For example, the Ledger doesn't take a genesisDir as an input, it takes a dbPathPrefix, which is a little different. So, making sure these paths are formatted in a way that is transparent to the component is the goal. But yes, I was basically sold that this improvement is required for this feature, so I've been chipping at it between reviews. It is indeed a little tricky. @algorandskiy thank you for the callout re v29 and more documentation in the config. //Documentation I plan on giving a full pass once the names and specifics stop changing. Rebasing on V29, will do 👍 |
b9e3469 to
ba7e57e
Compare
d78b6d5 to
994398f
Compare
algorandskiy
left a comment
There was a problem hiding this comment.
I'm still not sold on the at-use checks like below
// if the node's HotGenesisDir is not defined, set it to the root genesis dir
if node.genesisDirs.HotGenesisDir == "" {
node.genesisDirs.HotGenesisDir = node.genesisDirs.RootGenesisDir
}
...
blockDBPrefix := filepath.Join(dbPrefixes.ResolvedGenesisDirs.RootGenesisDir, dbPrefixes.DBFilePrefix)
if dbPrefixes.ResolvedGenesisDirs.HotGenesisDir != "" {
trackerDBPrefix = filepath.Join(dbPrefixes.ResolvedGenesisDirs.HotGenesisDir, dbPrefixes.DBFilePrefix)
}versus having it resolved inside some some cfg.EnsureDirs method. @winder / @cce / @bbroder-algo / @iansuvak opinions?
I agree that centralizing the resolution logic would be cleaner but don't feel super strongly about it since it's not being done in too many places. There doesn't seem to be a downside to centralizing it though? |
|
Do not merge until the beta release is out |
|
@AlgoAxel please rebase into master. |
Co-authored-by: Ian Suvak <ian.suvak@gmail.com>
What
This PR adds a collection of configuration variables for use by node runners to better organize their file resources when running
algod.How
Two types of configuration variables have been added:
1 - HotDataDir/ColdDataDir
If Hot or Cold Dirs are set, they will be provided to the downstream components of the node (only implemented for MakeFull at the moment). If they are not provided, the typical DataDir is provided. Either/Or may be specified.
Note, some resources like
config.jsonor minor runtime files may continue to exist only in the originally specified-d dataDir.2 -Specific Resource FilePaths and Dirs
A collection of resource paths are exposed now in the config to give relay operators specific control over the location of some artifacts. These resources are:
Documenting our Storage Paths, Before and After
Previously
dataDir
dataDir is a path provided by node runners which effectively seeds the operation of the node. It is expected to contain a
config.json, and unless otherwise specified, agenesis.jsonfile is also expected. dataDir is specified at runtime through either the-dflag, or theALGORAND_DATAOS Environment Variable. The nodes operation will create files in the dataDir likenode.log, or file locks.genesisDir
genesisDir is a path that is referenced by most of the tools built by this repo. Once a genesis file is loaded, the
genesis.IDis used as a subdirectory in the dataDir. This is done so that one node can operate on potentially many networks. Subcomponents of the node use the genesisDir to hold network specific data like wallets and ledger/block databases.Now
All directories [Root, Hot, Cold, Tracker ...], with the exception of Log paths, are "Ensured and Resolved" to Genesis Directories as part of the server initialization.
cfg.EnsureandResolveGenesisDirs(myRoot, myGenesisID)will resolve every given directory to absolute, will attach a GenesisDirectory, and will attempt to ensure the path exists.Once a ResolvedGenesisDirectories is made, it is handed down to MakeFull/MakeFollower node, who consults it while starting up its components. A LedgerDirsAndPrefix structure is also used to attach the ledger prefix so that the ledger components can handle their pathing with "ledger" prefix like they usually do.
additionally the cadaver dir in agreement will fall back to using
cfg.ColdDataDir. previously, this had no fallback and would simply default to a blank string. This intentionally does not change that behavior (nor does it append a genesis dir, as that wasn't happening before)Testing
Manual Testing
Manual testing is old, but I had confirmed that specifying directories worked as expected
Unit Tests
For Config:
ensureAbsPathEnsureAndResolveGenesisDirson the config object, for success and error.These tests did not confirm that logging configuration is functional, because that happens at the Server level. These tests also don't confirm the catchpoint directory because that path is only created once a catchpoint file is created.
Use Guide
Here's a quick rundown of how to use this feature-set.
Default
Don't set anything new, and your node is unaffected. Bits still land in your
-dspecified root, in whatever pathing they've always hadSimple Disk Tuning
If you have a spare NVME or other fast-disk lying around, you could set the
HotDataDirto a path on it. By doing that, fast-access resources will get created there, and anything else continues to live in your-dspecified root.Or perhaps you have a slower, larger drive that you want us to use -- you could set
ColdDataDirto a path on it. By doing that, slower/infrequent resources will get created there, and anything else continues to live in your-droot.Or, you could specify both of those, and now the nodes resources are separated into Hot/Cold, and located on those paths you specified.
Advanced Disk Tuning
If you would like to fully customize pathing, you can specify any one of the new resource dirs, like
TrackerDBDir. The related resource will be hosted there now.Mix and Match
You can choose to set as many of the individual resource paths as you like. Any unspecified paths will resolve to their most appropriate fallback. Slow resources will use ColdDataDir, and Fast resources will use HotDataDir. If either Hot or Cold aren't set, they fall back to your
-dspecified directory.A good first change to make to get the most disk benefit is to specify
HotDataDirto point at a dedicated high speed disk. This would allow the most critical persisted resources to spend less time on IO, potentially increasing your node's performance.