Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of unwarps in servers crate #2707

Merged
merged 4 commits into from
Mar 31, 2019

Conversation

hashmap
Copy link
Contributor

@hashmap hashmap commented Mar 25, 2019

It doesn't include stratum server which is sufficiently changed in 1.1
branch and adapters, which is big enough for a separate PR.

It doesn't include stratum server which is sufficiently changed in 1.1
branch and adapters, which is big enough for a separate PR.
@hashmap hashmap added the task label Mar 25, 2019
@hashmap hashmap added this to the 1.0.3 milestone Mar 25, 2019
@JeremyRubin
Copy link
Contributor

JeremyRubin commented Mar 25, 2019

General concept ACK, good cleanup, looks correct. (edit: I mean utACK 18b86d7)

I think it might make sense to get rid of the option type entirely in Dandelion config -- can these ever be None now?

I don't love the unwrap_retry macro, but it works and I don't have a better suggestion. One addressable concern is that it's possible to enter an infinite loop (v.s. just crashing). May also be better to do an exp backing on the timeout up to 1 sec, as 1 sec is kinda slow. Then again, I don't think we're expecting these to fail (ever) given they used to be unwraps...

nit: in the future, would be easier for me to review if you do one commit per module being refactored

@hashmap
Copy link
Contributor Author

hashmap commented Mar 25, 2019

@JeremyRubin thanks for the review!

think it might make sense to get rid of the option type entirely in Dandelion config

My understanding is that it may be needed, because we deserialize toml file into this structure, some of fields may be missing, perhaps we should make fields private and treat it as internal representation

One addressable concern is that it's possible to enter an infinite loop (v.s. just crashing).

That's what I was trying to prevent. This situation (eg chain.head()) shouldn't happen but we experienced it when all LMDB readers where taken by a large number of peers (we fixed it, shouldn't happen until the next time). Exponential backoff is a good suggestion but may be an overkill for this case.

in the future, would be easier for me to review if you do one commit per module being refactored

fair point, need to find a balance between size of a change and compactness (in topological sense). Some changes like introducing of Result bubble up.

@JeremyRubin
Copy link
Contributor

JeremyRubin commented Mar 26, 2019

Hm. The deserialize argument is interesting, for which serde allows you to have a default value should the field be missing. What do you think about using that?

Good point on the pausing if the failure is recoverable. I'm just worried about the case where failing is not recoverable and restarting is better, e.g., if we're running as a daemon and can restart (and make some progress) or if it's safest to just shut a malfunctioning node off completely. Given the tradeoffs in cryptocurrency, I'd rather shut my node down completely until manual intervention if it's not just a temporary resource limit induced race condition.

Re the nit, certainly not several PRs for this, but just commits. Im not sure if we adhere to 'every commit compiles' but I think it can be nice for reviewers to have the interface change separate from the refactorings that can be more quickly scanned over. I always prefer more small commits as it is easier to bisect on if an issue comes up and can always be squashed on merge anyways. But this is off topic for here -- can chat about it out of band.

@@ -44,37 +44,19 @@ const DANDELION_STEM_PROBABILITY: usize = 90;
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct DandelionConfig {
/// Choose new Dandelion relay peer every n secs.
#[serde = "default_dandelion_relay_secs"]
pub relay_secs: Option<u64>,
#[serde(default = "default_dandelion_relay_secs")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause all kinds of conflicts with 1.1.0 branch.
It might be worth not touching the config here and just doing it on the 1.1.0 branch.
We fixed the "default" serde config like you do here, but we did not go the additional step and get rid of the options (which I agree is 👍).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antiochp ok, I've reverted this commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect #[serde = "foo"] vs. #[serde(default = "foo")] has bitten us and continues to bite us all over our config parsing code.

.get_previous_header(&txhashset_head)
.map_err(|e| {
error!(
"chain error dirung getting a previous block header {}: {:?}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated - typo.

@@ -147,13 +161,13 @@ impl SyncRunner {
}

// if syncing is needed
let head = self.chain.head().unwrap();
let head = unwrap_or_retry!(self.chain.head());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syncer already loops and implicitly retries on next loop.
So it feels like we should just break out of this loop early if we fail any of these calls that could potentially fail?

I'm not convinced we need to handle these individual failures at this level of granularity like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one option would be to just kill the sync thread on any kind of (retry-able) failure and kick off a new one to replace it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by granularity? I think we should handle error cases using unwarap as the last resort, which is quite granular.

Unwrap kills a thread silently, unless we expect it. Another option is to return Result from syncer and handle on the upper level, but to me it seems too complicated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think I was misunderstanding what unwrap_or_retry!() was doing. I thought it was retrying the thing being wrapped in the macro. It's actually continue on the outer loop.

Another option is to return Result from syncer and handle on the upper level, but to me it seems too complicated

I think this might be easier to read though.
We spawn a simple loop that handles errors (by logging errors and sleeping).
And the loop calls sync_once() or something similar returning Result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think I was misunderstanding what unwrap_or_retry!() was doing. I thought it was retrying the thing being wrapped in the macro. It's actually continue on the outer loop.

Yeah, this wasn't obvious to me either - either follow @antiochp's suggestion or add a comment in the continue statement in the macro that it's meant to continue the sync loop.

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of cleaning up all he unwrap() calls.
Not entirely convinced we need granular retry logic in the syncer. Can we consider a retry on the entire syncer thread somehow?

@hashmap
Copy link
Contributor Author

hashmap commented Mar 27, 2019

@antiochp what would be advantage of relaunching of sync_loop? I mean it's easy to return Result and replace continue with return Err() but what would we get from it? My main goal is to not be in a situation when we have a node without sync thread, I'm fine with both tactical decisions in this particular case.

@ignopeverell
Copy link
Contributor

@antiochp okay with the PR after @hashmap response? It'd also be easy going forward to lose information about the original error if we reduce the granularity.

@ignopeverell
Copy link
Contributor

Btw thanks for the review as well @JeremyRubin!

Make the server warping again. ❇️

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@hashmap hashmap merged commit 325e328 into mimblewimble:master Mar 31, 2019
@hashmap hashmap deleted the unwrap-servers branch March 31, 2019 21:24
yeastplume added a commit that referenced this pull request Apr 1, 2019
* cleanup legacy "3 dot" check (#2625)

* Allow to peers behind NAT to get up to preferred_max connections (#2543)

Allow to peers behind NAT to get up to preffered_max connections

If peer has only outbound connections it's mot likely behind NAT and we should not stop it from getting more outbound connections

* Reduce usage of unwrap in p2p crate (#2627)

Also change store crate a bit

* Simplify (and fix) output_pos cleanup during chain compaction (#2609)

* expose leaf pos iterator
use it for various things in txhashset when iterating over outputs

* fix

* cleanup

* rebuild output_pos index (and clear it out first) when compacting the chain

* fixup tests

* refactor to match on (output, proof) tuple

* add comments to compact() to explain what is going on.

* get rid of some boxing around the leaf_set iterator

* cleanup

* [docs] Add switch commitment documentation (#2526)

* remove references to no-longer existing switch commitment hash

(as switch commitments were removed in ca8447f
and moved into the blinding factor of the Pedersen Commitment)

* some rewording (points vs curves) and fix of small formatting issues

* Add switch commitment documentation

* [docs] Documents in grin repo had translated in Korean.  (#2604)

*  Start to M/W intro translate in Korean
*  translate in Korean
*  add korean translation  on intro
* table_of_content.md translate in Korean.
*  table_of_content_KR.md finish translate in Korean, start to translate State_KR.md
*  add state_KR.md & commit some translation in State_KR.md
*  WIP stat_KR.md translation
*  add build_KR.md && stratum_KR.md
*  finish translate stratum_KR.md & table_of_content_KR.md
*  rename intro.KR.md to intro_KR.md
*  add intro_KR.md file path each language's  intro.md
*  add Korean translation file path to stratum.md & table_of_contents.md
*  fix difference with grin/master

* Fix TxHashSet file filter for Windows. (#2641)

* Fix TxHashSet file filter for Windows.

* rustfmt

* Updating regexp

* Adding in test case

* Display the current download rate rather than the average when syncing the chain (#2633)

* When syncing the chain, calculate the displayed download speed using the current rate from the most recent iteration, rather than the average download speed from the entire syncing process.

* Replace the explicitly ignored variables in the pattern with an implicit ignore

* remove root = true from editorconfig (#2655)

* Add Medium post to intro (#2654)

Spoke to @yeastplume who agreed it makes sense to add the "Grin Transactions Explained, Step-by-Step" Medium post to intro.md

Open for suggestions on a better location.

* add a new configure item for log_max_files (#2601)

* add a new configure item for log_max_files

* rustfmt

* use a constant instead of multiple 32

* rustfmt

* Fix the build warning of deprecated trim_right_matches (#2662)

* [DOC] state.md, build.md and chain directory documents translate in Korean.  (#2649)

*  add md files for translation.

*  start to translation fast-sync, code_structure. add file build_KR.md, states_KR.md

* add dandelion_KR.md && simulation_KR.md for Korean translation.

*  add md files for translation.

*  start to translation fast-sync, code_structure. add file build_KR.md, states_KR.md

* add dandelion_KR.md && simulation_KR.md for Korean translation.

* remove some useless md files for translation. this is rearrange set up translation order.

*  add dot end of sentence & translate build.md in korean

*  remove fast-sync_KR.md

*  finish build_KR.md translation

*  finish build_KR.md translation

*  finish translation state_KR.md & add phrase in state.md to move other language md file

* translate blocks_and_headers.md && chain_sync.md in Korean

*  add . in chain_sync.md , translation finished in doc/chain dir.

* fix some miss typos

* Api documentation fixes (#2646)

* Fix the API documentation for Chain Validate (v1/chain/validate).  It was documented as a POST, but it is actually a GET request, which can be seen in its handler ChainValidationHandler
* Update the API V1 route list response to include the headers and merkleproof routes.  Also clarify that for the chain/outputs route you must specify either byids or byheight to select outputs.

* refactor(ci): reorganize CI related code (#2658)

Break-down the CI related code into smaller more maintainable pieces.

* Specify grin or nanogrins in API docs where applicable (#2642)

* Set Content-Type in API client (#2680)

* Reduce number of unwraps in chain crate (#2679)

* fix: the restart of state sync doesn't work sometimes (#2687)

* let check_txhashset_needed return true on abnormal case (#2684)

*  Reduce number of unwwaps in api crate  (#2681)

* Reduce number of unwwaps in api crate

* Format use section

* Small QoL improvements for wallet developers (#2651)

* Small changes for wallet devs

* Move create_nonce into Keychain trait

* Replace match by map_err

* Add flag to Slate to skip fee check

* Fix secp dependency

* Remove check_fee flag in Slate

* Add Japanese edition of build.md (#2697)

* catch the panic to avoid peer thread quit early (#2686)

* catch the panic to avoid peer thread quit before taking the chance to ban
* move catch wrapper logic down into the util crate
* log the panic info
* keep txhashset.rs untouched
* remove a warning

* [DOC] dandelion.md, simulation.md ,fast-sync.md and pruning.md documents translate in Korean. (#2678)

* Show response code in API client error message (#2683)

It's hard to investigate what happens when an API client error is
printed out

* Add some better logging for get_outputs_by_id failure states (#2705)

* Switch commitment doc fixes (#2645)

Fix some typos and remove the use of parentheses in a
couple of places to make the reading flow a bit better.

* docs: update/add new README.md badges (#2708)

Replace existing badges with SVG counterparts and add a bunch of new ones.

* Update intro.md (#2702)

Add mention of censoring attack prevented by range proofs

* use sandbox folder for txhashset validation on state sync (#2685)

* use sandbox folder for txhashset validation on state sync

* rustfmt

* use temp directory as the sandbox instead actual db_root txhashset dir

* rustfmt

* move txhashset overwrite to the end of full validation

* fix travis-ci test

* rustfmt

* fix: hashset have 2 folders including txhashset and header

* rustfmt

* 
(1)switch to rebuild_header_mmr instead of copy the sandbox header mmr 
(2)lock txhashset when overwriting and opening and rebuild

* minor improve on sandbox_dir

* add Japanese edition of state.md (#2703)

* Attempt to fix broken TUI locale (#2713)

Can confirm that on the same machine 1.0.2 TUI looks great and is broken on
the current master. Bump of `cursive` version fixed it for me.
Fixes #2676

* clean the header folder in sandbox (#2716)

* forgot to clean the header folder in sandbox in #2685

* Reduce number of unwraps in servers crate (#2707)

It doesn't include stratum server which is sufficiently changed in 1.1
branch and adapters, which is big enough for a separate PR.

* rustfmt

* change version to beta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants