-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #334, add some doc. on how to replace the Manager in case of failure #335
base: master
Are you sure you want to change the base?
Conversation
doc/production.rst
Outdated
@@ -4,6 +4,15 @@ | |||
Production | |||
********** | |||
|
|||
Unlike the Development environment, that uses Vagrant pre-configured virtual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In development.rst we wrote "development environment", to be coherent we should use the same criteria.
doc/production.rst
Outdated
Replace the Manager in case of failure | ||
-------------------------------------- | ||
In case the Manager machine suffers a failure of some sort, it has to be | ||
replaced. In order to do this, the first thing to do is perform again the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is perform" or "is to perform" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm investigating this with an English speaking friend, I'll post the correct version ASAP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the point below, it is not clear what are "all station systems".
- Make sure that all the station systems and machines accept incoming | ||
connections from the newly allocated Manager's IP address. Specifically, the | ||
``TotalPower`` backend and the ``CalMux`` machines have to be tweaked in | ||
order to allow them to be controlled by the new manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the procedure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This procedure involves logging in the said machines as root, if it has to be documented, this is not the place to do it. A suggestion about this is we perform this step in advance by allowing a range of addresses to control the said machines, so, in case of failure, this step can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clear to me how it is possible to replicate the manager without any information about this point. I think the procedure should be documented somewhere, and in case this is not the place, here we have to put a reference link to it.
``discos-console`` and ``discos-storage`` machines (in case the DISCOS | ||
control software is running on a distributed environment). This will allow | ||
other services such as the Lustre service on the ``discos-storage`` machine | ||
to point again to the correct IP address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a procedure to point to?
control software is running on a distributed environment). This will allow | ||
other services such as the Lustre service on the ``discos-storage`` machine | ||
to point again to the correct IP address. | ||
- Perform the ssh key exchange procedure between the ``discos`` user of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Mauro do all this things? :-D We need an example for him :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a procedure that a generic observer can do. Performing the ssh key exchange requires knowing the password of both the discos
and the root
users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was joking, the point is that we have to write the documentation thinking that the reader is not one of the discos team...
No description provided.