Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #334, add some doc. on how to replace the Manager in case of failure #335

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

giuseppe-carboni
Copy link
Member

No description provided.

@@ -4,6 +4,15 @@
Production
**********

Unlike the Development environment, that uses Vagrant pre-configured virtual
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In development.rst we wrote "development environment", to be coherent we should use the same criteria.

Replace the Manager in case of failure
--------------------------------------
In case the Manager machine suffers a failure of some sort, it has to be
replaced. In order to do this, the first thing to do is perform again the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is perform" or "is to perform" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm investigating this with an English speaking friend, I'll post the correct version ASAP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the point below, it is not clear what are "all station systems".

- Make sure that all the station systems and machines accept incoming
connections from the newly allocated Manager's IP address. Specifically, the
``TotalPower`` backend and the ``CalMux`` machines have to be tweaked in
order to allow them to be controlled by the new manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the procedure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure involves logging in the said machines as root, if it has to be documented, this is not the place to do it. A suggestion about this is we perform this step in advance by allowing a range of addresses to control the said machines, so, in case of failure, this step can be skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clear to me how it is possible to replicate the manager without any information about this point. I think the procedure should be documented somewhere, and in case this is not the place, here we have to put a reference link to it.

``discos-console`` and ``discos-storage`` machines (in case the DISCOS
control software is running on a distributed environment). This will allow
other services such as the Lustre service on the ``discos-storage`` machine
to point again to the correct IP address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a procedure to point to?

control software is running on a distributed environment). This will allow
other services such as the Lustre service on the ``discos-storage`` machine
to point again to the correct IP address.
- Perform the ssh key exchange procedure between the ``discos`` user of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Mauro do all this things? :-D We need an example for him :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a procedure that a generic observer can do. Performing the ssh key exchange requires knowing the password of both the discos and the root users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was joking, the point is that we have to write the documentation thinking that the reader is not one of the discos team...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants