Add a way to pass secrets to processes in a controlled fashion#406
Add a way to pass secrets to processes in a controlled fashion#406kostko merged 7 commits intogenialis:masterfrom
Conversation
4003ec2 to
b226d82
Compare
Codecov Report
@@ Coverage Diff @@
## master #406 +/- ##
==========================================
- Coverage 80.09% 79.55% -0.55%
==========================================
Files 181 186 +5
Lines 9899 10065 +166
Branches 1059 1069 +10
==========================================
+ Hits 7929 8007 +78
- Misses 1789 1875 +86
- Partials 181 183 +2
Continue to review full report at Codecov.
|
| "type": "string", | ||
| "pattern": "^basic:secret:$" | ||
| }, | ||
| "value": { |
There was a problem hiding this comment.
Enforce the secret format, i.e. "pattern": "^[0-9a-fA-F]{24}$". Otherwise some users may try to save secret directly in to the field.
resolwe/flow/executors/prepare.py
Outdated
| file content that should be created. Keys are filenames, values | ||
| are the strings that should be written into those files. | ||
| """ | ||
| data = Data.objects.get(pk=data_id) |
There was a problem hiding this comment.
As we are getting just a single object this doesn't make any difference in performance or the number of queries as even prefetching requires an additional query. We could use select_related here instead of prefetch to combine everything into a single join.
resolwe/flow/executors/docker/run.py
Outdated
| mappings.append({ | ||
| 'src': os.path.join(SETTINGS['FLOW_EXECUTOR']['RUNTIME_DIR'], str(self.data_id), 'secrets'), | ||
| 'dest': '/secrets', | ||
| 'mode': 'ro,Z' |
There was a problem hiding this comment.
It is not ok to set SELinux mode on the systems that don't support it. We had such problems mounting data dir from Amazon EBS. Runtime dir will be often on shared storage, so you have to take care of that.
There was a problem hiding this comment.
Hm, but we are using this also above for mounting passwd/group?
There was a problem hiding this comment.
Yes, those files are normally on local disk so it is ok. The difference here is that runtime dir can be on shared storage.
There was a problem hiding this comment.
Ok, I will move this into configuration as we have for data/upload dirs.
There was a problem hiding this comment.
But this means that you have to support different mounting points in the container.
What if we add a setting to enable/disable SELinux config, e.g. FLOW_ENABLE_SELINUX_CONF, and the you can construct mount here?
cc @tjanez
There was a problem hiding this comment.
But then we must change this everywhere as this is very strange to have some modes in settings and others are derived from this?
There was a problem hiding this comment.
But this means that you have to support different mounting points in the container.
I don't think we should support this even with existing data/upload etc. endpoints. They should be fixed.
There was a problem hiding this comment.
I've pushed a version which moves this into settings (I don't like this though). We can move it back if this makes sense.
There was a problem hiding this comment.
But then we must change this everywhere
Yes.
Actually we support this for data/upload dir:
resolwe/resolwe/flow/executors/docker/run.py
Line 158 in ada2c06
There was a problem hiding this comment.
I know, but this (doing search and replace in arbitrary scripts) is a very ugly hack that should be removed.
| pk=data_id | ||
| ).process) | ||
| files[ExecutorFiles.DATA] = model_to_dict(data) | ||
| files[ExecutorFiles.PROCESS] = model_to_dict(data.process) |
resolwe/flow/managers/base.py
Outdated
|
|
||
| # extend the settings with whatever the executor wants | ||
| self.executor.extend_settings(data_id, files) | ||
| self.executor.extend_settings(data_id, files, extra) |
There was a problem hiding this comment.
I don't like name extra as it doesn't apply that it contains sensitive data. Maybe secrets or protected_files?
There was a problem hiding this comment.
On the other hand, the docstring for extend_settings specifies this plainly as a field for general raw files (as opposed to JSON serializations in the files parameter). Secrets are just one kind of opaque blob, so usage agrees with the docstring's specification.
There was a problem hiding this comment.
But we also prevent listing of such dirs and restrict masks of files. So it may have unwanted consequences if you put such file directly into executor dir.
I would prefer to narrow use of this, make it secure (it already is) and clearly state that.
resolwe/flow/models/data.py
Outdated
| contributor=self.contributor | ||
| ) | ||
| except Secret.DoesNotExist: | ||
| raise PermissionDenied("Access to secret not allowed") |
| return secret.value | ||
|
|
||
|
|
||
| class Secret(models.Model): |
There was a problem hiding this comment.
I think we will need an extra to describe where the secret comes from. E.g. if you get user's API token first time, you don't have to ask for it in following requests.
There was a problem hiding this comment.
We can add an arbitrary JSON metadata column for that?
| run: | ||
| program: | | ||
| # Echo secret to output. NEVER do this in an actual process! | ||
| re-save secret "$(cat /secrets/{{token.handle}})" |
There was a problem hiding this comment.
It would be useful to add Jinja filter for this, i.e. {{ token | get_secret }}
There was a problem hiding this comment.
For which part? The whole command is specific for bash. Or did you mean just the {{token.handle}}?
There was a problem hiding this comment.
I mean that you write {{ token | get_secret }} in the process and it is replaced with $(cat /secrets/{{token.handle}}) by Jinja filter.
That would make it much more user friendly.
resolwe/flow/models/secret.py
Outdated
| secret = self.create(value=value, contributor=contributor) | ||
| return str(secret.handle) | ||
|
|
||
| def get_secret(self, handle, contributor=None): |
There was a problem hiding this comment.
Is there a use-case where contributor is not defined? If not i would prefer to make it mandatory.
0e5addb to
991af17
Compare
|
Now also pushed removal of |
991af17 to
eec8bb6
Compare
eec8bb6 to
b863e76
Compare
resolwe/flow/executors/docker/run.py
Outdated
| mappings = SETTINGS.get('FLOW_DOCKER_MAPPINGS', {}) | ||
| for map_ in mappings: | ||
| script = script.replace(map_['src'], map_['dest']) | ||
| # XXX: This is a huge hack and should be removed. |
There was a problem hiding this comment.
Can we mount everything on the same path as it is outside?
There was a problem hiding this comment.
I don't think this is the correct way to go as it is much cleaner to have fixed volumes inside the container, regardless of what is outside. In this way you can always rely on specific paths when you run inside Docker.
It would be much better to always defer path translation to the executor (actually to the prepare part of the executor) as it is executor specific (e.g., the local executor does nothing, the Docker executor uses fixed volumes, etc.). Not just blindly using *_DIR from settings.
There was a problem hiding this comment.
This hack has been removed in the latest commit.
resolwe/flow/models/secret.py
Outdated
| :param value: Secret value to store | ||
| :param contributor: User owning the secret | ||
| :param metadata: Optional metadata dictionary (must be JSON serializable) | ||
| :param expires: Optional expiry time |
There was a problem hiding this comment.
Maybe state which units are used for expires?
There was a problem hiding this comment.
Also maybe state it defaults to None.
b863e76 to
2b4c789
Compare
|
@jberci please review my last commit, which fixes some issues with incorrect singleton use in Channels workers, which got exposed when removing hacky path translation. |
d37e16b to
c525cf4
Compare
|
LGTM so far. The global manager instance patching is still a bit hacky on the whole, but the Channel handler proxy will probably take care of that, since the manager will then be constructed once, as it was pre-channelization. |
95c31dc to
5aa051e
Compare
|
@jberci I've now pushed an updated version of the "Remove hacky path translation in Docker executor" commit, which separates the Channels consumer from the manager and uses the global manager instance instead. |
e818f13 to
3d74236
Compare
| class SecretManager(models.Manager): | ||
| """Manager for Secret objects.""" | ||
|
|
||
| def create_secret(self, value, contributor, metadata=None, expires=None): |
There was a problem hiding this comment.
Should this method always be used when creating secrets? If yes, then update docs and maybe also raise error when using the usual .create(. In this case it would also be nicer to return handle, secret to remove the need to query the secret by its handle if you intend to use the secret directly.
Alternatively if .create is allowed, then override it and call it in create_secret.
There was a problem hiding this comment.
Both ways are ok I guess, but prevent confusion.
| value = EncryptedTextField() | ||
|
|
||
| #: secret metadata (not encrypted) | ||
| metadata = JSONField(default=dict) |
There was a problem hiding this comment.
Since this is arbitrary data, should a naming convention be followed to prevent conflicts?
resolwe/flow/managers/base.py
Outdated
| )) | ||
|
|
||
| # Ensure we have the correct executor loaded. | ||
| self.executor = self.load_executor(executor) # pylint: disable=attribute-defined-outside-init |
There was a problem hiding this comment.
Is this still needed now when only the singleton manager is used?
There was a problem hiding this comment.
This has already been changed (discover_engines is now called instead), but yes we need re-discovery as the executor may change with each message.
| logger.exception("Unhandled exception in executor") | ||
|
|
||
| # Send error report. | ||
| self.update_data_status(process_error=[str(error)], status=DATA_META['STATUS_ERROR']) |
There was a problem hiding this comment.
A quick question: are process_* fields extended when updated or are they overwritten? @jberci?
resolwe/flow/executors/docker/run.py
Outdated
| for template in mappings_template] | ||
| # Setup Docker volumes. | ||
| def add_volume(volumes, kind, base_dir_name, volume, path=None, read_only=True): | ||
| """Add a new volume to the volumes list.""" |
There was a problem hiding this comment.
This is an internal helper, but ok, if you think that it should be added ;-)
There was a problem hiding this comment.
You can add just a quick description of arguments. There are just a lot of them, so it takes some time to understand them :)
resolwe/flow/executors/docker/run.py
Outdated
|
|
||
| mappings.append({'src': passwd_file.name, 'dest': '/etc/passwd', 'mode': 'ro,Z'}) | ||
| mappings.append({'src': group_file.name, 'dest': '/etc/group', 'mode': 'ro,Z'}) | ||
| volumes.append({'src': passwd_file.name, 'dest': '/etc/passwd', 'options': 'ro,Z'}) |
There was a problem hiding this comment.
It would be nice to make these options configurable.
There was a problem hiding this comment.
self.volumes? Or move self.volumes = volumes.copy() to the end of the function.
There was a problem hiding this comment.
self.volumes is no longer needed now that we got rid of hacky path translation, I forgot to remove it.
resolwe/flow/executors/docker/run.py
Outdated
| # NOTE: Since the tools are shared among all containers they must use the shared SELinux | ||
| # label (z option) | ||
| self.mappings_tools = [{'src': tool, 'dest': '/usr/local/bin/resolwe/{}'.format(i), 'mode': 'ro,z'} | ||
| self.mappings_tools = [{'src': tool, 'dest': '/usr/local/bin/resolwe/{}'.format(i), 'options': 'ro,z'} |
There was a problem hiding this comment.
This SELinux label will also be a problem on a shared storage.
There was a problem hiding this comment.
I left this as it was before? So this was already a problem before?
There was a problem hiding this comment.
Yes, it was already a problem before, I just pointed it out as the part for determining modes is being refactored.
There was a problem hiding this comment.
Good point, I'll add another extra option for tools and one for passwd/group.
|
|
||
|
|
||
| # update FLOW_DOCKER_MAPPINGS setting if necessary | ||
| def _get_updated_docker_mappings(previous_settings=getattr(settings, 'FLOW_EXECUTOR', {})): |
| inputs['proc'] = { | ||
| 'data_id': data.id, | ||
| 'data_dir': settings.FLOW_EXECUTOR['DATA_DIR'], | ||
| 'data_dir': self.manager.get_executor().resolve_data_path(data, '..'), |
There was a problem hiding this comment.
Maybe add dedicated function for this instead of passing .. as filename?
There was a problem hiding this comment.
@kostko make the filename argument optional and return root data dir if it is not defined.
EDIT: Actually both parameters should be optional in this case.
|
|
||
|
|
||
| class BaseManager(BaseConsumer): | ||
| class BaseManager(object): |
There was a problem hiding this comment.
This should go into a separate commit.
9f5a974 to
b770740
Compare
b770740 to
9cccf35
Compare
| logger.exception("Unhandled exception in executor") | ||
|
|
||
| # Send error report. | ||
| self.update_data_status(process_error=[str(error)], status=DATA_META['STATUS_ERROR']) |
resolwe/flow/executors/run.py
Outdated
|
|
||
| if finish_fields is not None: | ||
| self._send_manager_command(ExecutorProtocol.FINISH, extra_fields=finish_fields) | ||
| # the feedback key deletes itself once the list is drained |
There was a problem hiding this comment.
Hmm... I don't understand this comment.
I know it's not yours but since you touched it, can you make it more elaborate?
Also, make it a sentence.
There was a problem hiding this comment.
I found it confusing as well. It seems to be talking about the Redis list used for communication, which is deleted when all elements are popped? @jberci can you confirm this is the case?
There was a problem hiding this comment.
It is the case, the comment is about Redis deleting the list once all items have been popped. This behaviour was not obvious to me when I was doing this. If memory serves, the code was also structured so that one would expect explicit key deletion at the end, so I put that comment there. Feel free to get rid of it.
There was a problem hiding this comment.
Thanks for the explanation. I like @kostko's rewording and I think we can leave it in.
resolwe/flow/managers/base.py
Outdated
| with open(file_path, 'wt') as json_file: | ||
| json.dump(files[file_name], json_file, cls=SettingsJSONifier) | ||
|
|
||
| # Save the secrets in the runtime dir, with permissions to prevent listing the given directory. |
There was a problem hiding this comment.
Could you wrap this line at 100 characters?
resolwe/flow/managers/base.py
Outdated
|
|
||
| # Save the secrets in the runtime dir, with permissions to prevent listing the given directory. | ||
| secrets_dir = os.path.join(runtime_dir, ExecutorFiles.SECRETS_DIR) | ||
| os.makedirs(secrets_dir, mode=stat.S_IWUSR | stat.S_IXUSR) |
There was a problem hiding this comment.
To me, mode=0o300 is actually more readable than mode=stat.S_IWUSR | stat.S_IXUSR.
If you agree, change it.
resolwe/flow/managers/base.py
Outdated
| self.executor.extend_settings(data_id, files, secrets) | ||
|
|
||
| # save the settings into the various files in the data dir | ||
| # save the settings into the various files in the runtime dir |
There was a problem hiding this comment.
Please, convert this to a sentence since you touched it 🙂 .
There was a problem hiding this comment.
It should have been a sentence in the first place. I'll fix it.
| @@ -0,0 +1,24 @@ | |||
| """Manager Channels consumer.""" | |||
|
|
||
| :param data: Data object instance | ||
| :param filename: Filename to resolve | ||
| :return: Resolved filename, which can be used to access the given data |
There was a problem hiding this comment.
Please, wrap the docstring at 72 characters.
There was a problem hiding this comment.
Should we have linters for these things? I never look at where I wrap things :-)
There was a problem hiding this comment.
It's not yet implemented in pycodestyle, but the PR @ PyCQA/pycodestyle#674 seems on right track and that it will got approval from pycodestyle's maintainer.
| """Resolve upload path for use with the executor. | ||
|
|
||
| :param filename: Filename to resolve | ||
| :return: Resolved filename, which can be used to access the given uploaded |
There was a problem hiding this comment.
Please, wrap the docstring at 72 characters.
resolwe/flow/executors/prepare.py
Outdated
|
|
||
| :param data: Data object instance | ||
| :param filename: Filename to resolve | ||
| :return: Resolved filename, which can be used to access the given data |
There was a problem hiding this comment.
Please, wrap the docstring at 72 characters.
resolwe/flow/executors/prepare.py
Outdated
| """Resolve upload path for use with the executor. | ||
|
|
||
| :param filename: Filename to resolve | ||
| :return: Resolved filename, which can be used to access the given uploaded |
There was a problem hiding this comment.
Please, wrap the docstring at 72 characters.
cfa388e to
0b5b928
Compare
| volumes += SETTINGS.get('FLOW_DOCKER_EXTRA_VOLUMES', []) | ||
|
|
||
| # Create Docker --volume parameters from volumes. | ||
| command_args['volumes'] = ' '.join(['--volume="{src}":"{dest}":{options}'.format(**volume) |
There was a problem hiding this comment.
@kostko Will this work if options is empty? Or do you have to omit the colon?
There was a problem hiding this comment.
Yes, it works if the options are empty (no need to omit the colon).
0b5b928 to
3236bb1
Compare
3236bb1 to
5c35716
Compare
DEPLOYMENT CONSIDERATIONS:
SECRET_KEY, keep it secure and backed up as it should already be ;-)FLOW_DOCKER_MAPPINGShas been removed in favor of newFLOW_DOCKER_VOLUME_EXTRA_OPTIONSandFLOW_DOCKER_EXTRA_VOLUMES.