Skip to content

Commit ea8d5ef

Browse files
authored
Merge pull request #297 from grycap/dev-esparig
Updated interLink integration documentation
2 parents 8c944dc + 482faea commit ea8d5ef

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

docs/integration-interlink.md

+19-14
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,40 @@
11
# Integration with interLink
22

33

4-
[interLink](https://intertwin-eu.github.io/interLink/) is an open-source development that aims to provide an abstraction for executing a Kubernetes pod on any remote resource capable of managing a Container execution lifecycle.
4+
The project [interLink](https://intertwin-eu.github.io/interLink/) is an open-source development that aims to provide an abstraction for executing a Kubernetes pod on any remote resource capable of managing a Container execution lifecycle.
55

6-
OSCAR uses the Kubernetes Virtual Node to translate a job request from the Kubernetes pod into a remote call. We have been using Interlink to interact with an HPC cluster. For more infomation check the [interLink landing page](https://intertwin-eu.github.io/interLink).
6+
OSCAR uses the Kubernetes Virtual Node to translate a job request from the Kubernetes pod into a remote call. We have been using interLink to interact with an HPC cluster. For more infomation check this [video demo](https://youtu.be/NoVCfSxwtX0?si=emLcwTiUR897jFOg).
77

88
![Diagram](images/interlink.png)
99

10-
## Installation and use of Interlink Node in OSCAR cluster
10+
## Installation and use of interLink Node in OSCAR cluster
1111

12-
The cluster Kubernetes must have at least one virtual kubelet node. Those nodes will have tagged as `type=virtual-kubelet`. So, follow these steps to [add the Virtual node](https://intertwin-eu.github.io/interLink/docs/tutorial-admins/deploy-interlink) to the Kubernetes cluster. OSCAR detects these nodes by itself.
12+
The cluster Kubernetes must have at least one virtual kubelet node. Those nodes will have tagged as `type=virtual-kubelet`. Follow the documentation in the [interLink homepage](https://intertwin-eu.github.io/interLink/) to deploy an interLink virtual node to the Kubernetes cluster. OSCAR detects these nodes automatically.
1313

14-
Once the Virtual node and OSCAR are installed correctly, you use this node by adding the name of the virtual node in the `InterLinkNodeName` variable.
15-
Otherwise, to use a normal node of the Kubernetes cluster, leave it blank `""`
14+
Once the Virtual node and OSCAR are installed correctly, you can use this node to offload your job to the configured remote host. To offload the jobs created by a service to an interLink node the name of the virtual node has to be set in the `interlink_node_name` variable of the [service FDL](https://docs.oscar.grycap.net/fdl/#service).
15+
16+
Otherwise, if the variable is not set, i.e., `""`, the job will be executed in a normal Kubernetes node.
1617

1718

1819
## Annotations, restrictions, and other things to keep in mind
1920

20-
- The [OSCAR services annotations](https://docs.oscar.grycap.net/fdl/#service) persist in the virtual node and affect the behavior of the offloaded jobs.
21+
- The [OSCAR services annotations](https://docs.oscar.grycap.net/fdl/#service) would be applied to every job for that service. The annotations are used to apply some configuration to the remote host.
2122

22-
- The memory and CPU defined in the OSCAR services field do not affect the offloaded job.
23+
- The memory and CPU defined in the OSCAR services are not applied to the jobs that are offloaded via interLink. Those parameters are set employing the annotations, as specified by the provider.
2324

24-
- To request resources in the offloaded job, use the [slurm flags](https://curc.readthedocs.io/en/latest/running-jobs/job-resources.html#slurm-resource-flags) `slurm-job.vk.io/flags`( `--job-name`, `--time=02:30:00`, `--cpus-per-task`, `--nodes`, `--mem`). For example, you can mount a system folder in an HPC cluster with the key annotation `job.vk.io/singularity-mounts` and value pattern `"--bind <outside-container>:<inside-container>"`. The offload jobs are executed in a remote HPC cluster. So, a persistent volume claim cannot be mounted. Another example is the annotation `job.vk.io/pre-exec`, which will execute a command before each execution.
25+
- When the offloading is set to a SLURM job, it's possible to request some resources employing [slurm flags](https://curc.readthedocs.io/en/latest/running-jobs/job-resources.html#slurm-resource-flags). For example:
26+
- To request CPU and memory, `slurm-job.vk.io/flags`( `--job-name`, `--time=02:30:00`, `--cpus-per-task`, `--nodes`, `--mem`)
27+
- To mount a system folder in an HPC cluster `job.vk.io/singularity-mounts` and value pattern `"--bind <outside-container>:<inside-container>"`.
28+
- To execute a command before each run `job.vk.io/pre-exec`.
2529

2630
- Any environment variable with a special character could create an error in the translation between the virtual node and the remote job. As a good practice, pass the environment variable encode in Base64 and decode it inside the execution of the script.
2731

28-
Please note that interLink uses singularity to run a container with these characteristics:
32+
- Please note when the remote host is a SLURM cluster a docker container will be translated into a singularity one. There are some points to consider:
2933

30-
- You must reference the image container as singularity pattern `docker://ghcr.io/intertwin-eu/itwinai:0.0.1-3dgan-0.2`. Once the image is pulled, the image can be referenced by path `<path-of-container>/itwinaiv6.sif`.
31-
- Your script will not run as a privileged user in the container. Therefore, you cannot write in the regular file system. Use the `/tmp` folder.
32-
- The working directory is not the same in the container. Therefore, use absolute paths.
34+
- You must reference the image container indicating `docker://` at the beginning, e.g., `docker://ghcr.io/intertwin-eu/itwinai:0.0.1-3dgan-0.2`.
35+
- Once the image is pulled in the remote host, the image can be referenced by path `<path-of-container>/<image>.sif`.
36+
- Note that your script will not run as a privileged user in the container. Therefore, you cannot write in the regular file system. Use the `/tmp` folder if needed instead.
37+
- There is no working directory in singularity. Therefore, absolute paths are recommended.
3338

3439

35-
The support for interLink was integrated in the context of the [interTwin](https://www.intertwin.eu) project, with support from [Istituto Nazionale di Fisica Nucleare - INFN](https://home.infn.it/it/), who developed interLink, and [CERN](https://home.cern), who provided the development of [itwinai](https://github.com/interTwin-eu/itwinai), used as a platform for advanced AI/ML workflows in digital twin applications and a use case. Special thanks to the [IZUM Center](https://en-vegadocs.vega.izum.si) in Slovenia for providing access to the [HPC Vega](https://en-vegadocs.vega.izum.si) supercomputing facility to perform the testing.
40+
The interLink integration was developed in the context of the [interTwin](https://www.intertwin.eu) project, with support from [Istituto Nazionale di Fisica Nucleare - INFN](https://home.infn.it/it/), who developed interLink, and [CERN](https://home.cern), who provided the development of [itwinai](https://github.com/interTwin-eu/itwinai), used as a platform for advanced AI/ML workflows in digital twin applications and a use case. Special thanks to the [IZUM Center](https://izum.si) in Slovenia for providing access to the [HPC Vega](https://en-vegadocs.vega.izum.si) supercomputing facility to perform the testing.

0 commit comments

Comments
 (0)