diff --git a/README.md b/README.md index 75fd000..7fe6056 100644 --- a/README.md +++ b/README.md @@ -45,33 +45,24 @@ To set up the platform locally, execute the [`setup.sh`](setup.sh) script. For a Dive into our demo examples to see the platform in action: -> TODO +- **Jupyter Notebooks (e2e)**: -[//]: # (- **Jupyter Notebooks**:) + - [Demo Wine quality ML pipeline.](tutorials/demo_notebooks/demo_pipeline) -[//]: # ( - Explore ML pipelines related to wine quality analysis.) + - [Demo Fairness and energy monitoring pipeline.](tutorials/demo_notebooks/demo_fairness_and_energy_monitoring) -[//]: # ( - Investigate fairness and energy monitoring in ML pipelines.) -[//]: # ( - [Demo Notebooks](tutorials/demo_notebooks)) +- **Project Use-Case (e2e)**: -[//]: # () -[//]: # (- **Project Use-Case**:) + - TODO -[//]: # ( - Examine the RD/IML4E Siemens use-case featuring TTPLA/YOLACT.) +- **ML tools demos**: -[//]: # ( - [Siemens Use-Case Project](https://bitbucket.org/siloai/rd-iml4e-ttpla-siemens-usecase/src/master/)) + - [Try out MLflow](tutorials/ml_components_demos/try-mlflow) -[//]: # () -[//]: # (- **Interactive Tutorials**:) + - [Try out Kubeflow Pipelines](tutorials/ml_components_demos/try-kubeflow-pipelines) -[//]: # ( - Get hands-on with MLflow, Kubeflow Pipelines, and KServe through these interactive resources:) - -[//]: # ( - [Try out MLflow](tutorials/resources/try-mlflow)) - -[//]: # ( - [Try out Kubeflow Pipelines](tutorials/resources/try-kubeflow-pipelines)) - -[//]: # ( - [Try out Kserve](tutorials/resources/try-kserve)) + - [Try out Kserve](tutorials/ml_components_demos/try-kserve) ## High-Level Architecture Overview diff --git a/tutorials/demo_notebooks/README.md b/tutorials/demo_notebooks/README.md new file mode 100644 index 0000000..e603c69 --- /dev/null +++ b/tutorials/demo_notebooks/README.md @@ -0,0 +1,5 @@ +## Demo notebooks + +- Wine quality ML pipeline: [demo_pipeline](./demo_pipeline) + +- [Fairness and energy monitoring pipeline.](./demo_fairness_and_energy_monitoring) \ No newline at end of file diff --git a/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/demo-pipeline-with-fairness-and-energy-monitoring.ipynb b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/demo-pipeline-with-fairness-and-energy-monitoring.ipynb new file mode 100644 index 0000000..330f5a5 --- /dev/null +++ b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/demo-pipeline-with-fairness-and-energy-monitoring.ipynb @@ -0,0 +1,1312 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f976353c", + "metadata": {}, + "source": [ + "# Demo KFP pipeline with fairness and energy monitoring" + ] + }, + { + "cell_type": "markdown", + "id": "b1a95f6d", + "metadata": {}, + "source": [ + "This notebook demonstrates fairness and energy consumption monitoring in a single-run OSS pipeline. The used open-source GitHub repositories for enabling this monitoring are: \n", + "\n", + "- Data: https://github.com/socialfoundations/folktables\n", + "- Fairness: https://github.com/Trusted-AI/AIF360\n", + "- Energy consumption: https://github.com/hubblo-org/scaphandre\n", + "\n", + "The used data is 2014 US Census PUMS data (https://www.census.gov/programs-surveys/acs/microdata.html) from California, which is preprocessed into an ACSIncome format by Folktables suite. This aims to replicate a prediction task standardized in ML fairness research by the widely used UCI Adult Dataset. The task is to make a model which predicts using the available features if an individual has an income higher than $50 000 (Give 0 for no and give 1 for yes) in the column PINCP. \n", + "\n", + "The sensitive attributes of this data are age (AGEP), gender (SEX), and ethnicity (RAC1P) with encodings found here: https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2014-2018.pdf. In this simple demonstration we will set the privileged group as white men under 50 ([{\"AGEP\":1, \"SEX\": 1, \"RAC1P\": 1}]), and the unprivileged group as non-white women over 50 ([{\"AGEP\":0, \"SEX\": 0, \"RAC1P\": 0}]), so the binarization thresholds are 50 for AGEP, 1 for SEX, and 1 for RAC1P. AIF enables setting up more distinct groups by adding more dict elements into privileged or unprivileged groups, like {'AGEP': 1}. For this reason, the resulted scores for these groups should be seen only as a technical demonstration and nothing else." + ] + }, + { + "cell_type": "markdown", + "id": "8a7b7901", + "metadata": {}, + "source": [ + "# Scaphandre, Prometheus alerts and Grafana dashboard setup" + ] + }, + { + "cell_type": "markdown", + "id": "f59d366c", + "metadata": {}, + "source": [ + "Before we run the pipeline, we will need to set up Scaphander, Prometheus alerts and Grafana dashboard to handle metrics. \n", + "\n", + "To setup Scaphandre,we will use the following official documentation in the given order to get the required commands:\n", + "\n", + "- Kubernetes: https://hubblo-org.github.io/scaphandre-documentation/tutorials/kubernetes.html\n", + "- Prometheus: https://hubblo-org.github.io/scaphandre-documentation/references/exporter-prometheus.html\n", + "- Grafana: https://hubblo-org.github.io/scaphandre-documentation/how-to_guides/get-process-level-power-in-grafana.html\n", + "\n", + "We will first need to clone the Scaphandre GitHub Repository with the command 'git clone https://github.com/hubblo-org/scaphandre'. I cloned it into the folder where I stored the OSS clone, but it should be fine if it is anywhere else. When the cloning is done, call the command 'cd scaphandre', check that helm is installed with 'helm version', and then call 'helm install scaphandre helm/scaphandre'. \n", + "\n", + "However, it might be required that you change the default port value of 8080 from 8081 in 'scaphandre/helm/scaphandre/values.yaml' as seen in the provided 'modified-scaphandre-values.yaml' before doing the last step because it is the same port used by KFP, Kserve, and Prometheus. If you want to run Scaphandre with modified YAML, delete it with Helm using the command 'helm delete scaphandre' ('helm list' to confirm the name) and rerun the install command. \n", + "\n", + "You can check that Scahpandre configuration by running 'kubectl get pods' and then 'kubectl describe pod (pod name),' which should show under containers and Port that 8081/TCP. A faster way of doing this is 'kubectl get services.' To get Scahpandre to send metrics to Prometheus, we need to go into the pod with 'kubectl exec -it (pod name) -- /bin/bash' and then run 'scaphandre prometheus --port 8081'.\n", + "\n", + "If no errors are given, the setup is ready, and Prometheus should be able to query the energy consumption metrics. The Prometheus exporter can be made to run with 'scaphandre prometheus', but this will create errors due to the already used port. If the correct setup starts to throw errors unrelated to used port either due to nonoptimal configuration or unsuitable Prometheus query, restart it with CTRL + C and rerun the starting command.\n", + "\n", + "To setup Prometheus alerts for fairness metrics, we must modify the default 'prometheus-config-map.yaml' to have fairness alerts. It is recommended that the default YAML is first moved somewhere safe, after which the provided 'fairness-alert-prometheus-config-map.yaml' is renamed into 'prometheus-config-map.yaml'. Now we only need to run 'kubectl apply -k deployment/monitoring', 'kubectl rollout\n", + "restart deployment/prometheus-deployment -n monitoring' and wait a bit to apply these modifications.\n", + "\n", + "To setup the Grafana dashboard for fairness and energy consumption metrics, we only need to click import under create and upload the provided JSON file named 'grafana_fairness_consumption_monitoring_1.json'. The dashboard will be empty, except for Prometheus alerts and consumption plots until the KFP pipeline has completed evaluation step. As long as Prometheus is capable of querying fairness and energy consumption metrics after running the KFP, Grafana should also be fine." + ] + }, + { + "cell_type": "markdown", + "id": "88972adb", + "metadata": {}, + "source": [ + "# KFP setup" + ] + }, + { + "cell_type": "markdown", + "id": "cfb90632", + "metadata": {}, + "source": [ + "The requirements for running this code in Jupyter using a virtual enviroment are:\n", + "\n", + "- pip install notebook\n", + "- pip install kfp≃1.8.14" + ] + }, + { + "cell_type": "markdown", + "id": "a11c53d8", + "metadata": {}, + "source": [ + "Below we provide the necessary imports and set up the KFP client." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "2e0fc367", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:09.760595598Z", + "start_time": "2023-07-03T06:52:09.357655573Z" + } + }, + "outputs": [], + "source": [ + "# Imports\n", + "import warnings\n", + "warnings.filterwarnings(\"ignore\")\n", + "\n", + "import kfp\n", + "import kfp.dsl as dsl\n", + "from kfp.aws import use_aws_secret\n", + "from kfp.v2.dsl import (\n", + " component,\n", + " Input,\n", + " Output,\n", + " Dataset,\n", + " Metrics,\n", + " Artifact,\n", + " Model\n", + ")" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## 1. Connect to client\n", + "\n", + "The default way of accessing Kubeflow is via port-forward. This enables you to get started quickly without imposing any requirements on your environment. Run the following to port-forward Istio's Ingress-Gateway to local port `8080`:\n", + "\n", + "```sh\n", + "kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80\n", + "```" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d4b3d0fd", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:09.845815311Z", + "start_time": "2023-07-03T06:52:09.764499445Z" + } + }, + "outputs": [], + "source": [ + "import re\n", + "import requests\n", + "from urllib.parse import urlsplit\n", + "\n", + "def get_istio_auth_session(url: str, username: str, password: str) -> dict:\n", + " \"\"\"\n", + " Determine if the specified URL is secured by Dex and try to obtain a session cookie.\n", + " WARNING: only Dex `staticPasswords` and `LDAP` authentication are currently supported\n", + " (we default default to using `staticPasswords` if both are enabled)\n", + "\n", + " :param url: Kubeflow server URL, including protocol\n", + " :param username: Dex `staticPasswords` or `LDAP` username\n", + " :param password: Dex `staticPasswords` or `LDAP` password\n", + " :return: auth session information\n", + " \"\"\"\n", + " # define the default return object\n", + " auth_session = {\n", + " \"endpoint_url\": url, # KF endpoint URL\n", + " \"redirect_url\": None, # KF redirect URL, if applicable\n", + " \"dex_login_url\": None, # Dex login URL (for POST of credentials)\n", + " \"is_secured\": None, # True if KF endpoint is secured\n", + " \"session_cookie\": None # Resulting session cookies in the form \"key1=value1; key2=value2\"\n", + " }\n", + "\n", + " # use a persistent session (for cookies)\n", + " with requests.Session() as s:\n", + "\n", + " ################\n", + " # Determine if Endpoint is Secured\n", + " ################\n", + " resp = s.get(url, allow_redirects=True)\n", + " if resp.status_code != 200:\n", + " raise RuntimeError(\n", + " f\"HTTP status code '{resp.status_code}' for GET against: {url}\"\n", + " )\n", + "\n", + " auth_session[\"redirect_url\"] = resp.url\n", + "\n", + " # if we were NOT redirected, then the endpoint is UNSECURED\n", + " if len(resp.history) == 0:\n", + " auth_session[\"is_secured\"] = False\n", + " return auth_session\n", + " else:\n", + " auth_session[\"is_secured\"] = True\n", + "\n", + " ################\n", + " # Get Dex Login URL\n", + " ################\n", + " redirect_url_obj = urlsplit(auth_session[\"redirect_url\"])\n", + "\n", + " # if we are at `/auth?=xxxx` path, we need to select an auth type\n", + " if re.search(r\"/auth$\", redirect_url_obj.path):\n", + "\n", + " #######\n", + " # TIP: choose the default auth type by including ONE of the following\n", + " #######\n", + "\n", + " # OPTION 1: set \"staticPasswords\" as default auth type\n", + " redirect_url_obj = redirect_url_obj._replace(\n", + " path=re.sub(r\"/auth$\", \"/auth/local\", redirect_url_obj.path)\n", + " )\n", + " # OPTION 2: set \"ldap\" as default auth type\n", + " # redirect_url_obj = redirect_url_obj._replace(\n", + " # path=re.sub(r\"/auth$\", \"/auth/ldap\", redirect_url_obj.path)\n", + " # )\n", + "\n", + " # if we are at `/auth/xxxx/login` path, then no further action is needed (we can use it for login POST)\n", + " if re.search(r\"/auth/.*/login$\", redirect_url_obj.path):\n", + " auth_session[\"dex_login_url\"] = redirect_url_obj.geturl()\n", + "\n", + " # else, we need to be redirected to the actual login page\n", + " else:\n", + " # this GET should redirect us to the `/auth/xxxx/login` path\n", + " resp = s.get(redirect_url_obj.geturl(), allow_redirects=True)\n", + " if resp.status_code != 200:\n", + " raise RuntimeError(\n", + " f\"HTTP status code '{resp.status_code}' for GET against: {redirect_url_obj.geturl()}\"\n", + " )\n", + "\n", + " # set the login url\n", + " auth_session[\"dex_login_url\"] = resp.url\n", + "\n", + " ################\n", + " # Attempt Dex Login\n", + " ################\n", + " resp = s.post(\n", + " auth_session[\"dex_login_url\"],\n", + " data={\"login\": username, \"password\": password},\n", + " allow_redirects=True\n", + " )\n", + " if len(resp.history) == 0:\n", + " raise RuntimeError(\n", + " f\"Login credentials were probably invalid - \"\n", + " f\"No redirect after POST to: {auth_session['dex_login_url']}\"\n", + " )\n", + "\n", + " # store the session cookies in a \"key1=value1; key2=value2\" string\n", + " auth_session[\"session_cookie\"] = \"; \".join([f\"{c.name}={c.value}\" for c in s.cookies])\n", + "\n", + " return auth_session" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import kfp\n", + "\n", + "KUBEFLOW_ENDPOINT = \"http://localhost:8080\"\n", + "KUBEFLOW_USERNAME = \"user@example.com\"\n", + "KUBEFLOW_PASSWORD = \"12341234\"\n", + "\n", + "auth_session = get_istio_auth_session(\n", + " url=KUBEFLOW_ENDPOINT,\n", + " username=KUBEFLOW_USERNAME,\n", + " password=KUBEFLOW_PASSWORD\n", + ")\n", + "\n", + "client = kfp.Client(host=f\"{KUBEFLOW_ENDPOINT}/pipeline\", cookies=auth_session[\"session_cookie\"])\n", + "# print(client.list_experiments())" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "id": "a179fefb", + "metadata": {}, + "source": [ + "# Pull data" + ] + }, + { + "cell_type": "markdown", + "id": "879abbf8", + "metadata": {}, + "source": [ + "Here we create a KFP component, which uses Folktables functions to get the data and preprocess it into a suitable format for the prediction task. This data is then made into an artifact for further usage. " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "0bcfcfff", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:14.670519096Z", + "start_time": "2023-07-03T06:52:14.666471548Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"pandas~=1.4.2\",\"numpy\",\"folktables\"],\n", + " output_component_file='components/pull_data_component.yaml',\n", + ")\n", + "def pull_data(\n", + " state: str, \n", + " year: int, \n", + " data: Output[Dataset]\n", + "):\n", + " \"\"\"\n", + " Pull data component.\n", + " \"\"\"\n", + " import pandas as pd\n", + " import numpy as np\n", + " from folktables import ACSDataSource, ACSIncome\n", + " \n", + " pull_data_component_landmark = 'KFP_component'\n", + " \n", + " source = ACSDataSource(survey_year=year, horizon='1-Year', survey='person')\n", + " state_data = source.get_data(states=[state], download=True)\n", + " state_features, state_labels, _ = ACSIncome.df_to_pandas(state_data)\n", + " df = pd.concat([state_features,state_labels],axis=1)\n", + " df.to_csv(data.path, index=None)" + ] + }, + { + "cell_type": "markdown", + "id": "973f5aa1", + "metadata": {}, + "source": [ + "# Preprocess" + ] + }, + { + "cell_type": "markdown", + "id": "1f21ea37", + "metadata": {}, + "source": [ + "Here we create a component that changes sensitive attributes into binary columns using given thresholds, scales features (no sensitive attributes and predicted values), divides the data into three parts (train, test, indrift), and stores these parts as artifacts for later use." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "dadd5bf7", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:16.967863995Z", + "start_time": "2023-07-03T06:52:16.923690992Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"pandas~=1.4.2\", \"scikit-learn~=1.0.2\", \"numpy\"],\n", + " output_component_file='components/preprocess_component.yaml',\n", + ")\n", + "def preprocess(\n", + " data: Input[Dataset],\n", + " train_set: Output[Dataset],\n", + " test_set: Output[Dataset],\n", + " drift_set: Output[Dataset],\n", + " label_attribute: str,\n", + " sensitive_attributes: list,\n", + " splits: list,\n", + " group_thresholds: list\n", + "):\n", + " \"\"\"\n", + " Preprocess component.\n", + " \"\"\"\n", + " import pandas as pd\n", + " from sklearn.model_selection import train_test_split\n", + " from sklearn.preprocessing import StandardScaler\n", + " import numpy as np\n", + " import random\n", + " from itertools import islice\n", + " \n", + " preprocess_component_landmark = 'KFP_component'\n", + " \n", + " l_a = label_attribute\n", + " s_a = sensitive_attributes\n", + " g_t = group_thresholds\n", + " \n", + " data = pd.read_csv(data.path)\n", + " \n", + " attribute_amount = len(s_a) + 1\n", + " non_scalable_attribute_identity = []\n", + " non_scalable_attribute_values = []\n", + "\n", + " for i in range(0,attribute_amount):\n", + " if i == attribute_amount-1:\n", + " non_scalable_attribute_identity.append(l_a)\n", + " non_scalable_attribute_values.append(data[l_a].astype(int))\n", + " del data[l_a]\n", + " continue\n", + "\n", + " values = data[s_a[i]].copy()\n", + " values[values <= g_t[i]] = 1\n", + " values[values > g_t[i]] = 0\n", + " non_scalable_attribute_identity.append(s_a[i])\n", + " non_scalable_attribute_values.append(values)\n", + " del data[s_a[i]]\n", + "\n", + " scaler = StandardScaler()\n", + " bin_df = pd.DataFrame(scaler.fit_transform(data), \n", + " columns=data.columns) \n", + " \n", + " index = 0\n", + " for name in non_scalable_attribute_identity:\n", + " bin_df[name] = np.array(non_scalable_attribute_values[index]).astype(int)\n", + " index = index + 1\n", + "\n", + " index = np.array(bin_df.index)\n", + " random.seed(42)\n", + " random.shuffle(index)\n", + "\n", + " amounts = [round(bin_df.shape[0]*splits[0]), \n", + " round(bin_df.shape[0]*splits[1]), \n", + " round(bin_df.shape[0]*splits[2])]\n", + "\n", + " if bin_df.shape[0] < sum(amounts):\n", + " amounts[4] = amounts[4]+(bin_df.shape[0]-sum(amounts)) \n", + "\n", + " it = iter(index)\n", + "\n", + " sliced = [list(islice(it, 0, i)) for i in amounts]\n", + "\n", + " train_data = bin_df.loc[sliced[0]]\n", + " test_data = bin_df.loc[sliced[1]]\n", + " drift_data = bin_df.loc[sliced[2]]\n", + "\n", + " train_data.to_csv(train_set.path, index=None)\n", + " test_data.to_csv(test_set.path, index=None)\n", + " drift_data.to_csv(drift_set.path, index=None)" + ] + }, + { + "cell_type": "markdown", + "id": "89cf0229", + "metadata": {}, + "source": [ + "# Train" + ] + }, + { + "cell_type": "markdown", + "id": "651da6a2", + "metadata": {}, + "source": [ + "Here we create a component that defines the used metrics, makes the datasets suitable for AIF360 metrics (with a favorable label of 1 and unfavorable label of 0), trains a logistic regression model, calculates metrics, and stores these metrics into MLflow for model comparison. Some of the code is reused from the original OSS pipeline. The used fairness metrics are statistical parity, disparate impact, equal odds difference, average odds difference, and theil index fairness metrics, which are the most common in the provided AIF360 tutorials. Notice that there are fewer dataset metrics than model metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "7b400201", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:20.296259113Z", + "start_time": "2023-07-03T06:52:20.243432204Z" + } + }, + "outputs": [], + "source": [ + "from typing import NamedTuple\n", + "\n", + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"numpy\", \n", + " \"pandas~=1.4.2\",\n", + " \"aif360\",\n", + " \"scikit-learn~=1.0.2\", \n", + " \"mlflow~=1.25.0\", \n", + " \"boto3~=1.21.0\"],\n", + " output_component_file='components/train_component.yaml',\n", + ")\n", + "def train(\n", + " train_set: Input[Dataset],\n", + " test_set: Input[Dataset],\n", + " saved_model: Output[Model],\n", + " model_name: str,\n", + " label_attribute: str,\n", + " sensitive_attributes: list,\n", + " privilaged_groups: list,\n", + " unprivilaged_groups: list,\n", + " mlflow_experiment_name: str,\n", + " mlflow_tracking_uri: str,\n", + " mlflow_s3_endpoint_url: str\n", + ") -> NamedTuple(\"Output\", [('storage_uri', str), ('run_id', str),]):\n", + " \"\"\"\n", + " Train component.\n", + " \"\"\"\n", + " import numpy as np\n", + " import pandas as pd\n", + " from sklearn.linear_model import LogisticRegression\n", + " \n", + " from aif360.datasets import BinaryLabelDataset\n", + " from aif360.metrics import BinaryLabelDatasetMetric\n", + " from aif360.metrics import ClassificationMetric\n", + " from sklearn.metrics import accuracy_score,confusion_matrix,precision_score,recall_score,f1_score\n", + " \n", + " import mlflow\n", + " import mlflow.sklearn\n", + " import os\n", + " import logging\n", + " import pickle\n", + " from collections import namedtuple\n", + " \n", + " train_component_landmark = 'KFP_component'\n", + " \n", + " l_a = label_attribute\n", + " s_a = sensitive_attributes\n", + " p_g = privilaged_groups\n", + " u_g = unprivilaged_groups\n", + "\n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + " \n", + " def dataset_fairness(dataset,p_g,u_g):\n", + " metrics_list = []\n", + " dataset_metrics = BinaryLabelDatasetMetric(dataset, \n", + " privileged_groups = p_g,\n", + " unprivileged_groups = u_g)\n", + " \n", + " SP = dataset_metrics.mean_difference()\n", + " DI = dataset_metrics.disparate_impact()\n", + " \n", + " # Statistical parity\n", + " metrics_list.append({'name': 'D_SP', \n", + " 'value': SP })\n", + " \n", + " # Disparate impact\n", + " metrics_list.append({'name': 'D_DI', \n", + " 'value': DI })\n", + " \n", + " return metrics_list\n", + " \n", + " def model_metrics(dataset, pred, p_g, u_g):\n", + " metrics_list = []\n", + " \n", + " Acc = accuracy_score(dataset.labels, pred)\n", + " \n", + " # Accuracy\n", + " metrics_list.append({'name': 'M_Acc', \n", + " 'value': Acc })\n", + " \n", + " matrix = confusion_matrix(dataset.labels, pred)\n", + " \n", + " # True positives\n", + " metrics_list.append({'name': 'M_TP', \n", + " 'value': matrix[0][0]})\n", + " # False positives\n", + " metrics_list.append({'name': 'M_FP', \n", + " 'value': matrix[0][1]})\n", + " # False negatives\n", + " metrics_list.append({'name': 'M_FN', \n", + " 'value': matrix[1][0]})\n", + " # True negatives\n", + " metrics_list.append({'name': 'M_TN', \n", + " 'value': matrix[1][1]})\n", + " \n", + " dataset_pred = dataset.copy()\n", + " dataset_pred.labels = pred\n", + " \n", + " model_metrics = ClassificationMetric(\n", + " dataset,\n", + " dataset_pred,\n", + " privileged_groups = p_g,\n", + " unprivileged_groups = u_g)\n", + " \n", + " BA = (model_metrics.true_positive_rate() + model_metrics.true_negative_rate()) / 2\n", + " \n", + " # Balanced accuracy\n", + " metrics_list.append({'name': 'M_BA', \n", + " 'value': BA}) \n", + " \n", + " SP = model_metrics.mean_difference()\n", + " DI = model_metrics.disparate_impact()\n", + " AOD = model_metrics.average_odds_difference()\n", + " EOD = model_metrics.equal_opportunity_difference()\n", + " TI = model_metrics.theil_index()\n", + " \n", + " # Statistical parity\n", + " metrics_list.append({'name': 'M_SP', \n", + " 'value': SP})\n", + " # Disparate impact\n", + " metrics_list.append({'name': 'M_DI', \n", + " 'value': DI})\n", + " \n", + " # Average odds difference\n", + " metrics_list.append({'name': 'M_AOD', \n", + " 'value': AOD})\n", + " \n", + " # Equal oppoturnity difference\n", + " metrics_list.append({'name': 'M_EOD', \n", + " 'value': EOD})\n", + " \n", + " # Theil index\n", + " metrics_list.append({'name': 'M_TI', \n", + " 'value': TI})\n", + " \n", + " return metrics_list\n", + " \n", + " os.environ['MLFLOW_S3_ENDPOINT_URL'] = mlflow_s3_endpoint_url\n", + "\n", + " # load data\n", + " logger.info(\"Setting up data\")\n", + " train_data = pd.read_csv(train_set.path)\n", + " test_data = pd.read_csv(test_set.path)\n", + " \n", + " train = BinaryLabelDataset(\n", + " favorable_label = 1,\n", + " unfavorable_label = 0,\n", + " df = train_data,\n", + " label_names = [l_a],\n", + " protected_attribute_names = s_a)\n", + "\n", + " test = BinaryLabelDataset(\n", + " favorable_label = 1,\n", + " unfavorable_label = 0,\n", + " df = test_data,\n", + " label_names = [l_a],\n", + " protected_attribute_names= s_a)\n", + " \n", + " logger.info(\"Checking training and test data fairness\")\n", + " train_fairness = dataset_fairness(train,p_g,u_g)\n", + " test_fairness = dataset_fairness(test,p_g,u_g)\n", + "\n", + " # The predicted column is \"Target\" which is either 0 or 1\n", + " train_x = train.features \n", + " test_x = test.features \n", + " train_y = train.labels \n", + " test_y = test.labels \n", + " \n", + " logger.info(f\"Using MLflow tracking URI: {mlflow_tracking_uri}\")\n", + " mlflow.set_tracking_uri(mlflow_tracking_uri)\n", + "\n", + " logger.info(f\"Using MLflow experiment: {mlflow_experiment_name}\")\n", + " mlflow.set_experiment(mlflow_experiment_name)\n", + "\n", + " with mlflow.start_run() as run:\n", + "\n", + " run_id = run.info.run_id\n", + " logger.info(f\"Run ID: {run_id}\")\n", + "\n", + " model = LogisticRegression(random_state=42)\n", + " \n", + " logger.info(\"Fitting model...\")\n", + " model.fit(train_x, train_y)\n", + "\n", + " logger.info(\"Predicting...\")\n", + " \n", + " predicted_qualities = model.predict(test_x)\n", + " \n", + " model_metrics = model_metrics(test, predicted_qualities, p_g, u_g)\n", + " \n", + " logger.info(\"Logging training data metrics to MLflow\")\n", + " for pair in train_fairness:\n", + " name = 'Tr_' + pair['name']\n", + " mlflow.log_metric(name, pair['value'])\n", + " \n", + " logger.info(\"Logging test data metrics to MLflow\")\n", + " for pair in test_fairness:\n", + " name = 'Te_' + pair['name']\n", + " mlflow.log_metric(name, pair['value'])\n", + " \n", + " logger.info(\"Logging model metrics to MLflow\")\n", + " for pair in model_metrics:\n", + " mlflow.log_metric(pair['name'], pair['value'])\n", + " \n", + " # save model to mlflow\n", + " logger.info(\"Logging trained model\")\n", + " mlflow.sklearn.log_model(\n", + " model,\n", + " model_name,\n", + " registered_model_name=\"USCensusLR\",\n", + " serialization_format=\"pickle\"\n", + " )\n", + "\n", + " logger.info(\"Logging predictions artifact to MLflow\")\n", + " np.save(\"predictions.npy\", predicted_qualities)\n", + " mlflow.log_artifact(\n", + " local_path=\"predictions.npy\", artifact_path=\"predicted_qualities/\"\n", + " )\n", + "\n", + " # save model as KFP artifact\n", + " logging.info(f\"Saving model to: {saved_model.path}\")\n", + " with open(saved_model.path, 'wb') as fp:\n", + " pickle.dump(model, fp, pickle.HIGHEST_PROTOCOL)\n", + "\n", + " # prepare output\n", + " output = namedtuple('Output', ['storage_uri', 'run_id'])\n", + "\n", + " # return str(mlflow.get_artifact_uri())\n", + " return output(mlflow.get_artifact_uri(), run_id)" + ] + }, + { + "cell_type": "markdown", + "id": "6bd87b60", + "metadata": {}, + "source": [ + "# Evaluate" + ] + }, + { + "cell_type": "markdown", + "id": "1d77dad7", + "metadata": {}, + "source": [ + "Here we define a component, which gets the stored metrics, pushes these into a Prometheus gateway, and evaluates these metrics with given thresholds before going to the next phase. The Prometheus gateway is a ready-made component of the OSS pipeline, which Prometheus will scrape by providing the correct gateway URL. Prometheus is set up with a slightly modified YAML configuration to alert when accuracy and fairness metrics exceed given thresholds. Prometheus also enables Grafana to easily visualize the given metrics to provide a general overview of the deployed model and the cluster. " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "c86e08c1", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:43.016224090Z", + "start_time": "2023-07-03T06:52:42.961087479Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"numpy\", \"mlflow~=1.25.0\", \"prometheus_client\"],\n", + " output_component_file='components/evaluate_component.yaml',\n", + ")\n", + "def evaluate(\n", + " run_id: str,\n", + " mlflow_tracking_uri: str,\n", + " threshold_metrics: dict\n", + ") -> bool:\n", + " \"\"\"\n", + " Evaluate component: Compares metrics from training with given thresholds.\n", + "\n", + " Args:\n", + " run_id (string): MLflow run ID\n", + " mlflow_tracking_uri (string): MLflow tracking URI\n", + " threshold_metrics (dict): Minimum threshold values for each metric\n", + " Returns:\n", + " Bool indicating whether evaluation passed or failed.\n", + " \"\"\"\n", + " from mlflow.tracking import MlflowClient\n", + " from prometheus_client import CollectorRegistry, Gauge, push_to_gateway\n", + " import requests\n", + " import json\n", + " import logging\n", + " \n", + " evaluate_component_landmark = 'KFP_component'\n", + " \n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + "\n", + " client = MlflowClient(tracking_uri=mlflow_tracking_uri)\n", + " info = client.get_run(run_id)\n", + " training_metrics = info.data.metrics\n", + "\n", + " logger.info(f\"Training metrics: {training_metrics}\")\n", + " \n", + " registry = CollectorRegistry()\n", + " url = 'http://prometheus-pushgateway.monitoring.svc.cluster.local:9091'\n", + " for key, value in training_metrics.items():\n", + " metric = Gauge(key, 'Metric', registry = registry)\n", + " metric.set(value)\n", + " push_to_gateway(url, job = 'Metrics', registry = registry)\n", + " \n", + " # compare the evaluation metrics with the defined thresholds\n", + " for key, value in threshold_metrics.items():\n", + " if (key not in training_metrics) or (training_metrics[key] < value):\n", + " logger.error(f\"Metric {key} failed. Evaluation not passed!\")\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "id": "138a77e2", + "metadata": {}, + "source": [ + "# Deploy model" + ] + }, + { + "cell_type": "markdown", + "id": "90a5cdfc", + "metadata": {}, + "source": [ + "Here we define a component that deploys the passed model into an inference service. This component is exactly similar to the original OSS demo pipeline deploy model component." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ef37a9bc", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:52.985034452Z", + "start_time": "2023-07-03T06:52:52.969987786Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.9\",\n", + " packages_to_install=[\"kserve\"],\n", + " output_component_file='components/deploy_model_component.yaml',\n", + ")\n", + "def deploy_model(model_name: str, storage_uri: str):\n", + " \"\"\"\n", + " Deploy the model as a inference service with Kserve.\n", + " \"\"\"\n", + " from kubernetes import client\n", + " from kserve import KServeClient\n", + " from kserve import constants\n", + " from kserve import utils\n", + " from kserve import V1beta1InferenceService\n", + " from kserve import V1beta1InferenceServiceSpec\n", + " from kserve import V1beta1PredictorSpec\n", + " from kserve import V1beta1SKLearnSpec\n", + " import logging\n", + " \n", + " deploy_model_component_landmark = 'KFP_component'\n", + " \n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + " \n", + " model_uri = f\"{storage_uri}/{model_name}\"\n", + " logger.info(\"MODEL URI:\", model_uri)\n", + "\n", + " namespace = utils.get_default_target_namespace()\n", + " kserve_version='v1beta1'\n", + " api_version = constants.KSERVE_GROUP + '/' + kserve_version\n", + "\n", + " isvc = V1beta1InferenceService(\n", + " api_version=api_version,\n", + " kind=constants.KSERVE_KIND,\n", + " metadata=client.V1ObjectMeta(\n", + " name=model_name,\n", + " namespace=namespace,\n", + " annotations={'sidecar.istio.io/inject':'false'}\n", + " ),\n", + " spec=V1beta1InferenceServiceSpec(\n", + " predictor=V1beta1PredictorSpec(\n", + " service_account_name=\"kserve-sa\",\n", + " sklearn=V1beta1SKLearnSpec(\n", + " storage_uri=model_uri\n", + " )\n", + " )\n", + " )\n", + " )\n", + " KServe = KServeClient()\n", + " KServe.create(isvc)" + ] + }, + { + "cell_type": "markdown", + "id": "7fb274bf", + "metadata": {}, + "source": [ + "# Inference" + ] + }, + { + "cell_type": "markdown", + "id": "8e3cad7e", + "metadata": {}, + "source": [ + "Here we define a component that tests out the deployed inference service. The only difference between this and the original OSS pipeline component is that this gives two already preprocessed samples for simplicity." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "b90d1839", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:52:58.534123083Z", + "start_time": "2023-07-03T06:52:58.519744724Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.9\", # kserve on python 3.10 comes with a dependency that fails to get installed\n", + " packages_to_install=[\"kserve\", \"scikit-learn~=1.0.2\"],\n", + " output_component_file='components/inference_component.yaml',\n", + ")\n", + "def inference(\n", + " model_name: str\n", + "):\n", + " \"\"\"\n", + " Test inference.\n", + " \"\"\"\n", + " from kserve import KServeClient\n", + " from kserve import utils\n", + " import requests\n", + " import pickle\n", + " import logging\n", + " \n", + " inference_component_landmark = 'KFP_component'\n", + " \n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + "\n", + " namespace = utils.get_default_target_namespace()\n", + " \n", + " input_sample = [\n", + " [-0.07237661, 0.92825499, 1.31739921, -1.31196944, -0.73834281,\n", + " -0.09041844, 0.18720611, 1. , 0. , 1. ],\n", + " [-0.60712221, -1.55312352, 1.31739921, 0.05298942, 1.63138481,\n", + " -0.54927344, 0.18720611, 1. , 0. , 1. ]]\n", + " \n", + " # get inference service\n", + " KServe = KServeClient()\n", + "\n", + " # wait for deployment to be ready\n", + " KServe.get(model_name, namespace=namespace, watch=True, timeout_seconds=120)\n", + "\n", + " inference_service = KServe.get(model_name, namespace=namespace)\n", + " is_url = inference_service['status']['address']['url']\n", + "\n", + " logger.info(f\"\\nInference service status:\\n{inference_service['status']}\")\n", + " logger.info(f\"\\nInference service URL:\\n{is_url}\\n\")\n", + " \n", + " inference_input = {\n", + " 'instances': input_sample\n", + " }\n", + "\n", + " response = requests.post(is_url, json=inference_input)\n", + " logger.info(f\"\\nPrediction response:\\n{response.text}\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "82fb4b46", + "metadata": {}, + "source": [ + "# Pipeline" + ] + }, + { + "cell_type": "markdown", + "id": "43abfd0f", + "metadata": {}, + "source": [ + "The code below uses the previously defined components to create a KFP pipeline. This code modifies the original OSS pipeline by giving new variables state, year, lable_attribute, sensitive_attributes, splits, privilaged_groups, unprivilaged_groups, and group_thresholds for pull, preprocess, and train phases to use the new code." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fa31281b", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:53:01.116169461Z", + "start_time": "2023-07-03T06:53:01.105292322Z" + } + }, + "outputs": [], + "source": [ + "@dsl.pipeline(\n", + " name='aif-pipeline',\n", + " description='An single run pipeline for UCI Adult prediciton task',\n", + ")\n", + "def pipeline(\n", + " state: str,\n", + " year: int,\n", + " label_attribute: str,\n", + " sensitive_attributes: list,\n", + " splits: list,\n", + " privilaged_groups: list,\n", + " unprivilaged_groups: list,\n", + " group_thresholds: list,\n", + " mlflow_experiment_name: str,\n", + " mlflow_tracking_uri: str,\n", + " mlflow_s3_endpoint_url: str,\n", + " model_name: str,\n", + " threshold_metrics: dict\n", + "):\n", + " \"\"\"\n", + " pipeline component.\n", + " \"\"\"\n", + " pipeline_landmark = 'KFP_pipeline'\n", + " \n", + " pull_task = pull_data(state = state, year = year)\n", + "\n", + " preprocess_task = preprocess(data=pull_task.outputs[\"data\"],\n", + " label_attribute = label_attribute,\n", + " sensitive_attributes = sensitive_attributes,\n", + " splits = splits, group_thresholds = group_thresholds)\n", + "\n", + " train_task = train(\n", + " train_set = preprocess_task.outputs[\"train_set\"],\n", + " test_set = preprocess_task.outputs[\"test_set\"],\n", + " mlflow_experiment_name=mlflow_experiment_name,\n", + " mlflow_tracking_uri=mlflow_tracking_uri,\n", + " mlflow_s3_endpoint_url=mlflow_s3_endpoint_url,\n", + " model_name=model_name,\n", + " label_attribute = label_attribute,\n", + " sensitive_attributes = sensitive_attributes,\n", + " privilaged_groups = privilaged_groups,\n", + " unprivilaged_groups = unprivilaged_groups\n", + " )\n", + " \n", + " train_task.apply(use_aws_secret(secret_name=\"aws-secret\"))\n", + "\n", + " evaluate_trask = evaluate(\n", + " run_id=train_task.outputs[\"run_id\"],\n", + " mlflow_tracking_uri=mlflow_tracking_uri,\n", + " threshold_metrics=threshold_metrics\n", + " )\n", + " \n", + " eval_passed = evaluate_trask.output\n", + "\n", + " with dsl.Condition(eval_passed == \"true\"):\n", + " deploy_model_task = deploy_model(\n", + " model_name=model_name,\n", + " storage_uri=train_task.outputs[\"storage_uri\"],\n", + " )\n", + "\n", + " inference_task = inference(\n", + " model_name=model_name\n", + " )\n", + " \n", + " inference_task.after(deploy_model_task)" + ] + }, + { + "cell_type": "markdown", + "id": "fab72374", + "metadata": {}, + "source": [ + "# Arguments" + ] + }, + { + "cell_type": "markdown", + "id": "5fecf692", + "metadata": {}, + "source": [ + "Here we define the arguments used by the pipeline. Notice how the data is split with a 0.5-0.3-0.2 ratio into train, test, and indrift datasets. We also see how AIF360 defines compared groups in privilaged_groups and unprivilaged_groups values. We can change the state and year if we want this pipeline to use other PUMS data. Just pick any state from seen here https://www.bls.gov/respondents/mwr/electronic-data-interchange/appendix-d-usps-state-abbreviations-and-fips-codes.htm and choose between a year in the range [2014,2018]. \n", + "\n", + "However, the memory requirements will increase since folktables will download the data, and then KFP needs to make a new artifact. It might also be sometimes the case that KFP will create separate artifacts for the same data, but this has not happened in this pipeline setup. If you want to check created artifacts, go to the KFP dashboard, and click artifacts and unknown.\n", + "\n", + "If the pipeline, as seen in the KFP dashboard, gets an error in the inference step, you must either change the model_name given in the arguments into something else or remove some of the existing inference services. The latter can be done by writing 'kubectl get isvc -n kserve-inference' and using the listed names in the 'kubectl -n kserve-inference delete isvc (model_name)'. You can check these and other existing services with 'Kubectl get services -A'. " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "14ce0672", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:53:03.397382274Z", + "start_time": "2023-07-03T06:53:03.391544911Z" + } + }, + "outputs": [], + "source": [ + "# If we want the pipeline obey certain thresholds, we can set them here\n", + "eval_threshold_metrics = {'M_Acc': 0.60}\n", + "\n", + "arguments = {\n", + " \"state\": \"CA\",\n", + " \"year\": 2014,\n", + " \"label_attribute\": \"PINCP\",\n", + " \"sensitive_attributes\": [\"AGEP\",\"SEX\",\"RAC1P\"],\n", + " \"splits\": [0.5,0.3,0.2],\n", + " \"group_thresholds\": [50,1,1],\n", + " \"privilaged_groups\": [{\"AGEP\":1, \"SEX\": 1, \"RAC1P\": 1}],\n", + " \"unprivilaged_groups\": [{\"AGEP\":0, \"SEX\": 0, \"RAC1P\": 0}],\n", + " \"mlflow_tracking_uri\": \"http://mlflow.mlflow.svc.cluster.local:5000\",\n", + " \"mlflow_s3_endpoint_url\": \"http://mlflow-minio-service.mlflow.svc.cluster.local:9000\",\n", + " \"mlflow_experiment_name\": \"demo-aif-notebook\",\n", + " \"model_name\": \"demo-aif-lr\",\n", + " \"threshold_metrics\": eval_threshold_metrics\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7faba84c", + "metadata": {}, + "source": [ + "# Submit run" + ] + }, + { + "cell_type": "markdown", + "id": "7fdd08ad", + "metadata": {}, + "source": [ + "This block enables running the constructed pipeline. If you want to update the pipeline, rerun the code you have changed and then rerun this block. For some reason, running this KFP pipeline creates temporary mystery files with a size range of [2,5] GB, which enable_caching, metadata writing, artifacts, or in optimal docker configuration might cause. On their own, these files are not a problem, but if a computer has a set memory space of 105 GB for the root file system that has around 20GB free space left due to it being the default place for all kinds of software, the user needs to either manually delete these files (possible places for these files in ubuntu 22.04 are /tmp and /run based on date modification) or restart the computer after running KFP around 4-5 times to continue rerunning the pipeline. Thus, checking your computer's memory before and after running the KFP pipeline is recommended to prevent possible mistakes. It might also be worthwhile to check the amount of memory Minio is using with the following port forwards:\n", + "\n", + "- MLFlow Minio = kubectl -n mlflow port-forward svc/mlflow-minio-service 9001:9001 (user and password is minioadmin)\n", + "- KFP Minio = kubectl port-forward -n kubeflow svc/minio-service 9000:9000 (user is minio and password minio123)\n", + "\n", + "The localhost URLs are:\n", + "\n", + "- MLFlow Minio = http://localhost:9001/\n", + "- KFP Minio = http://localhost:9000/" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "2d98ed2b", + "metadata": { + "ExecuteTime": { + "end_time": "2023-07-03T06:53:06.478490268Z", + "start_time": "2023-07-03T06:53:06.091486547Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": "", + "text/html": "Experiment details." + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": "", + "text/html": "Run details." + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": "RunPipelineResult(run_id=94fdfe7d-41f5-4e77-b9b3-80395abbca38)" + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "run_name = \"demo-aif-run\"\n", + "experiment_name = \"demo-aif-experiment\"\n", + "\n", + "client.create_run_from_pipeline_func(\n", + " pipeline_func=pipeline,\n", + " run_name=run_name,\n", + " experiment_name=experiment_name,\n", + " arguments=arguments,\n", + " mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE,\n", + " enable_caching=True,\n", + " namespace=\"kubeflow-user-example-com\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "8fb432f9", + "metadata": {}, + "source": [ + "# Demonstration confirmation" + ] + }, + { + "cell_type": "markdown", + "id": "55974312", + "metadata": {}, + "source": [ + "If the previous block did not create any errors, check how the run goes by port forwarding KFP, MLFlow, Pushgateway, Prometheus, and Grafana. Since KFP and Prometheus ports are the same, I recommend first waiting for the pipeline to run in the KFP dashboard, shutting it down, and then port forwarding Prometheus. Same for Grafana, since it uses the same port as MLFlow. Here are the commands required for these port forwards:\n", + "\n", + "- KFP = kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80\n", + "- MLFlow = kubectl -n mlflow port-forward svc/mlflow 5000:5000\n", + "- Pushgateway = kubectl port-forward svc/prometheus-pushgateway 9091 -n monitoring\n", + "- Promtheus = kubectl port-forward svc/prometheus-service 8080 -n monitoring\n", + "- Grafana = kubectl port-forward svc/grafana 5000:3000 --namespace monitoring\n", + "\n", + "The localhosts URLs are:\n", + "\n", + "- KFP = http://localhost:8080\n", + "- MLFlow = http://localhost:5000/#/\n", + "- Pushgateway = http://localhost:9091/\n", + "- Prometheus = http://localhost:8080/alerts\n", + "- Grafana = http://localhost:5000/ (user and password are admin)\n", + "\n", + "A sign that everything went fine with the pipeline is that the experiment 'demo-aif-run' found in runs is green in KFP, the metrics results of the notebook 'demo-aif-notebook' shows numbers in all of the columns in MLFlow, pushgateway has a job named metrics with scores for all the cases seen in the code, Prometheus alerts are working, and grafana dashboard shows different numbers. Demonstration-wise, you have reached the end, but we will still review Scaphandre metrics and demonstration debugging. " + ] + }, + { + "cell_type": "markdown", + "id": "9ddc36d9", + "metadata": {}, + "source": [ + "# Scaphandre metrics" + ] + }, + { + "cell_type": "markdown", + "id": "951b7aed", + "metadata": {}, + "source": [ + "The relevant energy consumption metrics in Prometheus and Grafana queries, as described in https://hubblo-org.github.io/scaphandre-documentation/references/exporter-prometheus.html, are:\n", + "\n", + "- scaph_host_power_microwatts\n", + "- scaph_process_power_consumption_microwatts (This shows all cluster processes)\n", + "- scaph_host_energy_microjoule\n", + "- scaph_socket_power_microwatts\n", + "\n", + "We can filter these metrics by giving suitable labels inside {} and using regex. For example, if we want to get the power consumption of the whole cluster in watts, we need to write the query 'sum(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\"}) / 1000000' either in Prometheus or Grafana. The label \"Helm\" is used because the same metrics are duplicated in the current configuration, while 1000000 changes microwatts into watts.\n", + "\n", + "If we want to get more granular, like the enegry consumption of Prometheus, we need to write the query scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", exe=\"prometheus\"} / 1000000. Similarly, we can get the KFP (except for inference steps) power consumption with the query sum(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline =~\".*kfp.*\"}) / 1000000 by using regex.\n", + "\n", + "Other interesting labels like kubernetes_namespace only show default because Scaphandre isn't most likely configured to get the cluster namespaces. The instance label could provide even better granularity if the cluster IP addresses are stable. The most specific label is PID, which is, unfortunately, unique for all processes. There are other ways through the cmdline, which allows specifying KFP components (except for inference steps) by putting landmark variables. So, for example, the power consumption of running the code for every component execpt inference can be queried with sum(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline =~\".*landmark.*\"}) / 1000000. By giving more specifics, we only need the following filters for all components expect inference:\n", + "\n", + "- cmdline=~\".*(pull_data_component|python3-mpipinstall--quiet--no-warn-script-locationpandas~=1.4.2numpyfolktableskfp==1.8.22).*\"\n", + "- cmdline=~\".*(preprocess_component|python3-mpipinstall--quiet--no-warn-script-locationpandas~=1.4.2scikit-learn~=1.0.2numpykfp==1.8.22).*\"\n", + "- Train = cmdline=~\".*(train_component|python3-mpipinstall--quiet--no-warn-script-locationnumpypandas~=1.4.2aif360scikit-learn~=1.0.2mlflow~=1.25.0boto3~=1.21.0kfp==1.8.22).*\"\n", + "- Evaluate = cmdline=~\".*(evaluate_component|python3-mpipinstall--quiet--no-warn-script-locationnumpymlflow~=1.25.0prometheus_clientkfp==1.8.22).*\"\n", + "- Deploy = cmdline=~\".*(deploy_model_component|python3-mpipinstall--quiet--no-warn-script-locationkservekfp==1.8.22).*\"\n", + "\n", + "It is unknown why this matching technique does not work for inference services. We must first use regex negations to isolate KFP pipeline process runs. This can be done with the following filters:\n", + "\n", + "- app_kubernetes_io_managed_by=\"Helm\"\n", + "- cmdline!~\".*(bin/|app/|conf/|--loglevelinfo|scaphandreprometheus--port8081).*\" \n", + "- exe!~\".*(containerd-shim|nginx|postgres|sleep|workflow-contro|pause|minio|grafana-server|systemd-journal|manager|etcd|kube-apiserver|kube-controller|kube-scheduler|local-path-prov|mysqld|node|persistence_age).*\"\n", + "\n", + "Now, we can add further negations to isolate inference service related processes with the following filters:\n", + "\n", + "- app_kubernetes_io_managed_by=\"Helm\"\n", + "- cmdline!~\".*(bin/|app/|conf/|--loglevelinfo|scaphandreprometheus--port8081|preprocess|pull_data|train|evaluate|deploy_model|numpy|python3-mpipinstall--quiet--no-warn-script-locationkservekfp==1.8.22|metadata|msklearnserver|python3server.py).*\"\n", + "- exe!~\".*(containerd-shim|nginx|postgres|sleep|workflow-contro|pause|minio|grafana-server|systemd-journal|manager|etcd|kube-apiserver|kube-controller|kube-scheduler|local-path-prov|mysqld|node|persistence_age).*\"\n", + "\n", + "Unfortunately, these filters are not agnostic. They use code-specific regex matches, which means the substring negations must be updated for modified code. This problem can be fixed by properly configuring Scaphander into the cluster because it is currently in an awkward place due to requiring manual actions and not being able to get more relevant label metadata. Additionally, in the Grafana plots the sum(sum_over_time({}[1d])) / (1000000 * 1000 * 24) only approximates the cumulated daily energy consumption per hour without thinking about possible downtime, which is why sum_over_time(avg_over_time(avg())) might be a better option." + ] + }, + { + "cell_type": "markdown", + "id": "6b624073", + "metadata": {}, + "source": [ + "# Demonstration debugging" + ] + }, + { + "cell_type": "markdown", + "id": "d9f69b5e", + "metadata": {}, + "source": [ + "If the run did create some errors, the first place to check for errors is the KFP logs for the components, which can be found by going into the runs, clicking the latest experiment, clicking the component with a red error, and finding its log tab. Usually, the error is caused by a coding error, so after fixing it, rerun the modified components and start the experiment.\n", + "\n", + "If this doesn't solve the issue, use the tests for the cluster to check if the parts are green. Notice that the cluster is fine as long as the component test gives cluster ready and the passed amount of tests is either 36 or 37. The latter case can be caused by the website storing the test data being down. There might also be other errors, but as long as KFP is capable of running, these can be ignored. The tests are: \n", + "\n", + "- python tests/wait_deployment_ready.py --timeout 30 (virtual enviroment recommended and OSS root directory)\n", + "- pytest (virtual enviroment recommended)\n", + "\n", + "If not, removing the cluster and reinstalling everything is usually easier. A more surgical approach is to use kubectl to check the logs of pods in the given namespace and configure used YAMLs to fix the issue. When the modified YAMLs have been saved, just apply them and then rollout restart the pods. It is recommended to stop any dashboards before doing this. The required commands cluster removal, and kubectl fixing are:\n", + "\n", + "Cluster removal:\n", + "- Cluster deletion = kind delete cluster --name kind-ep\n", + "- Registry deletion = docker rm -f $(docker ps -aqf \"name=kind-registry\")\n", + "\n", + "Optional docker clean up:\n", + "- Show docker configuration = docker info\n", + "- Show containers = docker ps\n", + "- Show all containers = docker ps -a\n", + "- Delete containers = docker system prune (Be specific if you have other containers)\n", + "- Show all images = docker images -a\n", + "- Remove all images = docker image prune -a (Be specific if you have other images)\n", + "\n", + "Kubectl fixing\n", + "- Show pods of monitoring namespace = kubectl get pods -n monitoring\n", + "- Show deployment of monitoring namespace kubectl get deployment -n monitoring\n", + "- Apply YAMLs for monitoring = kubectl apply -k deployment/monitoring (OSS root directory)\n", + "- Show logs of a pod in monitoring namespace = kubectl logs (pod ID) -n monitoring\n", + "- Restart prometheus of monitoring namespace = kubectl rollout restart deployment prometheus-deployment -n monitoring\n", + "\n", + "As a final note, it is highly recommended that the Docker root directory (Docker Root Dir in docker info) is located in a place that does not take memory from critical programs like the operating system. Docker and KFP produce a lot of data, which in the worst case take so much memory that the OS cannot start up normally without the help of IT support. To lessen the impact of these mishaps, it's always good practice to have updated backups that work in GitHub or any other cloud service of your choice." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b7882f83", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/fairness-alert-prometheus-config-map.yaml b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/fairness-alert-prometheus-config-map.yaml new file mode 100644 index 0000000..fa140c4 --- /dev/null +++ b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/fairness-alert-prometheus-config-map.yaml @@ -0,0 +1,283 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: prometheus-server-conf + labels: + name: prometheus-server-conf +data: + prometheus.rules: |- + groups: + - name: devopscube demo alert + rules: + - alert: High Pod Memory + expr: sum(container_memory_usage_bytes) > 1 + for: 1m + labels: + severity: slack + annotations: + summary: High Memory Usage + - alert: Training Data Disparate Impact + expr: Tr_D_DI{job="kubernetes-service-endpoints"} < 0.8 or Tr_D_DI{job="kubernetes-service-endpoints"} > 1.2 + for: 1m + labels: + severity: slack + annotations: + summary: Disparate impact unfairness + - alert: Training Data Statistical Parity + expr: Tr_D_SP{job="kubernetes-service-endpoints"} < -0.2 or Tr_D_SP{job="kubernetes-service-endpoints"} > 0.2 + for: 1m + labels: + severity: slack + annotations: + summary: Statistical parity unfairness + - alert: Test Data Disparate Impact + expr: Te_D_DI{job="kubernetes-service-endpoints"} < 0.8 or Te_D_DI{job="kubernetes-service-endpoints"} > 1.2 + for: 1m + labels: + severity: slack + annotations: + summary: Disparate impact unfairness + - alert: Test Data Statistical Parity + expr: Te_D_SP{job="kubernetes-service-endpoints"} < -0.2 or Te_D_SP{job="kubernetes-service-endpoints"} > 0.2 + for: 1m + labels: + severity: slack + annotations: + summary: Statistical parity unfairness + - alert: Model Accuracy + expr: M_Acc{job="kubernetes-service-endpoints"} < 0.7 + for: 1m + labels: + severity: slack + annotations: + summary: Low Accuracy + - alert: Model Disparate Impact + expr: M_DI{job="kubernetes-service-endpoints"} < 0.8 or M_DI{job="kubernetes-service-endpoints"} > 1.2 + for: 1m + labels: + severity: slack + annotations: + summary: Disparate impact unfairness + - alert: Model Statistical Parity + expr: M_SP{job="kubernetes-service-endpoints"} < -0.2 or M_SP{job="kubernetes-service-endpoints"} > 0.2 + for: 1m + labels: + severity: slack + annotations: + summary: Statistical parity unfairness + - alert: Model Averge Odds Difference + expr: M_AOD{job="kubernetes-service-endpoints"} < -0.1 or M_AOD{job="kubernetes-service-endpoints"} > 0.1 + for: 1m + labels: + severity: slack + annotations: + summary: Average odds difference unfairness + - alert: Model Equal Oppoturnity Difference + expr: M_EOD{job="kubernetes-service-endpoints"} < -0.2 or M_EOD{job="kubernetes-service-endpoints"} > 0.2 + for: 1m + labels: + severity: slack + annotations: + summary: Equal oppoturnity difference unfairness + - alert: Model Theil Index + expr: M_TI{job="kubernetes-service-endpoints"} > 0.05 + for: 1m + labels: + severity: slack + annotations: + summary: Theil index unfairness + prometheus.yml: |- + # my global config + global: + scrape_interval: 15s # By default, scrape targets every 15 seconds. + evaluation_interval: 15s # By default, scrape targets every 15 seconds. + # scrape_timeout is set to the global default (10s). + + # Load and evaluate rules in this file every 'evaluation_interval' seconds. + rule_files: + # - 'alert.rules' + # - "first.rules" + # - "second.rules" + - /etc/prometheus/prometheus.rules + + alerting: + alertmanagers: + - scheme: http + static_configs: + - targets: + - "alertmanager.monitoring.svc:9093" + # - "alertmanager:9093" + + # A scrape configuration containing exactly one endpoint to scrape: + scrape_configs: + # Here it's Prometheus itself. + # The job name is added as a label `job=` to any timeseries scraped from this config. + + - job_name: 'prometheus' + + # Override the global default and scrape targets from this job every 5 seconds. + scrape_interval: 5s + + static_configs: + - targets: ['localhost:9090'] + + #- job_name: 'prometheus-pushgateway' + # scrape_interval: 10s + # static_configs: + # - targets: ['prometheus-pushgateway.monitoring.svc.cluster.local:9091'] + + - job_name: 'k8services' + kubernetes_sd_configs: + - role: endpoints + relabel_configs: + - source_labels: + - __meta_kubernetes_namespace + - __meta_kubernetes_service_name + action: drop + regex: default;kubernetes + - source_labels: + - __meta_kubernetes_namespace + regex: default + action: keep + - source_labels: [__meta_kubernetes_service_name] + target_label: job + + - job_name: 'k8pods' + kubernetes_sd_configs: + - role: pod + relabel_configs: + - source_labels: [__meta_kubernetes_pod_container_port_name] + regex: metrics + action: keep + - source_labels: [__meta_kubernetes_pod_container_name] + target_label: job + + - job_name: 'node-exporter' + kubernetes_sd_configs: + - role: endpoints + relabel_configs: + - source_labels: [__meta_kubernetes_endpoints_name] + regex: 'node-exporter' + action: keep + + # Collects all the metrics from the API servers + - job_name: 'kubernetes-apiservers' + + kubernetes_sd_configs: + - role: endpoints + scheme: https + + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + + relabel_configs: + - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] + action: keep + regex: default;kubernetes;https + + # Collects all the kubernetes node metrics + - job_name: 'kubernetes-nodes' + + scheme: https + + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + + kubernetes_sd_configs: + - role: node + + relabel_configs: + - action: labelmap + regex: __meta_kubernetes_node_label_(.+) + - target_label: __address__ + replacement: kubernetes.default.svc:443 + - source_labels: [__meta_kubernetes_node_name] + regex: (.+) + target_label: __metrics_path__ + replacement: /api/v1/nodes/${1}/proxy/metrics + + # Pod metrics will get discovered and scraped if the pod metadata is annotated + # with prometheus.io/scrape and prometheus.io/port annotations. + - job_name: 'kubernetes-pods' + + kubernetes_sd_configs: + - role: pod + + relabel_configs: + - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] + action: keep + regex: true + - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] + action: replace + target_label: __metrics_path__ + regex: (.+) + - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] + action: replace + regex: ([^:]+)(?::\d+)?;(\d+) + replacement: $1:$2 + target_label: __address__ + - action: labelmap + regex: __meta_kubernetes_pod_label_(.+) + - source_labels: [__meta_kubernetes_namespace] + action: replace + target_label: kubernetes_namespace + - source_labels: [__meta_kubernetes_pod_name] + action: replace + target_label: kubernetes_pod_name + + # Collects all cAdvisor metrics + - job_name: 'kubernetes-cadvisor' + + scheme: https + + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + + kubernetes_sd_configs: + - role: node + + relabel_configs: + - action: labelmap + regex: __meta_kubernetes_node_label_(.+) + - target_label: __address__ + replacement: kubernetes.default.svc:443 + - source_labels: [__meta_kubernetes_node_name] + regex: (.+) + target_label: __metrics_path__ + replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor + + # Scrapes Service endpoints if the service metadata is annotated + # with prometheus.io/scrape and prometheus.io/port annotations + - job_name: 'kubernetes-service-endpoints' + + kubernetes_sd_configs: + - role: endpoints + + relabel_configs: + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] + action: keep + regex: true + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] + action: replace + target_label: __scheme__ + regex: (https?) + - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] + action: replace + target_label: __metrics_path__ + regex: (.+) + - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] + action: replace + target_label: __address__ + regex: ([^:]+)(?::\d+)?;(\d+) + replacement: $1:$2 + - action: labelmap + regex: __meta_kubernetes_service_label_(.+) + - source_labels: [__meta_kubernetes_namespace] + action: replace + target_label: kubernetes_namespace + - source_labels: [__meta_kubernetes_service_name] + action: replace + target_label: kubernetes_name diff --git a/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/grafana_fairness_and_consumption_monitoring_1.json b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/grafana_fairness_and_consumption_monitoring_1.json new file mode 100644 index 0000000..85888b0 --- /dev/null +++ b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/grafana_fairness_and_consumption_monitoring_1.json @@ -0,0 +1,1080 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": "-- Grafana --", + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "target": { + "limit": 100, + "matchAny": false, + "tags": [], + "type": "dashboard" + }, + "type": "dashboard" + } + ] + }, + "editable": true, + "fiscalYearStartMonth": 0, + "graphTooltip": 0, + "id": 4, + "links": [], + "liveNow": false, + "panels": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 0 + }, + "id": 8, + "options": { + "alertInstanceLabelFilter": "", + "alertName": "", + "dashboardAlerts": false, + "groupBy": [], + "groupMode": "default", + "maxItems": 20, + "sortOrder": 1, + "stateFilter": { + "error": true, + "firing": true, + "inactive": false, + "noData": false, + "normal": false, + "pending": true + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "", + "interval": "", + "legendFormat": "", + "refId": "A" + } + ], + "title": "Prometheus Alerts", + "type": "alertlist" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "mappings": [ + { + "options": { + "pattern": "#[a-zA-Z ]+", + "result": { + "index": 0, + "text": "$1" + } + }, + "type": "regex" + } + ], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Value #Accuracy" + }, + "properties": [ + { + "id": "displayName", + "value": "Accuracy" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #True positives" + }, + "properties": [ + { + "id": "displayName", + "value": "True positives" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #False positives" + }, + "properties": [ + { + "id": "displayName", + "value": "False positives" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #True negatives" + }, + "properties": [ + { + "id": "displayName", + "value": "True negatives" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #False negatives" + }, + "properties": [ + { + "id": "displayName", + "value": "False negatives" + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 0 + }, + "id": 6, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "textMode": "auto" + }, + "pluginVersion": "8.4.3", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_Acc", + "format": "table", + "interval": "", + "legendFormat": "", + "refId": "Accuracy" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_TP", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "True positives" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_FP", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "False positives" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_TN", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "True negatives" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_FN", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "False negatives" + } + ], + "title": "Model Performance", + "transformations": [], + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Value #Training set disparate impact" + }, + "properties": [ + { + "id": "displayName", + "value": "Training set disparate impact" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Test set disparate impact" + }, + "properties": [ + { + "id": "displayName", + "value": "Test set disparate impact" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Training set statistical parity" + }, + "properties": [ + { + "id": "displayName", + "value": "Training set statistical parity" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Test set statistical parity" + }, + "properties": [ + { + "id": "displayName", + "value": "Test set statistical parity" + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 8 + }, + "id": 4, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "text": {}, + "textMode": "auto" + }, + "pluginVersion": "8.4.3", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "Tr_D_DI", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Training set disparate impact" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "Te_D_DI", + "format": "table", + "interval": "", + "legendFormat": "", + "refId": "Test set disparate impact" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "Tr_D_SP", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Training set statistical parity" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "Te_D_SP", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Test set statistical parity" + } + ], + "title": "Dataset Fairness Metrics", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [ + { + "matcher": { + "id": "byName", + "options": "Value #Balanced accuracy" + }, + "properties": [ + { + "id": "displayName", + "value": "Balanced accuracy" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Disparate impact" + }, + "properties": [ + { + "id": "displayName", + "value": "Disparate impact" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Statistical parity" + }, + "properties": [ + { + "id": "displayName", + "value": "Statistical parity" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Equal odds" + }, + "properties": [ + { + "id": "displayName", + "value": "Equal odds" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Average odds" + }, + "properties": [ + { + "id": "displayName", + "value": "Average odds" + } + ] + }, + { + "matcher": { + "id": "byName", + "options": "Value #Theil index" + }, + "properties": [ + { + "id": "displayName", + "value": "Theil index" + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 8 + }, + "id": 2, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "center", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "text": { + "titleSize": 15, + "valueSize": 40 + }, + "textMode": "auto" + }, + "pluginVersion": "8.4.3", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_BA", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Balanced accuracy" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_DI", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Disparate impact" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_SP", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Statistical parity" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_EOD", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Equal odds" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_AOD", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Average odds" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "M_TI", + "format": "table", + "hide": false, + "interval": "", + "legendFormat": "", + "refId": "Theil index" + } + ], + "title": "Model Fairness Metrics", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisGridShow": true, + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "watt" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 16 + }, + "id": 10, + "options": { + "legend": { + "calcs": [], + "displayMode": "hidden", + "placement": "bottom" + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "pluginVersion": "8.4.3", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\"}) / 1000000", + "interval": "", + "legendFormat": "", + "refId": "A" + } + ], + "title": "Cluster Power Consumption", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "kwatth" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 16 + }, + "id": 12, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "textMode": "auto" + }, + "pluginVersion": "8.4.3", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Whole cluster", + "refId": "Whole cluster" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline!~\".*(bin/|app/|conf/|--loglevelinfo|scaphandreprometheus--port8081).*\", exe!~\".*(containerd-shim|nginx|postgres|sleep|workflow-contro|pause|minio|grafana-server|systemd-journal|manager|etcd|kube-apiserver|kube-controller|kube-scheduler|local-path-prov|mysqld|node|persistence_age).*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "KFP", + "refId": "Kubeflow pipeline" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline =~\".*mlflow.*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "MLflow", + "refId": "MLflow" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", exe=\"prometheus\"}[1d])) / (1000000 * 1000 * 24)", + "interval": "", + "legendFormat": "Prometheus", + "refId": "Prometheus" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline =~\".*grafana.*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Grafana", + "refId": "Grafana" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline =~\".*minio.*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Minio", + "refId": "Minio" + } + ], + "title": "Energy Consumption of Cluster (1day)", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "watt" + }, + "overrides": [ + { + "__systemRef": "hideSeriesFrom", + "matcher": { + "id": "byNames", + "options": { + "mode": "exclude", + "names": [ + "sum(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline!~\".*(bin/|app/|conf/|--loglevelinfo|scaphandreprometheus--port8081).*\", exe!~\".*(containerd-shim|nginx|postgres|sleep|workflow-contro|pause|minio|grafana-server|systemd-journal|manager|etcd|kube-apiserver|kube-controller|kube-scheduler|local-path-prov|mysqld|node|persistence_age).*\"}) / 1000000" + ], + "prefix": "All except:", + "readOnly": true + } + }, + "properties": [ + { + "id": "custom.hideFrom", + "value": { + "legend": false, + "tooltip": false, + "viz": true + } + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 24 + }, + "id": 14, + "options": { + "legend": { + "calcs": [], + "displayMode": "hidden", + "placement": "bottom" + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline!~\".*(bin/|app/|conf/|--loglevelinfo|scaphandreprometheus--port8081).*\", exe!~\".*(containerd-shim|nginx|postgres|sleep|workflow-contro|pause|minio|grafana-server|systemd-journal|manager|etcd|kube-apiserver|kube-controller|kube-scheduler|local-path-prov|mysqld|node|persistence_age).*\"}) / 1000000", + "interval": "", + "legendFormat": "", + "refId": "A" + } + ], + "title": "KFP Power Consumption", + "transformations": [], + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "fieldConfig": { + "defaults": { + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "kwatth" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 24 + }, + "id": 18, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "center", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "textMode": "auto" + }, + "pluginVersion": "8.4.3", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline=~\".*(pull_data_component|python3-mpipinstall--quiet--no-warn-script-locationpandas~=1.4.2numpyfolktableskfp==1.8.22).*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Pull data", + "refId": "Pull data" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline=~\".*(preprocess_component|python3-mpipinstall--quiet--no-warn-script-locationpandas~=1.4.2scikit-learn~=1.0.2numpykfp==1.8.22).*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Preprocess", + "refId": "Preprocess" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline=~\".*(train_component|python3-mpipinstall--quiet--no-warn-script-locationnumpypandas~=1.4.2aif360scikit-learn~=1.0.2mlflow~=1.25.0boto3~=1.21.0kfp==1.8.22).*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Train", + "refId": "Train" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline=~\".*(evaluate_component|python3-mpipinstall--quiet--no-warn-script-locationnumpymlflow~=1.25.0prometheus_clientkfp==1.8.22).*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Evaluate", + "refId": "Evaluate" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline=~\".*(deploy_model_component|python3-mpipinstall--quiet--no-warn-script-locationkservekfp==1.8.22).*\"}[1d])) / (1000000 * 1000 * 24)", + "hide": false, + "interval": "", + "legendFormat": "Deploy", + "refId": "Deploy" + }, + { + "datasource": { + "type": "prometheus", + "uid": "P1809F7CD0C75ACF3" + }, + "exemplar": true, + "expr": "sum(sum_over_time(scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by=\"Helm\", cmdline!~\".*(bin/|app/|conf/|--loglevelinfo|scaphandreprometheus--port8081|preprocess|pull_data|train|evaluate|deploy_model|numpy|python3-mpipinstall--quiet--no-warn-script-locationkservekfp==1.8.22|metadata|msklearnserver|python3server.py).*\", exe!~\".*(containerd-shim|nginx|postgres|sleep|workflow-contro|pause|minio|grafana-server|systemd-journal|manager|etcd|kube-apiserver|kube-controller|kube-scheduler|local-path-prov|mysqld|node|persistence_age).*\"}[1d])) / (1000000 * 1000 * 24) ", + "interval": "", + "legendFormat": "Inference", + "refId": "Inference" + } + ], + "title": "Energy Consumption of KFP Components (1day)", + "type": "stat" + } + ], + "schemaVersion": 35, + "style": "dark", + "tags": [], + "templating": { + "list": [] + }, + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "OSS_Monitoring_1", + "uid": "kGRTQMX4k", + "version": 11, + "weekStart": "" +} \ No newline at end of file diff --git a/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/modified-scaphandre-values.yaml b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/modified-scaphandre-values.yaml new file mode 100644 index 0000000..e8dbc08 --- /dev/null +++ b/tutorials/demo_notebooks/demo_fairness_and_energy_monitoring/modified-scaphandre-values.yaml @@ -0,0 +1,24 @@ +image: + name: hubblo/scaphandre + tag: latest + +# original 8080 +port: 8081 + +resources: + limits: + memory: 75Mi + requests: + cpu: 75m + memory: 50Mi + +scaphandre: + command: prometheus + args: {} + extraArgs: + containers: true +# rustBacktrace: '1' + +# Run as root user to get proper permissions +userID: 0 +groupID: 0 diff --git a/tutorials/demo_notebooks/demo_pipeline/README.md b/tutorials/demo_notebooks/demo_pipeline/README.md new file mode 100644 index 0000000..305b719 --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/README.md @@ -0,0 +1,7 @@ +# Demo pipeline + +Jupyter notebook with a demo pipeline that uses the installed Kubeflow Pipelines, MLflow and Kserve components. + +
+ +![Pipeline Graph](graph.png) \ No newline at end of file diff --git a/tutorials/demo_notebooks/demo_pipeline/components/.gitkeep b/tutorials/demo_notebooks/demo_pipeline/components/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/tutorials/demo_notebooks/demo_pipeline/components/deploy_model_component.yaml b/tutorials/demo_notebooks/demo_pipeline/components/deploy_model_component.yaml new file mode 100644 index 0000000..b1b0c1c --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/components/deploy_model_component.yaml @@ -0,0 +1,82 @@ +name: Deploy model +description: Deploy the model as a inference service with Kserve. +inputs: +- {name: model_name, type: String} +- {name: storage_uri, type: String} +implementation: + container: + image: python:3.9 + command: + - sh + - -c + - |2 + + if ! [ -x "$(command -v pip)" ]; then + python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip + fi + + PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kserve' 'kfp==1.8.22' && "$0" "$@" + - sh + - -ec + - | + program_path=$(mktemp -d) + printf "%s" "$0" > "$program_path/ephemeral_component.py" + python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" + - |2+ + + import kfp + from kfp.v2 import dsl + from kfp.v2.dsl import * + from typing import * + + def deploy_model(model_name: str, storage_uri: str): + """ + Deploy the model as a inference service with Kserve. + """ + import logging + from kubernetes import client + from kserve import KServeClient + from kserve import constants + from kserve import utils + from kserve import V1beta1InferenceService + from kserve import V1beta1InferenceServiceSpec + from kserve import V1beta1PredictorSpec + from kserve import V1beta1SKLearnSpec + + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + model_uri = f"{storage_uri}/{model_name}" + logger.info(f"MODEL URI: {model_uri}") + + # namespace = 'kserve-inference' + namespace = utils.get_default_target_namespace() + kserve_version='v1beta1' + api_version = constants.KSERVE_GROUP + '/' + kserve_version + + + isvc = V1beta1InferenceService( + api_version=api_version, + kind=constants.KSERVE_KIND, + metadata=client.V1ObjectMeta( + name=model_name, + namespace=namespace, + annotations={'sidecar.istio.io/inject':'false'} + ), + spec=V1beta1InferenceServiceSpec( + predictor=V1beta1PredictorSpec( + service_account_name="kserve-sa", + sklearn=V1beta1SKLearnSpec( + storage_uri=model_uri + ) + ) + ) + ) + KServe = KServeClient() + KServe.create(isvc) + + args: + - --executor_input + - {executorInput: null} + - --function_to_execute + - deploy_model diff --git a/tutorials/demo_notebooks/demo_pipeline/components/evaluate_component.yaml b/tutorials/demo_notebooks/demo_pipeline/components/evaluate_component.yaml new file mode 100644 index 0000000..64b07c3 --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/components/evaluate_component.yaml @@ -0,0 +1,74 @@ +name: Evaluate +description: 'Evaluate component: Compares metrics from training with given thresholds.' +inputs: +- {name: run_id, type: String, description: ' MLflow run ID'} +- {name: mlflow_tracking_uri, type: String, description: MLflow tracking URI} +- {name: threshold_metrics, type: JsonObject, description: Minimum threshold values + for each metric} +outputs: +- {name: Output, type: Boolean} +implementation: + container: + image: python:3.10 + command: + - sh + - -c + - |2 + + if ! [ -x "$(command -v pip)" ]; then + python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip + fi + + PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'numpy' 'mlflow~=2.4.1' 'kfp==1.8.22' && "$0" "$@" + - sh + - -ec + - | + program_path=$(mktemp -d) + printf "%s" "$0" > "$program_path/ephemeral_component.py" + python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" + - |2+ + + import kfp + from kfp.v2 import dsl + from kfp.v2.dsl import * + from typing import * + + def evaluate( + run_id: str, + mlflow_tracking_uri: str, + threshold_metrics: dict + ) -> bool: + """ + Evaluate component: Compares metrics from training with given thresholds. + + Args: + run_id (string): MLflow run ID + mlflow_tracking_uri (string): MLflow tracking URI + threshold_metrics (dict): Minimum threshold values for each metric + Returns: + Bool indicating whether evaluation passed or failed. + """ + from mlflow.tracking import MlflowClient + import logging + + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + client = MlflowClient(tracking_uri=mlflow_tracking_uri) + info = client.get_run(run_id) + training_metrics = info.data.metrics + + logger.info(f"Training metrics: {training_metrics}") + + # compare the evaluation metrics with the defined thresholds + for key, value in threshold_metrics.items(): + if key not in training_metrics or training_metrics[key] > value: + logger.error(f"Metric {key} failed. Evaluation not passed!") + return False + return True + + args: + - --executor_input + - {executorInput: null} + - --function_to_execute + - evaluate diff --git a/tutorials/demo_notebooks/demo_pipeline/components/inference_component.yaml b/tutorials/demo_notebooks/demo_pipeline/components/inference_component.yaml new file mode 100644 index 0000000..2897fcd --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/components/inference_component.yaml @@ -0,0 +1,84 @@ +name: Inference +description: Test inference. +inputs: +- {name: model_name, type: String} +- {name: scaler_in, type: Artifact} +implementation: + container: + image: python:3.9 + command: + - sh + - -c + - |2 + + if ! [ -x "$(command -v pip)" ]; then + python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip + fi + + PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kserve' 'scikit-learn~=1.0.2' 'kfp==1.8.22' && "$0" "$@" + - sh + - -ec + - | + program_path=$(mktemp -d) + printf "%s" "$0" > "$program_path/ephemeral_component.py" + python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" + - |2+ + + import kfp + from kfp.v2 import dsl + from kfp.v2.dsl import * + from typing import * + + def inference( + model_name: str, + scaler_in: Input[Artifact] + ): + """ + Test inference. + """ + from kserve import KServeClient + import requests + import pickle + import logging + from kserve import utils + + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + namespace = utils.get_default_target_namespace() + + input_sample = [[5.6, 0.54, 0.04, 1.7, 0.049, 5, 13, 0.9942, 3.72, 0.58, 11.4], + [11.3, 0.34, 0.45, 2, 0.082, 6, 15, 0.9988, 2.94, 0.66, 9.2]] + + logger.info(f"Loading standard scaler from: {scaler_in.path}") + with open(scaler_in.path, 'rb') as fp: + scaler = pickle.load(fp) + + logger.info(f"Standardizing sample: {scaler_in.path}") + input_sample = scaler.transform(input_sample) + + # get inference service + KServe = KServeClient() + + # wait for deployment to be ready + KServe.get(model_name, namespace=namespace, watch=True, timeout_seconds=120) + + inference_service = KServe.get(model_name, namespace=namespace) + print(inference_service) + is_url = inference_service['status']['address']['url'] + + logger.info(f"\nInference service status:\n{inference_service['status']}") + logger.info(f"\nInference service URL:\n{is_url}\n") + + inference_input = { + 'instances': input_sample.tolist() + } + + response = requests.post(is_url, json=inference_input) + logger.info(f"\nPrediction response:\n{response.text}\n") + + args: + - --executor_input + - {executorInput: null} + - --function_to_execute + - inference diff --git a/tutorials/demo_notebooks/demo_pipeline/components/preprocess_component.yaml b/tutorials/demo_notebooks/demo_pipeline/components/preprocess_component.yaml new file mode 100644 index 0000000..a333ab9 --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/components/preprocess_component.yaml @@ -0,0 +1,71 @@ +name: Preprocess +description: Preprocess component. +inputs: +- {name: data, type: Dataset} +- {name: target, type: String, default: quality, optional: true} +outputs: +- {name: scaler_out, type: Artifact} +- {name: train_set, type: Dataset} +- {name: test_set, type: Dataset} +implementation: + container: + image: python:3.10 + command: + - sh + - -c + - |2 + + if ! [ -x "$(command -v pip)" ]; then + python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip + fi + + PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'pandas~=1.4.2' 'scikit-learn~=1.0.2' 'kfp==1.8.22' && "$0" "$@" + - sh + - -ec + - | + program_path=$(mktemp -d) + printf "%s" "$0" > "$program_path/ephemeral_component.py" + python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" + - |2+ + + import kfp + from kfp.v2 import dsl + from kfp.v2.dsl import * + from typing import * + + def preprocess( + data: Input[Dataset], + scaler_out: Output[Artifact], + train_set: Output[Dataset], + test_set: Output[Dataset], + target: str = "quality", + ): + """ + Preprocess component. + """ + import pandas as pd + import pickle + from sklearn.model_selection import train_test_split + from sklearn.preprocessing import StandardScaler + + data = pd.read_csv(data.path) + + # Split the data into training and test sets. (0.75, 0.25) split. + train, test = train_test_split(data) + + scaler = StandardScaler() + + train[train.drop(target, axis=1).columns] = scaler.fit_transform(train.drop(target, axis=1)) + test[test.drop(target, axis=1).columns] = scaler.transform(test.drop(target, axis=1)) + + with open(scaler_out.path, 'wb') as fp: + pickle.dump(scaler, fp, pickle.HIGHEST_PROTOCOL) + + train.to_csv(train_set.path, index=None) + test.to_csv(test_set.path, index=None) + + args: + - --executor_input + - {executorInput: null} + - --function_to_execute + - preprocess diff --git a/tutorials/demo_notebooks/demo_pipeline/components/pull_data_component.yaml b/tutorials/demo_notebooks/demo_pipeline/components/pull_data_component.yaml new file mode 100644 index 0000000..3379f42 --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/components/pull_data_component.yaml @@ -0,0 +1,46 @@ +name: Pull data +description: Pull data component. +inputs: +- {name: url, type: String} +outputs: +- {name: data, type: Dataset} +implementation: + container: + image: python:3.10 + command: + - sh + - -c + - |2 + + if ! [ -x "$(command -v pip)" ]; then + python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip + fi + + PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'pandas~=1.4.2' 'kfp==1.8.22' && "$0" "$@" + - sh + - -ec + - | + program_path=$(mktemp -d) + printf "%s" "$0" > "$program_path/ephemeral_component.py" + python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" + - |2+ + + import kfp + from kfp.v2 import dsl + from kfp.v2.dsl import * + from typing import * + + def pull_data(url: str, data: Output[Dataset]): + """ + Pull data component. + """ + import pandas as pd + + df = pd.read_csv(url, sep=";") + df.to_csv(data.path, index=None) + + args: + - --executor_input + - {executorInput: null} + - --function_to_execute + - pull_data diff --git a/tutorials/demo_notebooks/demo_pipeline/components/train_component.yaml b/tutorials/demo_notebooks/demo_pipeline/components/train_component.yaml new file mode 100644 index 0000000..271ce63 --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/components/train_component.yaml @@ -0,0 +1,153 @@ +name: Train +description: Train component. +inputs: +- {name: train_set, type: Dataset} +- {name: test_set, type: Dataset} +- {name: mlflow_experiment_name, type: String} +- {name: mlflow_tracking_uri, type: String} +- {name: mlflow_s3_endpoint_url, type: String} +- {name: model_name, type: String} +- {name: alpha, type: Float} +- {name: l1_ratio, type: Float} +- {name: target, type: String, default: quality, optional: true} +outputs: +- {name: saved_model, type: Model} +- {name: storage_uri, type: String} +- {name: run_id, type: String} +implementation: + container: + image: python:3.10 + command: + - sh + - -c + - |2 + + if ! [ -x "$(command -v pip)" ]; then + python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip + fi + + PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'numpy' 'pandas~=1.4.2' 'scikit-learn~=1.0.2' 'mlflow~=2.4.1' 'boto3~=1.21.0' 'kfp==1.8.22' && "$0" "$@" + - sh + - -ec + - | + program_path=$(mktemp -d) + printf "%s" "$0" > "$program_path/ephemeral_component.py" + python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" + - |2+ + + import kfp + from kfp.v2 import dsl + from kfp.v2.dsl import * + from typing import * + + def train( + train_set: Input[Dataset], + test_set: Input[Dataset], + saved_model: Output[Model], + mlflow_experiment_name: str, + mlflow_tracking_uri: str, + mlflow_s3_endpoint_url: str, + model_name: str, + alpha: float, + l1_ratio: float, + target: str = "quality", + ) -> NamedTuple("Output", [('storage_uri', str), ('run_id', str),]): + """ + Train component. + """ + import numpy as np + import pandas as pd + from sklearn.linear_model import ElasticNet + from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score + import mlflow + import mlflow.sklearn + import os + import logging + import pickle + from collections import namedtuple + + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + def eval_metrics(actual, pred): + rmse = np.sqrt(mean_squared_error(actual, pred)) + mae = mean_absolute_error(actual, pred) + r2 = r2_score(actual, pred) + return rmse, mae, r2 + + os.environ['MLFLOW_S3_ENDPOINT_URL'] = mlflow_s3_endpoint_url + + # load data + train = pd.read_csv(train_set.path) + test = pd.read_csv(test_set.path) + + # The predicted column is "quality" which is a scalar from [3, 9] + train_x = train.drop([target], axis=1) + test_x = test.drop([target], axis=1) + train_y = train[[target]] + test_y = test[[target]] + + logger.info(f"Using MLflow tracking URI: {mlflow_tracking_uri}") + mlflow.set_tracking_uri(mlflow_tracking_uri) + + logger.info(f"Using MLflow experiment: {mlflow_experiment_name}") + mlflow.set_experiment(mlflow_experiment_name) + + with mlflow.start_run() as run: + + run_id = run.info.run_id + logger.info(f"Run ID: {run_id}") + + model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) + + logger.info("Fitting model...") + model.fit(train_x, train_y) + + logger.info("Predicting...") + predicted_qualities = model.predict(test_x) + + (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities) + + logger.info("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio)) + logger.info(" RMSE: %s" % rmse) + logger.info(" MAE: %s" % mae) + logger.info(" R2: %s" % r2) + + logger.info("Logging parameters to MLflow") + mlflow.log_param("alpha", alpha) + mlflow.log_param("l1_ratio", l1_ratio) + mlflow.log_metric("rmse", rmse) + mlflow.log_metric("r2", r2) + mlflow.log_metric("mae", mae) + + # save model to mlflow + logger.info("Logging trained model") + mlflow.sklearn.log_model( + model, + model_name, + registered_model_name="ElasticnetWineModel", + serialization_format="pickle" + ) + + logger.info("Logging predictions artifact to MLflow") + np.save("predictions.npy", predicted_qualities) + mlflow.log_artifact( + local_path="predictions.npy", artifact_path="predicted_qualities/" + ) + + # save model as KFP artifact + logging.info(f"Saving model to: {saved_model.path}") + with open(saved_model.path, 'wb') as fp: + pickle.dump(model, fp, pickle.HIGHEST_PROTOCOL) + + # prepare output + output = namedtuple('Output', ['storage_uri', 'run_id']) + + # return str(mlflow.get_artifact_uri()) + return output(mlflow.get_artifact_uri(), run_id) + + args: + - --executor_input + - {executorInput: null} + - --function_to_execute + - train diff --git a/tutorials/demo_notebooks/demo_pipeline/demo-pipeline.ipynb b/tutorials/demo_notebooks/demo_pipeline/demo-pipeline.ipynb new file mode 100644 index 0000000..4cdc71e --- /dev/null +++ b/tutorials/demo_notebooks/demo_pipeline/demo-pipeline.ipynb @@ -0,0 +1,1038 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "# Demo KFP pipeline" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Install requirements:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-06-13T07:46:40.628249367Z", + "start_time": "2023-06-13T07:46:36.822607765Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: kfp~=1.8.14 in /home/joaquin/anaconda3/lib/python3.9/site-packages (1.8.16)\n", + "Requirement already satisfied: PyYAML<6,>=5.3 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (5.4.1)\n", + "Requirement already satisfied: tabulate<1,>=0.8.6 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.8.9)\n", + "Requirement already satisfied: absl-py<2,>=0.9 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (1.2.0)\n", + "Requirement already satisfied: google-auth<3,>=1.6.1 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (2.17.3)\n", + "Requirement already satisfied: kfp-server-api<2.0.0,>=1.1.2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (1.8.5)\n", + "Requirement already satisfied: requests-toolbelt<1,>=0.8.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.9.1)\n", + "Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (2.11.0)\n", + "Requirement already satisfied: click<9,>=7.1.2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (8.0.4)\n", + "Requirement already satisfied: strip-hints<1,>=0.1.8 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.1.10)\n", + "Requirement already satisfied: docstring-parser<1,>=0.7.3 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.15)\n", + "Requirement already satisfied: typer<1.0,>=0.3.2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.6.1)\n", + "Requirement already satisfied: cloudpickle<3,>=2.0.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (2.0.0)\n", + "Requirement already satisfied: uritemplate<4,>=3.0.1 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (3.0.1)\n", + "Requirement already satisfied: jsonschema<4,>=3.0.1 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (3.2.0)\n", + "Requirement already satisfied: google-api-python-client<2,>=1.7.8 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (1.12.11)\n", + "Requirement already satisfied: fire<1,>=0.3.1 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.4.0)\n", + "Requirement already satisfied: kfp-pipeline-spec<0.2.0,>=0.1.16 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (0.1.16)\n", + "Requirement already satisfied: kubernetes<19,>=8.0.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (18.20.0)\n", + "Requirement already satisfied: protobuf<4,>=3.13.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (3.20.3)\n", + "Requirement already satisfied: Deprecated<2,>=1.2.7 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (1.2.13)\n", + "Requirement already satisfied: google-cloud-storage<3,>=1.20.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (1.42.2)\n", + "Requirement already satisfied: pydantic<2,>=1.8.2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp~=1.8.14) (1.10.2)\n", + "Requirement already satisfied: wrapt<2,>=1.10 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from Deprecated<2,>=1.2.7->kfp~=1.8.14) (1.12.1)\n", + "Requirement already satisfied: six in /home/joaquin/anaconda3/lib/python3.9/site-packages (from fire<1,>=0.3.1->kfp~=1.8.14) (1.16.0)\n", + "Requirement already satisfied: termcolor in /home/joaquin/anaconda3/lib/python3.9/site-packages (from fire<1,>=0.3.1->kfp~=1.8.14) (2.0.1)\n", + "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->kfp~=1.8.14) (1.56.4)\n", + "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->kfp~=1.8.14) (2.29.0)\n", + "Requirement already satisfied: google-auth-httplib2>=0.0.3 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-api-python-client<2,>=1.7.8->kfp~=1.8.14) (0.1.0)\n", + "Requirement already satisfied: httplib2<1dev,>=0.15.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-api-python-client<2,>=1.7.8->kfp~=1.8.14) (0.20.4)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-auth<3,>=1.6.1->kfp~=1.8.14) (4.7.2)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-auth<3,>=1.6.1->kfp~=1.8.14) (0.2.8)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-auth<3,>=1.6.1->kfp~=1.8.14) (4.2.2)\n", + "Requirement already satisfied: google-cloud-core<3.0dev,>=1.6.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-cloud-storage<3,>=1.20.0->kfp~=1.8.14) (2.3.2)\n", + "Requirement already satisfied: google-resumable-media<3.0dev,>=1.3.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-cloud-storage<3,>=1.20.0->kfp~=1.8.14) (1.3.1)\n", + "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-resumable-media<3.0dev,>=1.3.0->google-cloud-storage<3,>=1.20.0->kfp~=1.8.14) (1.1.2)\n", + "Requirement already satisfied: cffi>=1.0.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=1.3.0->google-cloud-storage<3,>=1.20.0->kfp~=1.8.14) (1.15.0)\n", + "Requirement already satisfied: pycparser in /home/joaquin/anaconda3/lib/python3.9/site-packages (from cffi>=1.0.0->google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=1.3.0->google-cloud-storage<3,>=1.20.0->kfp~=1.8.14) (2.21)\n", + "Requirement already satisfied: pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from httplib2<1dev,>=0.15.0->google-api-python-client<2,>=1.7.8->kfp~=1.8.14) (3.0.4)\n", + "Requirement already satisfied: pyrsistent>=0.14.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from jsonschema<4,>=3.0.1->kfp~=1.8.14) (0.18.0)\n", + "Requirement already satisfied: attrs>=17.4.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from jsonschema<4,>=3.0.1->kfp~=1.8.14) (21.4.0)\n", + "Requirement already satisfied: setuptools in /home/joaquin/anaconda3/lib/python3.9/site-packages (from jsonschema<4,>=3.0.1->kfp~=1.8.14) (67.7.2)\n", + "Requirement already satisfied: python-dateutil in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp-server-api<2.0.0,>=1.1.2->kfp~=1.8.14) (2.8.2)\n", + "Requirement already satisfied: certifi in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp-server-api<2.0.0,>=1.1.2->kfp~=1.8.14) (2022.9.24)\n", + "Requirement already satisfied: urllib3>=1.15 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kfp-server-api<2.0.0,>=1.1.2->kfp~=1.8.14) (1.26.9)\n", + "Requirement already satisfied: requests-oauthlib in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kubernetes<19,>=8.0.0->kfp~=1.8.14) (1.3.1)\n", + "Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from kubernetes<19,>=8.0.0->kfp~=1.8.14) (0.58.0)\n", + "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.1->kfp~=1.8.14) (0.4.8)\n", + "Requirement already satisfied: typing-extensions>=4.1.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from pydantic<2,>=1.8.2->kfp~=1.8.14) (4.1.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->kfp~=1.8.14) (3.3)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->kfp~=1.8.14) (2.0.4)\n", + "Requirement already satisfied: wheel in /home/joaquin/anaconda3/lib/python3.9/site-packages (from strip-hints<1,>=0.1.8->kfp~=1.8.14) (0.37.1)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /home/joaquin/anaconda3/lib/python3.9/site-packages (from requests-oauthlib->kubernetes<19,>=8.0.0->kfp~=1.8.14) (3.2.1)\n" + ] + } + ], + "source": [ + "%%bash\n", + "\n", + "pip install kfp~=1.8.14" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Imports:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:20:34.510143690Z", + "start_time": "2023-11-09T09:20:34.016137561Z" + } + }, + "outputs": [], + "source": [ + "import warnings\n", + "warnings.filterwarnings(\"ignore\")\n", + "\n", + "import kfp\n", + "import kfp.dsl as dsl\n", + "from kfp.aws import use_aws_secret\n", + "from kfp.v2.dsl import (\n", + " component,\n", + " Input,\n", + " Output,\n", + " Dataset,\n", + " Metrics,\n", + " Artifact,\n", + " Model\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## 1. Connect to client\n", + "\n", + "The default way of accessing Kubeflow is via port-forward. This enables you to get started quickly without imposing any requirements on your environment. Run the following to port-forward Istio's Ingress-Gateway to local port `8080`:\n", + "\n", + "```sh\n", + "kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "outputs": [], + "source": [ + "import re\n", + "import requests\n", + "from urllib.parse import urlsplit\n", + "\n", + "def get_istio_auth_session(url: str, username: str, password: str) -> dict:\n", + " \"\"\"\n", + " Determine if the specified URL is secured by Dex and try to obtain a session cookie.\n", + " WARNING: only Dex `staticPasswords` and `LDAP` authentication are currently supported\n", + " (we default default to using `staticPasswords` if both are enabled)\n", + "\n", + " :param url: Kubeflow server URL, including protocol\n", + " :param username: Dex `staticPasswords` or `LDAP` username\n", + " :param password: Dex `staticPasswords` or `LDAP` password\n", + " :return: auth session information\n", + " \"\"\"\n", + " # define the default return object\n", + " auth_session = {\n", + " \"endpoint_url\": url, # KF endpoint URL\n", + " \"redirect_url\": None, # KF redirect URL, if applicable\n", + " \"dex_login_url\": None, # Dex login URL (for POST of credentials)\n", + " \"is_secured\": None, # True if KF endpoint is secured\n", + " \"session_cookie\": None # Resulting session cookies in the form \"key1=value1; key2=value2\"\n", + " }\n", + "\n", + " # use a persistent session (for cookies)\n", + " with requests.Session() as s:\n", + "\n", + " ################\n", + " # Determine if Endpoint is Secured\n", + " ################\n", + " resp = s.get(url, allow_redirects=True)\n", + " if resp.status_code != 200:\n", + " raise RuntimeError(\n", + " f\"HTTP status code '{resp.status_code}' for GET against: {url}\"\n", + " )\n", + "\n", + " auth_session[\"redirect_url\"] = resp.url\n", + "\n", + " # if we were NOT redirected, then the endpoint is UNSECURED\n", + " if len(resp.history) == 0:\n", + " auth_session[\"is_secured\"] = False\n", + " return auth_session\n", + " else:\n", + " auth_session[\"is_secured\"] = True\n", + "\n", + " ################\n", + " # Get Dex Login URL\n", + " ################\n", + " redirect_url_obj = urlsplit(auth_session[\"redirect_url\"])\n", + "\n", + " # if we are at `/auth?=xxxx` path, we need to select an auth type\n", + " if re.search(r\"/auth$\", redirect_url_obj.path):\n", + "\n", + " #######\n", + " # TIP: choose the default auth type by including ONE of the following\n", + " #######\n", + "\n", + " # OPTION 1: set \"staticPasswords\" as default auth type\n", + " redirect_url_obj = redirect_url_obj._replace(\n", + " path=re.sub(r\"/auth$\", \"/auth/local\", redirect_url_obj.path)\n", + " )\n", + " # OPTION 2: set \"ldap\" as default auth type\n", + " # redirect_url_obj = redirect_url_obj._replace(\n", + " # path=re.sub(r\"/auth$\", \"/auth/ldap\", redirect_url_obj.path)\n", + " # )\n", + "\n", + " # if we are at `/auth/xxxx/login` path, then no further action is needed (we can use it for login POST)\n", + " if re.search(r\"/auth/.*/login$\", redirect_url_obj.path):\n", + " auth_session[\"dex_login_url\"] = redirect_url_obj.geturl()\n", + "\n", + " # else, we need to be redirected to the actual login page\n", + " else:\n", + " # this GET should redirect us to the `/auth/xxxx/login` path\n", + " resp = s.get(redirect_url_obj.geturl(), allow_redirects=True)\n", + " if resp.status_code != 200:\n", + " raise RuntimeError(\n", + " f\"HTTP status code '{resp.status_code}' for GET against: {redirect_url_obj.geturl()}\"\n", + " )\n", + "\n", + " # set the login url\n", + " auth_session[\"dex_login_url\"] = resp.url\n", + "\n", + " ################\n", + " # Attempt Dex Login\n", + " ################\n", + " resp = s.post(\n", + " auth_session[\"dex_login_url\"],\n", + " data={\"login\": username, \"password\": password},\n", + " allow_redirects=True\n", + " )\n", + " if len(resp.history) == 0:\n", + " raise RuntimeError(\n", + " f\"Login credentials were probably invalid - \"\n", + " f\"No redirect after POST to: {auth_session['dex_login_url']}\"\n", + " )\n", + "\n", + " # store the session cookies in a \"key1=value1; key2=value2\" string\n", + " auth_session[\"session_cookie\"] = \"; \".join([f\"{c.name}={c.value}\" for c in s.cookies])\n", + "\n", + " return auth_session" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:06.585931779Z", + "start_time": "2023-11-09T09:21:06.541812426Z" + } + } + }, + { + "cell_type": "code", + "execution_count": 5, + "outputs": [], + "source": [ + "import kfp\n", + "\n", + "KUBEFLOW_ENDPOINT = \"http://localhost:8080\"\n", + "KUBEFLOW_USERNAME = \"user@example.com\"\n", + "KUBEFLOW_PASSWORD = \"12341234\"\n", + "\n", + "auth_session = get_istio_auth_session(\n", + " url=KUBEFLOW_ENDPOINT,\n", + " username=KUBEFLOW_USERNAME,\n", + " password=KUBEFLOW_PASSWORD\n", + ")\n", + "\n", + "client = kfp.Client(host=f\"{KUBEFLOW_ENDPOINT}/pipeline\", cookies=auth_session[\"session_cookie\"])\n", + "# print(client.list_experiments())" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:07.538682906Z", + "start_time": "2023-11-09T09:21:07.167349929Z" + } + } + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## 2. Components\n", + "\n", + "There are different ways to define components in KFP. Here, we use the **@component** decorator to define the components as Python function-based components.\n", + "\n", + "The **@component** annotation converts the function into a factory function that creates pipeline steps that execute this function. This example also specifies the base container image to run you component in." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Pull data component:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:12.772797912Z", + "start_time": "2023-11-09T09:21:12.759653707Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"pandas~=1.4.2\"],\n", + " output_component_file='components/pull_data_component.yaml',\n", + ")\n", + "def pull_data(url: str, data: Output[Dataset]):\n", + " \"\"\"\n", + " Pull data component.\n", + " \"\"\"\n", + " import pandas as pd\n", + "\n", + " df = pd.read_csv(url, sep=\";\")\n", + " df.to_csv(data.path, index=None)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Preprocess component:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:14.766141325Z", + "start_time": "2023-11-09T09:21:14.763181627Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"pandas~=1.4.2\", \"scikit-learn~=1.0.2\"],\n", + " output_component_file='components/preprocess_component.yaml',\n", + ")\n", + "def preprocess(\n", + " data: Input[Dataset],\n", + " scaler_out: Output[Artifact],\n", + " train_set: Output[Dataset],\n", + " test_set: Output[Dataset],\n", + " target: str = \"quality\",\n", + "):\n", + " \"\"\"\n", + " Preprocess component.\n", + " \"\"\"\n", + " import pandas as pd\n", + " import pickle\n", + " from sklearn.model_selection import train_test_split\n", + " from sklearn.preprocessing import StandardScaler\n", + "\n", + " data = pd.read_csv(data.path)\n", + "\n", + " # Split the data into training and test sets. (0.75, 0.25) split.\n", + " train, test = train_test_split(data)\n", + "\n", + " scaler = StandardScaler()\n", + "\n", + " train[train.drop(target, axis=1).columns] = scaler.fit_transform(train.drop(target, axis=1))\n", + " test[test.drop(target, axis=1).columns] = scaler.transform(test.drop(target, axis=1))\n", + "\n", + " with open(scaler_out.path, 'wb') as fp:\n", + " pickle.dump(scaler, fp, pickle.HIGHEST_PROTOCOL)\n", + "\n", + " train.to_csv(train_set.path, index=None)\n", + " test.to_csv(test_set.path, index=None)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Train component:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:16.238280070Z", + "start_time": "2023-11-09T09:21:16.221167022Z" + } + }, + "outputs": [], + "source": [ + "from typing import NamedTuple\n", + "\n", + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"numpy\", \"pandas~=1.4.2\", \"scikit-learn~=1.0.2\", \"mlflow~=2.4.1\", \"boto3~=1.21.0\"],\n", + " output_component_file='components/train_component.yaml',\n", + ")\n", + "def train(\n", + " train_set: Input[Dataset],\n", + " test_set: Input[Dataset],\n", + " saved_model: Output[Model],\n", + " mlflow_experiment_name: str,\n", + " mlflow_tracking_uri: str,\n", + " mlflow_s3_endpoint_url: str,\n", + " model_name: str,\n", + " alpha: float,\n", + " l1_ratio: float,\n", + " target: str = \"quality\",\n", + ") -> NamedTuple(\"Output\", [('storage_uri', str), ('run_id', str),]):\n", + " \"\"\"\n", + " Train component.\n", + " \"\"\"\n", + " import numpy as np\n", + " import pandas as pd\n", + " from sklearn.linear_model import ElasticNet\n", + " from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\n", + " import mlflow\n", + " import mlflow.sklearn\n", + " import os\n", + " import logging\n", + " import pickle\n", + " from collections import namedtuple\n", + "\n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + "\n", + " def eval_metrics(actual, pred):\n", + " rmse = np.sqrt(mean_squared_error(actual, pred))\n", + " mae = mean_absolute_error(actual, pred)\n", + " r2 = r2_score(actual, pred)\n", + " return rmse, mae, r2\n", + "\n", + " os.environ['MLFLOW_S3_ENDPOINT_URL'] = mlflow_s3_endpoint_url\n", + "\n", + " # load data\n", + " train = pd.read_csv(train_set.path)\n", + " test = pd.read_csv(test_set.path)\n", + "\n", + " # The predicted column is \"quality\" which is a scalar from [3, 9]\n", + " train_x = train.drop([target], axis=1)\n", + " test_x = test.drop([target], axis=1)\n", + " train_y = train[[target]]\n", + " test_y = test[[target]]\n", + "\n", + " logger.info(f\"Using MLflow tracking URI: {mlflow_tracking_uri}\")\n", + " mlflow.set_tracking_uri(mlflow_tracking_uri)\n", + "\n", + " logger.info(f\"Using MLflow experiment: {mlflow_experiment_name}\")\n", + " mlflow.set_experiment(mlflow_experiment_name)\n", + "\n", + " with mlflow.start_run() as run:\n", + "\n", + " run_id = run.info.run_id\n", + " logger.info(f\"Run ID: {run_id}\")\n", + "\n", + " model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)\n", + "\n", + " logger.info(\"Fitting model...\")\n", + " model.fit(train_x, train_y)\n", + "\n", + " logger.info(\"Predicting...\")\n", + " predicted_qualities = model.predict(test_x)\n", + "\n", + " (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)\n", + "\n", + " logger.info(\"Elasticnet model (alpha=%f, l1_ratio=%f):\" % (alpha, l1_ratio))\n", + " logger.info(\" RMSE: %s\" % rmse)\n", + " logger.info(\" MAE: %s\" % mae)\n", + " logger.info(\" R2: %s\" % r2)\n", + "\n", + " logger.info(\"Logging parameters to MLflow\")\n", + " mlflow.log_param(\"alpha\", alpha)\n", + " mlflow.log_param(\"l1_ratio\", l1_ratio)\n", + " mlflow.log_metric(\"rmse\", rmse)\n", + " mlflow.log_metric(\"r2\", r2)\n", + " mlflow.log_metric(\"mae\", mae)\n", + "\n", + " # save model to mlflow\n", + " logger.info(\"Logging trained model\")\n", + " mlflow.sklearn.log_model(\n", + " model,\n", + " model_name,\n", + " registered_model_name=\"ElasticnetWineModel\",\n", + " serialization_format=\"pickle\"\n", + " )\n", + "\n", + " logger.info(\"Logging predictions artifact to MLflow\")\n", + " np.save(\"predictions.npy\", predicted_qualities)\n", + " mlflow.log_artifact(\n", + " local_path=\"predictions.npy\", artifact_path=\"predicted_qualities/\"\n", + " )\n", + "\n", + " # save model as KFP artifact\n", + " logging.info(f\"Saving model to: {saved_model.path}\")\n", + " with open(saved_model.path, 'wb') as fp:\n", + " pickle.dump(model, fp, pickle.HIGHEST_PROTOCOL)\n", + "\n", + " # prepare output\n", + " output = namedtuple('Output', ['storage_uri', 'run_id'])\n", + "\n", + " # return str(mlflow.get_artifact_uri())\n", + " return output(mlflow.get_artifact_uri(), run_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Evaluate component:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:17.469593336Z", + "start_time": "2023-11-09T09:21:17.457874998Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.10\",\n", + " packages_to_install=[\"numpy\", \"mlflow~=2.4.1\"],\n", + " output_component_file='components/evaluate_component.yaml',\n", + ")\n", + "def evaluate(\n", + " run_id: str,\n", + " mlflow_tracking_uri: str,\n", + " threshold_metrics: dict\n", + ") -> bool:\n", + " \"\"\"\n", + " Evaluate component: Compares metrics from training with given thresholds.\n", + "\n", + " Args:\n", + " run_id (string): MLflow run ID\n", + " mlflow_tracking_uri (string): MLflow tracking URI\n", + " threshold_metrics (dict): Minimum threshold values for each metric\n", + " Returns:\n", + " Bool indicating whether evaluation passed or failed.\n", + " \"\"\"\n", + " from mlflow.tracking import MlflowClient\n", + " import logging\n", + "\n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + "\n", + " client = MlflowClient(tracking_uri=mlflow_tracking_uri)\n", + " info = client.get_run(run_id)\n", + " training_metrics = info.data.metrics\n", + "\n", + " logger.info(f\"Training metrics: {training_metrics}\")\n", + "\n", + " # compare the evaluation metrics with the defined thresholds\n", + " for key, value in threshold_metrics.items():\n", + " if key not in training_metrics or training_metrics[key] > value:\n", + " logger.error(f\"Metric {key} failed. Evaluation not passed!\")\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Deploy model component:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:18.863549700Z", + "start_time": "2023-11-09T09:21:18.853657914Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.9\",\n", + " packages_to_install=[\"kserve\"],\n", + " output_component_file='components/deploy_model_component.yaml',\n", + ")\n", + "def deploy_model(model_name: str, storage_uri: str):\n", + " \"\"\"\n", + " Deploy the model as a inference service with Kserve.\n", + " \"\"\"\n", + " import logging\n", + " from kubernetes import client\n", + " from kserve import KServeClient\n", + " from kserve import constants\n", + " from kserve import utils\n", + " from kserve import V1beta1InferenceService\n", + " from kserve import V1beta1InferenceServiceSpec\n", + " from kserve import V1beta1PredictorSpec\n", + " from kserve import V1beta1SKLearnSpec\n", + "\n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + "\n", + " model_uri = f\"{storage_uri}/{model_name}\"\n", + " logger.info(f\"MODEL URI: {model_uri}\")\n", + "\n", + " # namespace = 'kserve-inference'\n", + " namespace = utils.get_default_target_namespace()\n", + " kserve_version='v1beta1'\n", + " api_version = constants.KSERVE_GROUP + '/' + kserve_version\n", + "\n", + "\n", + " isvc = V1beta1InferenceService(\n", + " api_version=api_version,\n", + " kind=constants.KSERVE_KIND,\n", + " metadata=client.V1ObjectMeta(\n", + " name=model_name,\n", + " namespace=namespace,\n", + " annotations={'sidecar.istio.io/inject':'false'}\n", + " ),\n", + " spec=V1beta1InferenceServiceSpec(\n", + " predictor=V1beta1PredictorSpec(\n", + " service_account_name=\"kserve-sa\",\n", + " sklearn=V1beta1SKLearnSpec(\n", + " storage_uri=model_uri\n", + " )\n", + " )\n", + " )\n", + " )\n", + " KServe = KServeClient()\n", + " KServe.create(isvc)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Inference component:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:20.907252671Z", + "start_time": "2023-11-09T09:21:20.888353997Z" + } + }, + "outputs": [], + "source": [ + "@component(\n", + " base_image=\"python:3.9\", # kserve on python 3.10 comes with a dependency that fails to get installed\n", + " packages_to_install=[\"kserve\", \"scikit-learn~=1.0.2\"],\n", + " output_component_file='components/inference_component.yaml',\n", + ")\n", + "def inference(\n", + " model_name: str,\n", + " scaler_in: Input[Artifact]\n", + "):\n", + " \"\"\"\n", + " Test inference.\n", + " \"\"\"\n", + " from kserve import KServeClient\n", + " import requests\n", + " import pickle\n", + " import logging\n", + " from kserve import utils\n", + "\n", + " logging.basicConfig(level=logging.INFO)\n", + " logger = logging.getLogger(__name__)\n", + "\n", + " namespace = utils.get_default_target_namespace()\n", + "\n", + " input_sample = [[5.6, 0.54, 0.04, 1.7, 0.049, 5, 13, 0.9942, 3.72, 0.58, 11.4],\n", + " [11.3, 0.34, 0.45, 2, 0.082, 6, 15, 0.9988, 2.94, 0.66, 9.2]]\n", + "\n", + " logger.info(f\"Loading standard scaler from: {scaler_in.path}\")\n", + " with open(scaler_in.path, 'rb') as fp:\n", + " scaler = pickle.load(fp)\n", + "\n", + " logger.info(f\"Standardizing sample: {scaler_in.path}\")\n", + " input_sample = scaler.transform(input_sample)\n", + "\n", + " # get inference service\n", + " KServe = KServeClient()\n", + "\n", + " # wait for deployment to be ready\n", + " KServe.get(model_name, namespace=namespace, watch=True, timeout_seconds=120)\n", + "\n", + " inference_service = KServe.get(model_name, namespace=namespace)\n", + " print(inference_service)\n", + " is_url = inference_service['status']['address']['url']\n", + "\n", + " logger.info(f\"\\nInference service status:\\n{inference_service['status']}\")\n", + " logger.info(f\"\\nInference service URL:\\n{is_url}\\n\")\n", + "\n", + " inference_input = {\n", + " 'instances': input_sample.tolist()\n", + " }\n", + "\n", + " response = requests.post(is_url, json=inference_input)\n", + " logger.info(f\"\\nPrediction response:\\n{response.text}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## 3. Pipeline\n", + "\n", + "Pipeline definition:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:22.101290347Z", + "start_time": "2023-11-09T09:21:22.097143160Z" + } + }, + "outputs": [], + "source": [ + "@dsl.pipeline(\n", + " name='demo-pipeline',\n", + " description='An example pipeline that performs addition calculations.',\n", + ")\n", + "def pipeline(\n", + " url: str,\n", + " target: str,\n", + " mlflow_experiment_name: str,\n", + " mlflow_tracking_uri: str,\n", + " mlflow_s3_endpoint_url: str,\n", + " model_name: str,\n", + " alpha: float,\n", + " l1_ratio: float,\n", + " threshold_metrics: dict,\n", + "):\n", + " pull_task = pull_data(url=url)\n", + "\n", + " preprocess_task = preprocess(data=pull_task.outputs[\"data\"])\n", + "\n", + " train_task = train(\n", + " train_set=preprocess_task.outputs[\"train_set\"],\n", + " test_set=preprocess_task.outputs[\"test_set\"],\n", + " target=target,\n", + " mlflow_experiment_name=mlflow_experiment_name,\n", + " mlflow_tracking_uri=mlflow_tracking_uri,\n", + " mlflow_s3_endpoint_url=mlflow_s3_endpoint_url,\n", + " model_name=model_name,\n", + " alpha=alpha,\n", + " l1_ratio=l1_ratio\n", + " )\n", + " train_task.apply(use_aws_secret(secret_name=\"aws-secret\"))\n", + "\n", + " evaluate_trask = evaluate(\n", + " run_id=train_task.outputs[\"run_id\"],\n", + " mlflow_tracking_uri=mlflow_tracking_uri,\n", + " threshold_metrics=threshold_metrics\n", + " )\n", + "\n", + " eval_passed = evaluate_trask.output\n", + "\n", + " with dsl.Condition(eval_passed == \"true\"):\n", + " deploy_model_task = deploy_model(\n", + " model_name=model_name,\n", + " storage_uri=train_task.outputs[\"storage_uri\"],\n", + " )\n", + "\n", + " inference_task = inference(\n", + " model_name=model_name,\n", + " scaler_in=preprocess_task.outputs[\"scaler_out\"]\n", + " )\n", + " inference_task.after(deploy_model_task)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "Pipeline arguments:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:23.384109041Z", + "start_time": "2023-11-09T09:21:23.372786659Z" + } + }, + "outputs": [], + "source": [ + "# Specify pipeline argument values\n", + "\n", + "eval_threshold_metrics = {'rmse': 0.9, 'r2': 0.3, 'mae': 0.8}\n", + "\n", + "arguments = {\n", + " \"url\": \"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv\",\n", + " \"target\": \"quality\",\n", + " \"mlflow_tracking_uri\": \"http://mlflow.mlflow.svc.cluster.local:5000\",\n", + " \"mlflow_s3_endpoint_url\": \"http://mlflow-minio-service.mlflow.svc.cluster.local:9000\",\n", + " \"mlflow_experiment_name\": \"demo-notebook\",\n", + " \"model_name\": \"wine-quality\",\n", + " \"alpha\": 0.5,\n", + " \"l1_ratio\": 0.5,\n", + " \"threshold_metrics\": eval_threshold_metrics\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## 4. Submit run" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-11-09T09:21:25.461963519Z", + "start_time": "2023-11-09T09:21:25.011639525Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": "", + "text/html": "Experiment details." + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": "", + "text/html": "Run details." + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": "RunPipelineResult(run_id=311c8e22-4bcb-4681-8fa6-475c47645352)" + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "run_name = \"demo-run\"\n", + "experiment_name = \"demo-experiment\"\n", + "\n", + "client.create_run_from_pipeline_func(\n", + " pipeline_func=pipeline,\n", + " run_name=run_name,\n", + " experiment_name=experiment_name,\n", + " arguments=arguments,\n", + " mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE,\n", + " enable_caching=False,\n", + " namespace=\"kubeflow-user-example-com\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## 5. Check run" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "### Kubeflow Pipelines UI\n", + "\n", + "The default way of accessing Kubeflow is via port-forward. This enables you to get started quickly without imposing any requirements on your environment. Run the following to port-forward Istio's Ingress-Gateway to local port `8080`:\n", + "\n", + "```sh\n", + "kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80\n", + "```\n", + "\n", + "After running the command, you can access the Kubeflow Central Dashboard by doing the following:\n", + "\n", + "1. Open your browser and visit [http://localhost:8080/](http://localhost:8080/). You should get the Dex login screen.\n", + "2. Login with the default user's credential. The default email address is `user@example.com` and the default password is `12341234`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "### MLFlow UI\n", + "\n", + "To access MLFlow UI, open a terminal and forward a local port to MLFlow server:\n", + "\n", + "
\n", + "\n", + "```bash\n", + "$ kubectl -n mlflow port-forward svc/mlflow 5000:5000\n", + "```\n", + "\n", + "
\n", + "\n", + "Now MLFlow's UI should be reachable at [`http://localhost:5000`](http://localhost:5000)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## 6. Check deployed model\n", + "\n", + "```bash\n", + "# get inference services\n", + "kubectl -n kubeflow-user-example-com get inferenceservice\n", + "\n", + "# get deployed model pods\n", + "kubectl -n kubeflow-user-example-com get pods\n", + "\n", + "# delete inference service\n", + "kubectl -n kubeflow-user-example-com delete inferenceservice wine-quality\n", + "```\n", + "
\n", + "\n", + "If something goes wrong, check the logs with:\n", + "\n", + "
\n", + "\n", + "```bash\n", + "kubectl logs -n kubeflow-user-example-com kserve-container\n", + "\n", + "kubectl logs -n kubeflow-user-example-com queue-proxy\n", + "\n", + "kubectl logs -n kubeflow-user-example-com storage-initializer\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [], + "metadata": { + "collapsed": false + } + } + ], + "metadata": { + "kernelspec": { + "display_name": "iml4e", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.15 (default, Nov 24 2022, 08:57:44) \n[Clang 14.0.6 ]" + }, + "vscode": { + "interpreter": { + "hash": "2976e1db094957a35b33d12f80288a268286b510a60c0d029aa085f0b10be691" + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tutorials/demo_notebooks/demo_pipeline/graph.png b/tutorials/demo_notebooks/demo_pipeline/graph.png new file mode 100644 index 0000000..14bf10c Binary files /dev/null and b/tutorials/demo_notebooks/demo_pipeline/graph.png differ diff --git a/tutorials/ml_components_demos/try-kserve/README.md b/tutorials/ml_components_demos/try-kserve/README.md new file mode 100644 index 0000000..a5bca9a --- /dev/null +++ b/tutorials/ml_components_demos/try-kserve/README.md @@ -0,0 +1,74 @@ +# Sklearn inference service + +This directory contains an example of an inference service (kserve) using a sklearn model. + +See [sklearn-iris-model.yaml](sklearn-iris-model.yaml). + +## Deploy the model inference service + +```bash +# tutorials/resources/try-kserve +kubectl apply -f sklearn-iris-model.yaml +``` + +Check that the inference service was deployed correctly: + +```bash +$ kubectl get inferenceservice -n kubeflow-user-example-com + +NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE +sklearn-iris http://sklearn-iris.kserve-inference.example.com True 100 sklearn-iris-predictor-default-00001 48m +``` + +> It might take a few minutes to become "READY". + +## Try a prediction request + +First, configure the domain name + +```bash +kubectl patch cm config-domain --patch '{"data":{"example.com":""}}' -n knative-serving +``` + + +Determine the name of the ingress gateway to the inference service: + + +```bash +INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}') + +echo $INGRESS_GATEWAY_SERVICE +``` + +Port Forward to the ingress gateway service: + +```bash +kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80 +``` + +Start another terminal and set the following variables + +```bash + +export MODEL_NAME=sklearn-iris +export INGRESS_HOST=localhost +export INGRESS_PORT=8080 +export SERVICE_HOSTNAME=$(kubectl -n kubeflow-user-example-com get inferenceservice $MODEL_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3) + +echo $SERVICE_HOSTNAME +``` + +To send a prediction to your model, you need an authentication token. You can get this token from your +browser after login in to Kubeflow, you should be able to find it in the cookies (developer mode). Or you can also get the token +programmatically like in the [`predict.py`](predict.py) script. + +```bash +SESSION= +``` + +Send a prediction request: + +```bash +# tutorials/resources/try-kserve +curl -v --cookie "authservice_session=${SESSION}" -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/$MODEL_NAME:predict -d @./iris-input.json +``` \ No newline at end of file diff --git a/tutorials/ml_components_demos/try-kserve/iris-input.json b/tutorials/ml_components_demos/try-kserve/iris-input.json new file mode 100644 index 0000000..7783972 --- /dev/null +++ b/tutorials/ml_components_demos/try-kserve/iris-input.json @@ -0,0 +1,6 @@ +{ + "instances": [ + [6.8, 2.8, 4.8, 1.4], + [6.0, 3.4, 4.5, 1.6] + ] +} diff --git a/tutorials/ml_components_demos/try-kserve/predict.py b/tutorials/ml_components_demos/try-kserve/predict.py new file mode 100644 index 0000000..ba015b4 --- /dev/null +++ b/tutorials/ml_components_demos/try-kserve/predict.py @@ -0,0 +1,161 @@ +import re +from urllib.parse import urlsplit +from kserve import KServeClient +import requests +import logging + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + + +def get_istio_auth_session(url: str, username: str, password: str) -> dict: + """ + Determine if the specified URL is secured by Dex and try to obtain a session cookie. + WARNING: only Dex `staticPasswords` and `LDAP` authentication are currently supported + (we default default to using `staticPasswords` if both are enabled) + + :param url: Kubeflow server URL, including protocol + :param username: Dex `staticPasswords` or `LDAP` username + :param password: Dex `staticPasswords` or `LDAP` password + :return: auth session information + """ + # define the default return object + auth_session = { + "endpoint_url": url, # KF endpoint URL + "redirect_url": None, # KF redirect URL, if applicable + "dex_login_url": None, # Dex login URL (for POST of credentials) + "is_secured": None, # True if KF endpoint is secured + "session_cookie": None # Resulting session cookies in the form "key1=value1; key2=value2" + } + + # use a persistent session (for cookies) + with requests.Session() as s: + + ################ + # Determine if Endpoint is Secured + ################ + resp = s.get(url, allow_redirects=True) + if resp.status_code != 200: + raise RuntimeError( + f"HTTP status code '{resp.status_code}' for GET against: {url}" + ) + + auth_session["redirect_url"] = resp.url + + # if we were NOT redirected, then the endpoint is UNSECURED + if len(resp.history) == 0: + auth_session["is_secured"] = False + return auth_session + else: + auth_session["is_secured"] = True + + ################ + # Get Dex Login URL + ################ + redirect_url_obj = urlsplit(auth_session["redirect_url"]) + + # if we are at `/auth?=xxxx` path, we need to select an auth type + if re.search(r"/auth$", redirect_url_obj.path): + + ####### + # TIP: choose the default auth type by including ONE of the following + ####### + + # OPTION 1: set "staticPasswords" as default auth type + redirect_url_obj = redirect_url_obj._replace( + path=re.sub(r"/auth$", "/auth/local", redirect_url_obj.path) + ) + # OPTION 2: set "ldap" as default auth type + # redirect_url_obj = redirect_url_obj._replace( + # path=re.sub(r"/auth$", "/auth/ldap", redirect_url_obj.path) + # ) + + # if we are at `/auth/xxxx/login` path, then no further action is needed (we can use it for login POST) + if re.search(r"/auth/.*/login$", redirect_url_obj.path): + auth_session["dex_login_url"] = redirect_url_obj.geturl() + + # else, we need to be redirected to the actual login page + else: + # this GET should redirect us to the `/auth/xxxx/login` path + resp = s.get(redirect_url_obj.geturl(), allow_redirects=True) + if resp.status_code != 200: + raise RuntimeError( + f"HTTP status code '{resp.status_code}' for GET against: {redirect_url_obj.geturl()}" + ) + + # set the login url + auth_session["dex_login_url"] = resp.url + + ################ + # Attempt Dex Login + ################ + resp = s.post( + auth_session["dex_login_url"], + data={"login": username, "password": password}, + allow_redirects=True + ) + if len(resp.history) == 0: + raise RuntimeError( + f"Login credentials were probably invalid - " + f"No redirect after POST to: {auth_session['dex_login_url']}" + ) + + # store the session cookies in a "key1=value1; key2=value2" string + auth_session["session_cookie"] = "; ".join([f"{c.name}={c.value}" for c in s.cookies]) + + return auth_session + + +# =========================== GET AUTH SESSION TOKEN + +KUBEFLOW_ENDPOINT = "http://localhost:8080" +KUBEFLOW_USERNAME = "user@example.com" +KUBEFLOW_PASSWORD = "12341234" +NAMESPACE = "kubeflow-user-example-com" + +auth_session = get_istio_auth_session( + url=KUBEFLOW_ENDPOINT, + username=KUBEFLOW_USERNAME, + password=KUBEFLOW_PASSWORD +) +logger.info(auth_session["session_cookie"]) + +TOKEN = auth_session["session_cookie"].replace("authservice_session=", "") +logger.info("Token:", TOKEN) + +# =========================== SEND PREDICTION REQUEST + +# $ kubectl port-forward --namespace istio-system svc/istio-ingressgateway 8080:80 + +model_name = "sklearn-iris" + +# get inference service +KServe = KServeClient() + +# wait for deployment to be ready +KServe.get(model_name, namespace=NAMESPACE, watch=True, timeout_seconds=120) + +inference_service = KServe.get(model_name, namespace=NAMESPACE) +is_url = inference_service['status']['address']['url'] + +logger.info(f"\nInference service status:\n{inference_service['status']}") +logger.info(f"\nInference service URL:\n{is_url}\n") + +inference_input = { + "instances": [ + [6.8, 2.8, 4.8, 1.4], + [6.0, 3.4, 4.5, 1.6] + ] +} + +response = requests.post( + "http://localhost:8080/v1/models/sklearn-iris:predict", + json=inference_input, + cookies={"authservice_session": TOKEN}, + headers={ + "Host": "sklearn-iris.kubeflow-user-example-com.example.com" + } +) + +logger.info(f"{response}") +logger.info(f"\nPrediction response:\n{response.text}\n") diff --git a/tutorials/ml_components_demos/try-kserve/sklearn-iris-model.yaml b/tutorials/ml_components_demos/try-kserve/sklearn-iris-model.yaml new file mode 100644 index 0000000..2212bfd --- /dev/null +++ b/tutorials/ml_components_demos/try-kserve/sklearn-iris-model.yaml @@ -0,0 +1,9 @@ +apiVersion: "serving.kserve.io/v1beta1" +kind: "InferenceService" +metadata: + name: sklearn-iris + namespace: kubeflow-user-example-com +spec: + predictor: + sklearn: + storageUri: "gs://kfserving-examples/models/sklearn/1.0/model" \ No newline at end of file diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/.gitignore b/tutorials/ml_components_demos/try-kubeflow-pipelines/.gitignore new file mode 100644 index 0000000..6aee440 --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/.gitignore @@ -0,0 +1,166 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +# For a library or package, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# poetry +# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. +# This is especially recommended for binary packages to ensure reproducibility, and is more +# commonly ignored for libraries. +# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control +#poetry.lock + +# pdm +# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. +#pdm.lock +# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it +# in version control. +# https://pdm.fming.dev/#use-with-ide +.pdm.toml + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# PyCharm +# JetBrains specific template is maintained in a separate JetBrains.gitignore that can +# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore +# and can be added to the global gitignore or merged into this file. For a more nuclear +# option (not recommended) you can uncomment the following to ignore the entire idea folder. +#.idea/ +.python-version + +.vscode + +pipeline.yaml +versions/ diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/Dockerfile b/tutorials/ml_components_demos/try-kubeflow-pipelines/Dockerfile new file mode 100644 index 0000000..fa12fec --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/Dockerfile @@ -0,0 +1,10 @@ +FROM python:3.8.1 + +WORKDIR /app + +COPY requirements.txt . +RUN pip install -r requirements.txt --quiet --no-cache-dir + +COPY train.py . + +CMD ["python", "train.py"] diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/README.md b/tutorials/ml_components_demos/try-kubeflow-pipelines/README.md new file mode 100644 index 0000000..ffb85d3 --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/README.md @@ -0,0 +1,102 @@ +# Sample Kubeflow component and pipeline + +This is a sample of a Kubeflow Pipeline component and pipeline adapted from [here](https://github.com/kubeflow/pipelines/tree/sdk/release-1.8/components/sample/keras/train_classifier). + +The purpose is to show how to create a simple pipeline component and run a KFP pipeline. +By default, the example uses a local Kind cluster (`kind-ep`) and the local docker repository. Modify the files appropriately for your own environment if needed. + +This example uses custom containers for components. You may also want to learn about [building Python function-based components](https://www.kubeflow.org/docs/components/pipelines/sdk-v2/python-function-components/) as an alternative approach. + +This example uses [Pipelines SDK v2](https://www.kubeflow.org/docs/components/pipelines/sdk-v2/). + +## Pre-requisites + +Ensure your `kubectl` has correct context pointing to the desired cluster. For example, for the `kind-ep` cluster: + +```bash +kubectl config use-context kind-kind-ep +``` + +## Push container image + +The file [`train.py`](./train.py) contains sample for model training. MLflow is used for experiment tracking. + +The file [`Dockerfile`](./Dockerfile) contains the commands to assemble a Docker image for training. + +Image is built [`build_image.sh`](./build_image.sh). Read through the script. By default, images are loaded into the local cluster directly from the local docker repository using the `kind load docker-image` command. + +Build and load the image into the cluster: + +```bash +./build_image.sh +``` + +## Create component + +Kubeflow pipeline component for training is defined in [`component.yaml`](./component.yaml). See the documentation on [component specification](https://www.kubeflow.org/docs/components/pipelines/reference/component-spec/) to understand how components are defined. In brief, every component has + +- metadata such as name and description +- implementation specifying how to execute the component instance: Docker image, command and arguments +- interface specifying the inputs and outputs + +Update the container image under `implementation.container.image` so that it matches the image pushed with `build_image.sh`. + +## Create pipeline + +The file [`pipeline.py`](./pipeline.py) contains the definition for the Kubeflow pipeline: + +```python +# pipeline.py +import kfp + +# Load component from YAML file +train_op = kfp.components.load_component_from_file('component.yaml') + +@kfp.dsl.pipeline(name='Example Kubeflow pipeline', description='Pipeline to test an example component') +def pipeline(): + train_task = train_op() + +def compile(): + kfp.compiler.Compiler().compile( + pipeline_func=pipeline, + package_path='pipeline.yaml' + ) + +if __name__ == '__main__': + compile() +``` + +Compile the pipeline to `pipeline.yaml`: + +```bash +python pipeline.py +``` + +Submit pipeline run to Kubeflow Pipelines: + +```bash +python submit.py +``` + +You can also submit the pipeline file manually in Kubeflow Pipelines UI. + +## Dashboards + +The default way of accessing Kubeflow is via port-forward. This enables you to get started quickly without imposing any requirements on your environment. Run the following to port-forward Istio's Ingress-Gateway to local port `8080`: + +```sh +kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80 +``` + +After running the command, you can access the Kubeflow Central Dashboard by doing the following: + +1. Open your browser and visit [http://localhost:8080/](http://localhost:8080/). You should get the Dex login screen. +2. Login with the default user's credential. The default email address is `user@example.com` and the default password is `12341234`. + +To access MLFlow UI, forward a local port to MLFlow server with: + +```bash +kubectl -n mlflow port-forward svc/mlflow 5000:5000 +``` + +Then access the MLflow UI at [`http://localhost:5000`](http://localhost:5000). diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/build_image.sh b/tutorials/ml_components_demos/try-kubeflow-pipelines/build_image.sh new file mode 100755 index 0000000..4fd8b51 --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/build_image.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +set -eux + +IMAGE_NAME=try-out/kfp-test-image + +IMAGE_TAG=test-kfp + +FULL_IMAGE_NAME=${IMAGE_NAME}:${IMAGE_TAG} + +cd "$(dirname "$0")" + +docker build -t "$FULL_IMAGE_NAME" . + +# load the image into the local "kind" cluster with name "kind-ep" +kind load docker-image "$FULL_IMAGE_NAME" --name kind-ep + +# to push the image to a remote repository instead +# docker push "$FULL_IMAGE_NAME" \ No newline at end of file diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/component.yaml b/tutorials/ml_components_demos/try-kubeflow-pipelines/component.yaml new file mode 100644 index 0000000..f6158f9 --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/component.yaml @@ -0,0 +1,9 @@ +name: KFP example +description: Example component using MLflow +inputs: [] +outputs: [] +implementation: + container: + image: try-out/kfp-test-image:test-kfp + command: [python, train.py] + args: [] diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/pipeline.py b/tutorials/ml_components_demos/try-kubeflow-pipelines/pipeline.py new file mode 100644 index 0000000..a351588 --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/pipeline.py @@ -0,0 +1,16 @@ +import kfp + +train_op = kfp.components.load_component_from_file('component.yaml') + +@kfp.dsl.pipeline(name='Example Kubeflow pipeline', description='Pipeline to test an example component') +def pipeline(): + train_task = train_op() + +def compile(): + kfp.compiler.Compiler().compile( + pipeline_func=pipeline, + package_path='pipeline.yaml' + ) + +if __name__ == '__main__': + compile() diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/requirements.txt b/tutorials/ml_components_demos/try-kubeflow-pipelines/requirements.txt new file mode 100644 index 0000000..a36e12c --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/requirements.txt @@ -0,0 +1,2 @@ +kfp~=1.8.12 +mlflow diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/submit.py b/tutorials/ml_components_demos/try-kubeflow-pipelines/submit.py new file mode 100644 index 0000000..412f4bf --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/submit.py @@ -0,0 +1,125 @@ +import re +import requests +from urllib.parse import urlsplit +import kfp + + +def get_istio_auth_session(url: str, username: str, password: str) -> dict: + """ + Determine if the specified URL is secured by Dex and try to obtain a session cookie. + WARNING: only Dex `staticPasswords` and `LDAP` authentication are currently supported + (we default default to using `staticPasswords` if both are enabled) + + :param url: Kubeflow server URL, including protocol + :param username: Dex `staticPasswords` or `LDAP` username + :param password: Dex `staticPasswords` or `LDAP` password + :return: auth session information + """ + # define the default return object + auth_session = { + "endpoint_url": url, # KF endpoint URL + "redirect_url": None, # KF redirect URL, if applicable + "dex_login_url": None, # Dex login URL (for POST of credentials) + "is_secured": None, # True if KF endpoint is secured + "session_cookie": None # Resulting session cookies in the form "key1=value1; key2=value2" + } + + # use a persistent session (for cookies) + with requests.Session() as s: + + ################ + # Determine if Endpoint is Secured + ################ + resp = s.get(url, allow_redirects=True) + if resp.status_code != 200: + raise RuntimeError( + f"HTTP status code '{resp.status_code}' for GET against: {url}" + ) + + auth_session["redirect_url"] = resp.url + + # if we were NOT redirected, then the endpoint is UNSECURED + if len(resp.history) == 0: + auth_session["is_secured"] = False + return auth_session + else: + auth_session["is_secured"] = True + + ################ + # Get Dex Login URL + ################ + redirect_url_obj = urlsplit(auth_session["redirect_url"]) + + # if we are at `/auth?=xxxx` path, we need to select an auth type + if re.search(r"/auth$", redirect_url_obj.path): + + ####### + # TIP: choose the default auth type by including ONE of the following + ####### + + # OPTION 1: set "staticPasswords" as default auth type + redirect_url_obj = redirect_url_obj._replace( + path=re.sub(r"/auth$", "/auth/local", redirect_url_obj.path) + ) + # OPTION 2: set "ldap" as default auth type + # redirect_url_obj = redirect_url_obj._replace( + # path=re.sub(r"/auth$", "/auth/ldap", redirect_url_obj.path) + # ) + + # if we are at `/auth/xxxx/login` path, then no further action is needed (we can use it for login POST) + if re.search(r"/auth/.*/login$", redirect_url_obj.path): + auth_session["dex_login_url"] = redirect_url_obj.geturl() + + # else, we need to be redirected to the actual login page + else: + # this GET should redirect us to the `/auth/xxxx/login` path + resp = s.get(redirect_url_obj.geturl(), allow_redirects=True) + if resp.status_code != 200: + raise RuntimeError( + f"HTTP status code '{resp.status_code}' for GET against: {redirect_url_obj.geturl()}" + ) + + # set the login url + auth_session["dex_login_url"] = resp.url + + ################ + # Attempt Dex Login + ################ + resp = s.post( + auth_session["dex_login_url"], + data={"login": username, "password": password}, + allow_redirects=True + ) + if len(resp.history) == 0: + raise RuntimeError( + f"Login credentials were probably invalid - " + f"No redirect after POST to: {auth_session['dex_login_url']}" + ) + + # store the session cookies in a "key1=value1; key2=value2" string + auth_session["session_cookie"] = "; ".join([f"{c.name}={c.value}" for c in s.cookies]) + + return auth_session + + +KUBEFLOW_ENDPOINT = "http://localhost:8080" +KUBEFLOW_USERNAME = "user@example.com" +KUBEFLOW_PASSWORD = "12341234" +NAMESPACE = "kubeflow-user-example-com" + +auth_session = get_istio_auth_session( + url=KUBEFLOW_ENDPOINT, + username=KUBEFLOW_USERNAME, + password=KUBEFLOW_PASSWORD +) + +client = kfp.Client(host=f"{KUBEFLOW_ENDPOINT}/pipeline", cookies=auth_session["session_cookie"]) + +created_run = client.create_run_from_pipeline_package( + pipeline_file="pipeline.yaml", + arguments={}, + enable_caching=False, + run_name="try-kfp-run", + experiment_name="try-kfp-experiment", + namespace=NAMESPACE +) diff --git a/tutorials/ml_components_demos/try-kubeflow-pipelines/train.py b/tutorials/ml_components_demos/try-kubeflow-pipelines/train.py new file mode 100644 index 0000000..b39ed7b --- /dev/null +++ b/tutorials/ml_components_demos/try-kubeflow-pipelines/train.py @@ -0,0 +1,24 @@ +import mlflow + +MLFLOW_TRACKING_URI = "http://mlflow.mlflow.svc.cluster.local:5000" +MLFLOW_EXPERIMENT_NAME = "Kubeflow Pipeline try-out run" + +print('START') + + +def main(): + mlflow.set_tracking_uri(MLFLOW_TRACKING_URI) + experiment = mlflow.get_experiment_by_name(MLFLOW_EXPERIMENT_NAME) + + if experiment is None: + experiment_id = mlflow.create_experiment(MLFLOW_EXPERIMENT_NAME) + else: + experiment_id = experiment.experiment_id + + with mlflow.start_run(experiment_id=experiment_id) as run: + mlflow.log_param("my", "param") + mlflow.log_metric("score", 100) + + +if __name__ == '__main__': + main() diff --git a/tutorials/ml_components_demos/try-mlflow/README.md b/tutorials/ml_components_demos/try-mlflow/README.md new file mode 100644 index 0000000..14e7199 --- /dev/null +++ b/tutorials/ml_components_demos/try-mlflow/README.md @@ -0,0 +1,132 @@ +## Test MLFlow deployment + +First, make sure mlflow and minio server are both, MLflow ([http://localhost:5000](http://localhost:5000)) +and Minio ([http://localhost:9000](http://localhost:9000)), are accessible: + +MLflow: + +```bash +kubectl -n mlflow port-forward svc/mlflow 5000:5000 +``` + +MinIO: + +```bash +kubectl -n mlflow port-forward svc/mlflow-minio-service 9000:9000 +``` + +### Create an experiment run + +Create a new working directory and a virtual environment with your method of choice. + +Install dependencies: + +```bash +pip install mlflow google-cloud-storage scikit-learn boto3 +``` + +Create a sample Python file named `train.py` adapted from [train.py](https://github.com/mlflow/mlflow/blob/master/examples/sklearn_elasticnet_wine/train.py) used in the [MLflow tutorial](https://www.mlflow.org/docs/latest/tutorials-and-examples/tutorial.html): + +```python +# train.py +# Adapted from https://github.com/mlflow/mlflow/blob/master/examples/sklearn_elasticnet_wine/train.py +import os +import logging +import sys + +import mlflow +import mlflow.sklearn +import numpy as np +import pandas as pd +from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score +from sklearn.model_selection import train_test_split +from sklearn.linear_model import ElasticNet +import os + +os.environ['MLFLOW_S3_ENDPOINT_URL'] = 'http://localhost:9000/' +os.environ['AWS_ACCESS_KEY_ID'] = 'minioadmin' +os.environ['AWS_SECRET_ACCESS_KEY'] = 'minioadmin' + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +MLFLOW_TRACKING_URI = "http://localhost:5000" +MLFLOW_EXPERIMENT_NAME = "mlflow-minio-test" + + +def eval_metrics(actual, pred): + rmse = np.sqrt(mean_squared_error(actual, pred)) + mae = mean_absolute_error(actual, pred) + r2 = r2_score(actual, pred) + return rmse, mae, r2 + + +def main(): + np.random.seed(40) + + # Read the wine-quality csv file from the URL + csv_url = ( + "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv" + ) + + data = pd.read_csv(csv_url, sep=";") + + # Split the data into training and test sets. (0.75, 0.25) split. + train, test = train_test_split(data) + + # The predicted column is "quality" which is a scalar from [3, 9] + train_x = train.drop(["quality"], axis=1) + test_x = test.drop(["quality"], axis=1) + train_y = train[["quality"]] + test_y = test[["quality"]] + + alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5 + l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5 + + logger.info(f"Using MLflow tracking URI: {MLFLOW_TRACKING_URI}") + mlflow.set_tracking_uri(MLFLOW_TRACKING_URI) + + logger.info(f"Using MLflow experiment: {MLFLOW_EXPERIMENT_NAME}") + mlflow.set_experiment(MLFLOW_EXPERIMENT_NAME) + + with mlflow.start_run(): + lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) + + logger.info("Fitting model...") + + lr.fit(train_x, train_y) + + logger.info("Finished fitting") + + predicted_qualities = lr.predict(test_x) + + (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities) + + logger.info("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio)) + logger.info(" RMSE: %s" % rmse) + logger.info(" MAE: %s" % mae) + logger.info(" R2: %s" % r2) + + logger.info("Logging parameters to MLflow") + mlflow.log_param("alpha", alpha) + mlflow.log_param("l1_ratio", l1_ratio) + mlflow.log_metric("rmse", rmse) + mlflow.log_metric("r2", r2) + mlflow.log_metric("mae", mae) + + logger.info("Logging trained model") + mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel") + +if __name__ == '__main__': + main() + +``` + +Run the script: + +```bash +python train.py +``` + +After the script finishes, navigate to the MLflow UI at [http://localhost:5000](http://localhost:5000), +and you should see your run under the experiment "mlflow-minio-test". Browse the run parameters, metrics and artifacts. \ No newline at end of file