Skip to content

Commit

Permalink
Merge branch 'staging' into simonz/dockerfile
Browse files Browse the repository at this point in the history
  • Loading branch information
SimonYansenZhao committed Nov 12, 2024
2 parents 040ba1b + 12bc1e4 commit 06accbf
Show file tree
Hide file tree
Showing 11 changed files with 91 additions and 40 deletions.
14 changes: 11 additions & 3 deletions .github/actions/azureml-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,15 @@ inputs:
TEST_KIND:
required: true
description: Type of test - unit or nightly
AZUREML_TEST_CREDENTIALS:
AZUREML_TEST_UMI_CLIENT_ID:
required: true
description: Credentials for AzureML login
description: AzureML User-managed identity client ID
AZUREML_TEST_UMI_TENANT_ID:
required: true
description: AzureML User-managed identity tenant ID
AZUREML_TEST_UMI_SUB_ID:
required: true
description: AzureML User-managed identity subscription ID
AZUREML_TEST_SUBID:
required: true
description: AzureML subscription ID
Expand Down Expand Up @@ -53,7 +59,9 @@ runs:
- name: Log in to Azure
uses: azure/login@v2
with:
creds: ${{ inputs.AZUREML_TEST_CREDENTIALS }}
client-id: ${{ inputs.AZUREML_TEST_UMI_CLIENT_ID }}
tenant-id: ${{ inputs.AZUREML_TEST_UMI_TENANT_ID }}
subscription-id: ${{ inputs.AZUREML_TEST_UMI_SUB_ID }}
- name: Submit tests to AzureML
shell: bash
run: |
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/azureml-cpu-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ jobs:
needs: get-test-groups
name: ${{ join(matrix.*, ', ') }}
runs-on: ubuntu-latest
permissions:
id-token: write # This is required for requesting the JWT
strategy:
max-parallel: 50 # Usage limits: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
matrix:
Expand All @@ -79,7 +81,9 @@ jobs:
EXP_NAME: recommenders-nightly-${{ matrix.test-group }}-python${{ matrix.python-version }}-${{ github.ref }}
ENV_NAME: recommenders-${{ github.sha }}-python${{ matrix.python-version }}${{ contains(matrix.test-group, 'gpu') && '-gpu' || '' }}${{ contains(matrix.test-group, 'spark') && '-spark' || '' }}
TEST_KIND: 'nightly'
AZUREML_TEST_CREDENTIALS: ${{ secrets.AZUREML_TEST_CREDENTIALS }}
AZUREML_TEST_UMI_CLIENT_ID: ${{ secrets.AZUREML_TEST_UMI_CLIENT_ID }}
AZUREML_TEST_UMI_TENANT_ID: ${{ secrets.AZUREML_TEST_UMI_TENANT_ID }}
AZUREML_TEST_UMI_SUB_ID: ${{ secrets.AZUREML_TEST_UMI_SUB_ID }}
AZUREML_TEST_SUBID: ${{ secrets.AZUREML_TEST_SUBID }}
PYTHON_VERSION: ${{ matrix.python-version }}
TEST_GROUP: ${{ matrix.test-group }}
6 changes: 5 additions & 1 deletion .github/workflows/azureml-gpu-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ jobs:
needs: get-test-groups
name: ${{ join(matrix.*, ', ') }}
runs-on: ubuntu-latest
permissions:
id-token: write # This is required for requesting the JWT
strategy:
max-parallel: 50 # Usage limits: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
matrix:
Expand All @@ -79,7 +81,9 @@ jobs:
EXP_NAME: recommenders-nightly-${{ matrix.test-group }}-python${{ matrix.python-version }}-${{ github.ref }}
ENV_NAME: recommenders-${{ github.sha }}-python${{ matrix.python-version }}${{ contains(matrix.test-group, 'gpu') && '-gpu' || '' }}${{ contains(matrix.test-group, 'spark') && '-spark' || '' }}
TEST_KIND: 'nightly'
AZUREML_TEST_CREDENTIALS: ${{ secrets.AZUREML_TEST_CREDENTIALS }}
AZUREML_TEST_UMI_CLIENT_ID: ${{ secrets.AZUREML_TEST_UMI_CLIENT_ID }}
AZUREML_TEST_UMI_TENANT_ID: ${{ secrets.AZUREML_TEST_UMI_TENANT_ID }}
AZUREML_TEST_UMI_SUB_ID: ${{ secrets.AZUREML_TEST_UMI_SUB_ID }}
AZUREML_TEST_SUBID: ${{ secrets.AZUREML_TEST_SUBID }}
PYTHON_VERSION: ${{ matrix.python-version }}
TEST_GROUP: ${{ matrix.test-group }}
2 changes: 1 addition & 1 deletion .github/workflows/azureml-release-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- name: Setup python
uses: actions/setup-python@v5
with:
python-version: "3.8"
python-version: "3.10"
- name: Install wheel package
run: pip install wheel
- name: Create wheel from setup.py
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/azureml-spark-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ jobs:
needs: get-test-groups
name: ${{ join(matrix.*, ', ') }}
runs-on: ubuntu-latest
permissions:
id-token: write # This is required for requesting the JWT
strategy:
max-parallel: 50 # Usage limits: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
matrix:
Expand All @@ -78,7 +80,9 @@ jobs:
EXP_NAME: recommenders-nightly-${{ matrix.test-group }}-python${{ matrix.python-version }}-${{ github.ref }}
ENV_NAME: recommenders-${{ github.sha }}-python${{ matrix.python-version }}${{ contains(matrix.test-group, 'gpu') && '-gpu' || '' }}${{ contains(matrix.test-group, 'spark') && '-spark' || '' }}
TEST_KIND: 'nightly'
AZUREML_TEST_CREDENTIALS: ${{ secrets.AZUREML_TEST_CREDENTIALS }}
AZUREML_TEST_UMI_CLIENT_ID: ${{ secrets.AZUREML_TEST_UMI_CLIENT_ID }}
AZUREML_TEST_UMI_TENANT_ID: ${{ secrets.AZUREML_TEST_UMI_TENANT_ID }}
AZUREML_TEST_UMI_SUB_ID: ${{ secrets.AZUREML_TEST_UMI_SUB_ID }}
AZUREML_TEST_SUBID: ${{ secrets.AZUREML_TEST_SUBID }}
PYTHON_VERSION: ${{ matrix.python-version }}
TEST_GROUP: ${{ matrix.test-group }}
6 changes: 5 additions & 1 deletion .github/workflows/azureml-unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ jobs:
needs: get-test-groups
name: ${{ join(matrix.*, ', ') }}
runs-on: ubuntu-latest
permissions:
id-token: write # This is required for requesting the JWT
strategy:
max-parallel: 50 # Usage limits: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
matrix:
Expand All @@ -68,7 +70,9 @@ jobs:
EXP_NAME: recommenders-unit-${{ matrix.test-group }}-python${{ matrix.python-version }}-${{ github.sha }}
ENV_NAME: recommenders-${{ github.sha }}-python${{ matrix.python-version }}${{ contains(matrix.test-group, 'gpu') && '-gpu' || '' }}${{ contains(matrix.test-group, 'spark') && '-spark' || '' }}
TEST_KIND: 'unit'
AZUREML_TEST_CREDENTIALS: ${{ secrets.AZUREML_TEST_CREDENTIALS }}
AZUREML_TEST_UMI_CLIENT_ID: ${{ secrets.AZUREML_TEST_UMI_CLIENT_ID }}
AZUREML_TEST_UMI_TENANT_ID: ${{ secrets.AZUREML_TEST_UMI_TENANT_ID }}
AZUREML_TEST_UMI_SUB_ID: ${{ secrets.AZUREML_TEST_UMI_SUB_ID }}
AZUREML_TEST_SUBID: ${{ secrets.AZUREML_TEST_SUBID }}
PYTHON_VERSION: ${{ matrix.python-version }}
TEST_GROUP: ${{ matrix.test-group }}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ We provide a [benchmark notebook](examples/06_benchmarks/movielens.ipynb) to ill

This project welcomes contributions and suggestions. Before contributing, please see our [contribution guidelines](CONTRIBUTING.md).

This project adheres to [Microsoft's Open Source Code of Conduct](CODE_OF_CONDUCT.md) in order to foster a welcoming and inspiring community for all.
This project adheres to this [Code of Conduct](CODE_OF_CONDUCT.md) in order to foster a welcoming and inspiring community for all.

## Build Status

Expand Down
11 changes: 7 additions & 4 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,19 @@ pip install recommenders[spark]
# c. Run the notebook.
```

## Setup for Azure Databricks
## Setup for Databricks

The following instructions were tested on Azure Databricks Runtime 12.2 LTS (Apache Spark version 3.3.2) and 11.3 LTS (Apache Spark version 3.3.0).
As of April 2023, Databricks Runtime 13 is not yet supported as it is on Python 3.10.
The following instructions were tested on Databricks Runtime 15.4 LTS (Apache Spark version 3.5.0), 14.3 LTS (Apache Spark version 3.5.0), 13.3 LTS (Apache Spark version 3.4.1), and 12.2 LTS (Apache Spark version 3.3.2). We have tested the runtime on python 3.9,3.10 and 3.11.

After an Azure Databricks cluster is provisioned:
After an Databricks cluster is provisioned:
```bash
# 1. Go to the "Compute" tab on the left of the page, click on the provisioned cluster and then click on "Libraries".
# 2. Click the "Install new" button.
# 3. In the popup window, select "PyPI" as the library source. Enter "recommenders[examples]" as the package name. Click "Install" to install the package.
# 4. Now, repeat the step 3 for below packages:
# a. numpy<2.0.0
# b. pandera<=0.18.3
# c. scipy<=1.13.1
```

### Prepare Azure Databricks for Operationalization
Expand Down
6 changes: 4 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,23 +36,25 @@
"nltk>=3.8.1,<4", # requires tqdm
"notebook>=6.5.5,<8", # requires ipykernel, jinja2, jupyter, nbconvert, nbformat, packaging, requests
"numba>=0.57.0,<1",
"numpy<2.0.0", # FIXME: Remove numpy<2.0.0 once cornac release a version newer than 2.2.1 that resolve ImportError: numpy.core.multiarray failed to import.
"pandas>2.0.0,<3.0.0", # requires numpy
"pandera[strategies]>=0.6.5,<0.18;python_version<='3.8'", # For generating fake datasets
"pandera[strategies]>=0.15.0;python_version>='3.9'",
"retrying>=1.3.4,<2",
"scikit-learn>=1.2.0,<2", # requires scipy, and introduce breaking change affects feature_extraction.text.TfidfVectorizer.min_df
"scikit-surprise>=1.1.3",
"scipy>=1.10.1,<=1.13.1", # FIXME: Remove scipy<=1.13.1 once cornac release a version newer than 2.2.1. See #2128
"seaborn>=0.13.0,<1", # requires matplotlib, packaging
"statsmodels<=0.14.1;python_version<='3.8'",
"statsmodels>=0.14.4;python_version>='3.9'",
"transformers>=4.27.0,<5", # requires packaging, pyyaml, requests, tqdm
]

# shared dependencies
extras_require = {
"gpu": [
"fastai>=2.7.11,<3",
"numpy<1.25.0;python_version<='3.8'",
"nvidia-ml-py>=11.525.84",
"spacy<=3.7.5;python_version<='3.8'",
"tensorflow>=2.8.4,!=2.9.0.*,!=2.9.1,!=2.9.2,!=2.10.0.*,<2.16", # Fixed TF due to constant security problems and breaking changes #2073
"tf-slim>=1.1.0", # No python_requires in its setup.py
"torch>=2.0.1,<3",
Expand Down
58 changes: 38 additions & 20 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,28 +216,46 @@ Then, follow the steps below to create the AzureML infrastructure:
- Name: `azureml-test-workspace`
- Resource group: `recommenders_project_resources`
- Location: *Make sure you have enough quota in the location you choose*
2. Create two new clusters: `cpu-cluster` and `gpu-cluster`. Go to compute, then compute cluster, then new.
1. Create two new clusters: `cpu-cluster` and `gpu-cluster`. Go to compute, then compute cluster, then new.
- Select the CPU VM base. Anything above 64GB of RAM, and 8 cores should be fine.
- Select the GPU VM base. Anything above 56GB of RAM, and 6 cores, and an NVIDIA K80 should be fine.
3. Add the subscription ID to GitHub action secrets [here](https://github.com/recommenders-team/recommenders/settings/secrets/actions). Create a new repository secret called `AZUREML_TEST_SUBID` and add the subscription ID as the value.
4. Make sure you have installed [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli), and that you are logged in: `az login`.
5. Select your subscription: `az account set -s $AZURE_SUBSCRIPTION_ID`.
6. Create a Service Principal: `az ad sp create-for-rbac --name $SERVICE_PRINCIPAL_NAME --role contributor --scopes /subscriptions/$AZURE_SUBSCRIPTION_ID --json-auth`. This will output a JSON blob with the credentials of the Service Principal:
```
{
"clientId": "XXXXXXXXXXXXXXXXXXXXX",
"clientSecret": "XXXXXXXXXXXXXXXXXXXXX",
"subscriptionId": "XXXXXXXXXXXXXXXXXXXXX",
"tenantId": "XXXXXXXXXXXXXXXXXXXXX",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com/",
"activeDirectoryGraphResourceId": "https://graph.windows.net/",
"sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net/"
}
```
7. Add the output as github's action secret `AZUREML_TEST_CREDENTIALS` under repository's **Settings > Security > Secrets and variables > Actions**.
1. Add the subscription ID to GitHub action secrets
[here](https://github.com/recommenders-team/recommenders/settings/secrets/actions).
* Create a new repository secret called `AZUREML_TEST_SUBID` and
add the subscription ID as the value.
1. Set up [login with OpenID Connect
(OIDC)](https://github.com/marketplace/actions/azure-login#login-with-openid-connect-oidc-recommended)
for GitHub Actions.
1. Create a user-assigned managed identity (UMI) and assign the
following 3 roles of the AzureML workspace created above to the
UMI (See [Create a user-assigned managed
identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp#create-a-user-assigned-managed-identity)):
* AzureML Compute Operator
* AzureML Data Scientist
* Reader
1. [Create a federated identiy credential on the
UMI](https://learn.microsoft.com/en-us/entra/workload-id/workload-identity-federation-create-trust-user-assigned-managed-identity?pivots=identity-wif-mi-methods-azp#github-actions-deploying-azure-resources)
with the following settings:
* Name: A unique name for the federated identity credential
within your application.
* Issuer: Set to `https://token.actions.githubusercontent.com`
for GitHub Actions.
* Subject: The subject claim format, e.g.,
`repo:recommenders-team/recommenders:ref:refs/heads/<branch-name>`:
+ `repo:recommenders-team/recommenders:pull_request`
+ `repo:recommenders-team/recommenders:ref:refs/heads/staging`
+ `repo:recommenders-team/recommenders:ref:refs/heads/main`
* Description: (Optional) A description of the credential.
* Audiences: Specifies who can use this credential; for GitHub
Actions, use `api://AzureADTokenExchange`.
1. Create 3 Actions secrets
* `AZUREML_TEST_UMI_TENANT_ID`
* `AZUREML_TEST_UMI_SUB_ID`
* `AZUREML_TEST_UMI_CLIENT_ID`

and use the UMI's tenant ID, subscription ID and client ID as the
values of the secrets, respectively, under the repository's
**Settings > Security > Secrets and variables > Actions**.


## How to execute tests in your local environment
Expand Down
14 changes: 9 additions & 5 deletions tests/ci/azureml_tests/post_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,12 @@ def parse_args():
run_id=run.info.run_id,
dst_path=args.log_dir,
)
log_path = pathlib.Path("user_logs/std_log.txt")
with open(pathlib.Path(args.log_dir) / log_path, "r") as file:
print(f"\nDumping logs in {log_path}")
print("=====================================")
print(file.read())
log_path = next(
(path for path in pathlib.Path(args.log_dir).rglob("std_log.txt")),
None
)
if log_path is not None:
with open(log_path, "r") as file:
print(f"\nDumping logs in {log_path}")
print("=====================================")
print(file.read())

0 comments on commit 06accbf

Please sign in to comment.