Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object_score: Support Azure Fabric OAuth Provider #6382

Merged
merged 5 commits into from
Sep 21, 2024

Conversation

RobinLin666
Copy link
Contributor

@RobinLin666 RobinLin666 commented Sep 11, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

In Azure Fabric, we use token service to get user access token, for supporting long reading and writing operation and auto refresh access token, we implement this.

What changes are included in this PR?

This pull request introduces significant enhancements to the Azure integration within the object_store module, including the implementation of a new FabricTokenOAuthProvider. These changes aim to improve the authentication mechanism and add support for fabric token services.

Azure Builder Enhancements:

  • Added new fields to MicrosoftAzureBuilder to support fabric token services, including fabric_token_service_url, fabric_workload_host, fabric_session_token, and fabric_cluster_identifier. (object_store/src/azure/builder.rs object_store/src/azure/builder.rsR175-R182)
  • Updated AzureConfigKey to include new configuration keys related to fabric token services. (object_store/src/azure/builder.rs object_store/src/azure/builder.rsR347-R374)
  • Modified impl AsRef<str> and impl FromStr for AzureConfigKey to handle new fabric token service keys. (object_store/src/azure/builder.rs [1] [2]
  • Enhanced MicrosoftAzureBuilder to set and get the new fabric token service-related fields. (object_store/src/azure/builder.rs [1] [2]
  • Added logic to MicrosoftAzureBuilder to create a FabricTokenOAuthProvider if fabric token service fields are provided. (object_store/src/azure/builder.rs object_store/src/azure/builder.rsR919-R942)

Credential Enhancements:

These changes collectively enhance the Azure integration by supporting more complex authentication mechanisms, particularly for environments utilizing fabric token services.

Are there any user-facing changes?

After that, we can set some environment variables to make it auto refresh access token in Fabric Notebook.

# For Fabric Spark Notebook
import os
import urllib.parse
workload_endpoint = urllib.parse.urlparse(f"{spark.conf.get('trident.lakehouse.tokenservice.endpoint')}/access")
os.environ['azure_fabric_token_service_url'.upper()] = f"https://{spark.conf.get('spark.tokenServiceEndpoint')}/api/v1/proxy{workload_endpoint.path}"
os.environ['azure_fabric_workload_host'.upper()] = f"{workload_endpoint.scheme}://{workload_endpoint.hostname}"
os.environ['azure_fabric_session_token'.upper()] = spark.conf.get("trident.session.token")
os.environ['azure_fabric_cluster_identifier'.upper()] = spark.conf.get("spark.synapse.clusteridentifier")
os.environ['azure_storage_token'.upper()] = notebookutils.credentials.getToken("storage")

# For Fabric Python Notebook
import os
from notebookutils.common import configs
import urllib.parse
workload_endpoint = urllib.parse.urlparse(f"{configs.workload_endpoint()}/access")
os.environ['azure_fabric_token_service_url'.upper()] = f"{configs.ts_endpoint()}/api/v1/proxy{workload_endpoint.path}"
os.environ['azure_fabric_workload_host'.upper()] = f"{workload_endpoint.scheme}://{workload_endpoint.hostname}"
os.environ['azure_fabric_session_token'.upper()] = configs.session_token()
os.environ['azure_fabric_cluster_identifier'.upper()] = configs.cluster_identifier()
os.environ['azure_storage_token'.upper()] = notebookutils.credentials.getToken("storage")

Then, user can read/write delta table without storage_option.

from deltalake import DeltaTable
dt = DeltaTable('abfss://[email protected]/LH.Lakehouse/Tables/dbo/test')
df = dt.to_pyarrow_dataset().head(10).to_pandas()

@github-actions github-actions bot added the object-store Object Store Interface label Sep 11, 2024
@tustvold
Copy link
Contributor

The JWT logic seems a little odd to me, are we just using it to decode the expiry? If so could we avoid the additional dependency?

@RobinLin666
Copy link
Contributor Author

The JWT logic seems a little odd to me, are we just using it to decode the expiry? If so could we avoid the additional dependency?

Hi @tustvold Thank you for you review. Yes, because Token Service only returns a JWT token, so I need to decode the expiry. Any advice without dependency?

@tustvold
Copy link
Contributor

https://jwt.io/introduction you should be able to simply split the string, base64 decode the middle chunk and parse the JSON

@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a way to test this, but it looks plausible to me. Thank you.

Perhaps @roeap you might be able to give this one a once over as well?

@RobinLin666
Copy link
Contributor Author

Thanks all, please help to merge the PR if no question.

@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

Let's wait a day or two before merging to see if @roeap has some time to review. This is getting very close.

Thanks for your patience @RobinLin666 and the help @tustvold

Copy link
Contributor

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this @RobinLin666.

Unfortunately I also don't have access to a fabric workspace to validate, but I assume @RobinLin666 can see this work live :).

@RobinLin666
Copy link
Contributor Author

RobinLin666 commented Sep 21, 2024

Thanks @roeap , yes, I have validated in Fabric Notebook.

image image

(For testing, I printed something, and deleted it in the PR)

@@ -336,6 +344,34 @@ pub enum AzureConfigKey {
/// - `disable_tagging`
DisableTagging,

/// Fabric token service url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and this enum is marked #[non_exhaustive] and thus it is ok to add new variants without breaking the API

@alamb
Copy link
Contributor

alamb commented Sep 21, 2024

Thank you very much @tustvold @RobinLin666 and @roeap 🙏

@alamb alamb merged commit d727503 into apache:master Sep 21, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants