Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR updates the options field to use a case-insensitive dictionary to keep the behavior consistent with the Scala side (which uses CaseInsensitiveStringMap). Currently, options are stored in a normal Python dictionary which can be confusing to users. For instance:

class MyDataSource(DataSource):
    def __init__(self, options):
        self.api_key = options.get("API_KEY") # <- This is None

spark.read.format(..).option("API_KEY", my_key).load(...)

Here, options will not have this "API_KEY" as everything is converted to lowercase on the Scala side.

Why are the changes needed?

To improve usability.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@allisonwang-db allisonwang-db force-pushed the spark-46568-ds-options branch from 0cf2162 to 9075dab Compare January 3, 2024 08:17
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix Python flake8 check failure, @allisonwang-db .

flake8 checks failed:
./python/pyspark/sql/datasource.py:19:1: F401 'typing.final' imported but unused
from typing import final, Any, Dict, Iterator, List, Sequence, Tuple, Type, Union, TYPE_CHECKING

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@HyukjinKwon
Copy link
Member

linter failure seems related. let's fix up.

@allisonwang-db
Copy link
Contributor Author

@HyukjinKwon fixed!

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jan 5, 2024
…ive dictionary

### What changes were proposed in this pull request?

This PR updates the `options` field to use a case-insensitive dictionary to keep the behavior consistent with the Scala side (which uses `CaseInsensitiveStringMap`). Currently, `options` are stored in a normal Python dictionary which can be confusing to users. For instance:
```python
class MyDataSource(DataSource):
    def __init__(self, options):
        self.api_key = options.get("API_KEY") # <- This is None

spark.read.format(..).option("API_KEY", my_key).load(...)
```
Here, `options` will not have this "API_KEY" as everything is converted to lowercase on the Scala side.

### Why are the changes needed?

To improve usability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44564 from allisonwang-db/spark-46568-ds-options.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants