Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding support for dictionary writes to online store #4156

Merged
merged 10 commits into from
Apr 30, 2024

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Apr 28, 2024

What this PR does / why we need it:

This PR enables support for passing a Python Dict object to write to the online store. This is a first step along the path to optimizing feature writes to the online store.

Note, this won't speed up anything and, in fact, introduces minor latency to serialize the dict into a DataFrame. In a subsequent PR I will directly use the dictionary object as it propagates through the Feast backend layer to write rather than any of the existing pandas code. When I do this next step, I will measure the performance benefits to highlight the latency gains.

Which issue(s) this PR fixes:

N/A

Fixes

N/A

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo franciscojavierarceo changed the title Online store writes feat: Adding support for dictionary writes to online store Apr 28, 2024
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review April 28, 2024 10:34
@@ -1406,7 +1407,8 @@ def push(
def write_to_online_store(
self,
feature_view_name: str,
df: pd.DataFrame,
df: Optional[pd.DataFrame] = None,
input_dict: Optional[Dict] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this make more sense as a single param with a Union type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do a type check I just thought I'd be explicit about it first...because if I make it a single parameter then it would make sense to change the name of the input argument, which means then I'd have to make a breaking change or do a staged rollout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a staggered release for this and get feedback from folks but I wouldn't want to make any breaking changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, df name is really awkward. maybe call the new one something other than input_dict so that it can become the universal one after we deprecate and remove df? just inputs for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like that. I'll update.

@sudohainguyen
Copy link
Collaborator

hey quick question, how will the change optimize feature writes? 🤔
as I can see this is about parsing an input dictionary to a dataframe, would you mind walking me through the mechanism?

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Apr 30, 2024

hey quick question, how will the change optimize feature writes? 🤔 as I can see this is about parsing an input dictionary to a dataframe, would you mind walking me through the mechanism?

I think this is the same question for me :)

@franciscojavierarceo
Copy link
Member Author

franciscojavierarceo commented Apr 30, 2024

hey quick question, how will the change optimize feature writes? 🤔 as I can see this is about parsing an input dictionary to a dataframe, would you mind walking me through the mechanism?

@HaoXuAI @sudohainguyen
It won't yet. First I want to introduce this at this layer and add the tests. Then I want to measure the incremental performance of skipping pandas all together using cProfiler. So this is step 1 that just introduces the argument to the user.

I added some notes to the PR to be more explicit about the plan here.

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@tokoko
Copy link
Collaborator

tokoko commented Apr 30, 2024

@franciscojavierarceo btw, what is Dict here exactly? Is this always Dict[str, np.array] or there are other options? P.S. If you're gonna spend some time performance testing this, I think adding pyarrow table in the mix would be a good idea.

Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo
Copy link
Member Author

@franciscojavierarceo btw, what is Dict here exactly? Is this always Dict[str, np.array] or there are other options? P.S. If you're gonna spend some time performance testing this, I think adding pyarrow table in the mix would be a good idea.

I haven't spent any time using pyarrow but definitely open to it.

Signed-off-by: Francisco Javier Arceo <[email protected]>
@@ -1406,7 +1407,8 @@ def push(
def write_to_online_store(
self,
feature_view_name: str,
df: pd.DataFrame,
df: Optional[pd.DataFrame] = None,
inputs: Optional[Union[Dict[str, List[Any]], pd.DataFrame]] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tokoko updated

@jeremyary jeremyary merged commit abfac01 into feast-dev:master Apr 30, 2024
14 checks passed
tmihalac pushed a commit to tmihalac/feast that referenced this pull request May 3, 2024
…#4156)

* feat: Adding support for dictionary writes to online store

Signed-off-by: Francisco Javier Arceo <[email protected]>

* Simple approach

Signed-off-by: Francisco Javier Arceo <[email protected]>

* lint

Signed-off-by: Francisco Javier Arceo <[email protected]>

* adding error if both are missing

Signed-off-by: Francisco Javier Arceo <[email protected]>

* rename dict to input_dict

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated input arg to test

Signed-off-by: Francisco Javier Arceo <[email protected]>

* Renaming function argument

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated docstring

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated type signature

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated type to be more explicit

Signed-off-by: Francisco Javier Arceo <[email protected]>

---------

Signed-off-by: Francisco Javier Arceo <[email protected]>
tmihalac pushed a commit to tmihalac/feast that referenced this pull request May 3, 2024
…#4156)

* feat: Adding support for dictionary writes to online store

Signed-off-by: Francisco Javier Arceo <[email protected]>

* Simple approach

Signed-off-by: Francisco Javier Arceo <[email protected]>

* lint

Signed-off-by: Francisco Javier Arceo <[email protected]>

* adding error if both are missing

Signed-off-by: Francisco Javier Arceo <[email protected]>

* rename dict to input_dict

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated input arg to test

Signed-off-by: Francisco Javier Arceo <[email protected]>

* Renaming function argument

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated docstring

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated type signature

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated type to be more explicit

Signed-off-by: Francisco Javier Arceo <[email protected]>

---------

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Theodor Mihalache <[email protected]>
franciscojavierarceo pushed a commit that referenced this pull request May 24, 2024
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24)

### Bug Fixes

* Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6))
* Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff))
* Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084)
* Changed the code the way mysql container is initialized.  ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126)
* Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594))
* Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669)
* Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde))
* Get rid of empty string `name_alias` during feature view projection deserialization  ([#4116](#4116)) ([65056ce](65056ce))
* Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164))
* Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0))
* Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae))
* Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7))
* Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7))
* Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb))
* Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7))
* Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4))
* Update doc ([#4153](#4153)) ([e873636](e873636))
* Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3))
* Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96))
* Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11))

### Features

* Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc))
* Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0))
* Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98))
* Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419))
* Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48))
* Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745))
* Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5))
* Adding support for dictionary writes to online store  ([#4156](#4156)) ([abfac01](abfac01))
* Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640))
* Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef))
* Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562))
* Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621))
* Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9))
* Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4))
* Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150))
* Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f))
* Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea))
* Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716))
* Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d))
* Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114)

### Reverts

* Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
franciscojavierarceo pushed a commit that referenced this pull request May 27, 2024
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24)

### Bug Fixes

* Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6))
* Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff))
* Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084)
* Changed the code the way mysql container is initialized.  ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126)
* Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594))
* Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669)
* Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde))
* Get rid of empty string `name_alias` during feature view projection deserialization  ([#4116](#4116)) ([65056ce](65056ce))
* Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164))
* Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0))
* Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae))
* Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7))
* Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7))
* Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb))
* Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7))
* Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4))
* Update doc ([#4153](#4153)) ([e873636](e873636))
* Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3))
* Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96))
* Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11))

### Features

* Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc))
* Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0))
* Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98))
* Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419))
* Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48))
* Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745))
* Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5))
* Adding support for dictionary writes to online store  ([#4156](#4156)) ([abfac01](abfac01))
* Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640))
* Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef))
* Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562))
* Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621))
* Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9))
* Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4))
* Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150))
* Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f))
* Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea))
* Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716))
* Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d))
* Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114)

### Reverts

* Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants