Fix DL estimators for getting the output df schema #2611

irasit · 2021-01-21T22:26:13Z

Checklist before submitting

Did you read the contributor guide?
Did you update the docs?
[ Y] Did you write any tests to validate this change?
Did you update the CHANGELOG, if this change affects users?

Description

#2373 introduced a big delay when generating the output schema. Instead we can get the schema from the input df schema and label columns.

Fixes #2536.

Review process to land

All tests and other checks must succeed.
At least one member of the technical steering committee must review and approve.
If any member of the technical steering committee requests changes, they must be addressed.

irasit · 2021-01-21T22:35:28Z

horovod/spark/common/util.py

+
+
+def get_spark_df_output_schema(input_df_schema, label_cols, output_cols):
+    if len(label_cols) != len(output_cols):


@tgaddair Please check here. Is it OK to always assume label_cols and output_cols are 1:1 matching?

Signed-off-by: Peng Zhang <[email protected]>

Signed-off-by: Yana Shchyokotova <[email protected]> Signed-off-by: Peng Zhang <[email protected]>

Signed-off-by: Peng Zhang <[email protected]>

tgaddair

LGTM!

github-actions · 2021-02-05T00:43:15Z

Unit Test Results

    691 files +  18     691 suites +18 4h 43m 32s ⏱️ + 6m 47s
    539 tests +    1     510 ✔️ +    1     29 💤 ±    0 0 ❌ ±0
14 190 runs +318 10 730 ✔️ +196 3 460 💤 +122 0 ❌ ±0

Results for commit ea692ad. ± Comparison against base commit 2a775b2.

irasit force-pushed the df_schema branch from 092d114 to ca15684 Compare January 21, 2021 22:31

irasit commented Jan 21, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

irasit force-pushed the df_schema branch from 420a9d0 to c072c60 Compare January 29, 2021 01:41

This comment has been minimized.

Sign in to view

irasit requested a review from tgaddair January 30, 2021 00:39

irasit force-pushed the df_schema branch from 37aff2f to 086fa50 Compare February 2, 2021 08:40

irasit and others added 11 commits February 2, 2021 00:41

df_schema

5f381ac

Signed-off-by: Peng Zhang <[email protected]>

fix_df_schema

c261aeb

Signed-off-by: Peng Zhang <[email protected]>

addUnitTest

fe643a7

Signed-off-by: Peng Zhang <[email protected]>

fix_test_transform_multi_class

56e501c

Signed-off-by: Peng Zhang <[email protected]>

fixTest

d7a4b0a

Signed-off-by: Peng Zhang <[email protected]>

fix_rossmann

2cbee13

Signed-off-by: Peng Zhang <[email protected]>

fix_torch_loss

3464bd9

Signed-off-by: Peng Zhang <[email protected]>

fix_multi_class

93b1c90

Signed-off-by: Peng Zhang <[email protected]>

rebase

43017e3

Signed-off-by: Peng Zhang <[email protected]>

rebase

a607003

Signed-off-by: Peng Zhang <[email protected]>

Add Intel(R) MPI support for horovodrun (horovod#2374)

72771a6

Signed-off-by: Yana Shchyokotova <[email protected]> Signed-off-by: Peng Zhang <[email protected]>

irasit force-pushed the df_schema branch from 086fa50 to 72771a6 Compare February 2, 2021 08:41

fix_rebase

eb940b8

Signed-off-by: Peng Zhang <[email protected]>

irasit force-pushed the df_schema branch from ed27487 to eb940b8 Compare February 2, 2021 08:47

This comment has been minimized.

Sign in to view

tgaddair approved these changes Feb 4, 2021

View reviewed changes

tgaddair merged commit ea692ad into horovod:master Feb 4, 2021

irasit deleted the df_schema branch February 5, 2021 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DL estimators for getting the output df schema #2611

Fix DL estimators for getting the output df schema #2611

irasit commented Jan 21, 2021

irasit Jan 21, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

tgaddair left a comment

github-actions bot commented Feb 5, 2021



		def get_spark_df_output_schema(input_df_schema, label_cols, output_cols):
		if len(label_cols) != len(output_cols):

Fix DL estimators for getting the output df schema #2611

Fix DL estimators for getting the output df schema #2611

Conversation

irasit commented Jan 21, 2021

Checklist before submitting

Description

Review process to land

irasit Jan 21, 2021

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

tgaddair left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 5, 2021

Unit Test Results