Basic support for non-string names. #1784

ueshin · 2020-09-22T00:40:23Z

Currently names in Koalas, e.g., df.columns, df.colums.names, df.index.names, need to be string or tuple of string, but it should allow other data types which are supported by Spark.

before:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Index(['0', '1'], dtype='object')

after:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Int64Index([0, 1], dtype='int64')

itholic

Overall it looks fine, but I wonder maybe if it wouldn't matter to abandon some of the features we could have supported before ??

databricks/koalas/tests/test_series.py

itholic · 2020-09-24T08:22:18Z

databricks/koalas/indexes.py

@@ -1066,7 +1064,7 @@ def drop(self, labels):
        >>> index.drop([1])
        Int64Index([2, 3], dtype='int64')
        """
-        if not isinstance(labels, (tuple, list)):
+        if not is_list_like(labels):  # TODO: tuple?


I think maybe we don't need to consider tuple here because Index cannot have tuple value in Koalas anyway ??

>>> ks.Index([(1, 2), 3]) # Cannot have a tuple for Index as mixed type Traceback (most recent call last): ... pyarrow.lib.ArrowInvalid: Could not convert (1, 2) with type tuple: did not recognize Python value type when inferring an Arrow data type >>> ks.Index([(1, 2), (3, 4)]) # This is not gonna be an Index but MultiIndex MultiIndex([(1, 2), (3, 4)], )

Oh, but if we had any plan for supporting tuple for values, I think It's okay to remain the TODO as it is.

databricks/koalas/indexes.py

ueshin · 2020-09-25T00:54:50Z

Let me revert some commits. Maybe we should discuss input checks separately.

itholic

Looks good enough in terms of Basic support !

Let's discuss the details separately.

ueshin · 2020-09-26T00:42:53Z

Thanks! I'd merge this now. I'll submit PRs for each input check.

Fixes type annotations. After ##1784, those should accept non-string tuples.

ueshin added 6 commits September 21, 2020 17:39

Support non-string names.

8feb146

Fix.

e40eb5a

Merge branch 'master' into non-string_names

7fd6461

Fix input check.

1f06c4b

Fix.

45dac56

Fix.

a89ca56

ueshin marked this pull request as ready for review September 24, 2020 01:34

ueshin requested review from HyukjinKwon and itholic September 24, 2020 02:40

itholic reviewed Sep 24, 2020

View reviewed changes

ueshin added 2 commits September 24, 2020 17:55

Revert some commits.

c761116

Merge branch 'master' into non-string_names

6bd0895

ueshin changed the title ~~Support non-string names.~~ Basic support for non-string names. Sep 25, 2020

ueshin force-pushed the non-string_names branch from c542a3d to 6bd0895 Compare September 25, 2020 01:38

itholic approved these changes Sep 25, 2020

View reviewed changes

ueshin merged commit 043978a into databricks:master Sep 26, 2020

ueshin deleted the non-string_names branch September 26, 2020 00:43

ueshin mentioned this pull request Oct 15, 2020

Fix type annotations. #1853

Merged

HyukjinKwon pushed a commit that referenced this pull request Oct 16, 2020

Fix type annotations. (#1853)

85e6cf8

Fixes type annotations. After ##1784, those should accept non-string tuples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic support for non-string names. #1784

Basic support for non-string names. #1784

ueshin commented Sep 22, 2020 •

edited

Loading

itholic left a comment

itholic Sep 24, 2020

ueshin commented Sep 25, 2020

itholic left a comment

ueshin commented Sep 26, 2020

Basic support for non-string names. #1784

Basic support for non-string names. #1784

Conversation

ueshin commented Sep 22, 2020 • edited Loading

itholic left a comment

Choose a reason for hiding this comment

itholic Sep 24, 2020

Choose a reason for hiding this comment

ueshin commented Sep 25, 2020

itholic left a comment

Choose a reason for hiding this comment

ueshin commented Sep 26, 2020

ueshin commented Sep 22, 2020 •

edited

Loading