Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series(data=df, index=None) seems to be incorrectly instantiated #1732

Closed
LucasG0 opened this issue Aug 27, 2020 · 10 comments
Closed

Series(data=df, index=None) seems to be incorrectly instantiated #1732

LucasG0 opened this issue Aug 27, 2020 · 10 comments

Comments

@LucasG0
Copy link
Contributor

LucasG0 commented Aug 27, 2020

Hi, I saw it is now possible to create a Series from a DataFrame without any additional index parameter.
However, I can not print or sort_index a Series which has been created in this way.
I guess the Series is incorrectly instantiated and probably other actions would fail too.

>>> df = ks.DataFrame({"a": [1, 2, 3, 4, 5]})
>>> ser = ks.Series(df)
>>> print(ser)
Traceback (most recent call last):
  File "D:\Dev\Utils\Miniconda\envs\koalas-dev\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Dev\koalas\databricks\koalas\series.py", line 5232, in __repr__
    pser = self._kdf._get_or_create_repr_pandas_cache(max_display_count)[self.name]
  File "D:\Dev\Utils\Miniconda\envs\koalas-dev\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "D:\Dev\Utils\Miniconda\envs\koalas-dev\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: None

>>> ser.sort_index()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Dev\koalas\databricks\koalas\series.py", line 2227, in sort_index
    kdf = self._kdf[[self.name]].sort_index(
  File "D:\Dev\koalas\databricks\koalas\frame.py", line 10130, in __getitem__
    return self.loc[:, list(key)]
  File "D:\Dev\koalas\databricks\koalas\indexing.py", line 441, in __getitem__
    cols_sel
  File "D:\Dev\koalas\databricks\koalas\indexing.py", line 313, in _select_cols
    return self._select_cols_by_iterable(cols_sel, missing_keys)
  File "D:\Dev\koalas\databricks\koalas\indexing.py", line 1175, in _select_cols_by_iterable
    raise KeyError("['{}'] not in index".format(name_like_string(key)))
KeyError: "['__none__'] not in index"
@HyukjinKwon
Copy link
Member

Can you share the full error message? Looks like you're running it in Windows, is it correct?

@LucasG0
Copy link
Contributor Author

LucasG0 commented Aug 28, 2020

Yes I am running it on Windows, I edited with full error message.

@HyukjinKwon
Copy link
Member

@LucasG0, are you able to create a pandas DataFrame and select via df[column_name]?

@LucasG0
Copy link
Contributor Author

LucasG0 commented Aug 28, 2020

Yes, using pandas 1.0.5.

>>> df = pd.DataFrame({"a": [1, 1, 1], "b": [2, 2, 2]})
>>> df['a']
0    1
1    1
2    1
Name: a, dtype: int64
>>> df['b']
0    2
1    2
2    2
Name: b, dtype: int64

@LucasG0
Copy link
Contributor Author

LucasG0 commented Aug 28, 2020

Actually, it fails with an index parameter too.

>>> df = ks.DataFrame({"a": [1, 2, 3, 4, 5]})
>>> ser = ks.Series(df, index=df.index.to_numpy())
>>> print(ser)
Traceback (most recent call last):
  File "mytest.py", line 22, in <module>
    print(ser)
  File "D:\Dev\koalas\databricks\koalas\series.py", line 5239, in __repr__
    pser = self._kdf._get_or_create_repr_pandas_cache(max_display_count)[self.name]
  File "D:\Dev\Utils\Miniconda\envs\koalas-dev\lib\site-packages\pandas\core\frame.py", line 2806, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "D:\Dev\Utils\Miniconda\envs\koalas-dev\lib\site-packages\pandas\core\indexing.py", line 1553, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
  File "D:\Dev\Utils\Miniconda\envs\koalas-dev\lib\site-packages\pandas\core\indexing.py", line 1640, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([0, 1, 2, 3, 4], dtype='int64')] are in the [columns]"

@ueshin
Copy link
Collaborator

ueshin commented Aug 28, 2020

I guess, passing DataFrame to the constructor of Series is not supported?

@LucasG0
Copy link
Contributor Author

LucasG0 commented Aug 29, 2020

In deed Series doc does not specify that DataFrame can be passed as data.
But the constructor seems to handle it so I wonder if this behavior is intended.

@ueshin
Copy link
Collaborator

ueshin commented Aug 29, 2020

Ah, yeah, but actually the constructor taking DataFrame is intended for internal use so far.
What's the behavior of pandas?

@LucasG0
Copy link
Contributor Author

LucasG0 commented Aug 30, 2020

Alright, pandas does not support it either.
For internal purposes (#1737), I actually found the function first_series instead of passing DataFrame as Series constructor parameter.

@HyukjinKwon
Copy link
Member

I'll tentatively resolve this ticket since pandas doesn't support either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants