Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pandas/core/_algos/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""
core._algos is for algorithms that operate on ndarray and ExtensionArray.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use algos instead of _algos here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent on this, as I do like the non-private name, but we have a bunch of import algorithms as algos and i want to avoid ambiguity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let’s just create core.algorithms as a subdir

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good point. The underscore is still only a small difference though. So maybe we should think of yet another name (or actually make algorithms.py into a module and rethink it more broader)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, similar as what @jreback said.

But Brock, you wanted to better distinguish pure array algos, that don't need to deal with series/dataframe etc (which I think is a good goal). But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?

That's my thought too.

So maybe we should think of yet another name

Suggestions?

These should:

- Assume that any Index, Series, or DataFrame objects have already been unwrapped.
- Assume that any list arguments have already been cast to ndarray/EA.
- Not depend on Index, Series, or DataFrame, nor import any of these.
- May dispatch to ExtensionArray methods, but should not import from core.arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what else you are planning to put here, but it could also be pure-numpy algos? That would also have value in being very clear.
Eg the shift you moved here now is strictly numpy arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I have no problem with some functions in here being explicitly ndarray-only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't speaking about some functions, but the full file.

But so, what other functions do you envision to put here? (as for a single function it's not worth creating a module I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't speaking about some functions, but the full file.

That may end up being a reasonable organization, but it isn't obvious to me. e.g. if we end up with a shift that dispatches to shift_ea and shift_ndarray, would we want those to all be in the same file or split across multiple files?

"""
29 changes: 29 additions & 0 deletions pandas/core/_algos/needsname.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import numpy as np

from pandas.core.dtypes.common import ensure_platform_int


def shift(values: np.ndarray, periods: int, axis: int, fill_value) -> np.ndarray:
new_values = values

# make sure array sent to np.roll is c_contiguous
f_ordered = values.flags.f_contiguous
if f_ordered:
new_values = new_values.T
axis = new_values.ndim - axis - 1

if np.prod(new_values.shape):
new_values = np.roll(new_values, ensure_platform_int(periods), axis=axis)

axis_indexer = [slice(None)] * values.ndim
if periods > 0:
axis_indexer[axis] = slice(None, periods)
else:
axis_indexer[axis] = slice(periods, None)
new_values[tuple(axis_indexer)] = fill_value

# restore original order
if f_ordered:
new_values = new_values.T

return new_values
22 changes: 2 additions & 20 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from pandas.core.dtypes.missing import is_valid_nat_for_dtype, isna

from pandas.core import missing, nanops, ops
from pandas.core._algos.needsname import shift
from pandas.core.algorithms import checked_add_with_arr, take, unique1d, value_counts
from pandas.core.arrays.base import ExtensionArray, ExtensionOpsMixin
import pandas.core.common as com
Expand Down Expand Up @@ -773,26 +774,7 @@ def shift(self, periods=1, fill_value=None, axis=0):

fill_value = self._unbox_scalar(fill_value)

new_values = self._data

# make sure array sent to np.roll is c_contiguous
f_ordered = new_values.flags.f_contiguous
if f_ordered:
new_values = new_values.T
axis = new_values.ndim - axis - 1

new_values = np.roll(new_values, periods, axis=axis)

axis_indexer = [slice(None)] * self.ndim
if periods > 0:
axis_indexer[axis] = slice(None, periods)
else:
axis_indexer[axis] = slice(periods, None)
new_values[tuple(axis_indexer)] = fill_value

# restore original order
if f_ordered:
new_values = new_values.T
new_values = shift(self._data, periods, axis, fill_value)

return type(self)._simple_new(new_values, dtype=self.dtype)

Expand Down
22 changes: 2 additions & 20 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
from pandas.core.dtypes.common import (
_NS_DTYPE,
_TD_DTYPE,
ensure_platform_int,
is_bool_dtype,
is_categorical,
is_categorical_dtype,
Expand Down Expand Up @@ -65,6 +64,7 @@
isna,
)

from pandas.core._algos.needsname import shift
import pandas.core.algorithms as algos
from pandas.core.arrays import (
Categorical,
Expand Down Expand Up @@ -1317,25 +1317,7 @@ def shift(self, periods, axis: int = 0, fill_value=None):
# that, handle boolean etc also
new_values, fill_value = maybe_upcast(self.values, fill_value)

# make sure array sent to np.roll is c_contiguous
f_ordered = new_values.flags.f_contiguous
if f_ordered:
new_values = new_values.T
axis = new_values.ndim - axis - 1

if np.prod(new_values.shape):
new_values = np.roll(new_values, ensure_platform_int(periods), axis=axis)

axis_indexer = [slice(None)] * self.ndim
if periods > 0:
axis_indexer[axis] = slice(None, periods)
else:
axis_indexer[axis] = slice(periods, None)
new_values[tuple(axis_indexer)] = fill_value

# restore original order
if f_ordered:
new_values = new_values.T
new_values = shift(new_values, periods, axis, fill_value)

return [self.make_block(new_values)]

Expand Down