Skip to content

Conversation

Chiwendaiyue
Copy link

@Chiwendaiyue Chiwendaiyue commented Oct 7, 2025

  1. What does this PR do?

Adds insert_level method to MultiIndex for inserting new levels at specified positions.

  1. Motivation

This addresses the feature request in issue #62558 for a simple way to add levels to MultiIndex at given positions.

  1. Key changes
  • Implement insert_level method in pandas/core/indexes/multi.py
  • Add comprehensive test suite in pandas/tests/indexes/multi/test_insert_level.py
  • Handle various edge cases and error conditions
  1. Testing
  • All new tests pass
  • Existing functionality remains unchanged
  • Handles scalar values and array-like inputs
  • Proper error handling for invalid inputs
  1. Example usage

idx = pd.MultiIndex.from_tuples([('A', 1), ('B', 2)])
result = idx.insert_level(1, 'new_level')

cloudboat added 2 commits October 7, 2025 20:43
- Implement insert_level method for MultiIndex to insert new levels at specified positions
- Add comprehensive test cases for the new functionality
- Fix level names handling to match expected behavior

Resolves: MultiIndex level insertion feature request
- Implement insert_level method for MultiIndex to insert new levels at specified positions
- Add comprehensive test cases for the new functionality
- Fix level names handling to match expected behavior

Resolves: MultiIndex level insertion feature request
@Chiwendaiyue Chiwendaiyue marked this pull request as draft October 7, 2025 13:05
@Chiwendaiyue Chiwendaiyue marked this pull request as ready for review October 7, 2025 13:18
@Chiwendaiyue Chiwendaiyue changed the title multiindex-insert-level ENH: Implement MultiIndex.insert_level for inserting levels at specified positions Oct 7, 2025
@Chiwendaiyue Chiwendaiyue marked this pull request as draft October 7, 2025 13:29
cloudboat added 4 commits October 7, 2025 21:57
- Implement insert_level method for MultiIndex to insert new levels at specified positions
- Add comprehensive test cases for the new functionality
- Fix level names handling to match expected behavior

Resolves: MultiIndex level insertion feature request
@Chiwendaiyue Chiwendaiyue marked this pull request as ready for review October 7, 2025 15:49
Copy link
Member

@Alvaro-Kothe Alvaro-Kothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Here are some suggestions against the test suite.

It seems that you committed fastparquet and pyarrow accidentally.

Also, can you add an entry to whatsnew?

Comment on lines 8 to 12
def setup_method(self):
self.simple_idx = pd.MultiIndex.from_tuples(
[("A", 1), ("B", 2), ("C", 3)], names=["level1", "level2"]
)
self.empty_idx = pd.MultiIndex.from_tuples([], names=["level1", "level2"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I would prefer that you define this in the test body that should use it.

)
self.empty_idx = pd.MultiIndex.from_tuples([], names=["level1", "level2"])

def test_insert_level_basic(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you parametrize this test?

result = self.simple_idx.insert_level(0, "new_val", name="new_level")
assert result.names[0] == "new_level"

def test_insert_level_edge_positions(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go into the first test.

result_end = self.simple_idx.insert_level(2, "end")
assert result_end.nlevels == 3

def test_insert_level_error_cases(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you parametrize this?

with pytest.raises(ValueError, match="Length of values must match"):
self.simple_idx.insert_level(1, ["too", "few"])

def test_insert_level_with_different_data_types(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests could go into the first test too.

Comment on lines 78 to 91
def test_debug_names():
idx = pd.MultiIndex.from_tuples(
[("A", 1), ("B", 2), ("C", 3)], names=["level1", "level2"]
)
print("Original names:", idx.names)

result = idx.insert_level(0, "new_value")
print("Result names:", result.names)

expected = pd.MultiIndex.from_tuples(
[("new_value", "A", 1), ("new_value", "B", 2), ("new_value", "C", 3)],
names=[None, "level1", "level2"],
)
print("Expected names:", expected.names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a leftover debug function from earlier development. I've deleted it from the test file.


tm.assert_index_equal(original, self.simple_idx)

assert result.nlevels == original.nlevels + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion feels redundant against the first test.

Comment on lines 38 to 40
def test_insert_level_with_name(self):
result = self.simple_idx.insert_level(0, "new_val", name="new_level")
assert result.names[0] == "new_level"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also test adding names in the first test and remove this test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience!!!! I've made all the requested improvements such as “Moved test data into individual test methods”. All tests continue to pass and the implementation is ready for review!

(2, "end"),
],
)
def test_insert_level_edge_positions(self, position, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would also be better in the first test. Asserting the whole series is better than asserting its size.

"value",
[100, 1.5, None],
)
def test_insert_level_with_different_data_types(self, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you are only asserting its size. Move it to the first test to verify with tm.assert_index_equal(result, expected)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've consolidated all appropriate tests into the main parametrized test to provide comprehensive validation with tm.assert_index_equal, if here are some other problems, I would fix them immediately, thank you!!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file seems unrelated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, It might be remnants of a merge conflict - looks like an issue from resolving conflicts.I have made it the same as the main branch.

fastparquet Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you committed this by accident

pyarrow Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you committed this by accident

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git rm pyarrow

Comment on lines 231 to 232
-

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to remove the -

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I fixed it

Comment on lines 2763 to 2765
new_tuple = list(tup)
new_tuple.insert(position, value[i])
new_tuples.append(tuple(new_tuple))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary, this conversion of tuple -> list -> tuple?

new_tuple = list(tup)
new_tuple.insert(position, value[i])
new_tuples.append(tuple(new_tuple))
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this branch? Do any of your test cases fall here?

Comment on lines 2771 to 2776
if self.names is not None:
new_names = list(self.names)
else:
new_names = [None] * self.nlevels

new_names.insert(position, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.names is not None:
new_names = list(self.names)
else:
new_names = [None] * self.nlevels
new_names.insert(position, name)
if self.names is not None:
new_names = self.names[:position] + [name] + self.names[position + 1:]
else:
new_names = [None] * (position) + [name] + [None] * (self.nlevel - position)

Is there a case where self.names is None?

Copy link
Author

@Chiwendaiyue Chiwendaiyue Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I research the Constructors, If the user has not named it, the result._names = [None] * len(levels) would make it a list class. Can it be regard as self.names would never be None? Maybe here can be new_names = self.names[:position] + [name] + self.names[position:]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be regard as self.names would never be None?

The MultiIndex constructor contains these lines

result._names = [None] * len(levels)
if names is not None:
# handles name validation
result._set_names(names)

That indicates it would never be None. So I think it's safe to remove the branching.

pyarrow Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git rm pyarrow

…ersions & REF: Remove unnecessary else branch in MultiIndex.insert_level & REF: Simplify names handling in MultiIndex.insert_level
Comment on lines 2729 to 2733
if position == 0:
new_tuple = (value[i],) + tup
elif position == len(tup):
new_tuple = tup + (value[i],)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this if-else conditions. The slicing in the else branch handles the edges.

>>> idx.insert_level(1, ["L1", "L2"], name="z")
MultiIndex([('A', 'L1', 1), ('B', 'L2', 2)],
names=['x', 'z', 'y'])
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you removed the documentation accidentally.

Copy link
Member

@Alvaro-Kothe Alvaro-Kothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Chiwendaiyue
Copy link
Author

@Alvaro-Kothe Just checking if there's anything else needed before this can be merged? All checks are passing and the feature is ready. Thanks!

@Alvaro-Kothe
Copy link
Member

It needs the approval of a core member. Just wait.

- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`).
- Added :meth:`MultiIndex.insert_level` to insert new levels at specified positions in a MultiIndex (:issue:`62558`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put insert_level in the API reference?

Comment on lines 2722 to 2723
Value(s) to use for the new level. If scalar, broadcast to all items.
If array-like, length must match the length of the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this accept only an array-like?

result = self._reorder_ilevels(order)
return result

def insert_level(self, position: int, value, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def insert_level(self, position: int, value, name=None):
def insert_level(self, position: int, value, name: Hashable=lib.no_default) -> MultiIndex:

Comment on lines +2759 to +2765
for i, tup in enumerate(self):
new_tuple = tup[:position] + (value[i],) + tup[position:]
new_tuples.append(new_tuple)

new_names = self.names[:position] + [name] + self.names[position:]

return MultiIndex.from_tuples(new_tuples, names=new_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using from_tuples (which will be slow), can you calculate the level and codes of the new values (you can find examples of this in this file) and interpose that with the existing self.levels and self.codes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mroeschke
I'm working on this and have a question about the optimal approach.
I started with a levels/codes based implementation:

def insert_level(self, position: int, value, name: Hashable = lib.no_default) -> MultiIndex:
    #...
    new_level = Index(value)
    new_codes_for_level = new_level.get_indexer(value)
    
    new_levels = self.levels[:position] + [new_level] + self.levels[position:]
    new_codes = self.codes[:position] + [new_codes_for_level] + self.codes[position:]
    new_names = self.names[:position] + [name] + self.names[position:]
    
    return MultiIndex(levels=new_levels, codes=new_codes, names=new_names, verify_integrity=False)

However, I'm encountering issues with None value handling where new_level.get_indexer(value) fails when the new level contains duplicate values (like [None, None, None]).Should I:Continue with levels/codes optimization and implement custom codes mapping to handle duplicates, similar to how from_arrays handles it internally?Or just use the simpler from_tuples approach that builds new tuples and delegates to the well-tested MultiIndex.from_tuples method?The from_tuples approach would be more reliable for edge cases but potentially less performant for large indices.
Thanks for your guidance!

Comment on lines 875 to 883
def test_insert_level_integration():
idx = MultiIndex.from_tuples([("A", 1), ("B", 2)])

df = pd.DataFrame({"data": [10, 20]}, index=idx)
new_idx = idx.insert_level(0, "group1")
df_new = df.set_index(new_idx)

assert df_new.index.nlevels == 3
assert len(df_new) == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this test? I don't think it's necessarily testing anything new

import pandas._testing as tm


class TestMultiIndexInsertLevel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this class and make all the tests free functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants