Skip to content

Conversation

@jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Oct 28, 2020

This PR splits the array.rs into multiple modules, so that it is a bit easier to navigate in the project, particularly array.rs. Semantically, there should be no difference, and this change is also backward compatible, public items are only exposed in mod.rs.

This change was initially proposed by @paddyhoran on ARROW-9361, and this is just a proposed implementation.

Some notes:

  • Each commit is a different array
  • I named the files array_[type] to distinguish from other modules that do not contain arrays.
  • null and Union were not renamed
  • Some functions were moved around / incorporated in impl since they were used in multiple places.

@github-actions
Copy link

@andygrove
Copy link
Member

Moving the arrays to separate files is nice. I see there are some functionality changes in the PR too. Were these necessary as part of the refactoring or is this separate?

@jorgecarleitao
Copy link
Member Author

The reason is that there was a private function (slice_data) in array.rs that was used on two different array types. I moved that function to be part of the ArrayData, ArrayData::slice, so that it can be used in multiple places (and is really a function that literally slices ArrayData). This is change 1.

Since I made slice pub, I also added a unit test to it (change 2). I also noticed that there was some code duplication between that function (previously in array.rs, now in data.rs), and thus pulled that duplicated code to a separate function (count_nulls) for DRY (change 3). Theses changes implied that the pub(crate) data from ArrayData was no longer needed to be pub at all, and thus I just privatized them (this is not a public change, so all backward compatible), change 4. All these changes are done in the first commit only.

Neither the DRY, unit-testing nor privatization was necessary as part of this PR. I can revert them if you think we should do this cleanup in a separate PR.

@andygrove
Copy link
Member

Thanks. I just wanted to understand which parts needed review. I don't think there is a need to split into separate PRs.

@sunchao
Copy link
Member

sunchao commented Oct 31, 2020

@jorgecarleitao since this is a large PR, could you put more details in the PR description? e.g., what changes are being proposed for this PR and what's the reason behind it. Thanks.

@jorgecarleitao
Copy link
Member Author

@sunchao , makes sense. I updated the PR's description with more details. Some of the rational was initially presented on Jira itself, here I just took the issue and implemented it.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR commit by commit

I carefully reviewed the two commits that had obvious logic consolidations (d65a9c8 and 2fac5c9)

For the remaining commits that appeared to just be moving code around, I spot checked them and they looked good but I didn't verify line by line that the content was the same.

All in all, looks like a great step forward in code reorganization to me
👍

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy that most of the change is reorganising modules.
There's a change here that's also in #8590, and because of the reorganisation, we're going to get a lot of merge conflicts on other PRs.

@jorgecarleitao I'd like to also merge #8590, what order would you prefer?
Also, this has a merge conflict, may you please kindly rebase when you can.

@alamb
Copy link
Contributor

alamb commented Nov 7, 2020

I also agree this PR is likely to cause a bunch of conflicts, so the less time it is left outstanding the better.

@nevi-me nevi-me added the needs-rebase A PR that needs to be rebased by the author label Nov 7, 2020
Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks @jorgecarleitao

@jorgecarleitao
Copy link
Member Author

@alamb and @nevi-me , thanks a lot for taking the time to review this.

I have rebased this against master. I am fine with either order, rebasing this one is tedious but straightforward work as all changes are continuous blocks of code.

@jorgecarleitao jorgecarleitao removed the needs-rebase A PR that needs to be rebased by the author label Nov 7, 2020
@nevi-me nevi-me closed this in e7d56ee Nov 10, 2020
@jorgecarleitao jorgecarleitao deleted the clean_slice branch November 10, 2020 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants