Skip to content

Conversation

@jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Feb 14, 2021

The background and rational for this is described here; the idea is that this is groundwork to make our buffers typed, so that we can start introducing strong typing in the crate.

This change is backward incompatible:

  1. Our allocator is now a generic over type T: NativeType, which implies that we can now allocate certain types.
  2. The allocator moved from memory to a new module alloc (inspired after std::alloc).

Necessary steps to migrate existing code:

  1. use arrow::memory -> use arrow::alloc
  2. memory::allocate_aligned(...) -> alloc::allocate_aligned::<u8>(...)

Note how NativeType contains to_le_bytes; we will use this method for IPC, where we need to serialize buffers with a specific endianess. This is ground work to enable multiple endianesses support

@jorgecarleitao jorgecarleitao changed the title [Rust] Refactored allocator to be a generic over type T ARROW-11627: [Rust] Make allocator be a generic over type T Feb 14, 2021
@github-actions
Copy link

@apache apache deleted a comment from github-actions bot Feb 14, 2021
@jorgecarleitao
Copy link
Member Author

jorgecarleitao commented Feb 14, 2021

cc: @sunchao @nevi-me @pitrou @andygrove @alamb, this is a core change to the arrow crate, as it will enable us to allocate memory regions without losing the type information about what they contain. I would appreciate to know if someone sees an issue here.

The general direction is stop transmuting byte buffers, which have caused us too many bugs and is unsafe, and instead use Rust's type system on them.

AFAIK, this is will also enable us to support multiple endianess, but we are some PRs away from it. For now, we just stick to little endianess (AFAIK we do not support big atm), where at least we make it explicit.

cc @maxburke , since you depend on the memory module of this crate.

@codecov-io
Copy link

Codecov Report

Merging #9495 (a15c5b1) into master (8547c61) will decrease coverage by 0.06%.
The diff coverage is 51.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9495      +/-   ##
==========================================
- Coverage   82.12%   82.05%   -0.07%     
==========================================
  Files         235      236       +1     
  Lines       54729    54774      +45     
==========================================
  Hits        44944    44944              
- Misses       9785     9830      +45     
Impacted Files Coverage Δ
rust/arrow/src/alloc/types.rs 0.00% <0.00%> (ø)
rust/arrow/src/alloc/mod.rs 92.68% <92.68%> (ø)
rust/arrow/src/array/array_list.rs 93.40% <100.00%> (ø)
rust/arrow/src/array/raw_pointer.rs 100.00% <100.00%> (ø)
rust/arrow/src/buffer.rs 95.62% <100.00%> (+0.35%) ⬆️
rust/arrow/src/bytes.rs 53.12% <100.00%> (ø)
rust/parquet/src/encodings/encoding.rs 94.86% <0.00%> (-0.20%) ⬇️
rust/arrow/src/array/transform/fixed_binary.rs 84.21% <0.00%> (+5.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8547c61...8ac48f4. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this carefully and it looks like a significant improvement to me. I would not call myself an expert in this area (low level Rust allocators) but the changes make sense to me.

I vote :shipit: @nevi-me or @vertexclique do you have any thoughts about this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I know you didn't introduce this comment in this PR (it just moved) but I find this comment more confusing than enlightening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written this before, which part is confusing? Let's rephrase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am overwhelmed by the detail before I get the context of what the information is for.

Maybe if the comment started with something like

// Pick the best memory alignment based on the target architecture.
// 
// The rationale for doing so is ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// * ptr must denote a block of memory currently allocated via this allocator,
/// * ptr must denote a block of memory previously returned from `allocate_aligned` or
/// `allocate_aligned_zeroed`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// * ptr must be currently allocated via this allocator,
/// ptr must denote a block of memory previously returned from `allocate_aligned` or
/// `allocate_aligned_zeroed`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your allocation size is even anywhere near overflowing a 64-bit usize you are likely going to have problems 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I do think that this comment applies mostly to 32 bit systems where usize = u32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Whether a DataType is a valid type for this physical representation.
/// Whether an array element of `data_type` can be stored using this `NativeType` as
/// its physical representation. For example, `i32` stores `DataType::Int32` as well as
/// `DataType::Date32`

@alamb
Copy link
Contributor

alamb commented Feb 16, 2021

Any thoughts @nevi-me or @maxburke ?

@alamb
Copy link
Contributor

alamb commented Feb 18, 2021

I plan to merge this on the weekend unless I hear any comments otherwise

@alamb
Copy link
Contributor

alamb commented Feb 20, 2021

@jorgecarleitao this needs a rebase :(

But then I think we'll (finally) get it in. Woohoo!

Copy link
Contributor

@vertexclique vertexclique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomics needs some improvement and there is no need for SeqCst check since it doesn't matter while test threads are still ongoing. What is needed here is atomicity.

In addition to all these there is a zz_memory_check.rs existing. That code will fail because of ordering is not related to thread access. So I have refactored that for you:

#[cfg(feature = "memory-check")]
mod tests {
    use crate::alloc::ALLOCATIONS;
    use std::sync::atomic::*;

    // verify that there is no data un-allocated
    #[test]
    fn test_memory_check() {
        let mut i: usize = 0;
        unsafe {
            while ALLOCATIONS.load(Ordering::Relaxed) != 0 {
                i+=1;
                spin_loop_hint();
                
                // Fair __mm_pause can be 10m.
                assert!(i < 10_000_000);
            }
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written this before, which part is confusing? Let's rephrase.

Comment on lines +34 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If this number is not zero after all objects have been `drop`, there is a memory leak
pub static mut ALLOCATIONS: AtomicIsize = AtomicIsize::new(0);
// If this number is not zero after all objects have been `drop`, there is a memory leak
#[cfg(feature = "memory-check")]
pub static mut ALLOCATIONS: AtomicIsize = AtomicIsize::new(0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ALLOCATIONS.fetch_add(size as isize, std::sync::atomic::Ordering::SeqCst);
#[cfg(feature = "memory-check")]
ALLOCATIONS.fetch_add(size as isize, Ordering::Relaxed);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ALLOCATIONS.fetch_add(size as isize, std::sync::atomic::Ordering::SeqCst);
#[cfg(feature = "memory-check")]
ALLOCATIONS.fetch_add(size as isize, Ordering::Relaxed);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ALLOCATIONS.fetch_sub(size as isize, std::sync::atomic::Ordering::SeqCst);
#[cfg(feature = "memory-check")]
ALLOCATIONS.fetch_sub(size as isize, Ordering::Relaxed);

Comment on lines +123 to +127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ALLOCATIONS.fetch_add(
new_size as isize - old_size as isize,
std::sync::atomic::Ordering::SeqCst,
);
#[cfg(feature = "memory-check")]
ALLOCATIONS.fetch_add(
new_size.checked_sub(old_size).unwrap() as isize,
Ordering::Relaxed,
);

@vertexclique
Copy link
Contributor

@alamb Please don't get it in, it will slow down allocations with MOESI communication between cores.

Comment on lines +23 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use std::{
alloc::{handle_alloc_error, Layout},
sync::atomic::AtomicIsize,
};
use std::alloc::{handle_alloc_error, Layout};
use std::sync::atomic::*;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add the checked_sub check written below for memory-check; also, go here.

Comment on lines +40 to +43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tioli; Adding to_ne_bytes, to_be_bytes would be cool. Why not :)

@jorgecarleitao
Copy link
Member Author

@vertexclique , thanks a lot for going through this and your comments.

For my own understanding, could you give some insight into the ordering comments and the spin_loop_hint?

Note that this PR does no modify the ALLOCATIONS bit. That is the same as it was before. That was added in 2.0 and in the context of FFI; I measured no performance difference (maybe 2-3%).

The change set of this PR is essentially bf1a7eb

@vertexclique
Copy link
Contributor

For my own understanding, could you give some insight into the ordering comments and the spin_loop_hint?

loop hint is for waiting on the cpu without os intervention to wait until to the point of expected value for this test.

Note that this PR does no modify the ALLOCATIONS bit. That is the same as it was before. That was added in 2.0 and in the context of FFI; I measured no performance difference (maybe 2-3%).

The change set of this PR is essentially bf1a7eb

I've just seen this, probably I wasn't the reviewer or simply missed the review cycle.
I don't think that it can be seen in test execution. But it can be seen in the normal execution of the code.
Moreover, in the actual release code, it is not needed and can be filtered out.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it is worth, here is a pointer to the ALLOCATIONS code that was moved (aka confirming that the behavior has not changed):

https://github.com/apache/arrow/pull/9495/files#diff-d2c73184ac4d2feb55c895ed48bc8af4433d309276de1fd9bb0aa21fd9588f14L227-L230

@vertexclique I suggest we get this PR in and then update/optimize the use of ALLOCATIONS as a follow on PR. I think @jorgecarleitao has many other PRs / plans waiting on this PR so we should not delay it unecessairly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am overwhelmed by the detail before I get the context of what the information is for.

Maybe if the comment started with something like

// Pick the best memory alignment based on the target architecture.
// 
// The rationale for doing so is ....

@nevi-me nevi-me added the needs-rebase A PR that needs to be rebased by the author label Feb 24, 2021
@vertexclique
Copy link
Contributor

vertexclique commented Feb 24, 2021

@alamb @jorgecarleitao @nevi-me Yeah, I am totally ok merging it like this. We can keep track of the suggestions as Jira ticket. :)

@vertexclique
Copy link
Contributor

FYI: I've assigned these improvements to myself. Truly let's not block each other: https://issues.apache.org/jira/browse/ARROW-11760 @alamb @jorgecarleitao

@nevi-me
Copy link
Contributor

nevi-me commented Feb 24, 2021

i've been away for over a week due to personal issues.
When Jorge started work on this, we had discussed the need for Bytes or Buffer to know its byte width, so that we could correct our zero-copy slicing/offsets. I had taken a different approach to this PR (which was to directly track the byte_size in the Buffer struct), but because I made a lot of changes to the code base, it was difficult for me to reconcile changes.

This PR seems to deal more with the allocator, so not yet necessarily with Buffer. I'll try to see what impact it has on my (draft) work, when it's merged. [EDIT: this is done in https://github.com//pull/9496, which I'm now looking at]

The change set of this PR is essentially bf1a7eb

I'm fine with getting this PR merged, I'm a bit overwhelmed by the detail, but looking at bf1a7eb has certainly helped me with reviewing the change.

@nevi-me nevi-me removed the needs-rebase A PR that needs to be rebased by the author label Feb 24, 2021
@nevi-me
Copy link
Contributor

nevi-me commented Feb 24, 2021

I've rebased, and I'll merge this when CI is green

@alamb
Copy link
Contributor

alamb commented Feb 24, 2021

Thanks @nevi-me

@alamb
Copy link
Contributor

alamb commented Feb 24, 2021

FYI: I've assigned these improvements to myself. Truly let's not block each other:

Thank you @vertexclique !

@nevi-me nevi-me closed this in b5ac048 Feb 25, 2021
alamb pushed a commit that referenced this pull request Feb 25, 2021
Polars uses the `arrow::memory` module. With the backwards incompatible change of #9495, the API is refactored to `arrow::alloc`.

By making `alloc` public, users can shift to the new changes.

Closes #9572 from ritchie46/make_alloc_public

Authored-by: Ritchie Vink <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants