Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Allow hashing DataArrays with NA values #154

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alyst
Copy link
Member

@alyst alyst commented Jun 15, 2015

Without this DataFrames.nonunique() and friends do not work on frames with NA rows.

@johnmyleswhite
Copy link
Member

This seems like a good strategy. I'm confused why the old approach wouldn't have worked, though -- it seems like it should just be too slow.

@alyst
Copy link
Member Author

alyst commented Jun 15, 2015

For me on v0.4 the old code was throwing InexactError or something like that when NAs were hashed.

@johnmyleswhite
Copy link
Member

I'm going to hold off merging for a bit to give others a chance to review, but the CI failure seems unrelated and this seems good to go.

@alyst
Copy link
Member Author

alyst commented Jun 15, 2015

I've just thought it might be better to use findnext(BitVector) to skip NAs. I can resubmit an improved version.

@johnmyleswhite
Copy link
Member

Sounds good. I would do some profiling to make sure that it's worth the effort; for the almost no-NA case I imagine it will be meaningfully slower to use findnext.

@alyst
Copy link
Member Author

alyst commented Jun 15, 2015

OK, I've replaced the PR with the findnext() version. Both approaches should do more or less the same bit magic, so it's hard to say what would happen in the average "dense" case, but for "sparse" case findnext() should be faster.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants