You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue: HashingVectorizer behaves differently from FeatureHasher, HashingVectorizer can work off a Sting like
JUNK_FOOD_DOCS = (
"the pizza pizza beer copyright",
"the pizza burger beer copyright",
"the the pizza beer beer copyright",
"the burger beer beer copyright",
"the coke burger coke copyright",
"the coke burger burger",
)
but FeatureHasher expects an iterable of strings like:
* text matrix
* spliting the string creates the expected input to FeatureHasher #964
* FeatureHasher issue #963
* addressing catagories_ type mismatch when auto by explicitly setting dtype on test data to object #964
* reverted to just ubuntu for time saving
Describe the issue:
HashingVectorizer behaves differently from FeatureHasher, HashingVectorizer can work off a Sting like
but FeatureHasher expects an iterable of strings like:
Which is the correct behavior:
Minimal Complete Verifiable Example:
See dask-ml/tests/feature_extraction/test_text.py: test_basic()
Anything else we need to know?:
This is illustrated in the failing test
Environment:
The text was updated successfully, but these errors were encountered: