-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Id column converter #63
base: master
Are you sure you want to change the base?
Conversation
@invisiblefunnel could you take a look at that? |
@TheFrok yes thanks for the reminder. I made some changes in the master branch, can you rebase this branch onto master? I will try to make time to add comments this week. |
for column_pair in dependencies: | ||
if check_column_pair(column_pair): | ||
continue | ||
warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why produce a warning here as opposed to raising an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that it might be intentional, for example int8 and int16, or something like that.
partridge/readers.py
Outdated
@@ -1,20 +1,18 @@ | |||
from collections import defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep imports in alphabetical order. It looks like the only import related change in this file should be the removal of from .utilities import remove_node_attributes
.
tests/test_feed.py
Outdated
@@ -1,4 +1,6 @@ | |||
import datetime | |||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please group the pandas import with the pytest import according to these guidelines: https://www.python.org/dev/peps/pep-0008/#imports.
@@ -34,9 +35,13 @@ def __init__( | |||
self._locks: Dict[str, RLock] = {} | |||
if isinstance(source, self.__class__): | |||
self._read = source.get | |||
self._proxy_feed = bool(self._view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the value of _proxy_feed
not to depend on whether in feed is initialized from a path or another feed object. Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could probably just be bool(self.view)
out side the of if block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that and it didn't work.
Do you prefer passing proxy as a parameter to feed.init ?
self._convert_types(filename, df) | ||
df = df.reset_index(drop=True) | ||
df = self._transform(filename, df) | ||
if self._proxy_feed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the choice should be to filter+prune OR convert+transform. Can you tell me a bit about how you are thinking about this behavior? I will need to think through the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it for each table you filter you create a feed, and each feed is the source of the next one. except for the last layer the feeds (the proxy feeds) are only responsible for filtering the table according to the filter and the already filtered table (pruning). That's why those proxy feed you only need to filter and prune, before my change you did that by removing the transform and convert data from the configuration.
The last feed layer doesn't need to deal with the pruning and filtering since it doesn't even get a view as a parameter, and the lower level feeds are doing the pruning already.
Tell me if I missed something
and converts the dependent column before pruning
since after converting one column for `trips.txt` for example comparing other columns to that would be impossible without access to their convert configuration
it with a `is_dummy` attribute in the Feed class that prevent transforms and dtype_conversion
f63d00a
to
56bae87
Compare
Added a function to detect problematic converters (maybe should be somewhere else?)
and converts the dependent column before pruning
deleted
config_
var in_load_feed
(so all feed would have access to the configuration) and replaced it with a_proxy_feed
attribute in the Feed class that prevent transforms and dtype_conversion when they are not needed