Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Id column converter #63

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

TheFrok
Copy link

@TheFrok TheFrok commented Feb 18, 2020

Added a function to detect problematic converters (maybe should be somewhere else?)
and converts the dependent column before pruning

deleted config_ var in _load_feed (so all feed would have access to the configuration) and replaced it with a _proxy_feed attribute in the Feed class that prevent transforms and dtype_conversion when they are not needed

@TheFrok
Copy link
Author

TheFrok commented Mar 1, 2020

@invisiblefunnel could you take a look at that?
I can't add a reviewer for some reason

@invisiblefunnel
Copy link
Contributor

@TheFrok yes thanks for the reminder. I made some changes in the master branch, can you rebase this branch onto master? I will try to make time to add comments this week.

for column_pair in dependencies:
if check_column_pair(column_pair):
continue
warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why produce a warning here as opposed to raising an exception?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that it might be intentional, for example int8 and int16, or something like that.

@@ -1,20 +1,18 @@
from collections import defaultdict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep imports in alphabetical order. It looks like the only import related change in this file should be the removal of from .utilities import remove_node_attributes.

@@ -1,4 +1,6 @@
import datetime
import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please group the pandas import with the pytest import according to these guidelines: https://www.python.org/dev/peps/pep-0008/#imports.

@@ -34,9 +35,13 @@ def __init__(
self._locks: Dict[str, RLock] = {}
if isinstance(source, self.__class__):
self._read = source.get
self._proxy_feed = bool(self._view)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the value of _proxy_feed not to depend on whether in feed is initialized from a path or another feed object. Is that possible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could probably just be bool(self.view) out side the of if block

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and it didn't work.
Do you prefer passing proxy as a parameter to feed.init ?

self._convert_types(filename, df)
df = df.reset_index(drop=True)
df = self._transform(filename, df)
if self._proxy_feed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the choice should be to filter+prune OR convert+transform. Can you tell me a bit about how you are thinking about this behavior? I will need to think through the logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it for each table you filter you create a feed, and each feed is the source of the next one. except for the last layer the feeds (the proxy feeds) are only responsible for filtering the table according to the filter and the already filtered table (pruning). That's why those proxy feed you only need to filter and prune, before my change you did that by removing the transform and convert data from the configuration.
The last feed layer doesn't need to deal with the pruning and filtering since it doesn't even get a view as a parameter, and the lower level feeds are doing the pruning already.

Tell me if I missed something

and converts the dependent column before pruning
since after converting one column for `trips.txt` for example comparing
other columns to that would be impossible without access to their convert
configuration
it with a `is_dummy` attribute in the Feed class that prevent transforms and dtype_conversion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants