Id column converter #63

TheFrok · 2020-02-18T08:07:36Z

Added a function to detect problematic converters (maybe should be somewhere else?)
and converts the dependent column before pruning

deleted config_ var in _load_feed (so all feed would have access to the configuration) and replaced it with a _proxy_feed attribute in the Feed class that prevent transforms and dtype_conversion when they are not needed

TheFrok · 2020-03-01T15:17:01Z

@invisiblefunnel could you take a look at that?
I can't add a reviewer for some reason

invisiblefunnel · 2020-03-02T00:27:53Z

@TheFrok yes thanks for the reminder. I made some changes in the master branch, can you rebase this branch onto master? I will try to make time to add comments this week.

invisiblefunnel · 2020-03-02T00:30:57Z

partridge/gtfs.py

+            for column_pair in dependencies:
+                if check_column_pair(column_pair):
+                    continue
+                warn(


Why produce a warning here as opposed to raising an exception?

I thought that it might be intentional, for example int8 and int16, or something like that.

invisiblefunnel · 2020-03-02T00:36:36Z

partridge/readers.py

@@ -1,20 +1,18 @@
-from collections import defaultdict


Please keep imports in alphabetical order. It looks like the only import related change in this file should be the removal of from .utilities import remove_node_attributes.

invisiblefunnel · 2020-03-02T00:37:54Z

tests/test_feed.py

@@ -1,4 +1,6 @@
 import datetime
+import pandas as pd


Please group the pandas import with the pytest import according to these guidelines: https://www.python.org/dev/peps/pep-0008/#imports.

tests/test_utilities.py

invisiblefunnel · 2020-03-02T00:40:50Z

partridge/gtfs.py

@@ -34,9 +35,13 @@ def __init__(
        self._locks: Dict[str, RLock] = {}
        if isinstance(source, self.__class__):
            self._read = source.get
+            self._proxy_feed = bool(self._view)


I would prefer the value of _proxy_feed not to depend on whether in feed is initialized from a path or another feed object. Is that possible?

Yes, it could probably just be bool(self.view) out side the of if block

I tried that and it didn't work.
Do you prefer passing proxy as a parameter to feed.init ?

invisiblefunnel · 2020-03-02T00:42:46Z

partridge/gtfs.py

-                self._convert_types(filename, df)
-                df = df.reset_index(drop=True)
-                df = self._transform(filename, df)
+                if self._proxy_feed:


I'm not sure the choice should be to filter+prune OR convert+transform. Can you tell me a bit about how you are thinking about this behavior? I will need to think through the logic.

As I see it for each table you filter you create a feed, and each feed is the source of the next one. except for the last layer the feeds (the proxy feeds) are only responsible for filtering the table according to the filter and the already filtered table (pruning). That's why those proxy feed you only need to filter and prune, before my change you did that by removing the transform and convert data from the configuration.
The last feed layer doesn't need to deal with the pruning and filtering since it doesn't even get a view as a parameter, and the lower level feeds are doing the pruning already.

Tell me if I missed something

and converts the dependent column before pruning

since after converting one column for `trips.txt` for example comparing other columns to that would be impossible without access to their convert configuration

it with a `is_dummy` attribute in the Feed class that prevent transforms and dtype_conversion

invisiblefunnel reviewed Mar 2, 2020

View reviewed changes

tests/test_utilities.py Outdated Show resolved Hide resolved

invisiblefunnel reviewed Mar 2, 2020

View reviewed changes

TheFrok added 9 commits March 10, 2020 18:09

Added a function to detect problematic converters,

7b6ac45

and converts the dependent column before pruning

Added a test and fixed the warning message

57b095c

Black compatible and documentation improvements

410e1c6

left the converters attribute in the dummy feed in _load_feed

be5f265

since after converting one column for `trips.txt` for example comparing other columns to that would be impossible without access to their convert configuration

deleted config_ var in _load_feed and replaced

9186b3f

it with a `is_dummy` attribute in the Feed class that prevent transforms and dtype_conversion

replaced dummy parameter with auto calculated proxy attribute

a23b27c

moved config method to run only once

2a138bc

removed unused import

705053d

fixed import order

56bae87

TheFrok force-pushed the id-column-converter branch from f63d00a to 56bae87 Compare March 10, 2020 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Id column converter #63

Id column converter #63

TheFrok commented Feb 18, 2020

TheFrok commented Mar 1, 2020

invisiblefunnel commented Mar 2, 2020

invisiblefunnel Mar 2, 2020

TheFrok Mar 6, 2020

invisiblefunnel Mar 2, 2020

invisiblefunnel Mar 2, 2020

invisiblefunnel Mar 2, 2020

TheFrok Mar 6, 2020

TheFrok Mar 6, 2020

invisiblefunnel Mar 2, 2020

TheFrok Mar 6, 2020

Id column converter #63

Are you sure you want to change the base?

Id column converter #63

Conversation

TheFrok commented Feb 18, 2020

TheFrok commented Mar 1, 2020

invisiblefunnel commented Mar 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment