Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added static typing to data_utils.py #662

Merged
merged 28 commits into from
Oct 18, 2022

Conversation

Sanketh7
Copy link
Contributor

@JGSweets
Copy link
Contributor

@Sanketh7 looks like the branch needs updating as well.

@taylorfturner taylorfturner added the static_typing mypy static typing issues label Sep 27, 2022
Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments mainly around formatting and +1 to @JGSweets comment around not casting

dataprofiler/data_readers/data_utils.py Show resolved Hide resolved
dataprofiler/data_readers/data_utils.py Outdated Show resolved Hide resolved
dataprofiler/data_readers/data_utils.py Show resolved Hide resolved
dataprofiler/data_readers/data_utils.py Show resolved Hide resolved
JGSweets
JGSweets previously approved these changes Sep 27, 2022
@JGSweets JGSweets dismissed their stale review September 27, 2022 21:21

comments needed addressing

JGSweets
JGSweets previously approved these changes Sep 28, 2022
auto-merge was automatically disabled October 3, 2022 15:00

Head branch was pushed to by a user without write access

@taylorfturner taylorfturner enabled auto-merge (squash) October 3, 2022 15:42
Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment around the dict isinstance

dataprofiler/data_readers/data_utils.py Outdated Show resolved Hide resolved
ksneab7
ksneab7 previously approved these changes Oct 14, 2022
dataprofiler/data_readers/data_utils.py Outdated Show resolved Hide resolved
micdavis
micdavis previously approved these changes Oct 18, 2022
@micdavis micdavis enabled auto-merge (squash) October 18, 2022 15:45
@@ -90,7 +105,7 @@ def unicode_to_str(data, ignore_dicts=False):
# if data is a dictionary
if isinstance(data, dict) and not ignore_dicts:
return {
unicode_to_str(key, ignore_dicts=True): unicode_to_str(
cast(str, unicode_to_str(key, ignore_dicts=True)): unicode_to_str(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no guarantee this is a string, this could be an int, right?

Copy link
Contributor Author

@Sanketh7 Sanketh7 Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the cast (not needed anymore after I changed JSONType).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome -- thx, @Sanketh7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it removed @Sanketh7 ? looks like it is still there ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed by 8de0cef

@@ -99,7 +114,11 @@ def unicode_to_str(data, ignore_dicts=False):
return data


def json_to_dataframe(json_lines, selected_columns=None, read_in_string=False):
def json_to_dataframe(
json_lines: List[JSONType],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually validate this, @taylorfturner can df = pd.DataFrame(json_lines) take in any list of JSON?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works assuming List[JSONType] but should it be Dict[str, List[JSONType]]?

from typing import Dict, List, Union
import pandas as pd

JSONType = Union[str, int, float, bool, None, List, Dict]

test_iter_list = []
test_iter_list.append(['test', 'test'])
test_iter_list.append([1,2,3])
test_iter_list.append([1.02, 2.02, 3.03])
test_iter_list.append([True, False])
test_iter_list.append([None, None])
test_iter_list.append([['test', 'test'], ['test', 'test']])
test_iter_list.append([{'test': 'test', 'test': 'test'}])


for test_iter in test_iter_list: 
    pd.DataFrame(test_iter)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@taylorfturner taylorfturner Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanketh7, I think the docstring for this on L126 :type json_lines: list(dict) needs updating

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does work. So I think we are GTG
for test_iter in test_iter_list: json_to_dataframe(test_iter)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List[list] works

In [307]: data = data_generator("""[["test", "test"], ["test", "test"]]""".splitlines())
In [308]: read_json(data)
Out[308]: [[['test', 'test'], ['test', 'test']]]

auto-merge was automatically disabled October 18, 2022 16:13

Head branch was pushed to by a user without write access

@JGSweets JGSweets enabled auto-merge (squash) October 18, 2022 19:36
@taylorfturner taylorfturner dismissed their stale review October 18, 2022 19:43

dismissing my own

@JGSweets JGSweets merged commit e4e54b6 into capitalone:main Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable static_typing mypy static typing issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants