Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Walk through the "pachyderm user" onboarding track, and make it work with python-pachyderm #256

Open
msteffen opened this issue May 3, 2021 · 1 comment
Assignees

Comments

@msteffen
Copy link
Contributor

msteffen commented May 3, 2021

General feedback from users has been that python-pachyderm is hard to use. For example, we've heard "there's no way to parse a Pachyderm file as a CSV or a pandas dataframe", but our getfile library supports the python iterator interface, so it's an open question (at least to me) why this doesn't work? Do we support it incorrectly? Are the functions confusingly-named or hard to find?

Going through the "pachyderm user" onboarding track, which uses pachctl, and making it equally doable with python-pachyderm should be a big step forward in python-pachyderm usability.

@albscui albscui self-assigned this May 7, 2021
@albscui
Copy link

albscui commented May 7, 2021

Hmm I'm also curious as to why loading into Pandas doesn't seem to work. I uploaded a sample CSV file to a local pachyderm cluster, and was able to get_file() and load it into a pandas dataframe.

Example workflow would be

import python_pachyderm as pachyderm
import pandas as pd

client = pachyderm.client()
pf = client.get_file('REPO', 'COMMIT/BRANCH', 'CSV_FILE')
df = pd.read_csv(pf)  # this works!

This works because Pandas can load any file-like obj with a read() method into their read_csv() function, and luckily PFSFile has a read() method.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants