Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10-minute tutorial with Koalas with a live notebook to try out #843

Closed
wants to merge 4 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Sep 29, 2019

This PR takes over #34 and #196

This PR also adds an integration with Binder which provides a docker based Jupyter notebook against the repository branch right way.

You can try this by clicking this Binder - seems it can take up to around 5+ mins for the initial launch to prepare due to heavy dependencies like PySpark and Java installation (which seems not built-in).

Screen Shot 2019-09-29 at 3 50 26 PM

As a bonus, pandoc seems supporting the conversion from Jupyter files to RST files. So, I might be able to generate documentation from this file and add it to our official documentation (based upon #842)

@HyukjinKwon
Copy link
Member Author

I need to proof-read one more time. Will do it soon within few days but I would appreciate if you guys have some feedback.

@codecov-io
Copy link

codecov-io commented Sep 29, 2019

Codecov Report

Merging #843 into master will increase coverage by 0.28%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #843      +/-   ##
==========================================
+ Coverage   94.17%   94.46%   +0.28%     
==========================================
  Files          32       32              
  Lines        6007     6409     +402     
==========================================
+ Hits         5657     6054     +397     
- Misses        350      355       +5
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100% <0%> (ø) ⬆️
databricks/koalas/series.py 95.24% <0%> (+0.16%) ⬆️
databricks/koalas/frame.py 96.58% <0%> (+0.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 391df25...ab7c2b2. Read the comment docs.

@itholic
Copy link
Contributor

itholic commented Sep 29, 2019

@HyukjinKwon , It looks great work. and i took a look at this and fix some typos & codes (almost to keep convention), and add some contents Plotting, and Getting data in/out.
Since maybe github currently doesn't support ipynb file type for uploading,

스크린샷 2019-09-29 오후 7 49 12

i added additional string pdf for uploading, so maybe you should remove it before check this file on jupyter notebook.

10-minutes-to-Koalas.ipynb.pdf

could you check this file when you available.

Good evening :)

@HyukjinKwon
Copy link
Member Author

Thx, let me push it first and clean up.

@HyukjinKwon HyukjinKwon force-pushed the tutorial branch 2 times, most recently from 750d09c to f0b9aae Compare September 29, 2019 13:51
@gatorsmile
Copy link
Collaborator

This is great!

@HyukjinKwon HyukjinKwon force-pushed the tutorial branch 2 times, most recently from a95048a to acc7331 Compare September 30, 2019 01:00
@ueshin
Copy link
Collaborator

ueshin commented Sep 30, 2019

This is great!

In the Jupyter notebook,

  • Pandas -> pandas
  • The explanation below In[1] is not right now.
  • We don't need >>> and ... in each command
  • I'd prefer ks.from_pandas(pdf) to ks.DataFrame(pdf) (In[10], In[48])
  • ks.DataFrame(sdf) -> sdf.to_koalas() might be better? (In[16])

@HyukjinKwon
Copy link
Member Author

Nice. Will address them soon. For >>> and ... I added them for the case we reuse it for our doc. Otherwise it cannot distinguish codes and output :).

@HyukjinKwon
Copy link
Member Author

Let me merge this given many positive feedback in general - the last change was only README.md change. I will commit each to credit properly.

I will still have to touch this file in a separate PR to generate this file as the official documentation. I can handle some more comments here if you guys have some.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants