Adds from, read, write, to_s methods #52

athityakumar · 2017-08-26T00:29:42Z

The to_format method is pending, and requires more clarity. Also, the YARD docs is yet to be updated as it's been updated for markdown support in PR #50 and would be better to proceed after that PR is merged.

The to_format method is pending, and is quite ambigous as in , returning which format instance would make more sense. Also, the YARD docs is yet to be updated as it's been updated for markdown support in PR #50 and would be better to proceed after that PR is merged.

Minor arguments issue for remote files parsing

athityakumar · 2017-08-29T08:07:39Z

@zverok @v0dro - Please review when you're free. I think we'll be good to go with v0.1 release after merging this PR. 😄

zverok · 2017-08-29T11:11:29Z

On it.

zverok

Well, I have a HUGE concern now, when I see the changes. Let's discuss it?

zverok · 2017-08-29T15:56:33Z

README.md

-df = Daru::IO::Importers::Format.new(args).call
+instance = Daru::IO::Importers::Format.new(opts)
+df1 = instance.from(connection)
+df2 = instance.read(path)


Doesn't look informative for me :(

zverok · 2017-08-29T16:03:58Z

README.md

@@ -130,8 +133,8 @@ Imports a **Daru::DataFrame** from a **.csv** or **.csv.gz** file.
    require 'daru/io/importers/csv'

    #! Usage from Daru::IO
-    df1 = Daru::IO::Importers::CSV.new('path/to/file.csv', skiprows: 10, col_sep: ' ').call
-    df2 = Daru::IO::Importers::CSV.new('path/to/file.csv.gz', skiprows: 10, compression: :gzip).call
+    df1 = Daru::IO::Importers::CSV.new(skiprows: 10, col_sep: ' ').read('path/to/file.csv')


Funny thing here (which we'll probably postpone, considering how small time we have) is it looks absolutely, totaly, astonishingly unnatural.

If we'll think about importer as an object, usable on itself, in my head the following is logical:

importer = Daru::IO::Importers::CSV.read('mydata.csv', col_sep: ' ') # reads once from file, converts data to intermediate representation # or importer = Daru::IO::Importers::CSV.parse(some_string_from_other_source, col_sep: ' ') df1 = importer.call(skiprows: 10, count: 20) # first 10 lines to one DF, quickly df2 = importer.call(skiprows: 20) # the rest to another one, quickly

This way we know why we created the object and what we've passed to constructor.
But what is the object created with Daru::IO::Importers::CSV.new(skiprows: 10, col_sep: ' '), what's its "physical meaning" or "semantics"?.. ¯\_(ツ)_/¯

Does it make sense for you?..

The main idea I had behind the Importers.new(opts).read(path) was so that it'd facilitate in automating if the Importer is used to collect similar data from LOTS of sources, For example, a folder containing lots of csv files with maybe some daily trends in each file.

That way, the opts (options) being initialized can be re-used for multiple sources with passing for each call like Importers::CSV.read(path).call(opts). This was what I had in mind while having a semantic like Importers::CSV.new(opts).read(path).

csv_files = Dir.entries('contains/1000/csvs').keep_if { |file| file.end_with?('.csv')} csv_importer = Daru::IO::Importers::CSV.new(skiprows: 10, col_sep: ' ') csv_files.each do |filename| csv_importer.read(filename) end.reduce(:concat)

zverok · 2017-08-29T16:09:02Z

README.md

@@ -195,8 +198,8 @@ Imports an **Array** of **Daru::DataFrame**s from a **.html** file or website.
    require 'daru/io/importers/html'

    #! Usage from Daru::IO
-    df1 = Daru::IO::Importers::HTML.new('https://some/url/with/tables', match: 'market', name: 'Shares analysis').call
-    df2 = Daru::IO::Importers::HTML.new('path/to/file.html', match: 'market', name: 'Shares analysis').call
+    df1 = Daru::IO::Importers::HTML.new(match: 'market', name: 'Shares analysis').read('https://some/url/with/tables')


What I've said above relates to all importers, but here it is extremely visible:

importer = Daru::IO::Importers::HTML.new('https://some/url/with/20Mb/of/tables') # URL read once, Nokogiri document parsed once df1 = importer.call(match: 'prices') df2 = importer.call(match: 'goods') df3 = importer.call(match: 'clients')

Do you feel it?..

Ah, I can see this. But I feel that the existing setup also makes sense when gathering similar data from multiple sources. Something like importing from GitHub API of various repositories,

json_importer = Daru::IO::Importers::JSON.new(RepositoryName: '$..full_name', Stars: '$..stargazers_count', Size: '$..size', Forks: '$..forks_count') %w[athityakumar zverok v0dro].map do |username| json_importer.read("https://api.github.com/users/#{username}/repos") end.reduce(:concat)

But of course, your suggestion does better when the data source is pretty huge. We definitely don't want to read again & again for sure. I can change that, and maybe it can be used like this even for the example I gave like -

json_options = {RepositoryName: '$..full_name', Stars: '$..stargazers_count', Size: '$..size', Forks: '$..forks_count'} %w[athityakumar zverok v0dro].map do |username| Daru::IO::Importers::JSON.read("https://api.github.com/users/#{username}/repos").call(json_options) end.reduce(:concat)

No issues, refactoring this shouldn't take much time (hopefully). 😅

@zverok - I think I've explained what I had in mind while favoring the approach of Importers::Format.new(opts).read(file) over Importers::Format.read(file).call(opts). Which way should we proceed?

Yep, you last code example is "right" way (as for me). Creating importer just by JSONPath's provide no value (as nothing "heavy" is done in create, and reparsing JSONPath on each call is cheap)

@zverok - Acknowledged. I'll update this PR by tomorrow. 👍

(That's why it is sometimes better to discuss code, than just ideas in GitHub issues... We could do faster if it would not be so hard for me to imagine everything beforehands. Sorry for that.)

@zverok - Added the changes. Please review again. 😄

zverok · 2017-08-29T16:11:27Z

README.md

+instance = Daru::IO::Exporters::Format.new(df, opts)
+instance.to_s #=> Provides a file-writable string, which can be used in web applications for file download purposes
+instance.to #=> Provides a Format instance
+instance.write(path) #=> Writes to the given path


Exporters API, on the other hand, looks reasonable.

…ll(opts) Rubocop, YARD Docs & README are still a mess. But nevertheless, there's a great deal of code porting that has been done, and I NEVER want to have any of these changes accidentally reverted. RSpec is passing. 🎉

athityakumar · 2017-08-30T22:05:54Z

lib/daru/io/importers/csv.rb

+        #
+        # @example Reading from csv.gz file
+        #   instance = Daru::IO::Importers::CSV.read("matrix_test.csv.gz")
+        def read(path)


I'm having a dilemma about YARD documenting the read (and from) method. Basically, there are 2 read methods - Importers::Format#read class instance method and Importers::Base#read class method.

So, I'm showcasing example of Daru::IO::Importers::CSV.read(path) usage though it's technically the Daru::IO::Importers::CSV.new.read(path) which is being called by Daru::IO::Importers.read(path). So, though it's technically incorrect, I'd like it to stay this way as users would find the method's showcase easier this way.

You can imitate docs for Daru::IO::Importers::CSV.read (as if it exists and defined in this file) with YARD's @!method directive. I believe it would be ideal solution from all possible points of view, WDYT?

zverok · 2017-08-31T11:52:47Z

Is it ready for the new round of review?

athityakumar · 2017-08-31T13:05:16Z

@zverok - Apart from the YARD doc issue discussed above, yes.

zverok · 2017-08-31T13:09:18Z

OK, on it.

zverok

zverok · 2017-08-31T19:40:27Z

README.md

-    df1 = Daru::IO::Importers::CSV.new('path/to/file.csv', skiprows: 10, col_sep: ' ').call
-    df2 = Daru::IO::Importers::CSV.new('path/to/file.csv.gz', skiprows: 10, compression: :gzip).call
+    df1 = Daru::IO::Importers::CSV.read('path/to/file.csv').call(skiprows: 10, col_sep: ' ')
+    df2 = Daru::IO::Importers::CSV.read('path/to/file.csv.gz').call(skiprows: 10, compression: :gzip)


Note to future (maybe create a GitHub issue?): in fact, options should be split into (file) reading ones, and DF importing ones, like this: Daru::IO::Importers::CSV.read('path/to/file.csv', skiprows: 10, col_sep: ' ').call(order: [....]) -- only this way file reading can be performed on .read, and conversion then several times on call. But it would be tricky a bit, so not on this PR.

Opened #53 for this. 👍

athityakumar added 4 commits August 26, 2017 04:54

Adds minor rubocop fixes

497b50d

Fixes JSON, CSV & XLSX rspec tests

65783ea

Minor arguments issue for remote files parsing

Fixes minor Mongo to JSON linking issue

3cbddba

athityakumar requested review from zverok, v0dro and lokeshh August 26, 2017 00:30

athityakumar mentioned this pull request Aug 26, 2017

Calling read-write methods from daru to daru-io #29

Closed

athityakumar added 5 commits August 27, 2017 04:01

Merge branch 'master' into read-write-s-methods

a0e986c

Documents all Exporter methods

a060cf4

Documents all Importer methods

348111b

Changes README documentation

d0fe35d

Fixes minor rubocop offense

5b39712

athityakumar force-pushed the read-write-s-methods branch from 2316fad to b376af1 Compare August 29, 2017 01:27

Fixes minor rspec issue

b376af1

athityakumar changed the title ~~[WIP] Adds from, read, write, to_s methods~~ Adds from, read, write, to_s methods Aug 29, 2017

Adds a very minor change to README

823b024

zverok reviewed Aug 29, 2017

View reviewed changes

athityakumar added 3 commits August 31, 2017 01:30

Adds changes that makes Rubocop & YARD docs happy

8d63e49

Adds Importer doc changes to README

b3ccf4c

athityakumar commented Aug 30, 2017

View reviewed changes

Adds @method tag for all Importers' YARD docs

5e7ead9

zverok approved these changes Aug 31, 2017

View reviewed changes

athityakumar mentioned this pull request Aug 31, 2017

Better distinction between method arguments in Importers #53

Open

athityakumar merged commit 2407eee into master Aug 31, 2017

athityakumar mentioned this pull request Sep 1, 2017

Add tests for linkages between daru & daru-io calls #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds from, read, write, to_s methods #52

Adds from, read, write, to_s methods #52

athityakumar commented Aug 26, 2017

athityakumar commented Aug 29, 2017

zverok commented Aug 29, 2017

zverok left a comment

zverok Aug 29, 2017

zverok Aug 29, 2017

athityakumar Aug 29, 2017

zverok Aug 29, 2017

athityakumar Aug 29, 2017

athityakumar Aug 29, 2017

zverok Aug 29, 2017

athityakumar Aug 30, 2017

zverok Aug 30, 2017

athityakumar Aug 30, 2017

zverok Aug 29, 2017

athityakumar Aug 30, 2017

zverok Aug 31, 2017

zverok commented Aug 31, 2017

athityakumar commented Aug 31, 2017

zverok commented Aug 31, 2017

zverok left a comment

zverok Aug 31, 2017

athityakumar Aug 31, 2017

Adds from, read, write, to_s methods #52

Adds from, read, write, to_s methods #52

Conversation

athityakumar commented Aug 26, 2017

athityakumar commented Aug 29, 2017

zverok commented Aug 29, 2017

zverok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zverok commented Aug 31, 2017

athityakumar commented Aug 31, 2017

zverok commented Aug 31, 2017

zverok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment