Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up of GSoC: remove obsolete parts from main gem #405

Open
zverok opened this issue Sep 1, 2017 · 23 comments
Open

Follow-up of GSoC: remove obsolete parts from main gem #405

zverok opened this issue Sep 1, 2017 · 23 comments
Milestone

Comments

@zverok
Copy link
Collaborator

zverok commented Sep 1, 2017

  1. I believe after release of daru-io there should be a period of cleaning up daru of duplicating things (importers and exporters), and related outdated stuff, like mentioned here: Please do not show the Install optional dependencies message everytime on runtime  #404
  2. I believe daru-io should be hard dependency for daru (like rspec installs rspec-core, rspec-expectations and so on) -- maybe in future with extraction of daru-dataframe-only gem, maybe not. But the point is, if somebody just install daru and wants his DF from CSV, they don't need to think about daru-io explicitly.
  3. Currently I am not sure about parts of daru that can be removed due to daru-views existance, subject to investigate.
  4. I believe such API change is marking next major version, and probably several other things should be done before the release, like Drop support for Ruby 2.0 #336, and probably Enhance specs style  #317 and Make all inline docs proper YARD #318
  5. And by this time we probably can do a Great Cleanup, like removing completely everything that was deprecated in last versions, and deprecate a bunch of another stuff, and process #cleanup-tagged issues, and merge some stuff like Defining group_by#aggregate  #340 and ...

I am ready to lead the work (and do the most complicated parts myself), and propose the following (really hi-level) timeline:

  • September: finalizing GSoC, planning new release and define roadmap 🤔 💭
  • October: hard work :rage1: :rage2: :rage3: :rage4:
  • Beginning of November, hopefully (the earlier, the better, maybe I am overestimating the complexity of the road): Big Shiny Release 🎉

@v0dro @lokeshh @athityakumar @Shekharrajak @parthm @rainchen please feel free to share your thoughts, and notify if you are willing to take some part of the work.

@zverok
Copy link
Collaborator Author

zverok commented Sep 1, 2017

Probably, the work should be done on some daru-1.0 branch, and (as GitHub allows it) PRs on the road should be merged there, not in master?.. Or master would became our moving target, and stable branch should be maintained for current bugs fixing/small enchancements.

@athityakumar
Copy link
Member

Since it's usually an understanding that master (or default branch) is usually the stable branch, I feel that it'd be better if we continue on daru-1.0 branch. Regarding the port of daru-io, I'd definitely like to contribute in removing obsolete IO parts from daru. A few pointers related to this porting :

  • IMO, #save and #load methods are better suited to reside within daru.
  • All other IO methods have already been ported to daru-io, and can be removed from daru.
  • Though #to_html seems like an Exporter method at first look, I feel that it'd be better to have it in daru-view. Of course, it can be ported to daru-io too if that'd be better.

@zverok
Copy link
Collaborator Author

zverok commented Sep 1, 2017

Since it's usually an understanding that master (or default branch) is usually the stable branch, I feel that it'd be better if we continue on daru-1.0 branch.

👍

Regarding the port of daru-io, I'd definitely like to contribute in removing obsolete IO parts from daru.

👍

IMO, #save and #load methods are better suited to reside within daru.

👍

All other IO methods have already been ported to daru-io, and can be removed from daru.

👍

Though #to_html seems like an Exporter method at first look, I feel that it'd be better to have it in daru-view.

👍

Looks like we are on the same page currently :)

@Shekharrajak
Copy link
Member

daru uses Nyaplot, Gruff plotting libraries. daru-view uses Nyaplot (same as daru but it is able to generate html code) , HighCharts, GoogleCharts for plotting.

So if we want to shift the whole plotting libraries to daru-view then we need to shift the Gruff library to daru-view (without any change) and need some changes in daru-view/nyaplot module.

I will try to do this part of work.

@parthm
Copy link
Contributor

parthm commented Sep 3, 2017

@zverok looks like a good plan. I would be glad to help on some pieces as time permits.

Since this is toward v1.0, it may be worth reviewing Daru API for consistency. As 1.0 is a major version I would think it would be significantly more difficult to make any API adjustments post 1.0. Two examples that come to mind are the discussion on to/from-read/write on issue #280 and the inconsistency between DataFrame#vector_sum and Vector#sum in issue #391 (the latter ignores nil by default, the former doesn't). With Daru usage growing, it may be worth using the major version change to iron out any API inconsistencies.

A somewhat related question would be, does Daru have a deprecation policy and versioning guideline in place to help with breaking changes? This can help users with migration to newer versions of Daru and becomes important due to its growing adoption.

@parthm
Copy link
Contributor

parthm commented Sep 3, 2017

Just too keep track of another API proposal discussed, issue #131 talks about changing #dv to #to_daru_vector.

@zverok
Copy link
Collaborator Author

zverok commented Sep 3, 2017

I would be glad to help on some pieces as time permits.

Nice to hear that!

Since this is toward v1.0, it may be worth reviewing Daru API for consistency.

Yes, it would definitely a huge review. As @v0dro officially made me the maintainer, I am planning a Big Autumn Cleanup for a huge part of our ancestry :trollface: Hold your hats.

A somewhat related question would be, does Daru have a deprecation policy and versioning guideline in place to help with breaking changes? This can help users with migration to newer versions of Daru and becomes important due to its growing adoption.

We have deprecate API (which marks methods as deprecated and prints warnings), but to that point never removed anything deprecated. Probably, it is a good point to explicitly set the policy (and even add some automation tool for check it, like "should ve drop this or that methods in current version?")

@zverok
Copy link
Collaborator Author

zverok commented Sep 3, 2017

Just too keep track of another API proposal discussed, issue #131 talks about changing #dv to #to_daru_vector.

To be honest, current preferred by community (outside Rails one, ofc) Ruby style is "avoid monkey patches of core classes if only possible". I believe we'll follow this rule in 1.0

@v0dro
Copy link
Member

v0dro commented Sep 5, 2017

@baarkerlounger, since you're working on release cycles, will it possible for you to take inspiration from your previous work and other gems and frame a 'release policy' for daru which we will consistetly follow for every single new release in the future? I think it is very important given that this is becoming a popular project and more people have started contributing to it. Please share your thoughts on this issue: #411

@v0dro
Copy link
Member

v0dro commented Sep 5, 2017

Also, I think the next version (after the great cleanup) should be 0.2 not 1.0. We'll release 1.0 once the data storage issue is sorted out: #328

@zverok
Copy link
Collaborator Author

zverok commented Sep 5, 2017

Also, I think the next version (after the great cleanup) should be 0.2 not 1.0. We'll release 1.0 once the data storage issue is sorted out: #328

Well, you are the boss, but I believe with current usage, number of GitHub stars, and so on and so forth -- the new version with seriously updated docs, removed deprecations, new importers/exporters, ... well, I really believe it could be called The One-Point-Oh.

As about #328 -- I am not sure that it should become the main implementation of Daru. Because Daru is useful also as a small, easy-to-install, almost-no-dependencies concept of DataFrame and Vector, and with daru-io and daru-view even more so (like "take those 300 rows from DB and export them to Excel").

I understand the goal to compete with pandas, yet, to be honest, I still believe all internal speedups with dependencies on side libraries and compiled codes should be only plugins, not the core of the library.

Of course, you may have another opinion, and, like I've said, you are the boss.

@baarkerlounger
Copy link
Contributor

I guess the outcome of this version numbering should be part of the 'release policy' - personally I like have x.y.z with x number bumps being used only for breaking changes that will require user code to change, y being bumped for new features, z being used for bug fixes only.

That way users can have consistent expectations about behaviour from the numbering. So I think 0.2.0 or 1.0.0 should depend on whether the great cleanup is backwards compatible or not.

@v0dro
Copy link
Member

v0dro commented Sep 5, 2017

Hmmmm yes @baarkerlounger is right. @zverok I don't think there will be too many breaking changes due to the cleanup given that we're essentially only moving out the IO and view methods, right?

About the data storage, I think it will probably be a plugin which will over-ride all core daru functionality once required.

@baarkerlounger
Copy link
Contributor

I also think there should be another release when the currently open PRs are merged.

@parthm
Copy link
Contributor

parthm commented Sep 5, 2017

Based on autumn cleanup review @zverok mentoned in an earlier note, if we have API consistency being handled in this release, labeling it 1.0 might make sense. In case it's only internal cleanup I suppose we could call it 0.2.

I still believe all internal speedups with dependencies on side libraries and compiled codes should be only plugins, not the core of the library.

Just to cast my vote here, this approach makes sense IMHO. It makes it very easy to get started with Daru. Having a huge install by default can sometimes be painful. I had issues earlier when I tried the full install of SciRuby (IIRC it was some compilation issues with gsl) and nearly ended up going with Pandas. Fortunately, a Daru only install worked smoothly and I could happily use Ruby/Daru.

@zverok
Copy link
Collaborator Author

zverok commented Sep 6, 2017

@zverok I don't think there will be too many breaking changes due to the cleanup given that we're essentially only moving out the IO and view methods, right?

Well, my idea of the cleanup (described in original issue) is:

  • move out IO/view
  • it already will introduce some changes and most probably incompatibilities, therefore...
  • remove what was deprecated for a long time
  • go through all unmerged PRs, unfinished issues and style and fix it

So, I've meant that splitting out daru-io is a BIG move, and it could be a reason for dropping the outdated things, and doing a New Big Shiny release.

@v0dro
Copy link
Member

v0dro commented Oct 8, 2017

Yes I agree. @zverok do you plan to work on this?

@zverok
Copy link
Collaborator Author

zverok commented Oct 8, 2017

Right now sitting and looking into the large file with a plan :) Will publish today or tomorrow's morning!

@zverok zverok added this to the Version 1.0 milestone Oct 8, 2017
@zverok
Copy link
Collaborator Author

zverok commented Oct 8, 2017

I've created branch v-1-pre targeting all "big" (e.g. incompatible) changes and GitHub issues milestone.
The latter is NOT complete currently (I am still reviewing through all we have).

As for myself, I am planning currently to work on updating specs (and making a list of methods to deprecate/redesign), and if evrybody is OK with it I'll work just in the branch, without PRs.

@baarkerlounger
Copy link
Contributor

@zverok why no PRs? Will end up with a potentially huge branch to review?

@zverok
Copy link
Collaborator Author

zverok commented Oct 8, 2017

@baarkerlounger because I am planning mostly to do "maintenance" tasks (change minimal Ruby version, facelift code style here and there and so on), and also gradually update specs between that, it would be work-in-progress of preparation of Big Release. It would be iterations like "look at Index, fix specs, discuss possible deprecations, fix a bit more specs, cleanup some method" and so on.

(And one huge PR "fix all specs" would be almost un-reviewable anyways.)

@athityakumar
Copy link
Member

I've submitted PR #430 to port IO modules away from daru, please review. (I'll resolve the merge conflict along with review changes) 😄

@zverok
Copy link
Collaborator Author

zverok commented Nov 11, 2017

JFYI: not sleeping. Started to refactor specs & docs, which somehow leaded to HUGE lib review, current results (WIP, not passing specs at the moment) in https://github.com/SciRuby/daru/tree/awfully-big-refactoring branch. Current plan is to finish it there, then merge into v-1-pre branch, then fix everything that will left of milestone :) ETA ~1-2 weeks on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants