Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joining on index #234

Open
genya0407 opened this issue Aug 26, 2016 · 12 comments
Open

joining on index #234

genya0407 opened this issue Aug 26, 2016 · 12 comments

Comments

@genya0407
Copy link
Contributor

I request some means to join two Daru::DataFrame on their index.
Like: pandas's one.

@athityakumar
Copy link
Member

athityakumar commented Feb 6, 2017

If I've understood properly,

Input

2.3.1 :006 > l1 = Daru::DataFrame.new({ a: [1,2,3], b: [4,5,6]}, index: ['x','y','z'])
 => #<Daru::DataFrame(3x2)>
       a   b
   x   1   4
   y   2   5
   z   3   6 
2.3.1 :007 > l2 = Daru::DataFrame.new({ c: [7,8,9], d: [10,11,12]}, index: ['x','y','z'])
 => #<Daru::DataFrame(3x2)>
       c   d
   x   7  10
   z   8  11
   y   9  12 

Then, desired output is something like

 => #<Daru::DataFrame(3x4)>
       a   b   c   d
   x   1   4   7   10
   y   2   5   8   11
   z   3   6   9   12 

Right? I'd like to work on this. 😄

@athityakumar
Copy link
Member

I had a look at the core/merge.rb file, however I'm not able to make out how to check the join function on console (irb). Any help would be appreciated.

@v0dro
Copy link
Member

v0dro commented Feb 7, 2017

You can start a session with rake pry and then call join on your DataFrame like df.join(....).

@Shekharrajak
Copy link
Member

Shekharrajak commented Mar 2, 2017

If no one is working on it then I am interested to try to work on it.

@v0dro
Copy link
Member

v0dro commented Mar 4, 2017

@Shekharrajak yes you can.

@Shekharrajak
Copy link
Member

I think join must work on the index as default, right ?

@v0dro
Copy link
Member

v0dro commented Mar 19, 2017

@gnilrets WDYT?

@gnilrets
Copy link
Contributor

Makes sense to me...

@Shekharrajak
Copy link
Member

It took me time to understand the core/merge.rb, since there is no examples/docs added in file.

  1. I see that merge is defined in dataframe.rb with no option. But as we know merge + some condition is called different types of join.

  2. merge must be defined in core/merge.rb and same kind of(like join), option must be used in merge as well.

  3. One special case will be when there is some column(s) is(are) common on df1 and df2 then suffix must be added. (at this time we will first add suffix on both df column then pass into join).

E.g.

irb(main):065:0> left
=> #<Daru::DataFrame(3x2)>
       A   B
   0  A0  B0
   1  A1  B1
   2  A2  B2
irb(main):066:0> right
=> #<Daru::DataFrame(3x2)>
       A   c
  K0  A0  D0
  K2  A1  D2
  K3  A3  D3

irb(main):064:0> left.merge(right)
=> #<Daru::DataFrame(3x4)>
     A_1   B A_2   c
   0  A0  B0  A0  D0
   1  A1  B1  A1  D2
   2  A2  B2  A3  D3


Problem in merge :

  • Doesn't use index to merge on.

see this example :

irb(main):002:0* left = Daru::DataFrame.new({
irb(main):003:2* :A => ['A0', 'A1', 'A2'],
irb(main):004:2* :B => ['B0', 'B1', 'B2']},
irb(main):005:1* index: ['K0', 'K1', 'K2'])
=> #<Daru::DataFrame(3x2)>
       A   B
  K0  A0  B0
  K1  A1  B1
  K2  A2  B2
irb(main):006:0>
irb(main):007:0* right = Daru::DataFrame.new({
irb(main):008:2* :C => ['C0', 'C2', 'C3'],
irb(main):009:2* :D => ['D0', 'D2', 'D3']},
irb(main):010:1* index: ['K0', 'K2', 'K3'])
=> #<Daru::DataFrame(3x2)>
       C   D
  K0  C0  D0
  K2  C2  D2
  K3  C3  D3

# index values are different but still merge

irb(main):011:0> left.merge(right)
=> #<Daru::DataFrame(3x4)>
       A   B   C   D
   0  A0  B0  C0  D0
   1  A1  B1  C2  D2
   2  A2  B2  C3  D3

# when no index is passed then its fine.

irb(main):012:0> left = Daru::DataFrame.new({
irb(main):013:2* :A => ['A0', 'A1', 'A2'],
irb(main):014:2* :B => ['B0', 'B1', 'B2']},
irb(main):015:1* )
=> #<Daru::DataFrame(3x2)>
       A   B
   0  A0  B0
   1  A1  B1
   2  A2  B2
irb(main):016:0> right = Daru::DataFrame.new({
irb(main):017:2* :C => ['C0', 'C2', 'C3'],
irb(main):018:2* :D => ['D0', 'D2', 'D3']},
irb(main):019:1* )
=> #<Daru::DataFrame(3x2)>
       C   D
   0  C0  D0
   1  C2  D2
   2  C3  D3
irb(main):020:0> left.merge(right)
=> #<Daru::DataFrame(3x4)>
       A   B   C   D
   0  A0  B0  C0  D0
   1  A1  B1  C2  D2
   2  A2  B2  C3  D3

Reference :

  1. http://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join

  2. http://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/#mergetypes

@v0dro
Copy link
Member

v0dro commented Mar 26, 2017

If I understand you correctly, you are basically proposing that parts of join should be included into merge right? Why is that? You can very well do the same thing by specifying options to join.

The purpose of this issue is to come up with suitable APIs to join two dataframes on index.

Also, how is your first example (in the section below Problem with merge) any different from the first? We all know that joining on index is not yet supported. Your example is perfectly demonstrating that.

And why do you want to perform a merge in the first place when the problem is with join?

@Shekharrajak
Copy link
Member

Shekharrajak commented Mar 26, 2017

Actually my above comment is not much related to this issue, but I think that if 'joining on index' is fixed then using inner join (default join for merge) merge can be done, default on: index. So we don't need to solve separate merge issue .

@v0dro
Copy link
Member

v0dro commented Mar 28, 2017

Why should the default be changed from :inner to :index? Is there any compelling reason?

So we don't need to solve separate merge issue .

What exactly do you mean by this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants