lab 06 #22

sunaynagoel · 2019-09-30T22:28:45Z

Is the Lab for Wk 06 going to be actually due on Oct 3rd or are you going to move the due date?

lecy · 2019-10-01T12:04:58Z

I'm adjusting the date. This semester has gotten the best of me.

Lab 6 will be short and sweet, the remainder of assignments will be posted tonight. I appreciate the patience.

RickyDuran · 2019-10-02T13:26:56Z

@lecy Lab 6 is showing up in the schedule, but the page is not live yet.

I am not sure if it is supposed to be.

Thanks!
Ricky

lecy · 2019-10-02T15:55:11Z

Thanks Ricky - it was a broken link, should be working now.

jmacost5 · 2019-10-02T17:29:33Z

So I made this code for the first part of the lab 6, how do I find the variable for year?

Salaries <- 
Salaries %>% 
mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) )
group_by( Year, Teams)
summarize(Salaries)

lecy · 2019-10-02T17:39:24Z

You can find data dictionaries or peek at your data using:

help( Salaries )
head( Salaries )

jmacost5 · 2019-10-02T17:45:16Z

So when I replace year with yearID in the code it still cannot find year

lecy · 2019-10-02T17:53:07Z

I think you might be confusing dataset names (Teams) with ID names in the dataset.

sunaynagoel · 2019-10-02T18:21:09Z

I am working on the same question as well. I do have a question about merging? This is taking forever and not producing any result?


merge(Teams, Salaries, by.x= "teamID",by.y = "yearID", all = "TRUE") %>%
pander()

lecy · 2019-10-02T18:30:05Z

By.x and by.y are used when variable names differ (e.g. “team.id” and “teamID”). You should be using a compound ID instead.

c(“id1”,”id2”)

sunaynagoel · 2019-10-02T18:48:01Z

It is doing the same thing . What am I doing wrong ?

merge (Teams, Salaries, by= c("teamID","yearID"),all = "TRUE") %>%
pander()

lecy · 2019-10-02T19:24:22Z

TRUE is an object not a string in R?

merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )

I am assuming you have already aggregated your Salaries dataset up to the team level by year?

I'm not sure you want to keep all because the Teams dataset covers all seasons back to 1885, while the Salaries dataset only starts in 1985.

jmacost5 · 2019-10-02T20:08:40Z

yearID is what is being said the name of the variable year is, I put both year and yearID in so I am not understanding what I am doing wrong to not get the right variable for year? Am I not understanding what the question is asking?

year <- 
Salaries <- 
  Salaries %>% 
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) )
  merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
group_by(Teams)
summarize(Salaries)
pander()

sunaynagoel · 2019-10-02T20:11:57Z

@jmacost5 I am working on the same thing but I am going piece by piece. I have this. I am sure I missing something here.

#new variable for total salary team wise
Salaries %>%
  group_by(teamID, yearID)%>%
  summarise ( n=n(),
              totalsalary= sum(salary.adj,na.rm = TRUE))%>%
  
   pander()

sunaynagoel · 2019-10-02T20:14:29Z

@lecy

TRUE is an object not a string in R?

When I put it under quotes it gives me error.
merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
I am assuming you have already aggregated your Salaries dataset up to the team level by year?

I tried.
I'm not sure you want to keep all because the Teams dataset covers all seasons back to 1885, while the Salaries dataset only starts in 1985.

I don't want to keep all the data. i just want to keep data from 1985.

lecy · 2019-10-02T21:59:01Z

Which merge would make sense here?

Note that in base R the merge arguments translate to:

merge( ..., all=TRUE )  # creates an outer join
merge( ..., all=FALSE)  # creates an inner join
merge( ..., all.x=TRUE)  # creates a left join
merge( ..., all.y=TRUE)  # creates a right join

lecy · 2019-10-02T22:03:14Z

@jmacost5 I would first create your new team salaries dataset, then merge it with Teams. Not sure what a couple of these steps are doing.

year <- 
Salaries <- 
  Salaries %>% 
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) )
  merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
group_by(Teams)
summarize(Salaries)
pander()

Maybe something like?

team.salaries <- 
  Salaries %>% 
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( ... ) %>%
  summarize( ... )

dat.teams.salaries <- merge( Teams, team.salaries, by=c("teamID","yearID"), all=? )

sunaynagoel · 2019-10-03T03:50:02Z

The submission link for Lab-06 is not working for me.

etbartell · 2019-10-03T04:18:53Z

Here is my code for creating the aggregate salary dataset:

team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()

It results in 4 columns of data without titles (should be teamID, yearID, n, & team.budget). Then when I try to merge this dataset with Teams I get this error message:

Unified <- 
  merge(team.Salaries, Teams, by=c("teamID", "yearID"), all=F)

Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns

I can only assume this is because summary table had no column titles and therefore no variable names to match with Teams. I've updated and reloaded all of the packages in use and re-started R-studio several times. I'm not sure what else could be the problem.

jmacost5 · 2019-10-03T04:22:52Z

Here is my code for creating the aggregate salary dataset:
team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()
It results in 4 columns of data without titles (should be teamID, yearID, n, & team.budget). Then when I try to merge this dataset with Teams I get this error message:
Unified <- 
  merge(team.Salaries, Teams, by=c("teamID", "yearID"), all=F)
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns

I can only assume this is because summary table had no column titles and therefore no variable names to match with Teams. I've updated and reloaded all of the packages in use and re-started R-studio several times. I'm not sure what else could be the problem.

I am not super good at this but I believe that the all=F needs to be all= TRUE

etbartell · 2019-10-03T04:25:16Z

@jmacost5 I tried all 4 merging possibilities. All gave the same error message. I agree though, that one should be correct.

jmacost5 · 2019-10-03T04:29:29Z

this is giving me the 4 variables, not the labels, but it should work. {r} team.Salaries <- Salaries %>% mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>% group_by( teamID, yearID ) %>% summarize( n = n(), team.budget = sum(salary.adj) ) %>% filter( n > 24 ) %>% pander() dat.team.salaries <- merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
My question thought is how to make a variable for the cost per win, and do we have to scale it up for inflation?

etbartell · 2019-10-03T04:42:35Z

@jmacost5 That's the thing. We have to merge Teams with the new dataset ("team.Salaries") rather than the original "Salaries" dataset. This will combine inflation-adjusted team budgets with Win stats so that we can calculate cost-per-win. The problem I'm stuck on is that it won't let me merge the two datasets because the labels are missing from my "team.Salaries" set.

etbartell · 2019-10-03T04:52:23Z

I would think it should look something like this but anyone please feel free to show me where I'm wrong:

# aggregate team salaries, adjust team budgets for inflation, and screen out teams with less than 25 players
team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()

# merge aggregate salary dataset with Teams
dat.team.salaries <-
  merge(team.Salaries, Teams, by=c("teamID","yearID"), all=T)

# calculate cost per-win and list the 25 lowest-paying teams
dat.team.salaries %>%
  mutate(cost.per.win = team.budget/W) %>%
  arrange(cost.per.win) %>%
  select("yearID","teamID","lgID","Rank","G","W","n","team.budget","cost.per.win") %>%
  head(25) %>% pander()

lecy · 2019-10-03T05:19:25Z

@etbartell It's strange because when I run the code you posted here I get the correct results.

head( Unified )
  teamID yearID  n team.budget lgID franchID divID Rank   G Ghome  W  L DivWin
1    ANA   1997 31    54596239   AL      ANA     W    2 162    82 84 78      N
2    ANA   1998 34    70278139   AL      ANA     W    2 162    81 85 77      N
3    ANA   1999 40    91548199   AL      ANA     W    4 162    81 70 92      N
4    ANA   2000 30    82584880   AL      ANA     W    3 162    81 82 80      N
5    ANA   2001 30    74058241   AL      ANA     W    3 162    81 75 87      N
6    ANA   2002 28    93359559   AL      ANA     W    2 162    81 99 63      N
  WCWin LgWin WSWin   R   AB    H X2B X3B  HR  BB   SO  SB CS HBP SF  RA  ER
1     N     N     N 829 5628 1531 279  25 161 617  953 126 72  45 57 794 730
2     N     N     N 787 5630 1530 314  27 147 510 1028  93 45  48 41 783 720
3     N     N     N 711 5494 1404 248  22 158 511 1022  71 45  43 42 826 762
4     N     N     N 864 5628 1574 309  34 236 608 1024  93 52  47 43 869 805
5     N     N     N 691 5551 1447 275  26 158 494 1001 116 52  77 53 730 671
6     Y     Y     Y 851 5678 1603 333  32 152 462  805 117 51  74 64 644 595
   ERA CG SHO SV IPouts   HA HRA BBA  SOA   E  DP    FP           name
1 4.52  9   5 39   4364 1506 202 605 1050 123 140 0.980 Anaheim Angels
2 4.49  3   5 52   4332 1481 164 630 1091 106 146 0.983 Anaheim Angels
3 4.79  4   7 37   4294 1472 177 624  877 106 156 0.983 Anaheim Angels
4 5.00  5   3 46   4344 1534 228 662  846 134 182 0.978 Anaheim Angels
5 4.20  6   1 43   4313 1452 168 525  947 103 142 0.983 Anaheim Angels
6 3.69  7  14 54   4357 1345 169 509  999  87 151 0.986 Anaheim Angels
                        park attendance BPF PPF teamIDBR teamIDlahman45
1 Edison International Field    1767330 102 102      ANA            ANA
2 Edison International Field    2519280 102 102      ANA            ANA
3 Edison International Field    2253123  99 100      ANA            ANA
4 Edison International Field    2066982 102 103      ANA            ANA
5 Edison International Field    2000919 101 101      ANA            ANA
6 Edison International Field    2305547 100  99      ANA            ANA
  teamIDretro
1         ANA
2         ANA
3         ANA
4         ANA
5         ANA
6         ANA

I have not encountered this issue before with the functions stripping column names.

Can you print your session info while you are getting these results?

sessionInfo()

And send me your RMD file by email?

sunaynagoel · 2019-10-03T05:21:15Z

Here is my code for creating the aggregate salary dataset:
team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()
It results in 4 columns of data without titles (should be teamID, yearID, n, & team.budget). Then when I try to merge this dataset with Teams I get this error message:
Unified <- 
  merge(team.Salaries, Teams, by=c("teamID", "yearID"), all=F)
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns

I can only assume this is because summary table had no column titles and therefore no variable names to match with Teams. I've updated and reloaded all of the packages in use and re-started R-studio several times. I'm not sure what else could be the problem.

Try
Unified <-
merge (Teams, team.salaries, all = FALSE)

lecy · 2019-10-03T05:30:19Z

@sunaynagoel Link for submission should be active now.

etbartell · 2019-10-03T06:17:55Z

@lecy Sending my Rmd now. Here is the sessionInfo:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Lahman_7.0-1 pander_0.6.3 dplyr_0.8.3 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       digest_0.6.21    crayon_1.3.4     assertthat_0.2.1 R6_2.4.0         magrittr_1.5     pillar_1.4.2     rlang_0.4.0     
 [9] tools_3.6.1      glue_1.3.1       purrr_0.3.2      xfun_0.9         compiler_3.6.1   pkgconfig_2.0.3  knitr_1.25       tidyselect_0.2.5
[17] tibble_2.1.3

lecy · 2019-10-03T07:06:03Z

@etbartell Here is the problem:

# calculate cost per-win and list the 25 lowest-paying teams
team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()

Assignment occurs after the last operation on the right-hand side has completed. What is the last operation here?

So it was assigning the formatted pander table to your team.Salaries object, not the data frame.

class( team.Salaries )
[1] "NULL"

Easy fix, remove the pander() function:

team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 )

jmacost5 · 2019-10-03T07:18:56Z

For some reason when I put in my code it says that team.budget is not found.

team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()
dat.team.salaries <- merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
 mutate(cost.per.win = team.budget/W) %>%
  arrange(cost.per.win) %>%
  select("yearID","teamID","lgID","Rank","G","W","n","team.budget","cost.per.win") %>%
  head(25) %>% pander()

lecy · 2019-10-03T07:29:11Z

@jmacost5 Check out the comment right above. You need to remove pander() during assignment.

jmacost5 · 2019-10-03T15:14:39Z

I removed pander, I honestly thought I did after the comment, but I am getting the same error.

Error in mutate(cost.per.win = team.budget/W) :
object 'team.budget' not found

team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% pander()
dat.team.salaries <- merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
 mutate(cost.per.win = team.budget/W)
  arrange(cost.per.win)
  select("yearID","teamID","lgID","Rank","G","W","n","team.budget","cost.per.win")
  head(25)

etbartell · 2019-10-03T17:52:30Z

@jmacost5 I separated the merge chunk from the rest of it and it finally gave me the right table:

dat.team.salaries <-
  merge(team.Salaries, Teams, by=c("teamID","yearID"), all=T)

dat.team.salaries %>%
  mutate(cost.per.win = team.budget/W) %>%
  arrange(cost.per.win) %>%
  select("yearID","teamID","lgID","Rank","G","W","n","team.budget","cost.per.win") %>%
  head(25)

etbartell · 2019-10-03T18:08:24Z

@lecy Thank you, that worked! I'm just curious though, why would the pander() function get rid of the column names? I thought I had defined the team.Salaries data frame with the group_by, summary, and filter functions, and that it would just retain the names from Salaries.

jmacost5 · 2019-10-03T18:22:55Z

I am still getting a the same error when I remove the

team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 ) %>% 
dat.team.salaries <- 
  merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
dat.team.salaries %>%
 mutate(cost.per.win = team.budget/W) %>%
  arrange(cost.per.win) %>%
  select("yearID","teamID","lgID","Rank","G","W","n","team.budget","cost.per.win") %>%
  head(25)

Error: object 'team.budget' not found

etbartell · 2019-10-03T18:38:11Z

@jmacost5 Your code there doesn't include the merge, and also you need to remove the pander() function after the filter.

jmacost5 · 2019-10-03T18:47:29Z

I literally swear that I removed pander I do not understand how it keeps popping up. Can you see if the code is ok now?

lecy · 2019-10-03T18:59:18Z

@etbartell Thanks for catching that! @jmacost5 is it working now?

jmacost5 · 2019-10-03T19:06:27Z

It is still giving me an error, It is the code listed above (I edited it)

Error in Salaries %>% mutate(salary.adj = salary * (1.03)^(max(yearID) - :
could not find function "%>%<-"

lecy · 2019-10-03T19:55:53Z

@jmacost5 The pipe operator lives in the dplyr package, so make sure you have library(dplyr) somewhere.

jmacost5 · 2019-10-03T19:57:30Z

I do, in the beginning of the paper I have all the library() for the assignment

# note that you cannot include an install.packages() call in an RMD file when knitting
library( dplyr )
library( pander )
library( Lahman )

lecy · 2019-10-03T19:59:02Z

Are you getting the error during a knit, or working interactively with code chunks?

If it's interactive, you have to load the dplyr package by running that chunk at the top first, or typing the library() command into the console.

jmacost5 · 2019-10-03T20:00:56Z

Yes I am, I ran all the chunks above (i just found this function) and I tried knitting it and I still get the error, the run all above chunks should cause the programs to be loaded right? I looked and it says that they are installed too.

lecy · 2019-10-03T20:01:31Z

Send me your RMD by email please.

lecy · 2019-10-03T21:19:20Z

Ok, here are the issues:

(1) You left a pipe operator at the end of the line when creating your team budget dataset.
(2) You create a team budget dataset, but then use the original Salaries dataset in the merge step.

team.Salaries <-
  Salaries %>%
  mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
  group_by( teamID, yearID ) %>%
  summarize( n = n(), team.budget = sum(salary.adj) ) %>%
  filter( n > 24 )    # %>%     never end a recipe with a pipe! 

dat.team.salaries <- 
  # merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )   # need to use your new dataset, not original 
  merge( Teams, team.Salaries, by=c("teamID","yearID"), all=TRUE )

lecy added the lab-06 label Oct 15, 2019

lab 06 #22

lab 06 #22

Comments

sunaynagoel commented Sep 30, 2019

lecy commented Oct 1, 2019

RickyDuran commented Oct 2, 2019

lecy commented Oct 2, 2019

jmacost5 commented Oct 2, 2019

lecy commented Oct 2, 2019

jmacost5 commented Oct 2, 2019

lecy commented Oct 2, 2019

sunaynagoel commented Oct 2, 2019

lecy commented Oct 2, 2019

sunaynagoel commented Oct 2, 2019

lecy commented Oct 2, 2019 • edited Loading

jmacost5 commented Oct 2, 2019

sunaynagoel commented Oct 2, 2019 • edited Loading

sunaynagoel commented Oct 2, 2019

lecy commented Oct 2, 2019

lecy commented Oct 2, 2019

sunaynagoel commented Oct 3, 2019

etbartell commented Oct 3, 2019 • edited Loading

jmacost5 commented Oct 3, 2019

etbartell commented Oct 3, 2019

jmacost5 commented Oct 3, 2019

etbartell commented Oct 3, 2019

etbartell commented Oct 3, 2019

lecy commented Oct 3, 2019

sunaynagoel commented Oct 3, 2019

lecy commented Oct 3, 2019

etbartell commented Oct 3, 2019 • edited by lecy Loading

lecy commented Oct 3, 2019 • edited Loading

jmacost5 commented Oct 3, 2019

lecy commented Oct 3, 2019 • edited Loading

jmacost5 commented Oct 3, 2019

etbartell commented Oct 3, 2019 • edited Loading

etbartell commented Oct 3, 2019

jmacost5 commented Oct 3, 2019 • edited Loading

etbartell commented Oct 3, 2019

jmacost5 commented Oct 3, 2019 • edited Loading

lecy commented Oct 3, 2019

jmacost5 commented Oct 3, 2019

lecy commented Oct 3, 2019 • edited Loading

jmacost5 commented Oct 3, 2019

lecy commented Oct 3, 2019

jmacost5 commented Oct 3, 2019 • edited Loading

lecy commented Oct 3, 2019

lecy commented Oct 3, 2019

lecy commented Oct 2, 2019 •

edited

Loading

sunaynagoel commented Oct 2, 2019 •

edited

Loading

etbartell commented Oct 3, 2019 •

edited

Loading

etbartell commented Oct 3, 2019 •

edited by lecy

Loading

lecy commented Oct 3, 2019 •

edited

Loading

lecy commented Oct 3, 2019 •

edited

Loading

etbartell commented Oct 3, 2019 •

edited

Loading

jmacost5 commented Oct 3, 2019 •

edited

Loading

jmacost5 commented Oct 3, 2019 •

edited

Loading

lecy commented Oct 3, 2019 •

edited

Loading

jmacost5 commented Oct 3, 2019 •

edited

Loading