-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lab 06 #22
Comments
I'm adjusting the date. This semester has gotten the best of me. Lab 6 will be short and sweet, the remainder of assignments will be posted tonight. I appreciate the patience. |
@lecy Lab 6 is showing up in the schedule, but the page is not live yet. I am not sure if it is supposed to be. Thanks! |
Thanks Ricky - it was a broken link, should be working now. |
So I made this code for the first part of the lab 6, how do I find the variable for year?
|
You can find data dictionaries or peek at your data using: help( Salaries )
head( Salaries ) |
So when I replace year with yearID in the code it still cannot find year |
I think you might be confusing dataset names (Teams) with ID names in the dataset. |
I am working on the same question as well. I do have a question about merging? This is taking forever and not producing any result?
|
By.x and by.y are used when variable names differ (e.g. “team.id” and “teamID”). You should be using a compound ID instead. c(“id1”,”id2”) |
It is doing the same thing . What am I doing wrong ?
|
TRUE is an object not a string in R? merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE ) I am assuming you have already aggregated your Salaries dataset up to the team level by year? I'm not sure you want to keep all because the Teams dataset covers all seasons back to 1885, while the Salaries dataset only starts in 1985. |
yearID is what is being said the name of the variable year is, I put both year and yearID in so I am not understanding what I am doing wrong to not get the right variable for year? Am I not understanding what the question is asking?
|
@jmacost5 I am working on the same thing but I am going piece by piece. I have this. I am sure I missing something here.
|
|
@jmacost5 I would first create your new team salaries dataset, then merge it with Teams. Not sure what a couple of these steps are doing. year <-
Salaries <-
Salaries %>%
mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) )
merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE )
group_by(Teams)
summarize(Salaries)
pander() Maybe something like? team.salaries <-
Salaries %>%
mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
group_by( ... ) %>%
summarize( ... )
dat.teams.salaries <- merge( Teams, team.salaries, by=c("teamID","yearID"), all=? ) |
The submission link for Lab-06 is not working for me. |
Here is my code for creating the aggregate salary dataset:
It results in 4 columns of data without titles (should be teamID, yearID, n, & team.budget). Then when I try to merge this dataset with Teams I get this error message:
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns I can only assume this is because summary table had no column titles and therefore no variable names to match with Teams. I've updated and reloaded all of the packages in use and re-started R-studio several times. I'm not sure what else could be the problem. |
I am not super good at this but I believe that the all=F needs to be all= TRUE |
@jmacost5 I tried all 4 merging possibilities. All gave the same error message. I agree though, that one should be correct. |
this is giving me the 4 variables, not the labels, but it should work. |
@jmacost5 That's the thing. We have to merge Teams with the new dataset ("team.Salaries") rather than the original "Salaries" dataset. This will combine inflation-adjusted team budgets with Win stats so that we can calculate cost-per-win. The problem I'm stuck on is that it won't let me merge the two datasets because the labels are missing from my "team.Salaries" set. |
I would think it should look something like this but anyone please feel free to show me where I'm wrong:
|
@etbartell It's strange because when I run the code you posted here I get the correct results. head( Unified )
teamID yearID n team.budget lgID franchID divID Rank G Ghome W L DivWin
1 ANA 1997 31 54596239 AL ANA W 2 162 82 84 78 N
2 ANA 1998 34 70278139 AL ANA W 2 162 81 85 77 N
3 ANA 1999 40 91548199 AL ANA W 4 162 81 70 92 N
4 ANA 2000 30 82584880 AL ANA W 3 162 81 82 80 N
5 ANA 2001 30 74058241 AL ANA W 3 162 81 75 87 N
6 ANA 2002 28 93359559 AL ANA W 2 162 81 99 63 N
WCWin LgWin WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER
1 N N N 829 5628 1531 279 25 161 617 953 126 72 45 57 794 730
2 N N N 787 5630 1530 314 27 147 510 1028 93 45 48 41 783 720
3 N N N 711 5494 1404 248 22 158 511 1022 71 45 43 42 826 762
4 N N N 864 5628 1574 309 34 236 608 1024 93 52 47 43 869 805
5 N N N 691 5551 1447 275 26 158 494 1001 116 52 77 53 730 671
6 Y Y Y 851 5678 1603 333 32 152 462 805 117 51 74 64 644 595
ERA CG SHO SV IPouts HA HRA BBA SOA E DP FP name
1 4.52 9 5 39 4364 1506 202 605 1050 123 140 0.980 Anaheim Angels
2 4.49 3 5 52 4332 1481 164 630 1091 106 146 0.983 Anaheim Angels
3 4.79 4 7 37 4294 1472 177 624 877 106 156 0.983 Anaheim Angels
4 5.00 5 3 46 4344 1534 228 662 846 134 182 0.978 Anaheim Angels
5 4.20 6 1 43 4313 1452 168 525 947 103 142 0.983 Anaheim Angels
6 3.69 7 14 54 4357 1345 169 509 999 87 151 0.986 Anaheim Angels
park attendance BPF PPF teamIDBR teamIDlahman45
1 Edison International Field 1767330 102 102 ANA ANA
2 Edison International Field 2519280 102 102 ANA ANA
3 Edison International Field 2253123 99 100 ANA ANA
4 Edison International Field 2066982 102 103 ANA ANA
5 Edison International Field 2000919 101 101 ANA ANA
6 Edison International Field 2305547 100 99 ANA ANA
teamIDretro
1 ANA
2 ANA
3 ANA
4 ANA
5 ANA
6 ANA I have not encountered this issue before with the functions stripping column names. Can you print your session info while you are getting these results? sessionInfo() And send me your RMD file by email? |
Try |
@sunaynagoel Link for submission should be active now. |
@lecy Sending my Rmd now. Here is the sessionInfo: R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Lahman_7.0-1 pander_0.6.3 dplyr_0.8.3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 digest_0.6.21 crayon_1.3.4 assertthat_0.2.1 R6_2.4.0 magrittr_1.5 pillar_1.4.2 rlang_0.4.0
[9] tools_3.6.1 glue_1.3.1 purrr_0.3.2 xfun_0.9 compiler_3.6.1 pkgconfig_2.0.3 knitr_1.25 tidyselect_0.2.5
[17] tibble_2.1.3
|
@etbartell Here is the problem: # calculate cost per-win and list the 25 lowest-paying teams
team.Salaries <-
Salaries %>%
mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
group_by( teamID, yearID ) %>%
summarize( n = n(), team.budget = sum(salary.adj) ) %>%
filter( n > 24 ) %>% pander() Assignment occurs after the last operation on the right-hand side has completed. What is the last operation here? So it was assigning the formatted pander table to your team.Salaries object, not the data frame. class( team.Salaries )
[1] "NULL" Easy fix, remove the pander() function: team.Salaries <-
Salaries %>%
mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
group_by( teamID, yearID ) %>%
summarize( n = n(), team.budget = sum(salary.adj) ) %>%
filter( n > 24 ) |
For some reason when I put in my code it says that team.budget is not found.
|
@jmacost5 Check out the comment right above. You need to remove pander() during assignment. |
I removed pander, I honestly thought I did after the comment, but I am getting the same error. Error in mutate(cost.per.win = team.budget/W) :
|
@jmacost5 I separated the merge chunk from the rest of it and it finally gave me the right table:
|
@lecy Thank you, that worked! I'm just curious though, why would the pander() function get rid of the column names? I thought I had defined the team.Salaries data frame with the group_by, summary, and filter functions, and that it would just retain the names from Salaries. |
I am still getting a the same error when I remove the
Error: object 'team.budget' not found |
@jmacost5 Your code there doesn't include the merge, and also you need to remove the pander() function after the filter. |
I literally swear that I removed pander I do not understand how it keeps popping up. Can you see if the code is ok now? |
@etbartell Thanks for catching that! @jmacost5 is it working now? |
It is still giving me an error, It is the code listed above (I edited it) Error in Salaries %>% mutate(salary.adj = salary * (1.03)^(max(yearID) - : |
@jmacost5 The pipe operator lives in the dplyr package, so make sure you have |
I do, in the beginning of the paper I have all the library() for the assignment
|
Are you getting the error during a knit, or working interactively with code chunks? If it's interactive, you have to load the dplyr package by running that chunk at the top first, or typing the library() command into the console. |
Yes I am, I ran all the chunks above (i just found this function) and I tried knitting it and I still get the error, the run all above chunks should cause the programs to be loaded right? I looked and it says that they are installed too. |
Send me your RMD by email please. |
Ok, here are the issues: (1) You left a pipe operator at the end of the line when creating your team budget dataset. team.Salaries <-
Salaries %>%
mutate( salary.adj = salary*(1.03)^( max(yearID) - yearID ) ) %>%
group_by( teamID, yearID ) %>%
summarize( n = n(), team.budget = sum(salary.adj) ) %>%
filter( n > 24 ) # %>% never end a recipe with a pipe!
dat.team.salaries <-
# merge( Teams, Salaries, by=c("teamID","yearID"), all=TRUE ) # need to use your new dataset, not original
merge( Teams, team.Salaries, by=c("teamID","yearID"), all=TRUE ) |
Is the Lab for Wk 06 going to be actually due on Oct 3rd or are you going to move the due date?
The text was updated successfully, but these errors were encountered: