Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] changed CRAN mirror to fix failing installs #2954

Merged
merged 6 commits into from
Mar 28, 2020

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Mar 27, 2020

This fixes the issue we are seeing in Mac builds across all PRs right now.

@StrikerRUS I know you tried a similar change in #2949 (comment) and said it didn't work. I don't see anything obviously wrong with your approach, but I do know the change I'm suggesting in this PR works on my Mac.

Changes:

  1. Change from RStudio CRAN mirror to https://cloud.r-project.org/
  2. Make that explicit, instead of relying on .Rprofile

Since we only have one install.packages() call, I don't think we need to deal with the complexity of when and if the .Rprofile file gets sourced.

on my Mac:

Screen Shot 2020-03-27 at 10 23 28 AM

My best guess about what happened is:

  1. R binaries for macOS became temporarily unavailable on CRAN
  2. this broke whatever process mirrors them from CRAN's main repository to https//cran.rstudio.com
  3. CRAN repository was still broken when @StrikerRUS tried it (see that comment above)
  4. CRAN is now fixed so it succeeded for me
  5. RStudio still hasn't caught up (maybe CRAN fixed the issue in a way that breaks RStudio's mirroring or something)

Even if that is true and a few hours from now the RStudio repo will be back up, I still think this PR should be accepted. It removes unnecessary indirection (with .Rprofile) and points our builds at the real source of truth for R packages.

@jameslamb
Copy link
Collaborator Author

ugh same error on Travis as #2949 (comment). When it worked on my Mac I was so ready to believe it was transient 😫

image

But the Azure DevOps build succeeded 0_o

@jameslamb
Copy link
Collaborator Author

ugh same error on Travis as #2949 (comment). When it worked on my Mac I was so ready to believe it was transient 😫

image

But the Azure DevOps build succeeded 0_o

I am running the same version of R on my Mac and you can see from my screenshot that it clearly worked there. If you click around in a browser at https://cran.rstudio.com/bin/macosx/el-capitan/contrib/ you can see that all the folders are now empty, but all the packages we need are there at https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/.

I am confused

@jameslamb
Copy link
Collaborator Author

jameslamb commented Mar 27, 2020

This PR seems to have fixed the installation errors!

But now I see this on the lint task:

image removed (pasted the wrong one in original comment)

image

That seems like an unrelated problem since we use conda setup for lintr. So maybe we have two problems :/

@StrikerRUS
Copy link
Collaborator

@jameslamb
I just reverted that temporary fix for libiconv in lint job from my PR and everything was OK.
22d4076

@jameslamb
Copy link
Collaborator Author

@jameslamb
I just reverted that temporary fix for libiconv in lint job from my PR and everything was OK.
22d4076

oh great! it seems like this PR is passing now. Did you restart the lint task? I couldn't do that (I forgot the trick too be able to rebuild on Travis without closing and opening the PR)

@StrikerRUS
Copy link
Collaborator

Did you restart the lint task?

Yep :-)
I'm afraid that this libiconv error is sporadic.

Let me rerun it several times to confirm we are not facing it anymore.

.ci/test_r_package.sh Outdated Show resolved Hide resolved
@StrikerRUS
Copy link
Collaborator

Let me rerun it several times to confirm we are not facing it anymore.

https://travis-ci.org/github/microsoft/LightGBM/jobs/667785188?utm_medium=notification&utm_source=github_status

I re-run 3 times - the third one is errored!

@StrikerRUS
Copy link
Collaborator

Exactly the same versions of installed packages in green and red builds...

@jameslamb
Copy link
Collaborator Author

Let me rerun it several times to confirm we are not facing it anymore.

https://travis-ci.org/github/microsoft/LightGBM/jobs/667785188?utm_medium=notification&utm_source=github_status

I re-run 3 times - the third one is errored!

ah!!! I'm afraid 😬

I traced a chain of comments on errors from conda/conda#8838 to conda-forge/libarchive-feedstock#35 (comment) where I see

...can cause problems if the package is installed with a version of libxml2 that does not require libiconv (such as the packages in defaults) 

I can see in this failed build we ended up with libxml2 from a default channel

libxml2            pkgs/main/linux-64::libxml2-2.9.9-hea5a465_1

In this successful build we got the same thing:

libxml2            pkgs/main/linux-64::libxml2-2.9.9-hea5a465_1

WHAT IS HAPPENING haha

@jameslamb
Copy link
Collaborator Author

@StrikerRUS if you put this back (22d4076) I wonder if it would work? Even though it's sometimes redundant, it would guarantee we have libiconv right

@StrikerRUS
Copy link
Collaborator

if you put this back (22d4076)

I'll create a new branch and try several rebuilds there to not pollute other PRs. I'll get back to you ASAP (~3-5 re-runs of lint job).

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 27, 2020

I'll create a new branch and try several rebuilds there to not pollute other PRs. I'll get back to you ASAP (~3-5 re-runs of lint job).

Unfortunately, no difference with excplicitly installed libiconv
https://travis-ci.org/github/microsoft/LightGBM/jobs/667822501

@jameslamb
Copy link
Collaborator Author

jameslamb commented Mar 27, 2020

I'll create a new branch and try several rebuilds there to not pollute other PRs. I'll get back to you ASAP (~3-5 re-runs of lint job).

Unfortunately, no difference with excplicitly installed libiconv
https://travis-ci.org/github/microsoft/LightGBM/jobs/667822501

I don't see libiconv in the logs of that job, other than in the error message. Are you sure conda is attempting it?

Also how are you rebuilding the tasks? Do you just have more permissions on our Travis than I do? I'd be happy to investigate this if I could rebuild tasks

^ nevermind, I can do this on my fork

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 27, 2020

^ nevermind, I can do this on my fork

Just re-login: #2827 (comment).

I don't see libiconv in the logs of that job, other than in the error message. Are you sure conda is attempting it?

Hmm, there are so many issues with CIs in these days, I'm not sure in anything 🙁 For example, right now Travis status is yellow while all jobs have finished already.

I'll try to do it again.

@jameslamb
Copy link
Collaborator Author

I added an explicit install of libxml2 from conda-forge (following conda-forge/libarchive-feedstock#35 (comment)) and so far got 3 consecutive successful runs of the lint task:

I'm going to try to get to 10 consecutive builds, and if they work I'll add that to this PR

@jameslamb
Copy link
Collaborator Author

I added an explicit install of libxml2 from conda-forge (following conda-forge/libarchive-feedstock#35 (comment)) and so far got 3 consecutive successful runs of the lint task:

I'm going to try to get to 10 consecutive builds, and if they work I'll add that to this PR

  1. https://travis-ci.org/github/jameslamb/LightGBM/builds/667852382
  2. https://travis-ci.org/github/jameslamb/LightGBM/builds/667854550

At this point Travis stopped loading for me. I'm worried that the service is down :/. It's had some struggles the past week

https://www.traviscistatus.com/

image

That would explain why we're seeing this:

image

They had an incident related to it yesterday

image

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb OK, let's take a break and get back to CI errors tomorrow.
I'm approving this PR. Feel free to add explicit conda installation to the lint job if think it can help and merge the PR (if Travis will let you do that 😄 ).

@jameslamb
Copy link
Collaborator Author

@jameslamb OK, let's take a break and get back to CI errors tomorrow.
I'm approving this PR. Feel free to add explicit conda installation to the lint job if think it can help and merge the PR (if Travis will let you do that 😄 ).

🤝 deal, I'll push that change and then let's come back to it tomorrow. Thanks for the help!

@jameslamb
Copy link
Collaborator Author

Travis definitely did have an outage today! (link)

This went up a bit after we stopped working on this:

image

I'll rebuild two more times on my personal to make sure this is change works, then merge.

@jameslamb
Copy link
Collaborator Author

ok now that the Travis outage seems to be behind us, did 3 more consecutive builds, all passed the lint task:

  1. https://travis-ci.org/github/jameslamb/LightGBM/builds/667959243
  2. https://travis-ci.org/github/jameslamb/LightGBM/builds/667959839
  3. https://travis-ci.org/github/jameslamb/LightGBM/builds/667960317

I feel ok with merging this.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants