Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] add CMake + R 3.6 test back (fixes #3469) #4053

Merged
merged 6 commits into from
Mar 10, 2021
Merged

Conversation

jameslamb
Copy link
Collaborator

I think this PR fixes #3469 😬

My theory

Using https://github.com/nelsonjchen/reverse-rdp-windows-github-actions, (mentioned in #3469 (comment)), I RDP'd into a GitHub Actions runner running the windows-latest image and try some experiments. I discovered something confusing and scary that I think could be the root cause of CI instability mentioned in #3469.

I manually copied the content of test_r_package_windows.ps1 into a Powershell shell running in that runner, a few lines at a time.

Immediately after installing Rtools, the directory C:\Rtools was visible in Windows Explorer but C:\Rtools\mingw_64 didn't exist yet! Even though, in powershell, the command had exited successfully and returned control to the shell.

image

After around 20 seconds, the mingw_64 directory finally popped up.

image

And when I say "it popped up", I don't think that's just the effect of buffering or a delay in showing the files. I hit "refresh" in Windows Explorer several times and did not see mingw_64 show up until after 20 seconds or so had passed. So I think the install returned control to powershell before all the files were actually written, somehow 😱 .

This type of problem could explain what we've seen in #3469, where the R 3.6 + CMake job fails but not always. Recall that the failures look like this:

  The C compiler

    "C:/Rtools/mingw_64/bin/gcc.exe"

  is not able to compile a simple test program.

...

    C:/Rtools/mingw_64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.3/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lshell32

It makes sense to me that that could happen if all of the contents of C:\Rtools\mingw_64 (which include a lot of things used by ld.exe) aren't available yet when CMake tries to compile LightGBM!

Proposed Changes

I know fixing a problem by just introducing a sleep isn't ideal, but I think it would be difficult to figure out the full list of files needed by ld.exe when compiling LightGBM. I found that with this fix, I got two consecutive runs of the R 3.6 + CMake + Windows job (5 builds on each run) to complete successfully on my fork.

image

@jameslamb jameslamb requested a review from StrikerRUS March 8, 2021 06:16
@jameslamb jameslamb requested a review from Laurae2 as a code owner March 8, 2021 06:16
Comment on lines +42 to +47
Remove-From-Path ".*chocolatey.*"
Remove-From-Path ".*Chocolatey.*"
Remove-From-Path ".*Git.*mingw64.*"
Remove-From-Path ".*msys64.*"
Remove-From-Path ".*rtools40.*"
Remove-From-Path ".*Strawberry.*"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these paths all had some msys2 or mingw components in them.

I dumped the full PATH below so you can see what's there, including why I needed both chocolatey and Chocolatey 😂

$env:PATH.split(";")
full contents of PATH before removals
C:\Program Files\PowerShell\7
C:\Users\runneradmin\.dotnet\tools
C:\Program Files\MongoDB\Server\4.4\bin
C:\aliyun-cli
C:\vcpkg
C:\cf-cli
C:\Program Files (x86)\NSIS\
C:\Program Files\Mercurial\
C:\hostedtoolcache\windows\stack\2.5.1\x64
C:\ProgramData\chocolatey\lib\ghc.8.10.3\tools\ghc-8.10.3\bin
C:\Program Files\dotnet
C:\mysql-5.7.21-winx64\bin
C:\Program Files\R\R-4.0.4\bin\x64
C:\SeleniumWebDrivers\GeckoDriver
C:\Program Files (x86)\sbt\bin
C:\Rust\.cargo\bin
C:\Program Files (x86)\GitHub CLI
C:\Program Files\Git\bin
C:\Program Files (x86)\pipx_bin
C:\hostedtoolcache\windows\go\1.15.8\x64\bin
C:\hostedtoolcache\windows\Python\3.7.9\x64\Scripts
C:\hostedtoolcache\windows\Python\3.7.9\x64
C:\hostedtoolcache\windows\Ruby\2.5.8\x64\bin
C:\Program Files\Java\jdk8u282-b08\bin
C:\npm\prefix
C:\Program Files\Microsoft SDKs\Azure\Azure Dev Spaces CLI
C:\Program Files\Microsoft SDKs\Azure\Azure Dev Spaces CLI\
C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin
C:\ProgramData\kind
C:\windows\system32
C:\windows
C:\windows\System32\Wbem
C:\windows\System32\WindowsPowerShell\v1.0\
C:\windows\System32\OpenSSH\
C:\ProgramData\Chocolatey\bin
C:\Program Files\Microsoft\Web Platform Installer\
C:\Program Files\Docker
C:\Program Files\PowerShell\7\
C:\Program Files\dotnet\
C:\Program Files\Microsoft SQL Server\130\Tools\Binn\
C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\
C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\
C:\Program Files (x86)\Microsoft SQL Server\110\DTS\Binn\
C:\Program Files (x86)\Microsoft SQL Server\120\DTS\Binn\
C:\Program Files (x86)\Microsoft SQL Server\130\DTS\Binn\
C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\
C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn\
C:\Program Files\nodejs\
C:\ProgramData\chocolatey\lib\pulumi\tools\Pulumi\bin
C:\ProgramData\chocolatey\lib\maven\apache-maven-3.6.3\bin
C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code
C:\Program Files\Microsoft SDKs\Service Fabric\Tools\ServiceFabricLocalClusterManager
C:\Program Files\OpenSSL\bin
C:\Strawberry\c\bin
C:\Strawberry\perl\site\bin
C:\Strawberry\perl\bin
C:\Program Files\Git\cmd
C:\Program Files\Git\mingw64\bin
C:\Program Files\Git\usr\bin
c:\tools\php
C:\Program Files (x86)\sbt\bin
C:\Program Files\TortoiseSVN\bin
C:\SeleniumWebDrivers\ChromeDriver\
C:\SeleniumWebDrivers\EdgeDriver\
C:\Program Files\CMake\bin
C:\Program Files\Amazon\AWSCLIV2\
C:\Program Files\Amazon\SessionManagerPlugin\bin\
C:\Program Files\Amazon\AWSSAMCLI\bin\
C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin
C:\Program Files (x86)\Microsoft BizTalk Server\

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look this up, but Strawberry is a perl environment for Windows: https://strawberryperl.com/. It's Program Files contain some mingw tools used to compile C programs

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea and implementation of PATH cleaning! Thank you very much! Please take a look at some my comments below.

.ci/test_r_package_windows.ps1 Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
.ci/test_r_package_windows.ps1 Outdated Show resolved Hide resolved
@@ -95,6 +113,10 @@ Write-Output "Installing Rtools"
./Rtools.exe /VERYSILENT /SUPPRESSMSGBOXES /DIR=$RTOOLS_INSTALL_PATH ; Check-Output $?
Write-Output "Done installing Rtools"

# wait for all Rtools files to be written
Write-Output "Sleeping to allow Rtools install to finish"
Start-Sleep -Seconds 60
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use Start-Process -FilePath Rtools.exe -NoNewWindow -Wait -ArgumentList ... above to not wait manually?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Seems I missed that:

I hit "refresh" in Windows Explorer several times and did not see mingw_64 show up until after 20 seconds or so had passed. So I think the install returned control to powershell before all the files were actually written, somehow 😱 .

Indeed, very interesting observation! 👍

Given that, please ignore my comment above about -Wait argument (it will not help). Instead, I believe we should do something like

while True:
    sleep(1)
    if exists("$RTOOLS_INSTALL_PATH/mingw_64"):
        break

instead of hardcoded sleep(60).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 I didn't know about -Wait! I don't write Powershell code anywhere else except this one script in this project, so I'm not very knowledgeable about it. Yes that seems a lot better than a sleep! Let me try that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you think that writing all files into mingw_64 takes some significant time and this is not just about delay in mingw_64 folder creation?

Copy link
Collaborator Author

@jameslamb jameslamb Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha we posted our responses at about the same time, I didn't see your second comment.

I don't think this while loop approach is exactly correct a reliable fix. The directory mingw_64 could exist without all of the other files and folders in it being populated yet. This is what I meant by this explanation in the PR description

know fixing a problem by just introducing a sleep isn't ideal, but I think it would be difficult to figure out the full list of files needed by ld.exe when compiling LightGBM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see... Could then please try -Wait workaround in RDP before merging this? I'm not an expert in PowerShell too, so maybe some magic will happen 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! I won't be able to test it until later tonight though, maybe 6-8 hours from now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, I forgot about r-windows/r-tools. I vaguely remember reading a blog post about it once. If that's how Jeroen does it, it's definitely a good sign.

Happy to say that switching to a -Wait did work! I could see that the Rtools install command didn't return control to powershell until both ming_32 and mingw_64 directories were created.

image

image

Remove-From-Path ".*Chocolatey.*"
Remove-From-Path ".*Git.*mingw64.*"
Remove-From-Path ".*msys64.*"
Remove-From-Path ".*rtools40.*"
Copy link
Collaborator

@StrikerRUS StrikerRUS Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about to remove this particular directory physically?
Looks like there could be conflicts during our own installation (but for R 4.x job)

$RTOOLS_INSTALL_PATH = "C:\rtools40"

Copy link
Collaborator Author

@jameslamb jameslamb Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had something like Remove-Item "C:\rtools40" -Recurse", but it was failing with an error saying something like "insufficient permission to remove this, must be an administrator or this may be a system file" (don't remember the exact error message).

Although, I was just looking at your work in RGF...maybe we could borrow your solution from there to get past that error? 😀

https://github.com/RGF-team/rgf/blob/ab328a01d3af809c0c1fd4122d8b70b9fb2e9757/.ci/r_tests_windows.ps1#L20

Remove-Item C:\rtools40 -Force -Recurse -ErrorAction Ignore

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so! This command takes some time (several seconds) but I think it won't be excess and can protect us from some other problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I don't know why, but I had to manually install qpdf for Windows R jobs (I chose pacman package manager because it's bundled with Rtools and one-line solution) to pass --as-cran checks for vignettes.
Maybe this info will help you to save some time on other projects or even here in LightGBM.
https://github.com/RGF-team/rgf/blob/ab328a01d3af809c0c1fd4122d8b70b9fb2e9757/.ci/r_tests_windows.ps1#L42

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Maybe we'll have to worry about qpdf on #3946 , actually, so that is timely advicce

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! Hope this PR will really fix that issue this time!

@jameslamb
Copy link
Collaborator Author

As soon as #4055 builds successfully, I'll merge that, then merge master into this branch, then merge this.

@jameslamb jameslamb merged commit 13680d8 into master Mar 10, 2021
@jameslamb jameslamb deleted the fix/r3.6-ci branch March 10, 2021 00:54
StrikerRUS added a commit that referenced this pull request Mar 25, 2021
* [docs]Add alt text on images

* Update docs/GPU-Windows.rst

Co-authored-by: James Lamb <[email protected]>

* Update docs/GPU-Windows.rst

Co-authored-by: James Lamb <[email protected]>

* Apply suggestions from code review

Co-authored-by: James Lamb <[email protected]>

* Apply suggestions from code review

Co-authored-by: James Lamb <[email protected]>

* Merge main branch commit updates (#1)

* [docs] Add alt text to image in Parameters-Tuning.rst (#4035)

* [docs] Add alt text to image in Parameters-Tuning.rst

Add alt text to Leaf-wise growth image, as part of #4028

* Update docs/Parameters-Tuning.rst

Co-authored-by: James Lamb <[email protected]>

Co-authored-by: James Lamb <[email protected]>

* [ci] [R-package] upgrade to R 4.0.4 in CI (#4042)

* [docs] update description of deterministic parameter (#4027)

* update description of deterministic parameter to require using with force_row_wise or force_col_wise

* Update include/LightGBM/config.h

Co-authored-by: Nikita Titov <[email protected]>

* update docs

Co-authored-by: Nikita Titov <[email protected]>

* [dask] Include support for init_score (#3950)

* include support for init_score

* use dataframe from init_score and test difference with and without init_score in local model

* revert refactoring

* initial docs. test between distributed models with and without init_score

* remove ranker from tests

* test value for root node and change docs

* comma

* re-include parametrize

* fix incorrect merge

* use single init_score and the booster_ attribute

* use np.float64 instead of float

* [ci] ignore untitle Jupyter notebooks in .gitignore (#4047)

* [ci] prevent getting incompatible dask and distributed versions (#4054)

* [ci] prevent getting incompatible dask and distributed versions

* Update .ci/test.sh

Co-authored-by: Nikita Titov <[email protected]>

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055)

* [ci] fix R CMD CHECK note about example timings (fixes #4049)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [ci] add CMake + R 3.6 test back (fixes #3469) (#4053)

* [ci] add CMake + R 3.6 test back (fixes #3469)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* Update .ci/test_r_package_windows.ps1

* -Wait and remove rtools40

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [dask] include multiclass-classification task in tests (#4048)

* include multiclass-classification task and task_to_model_factory dicts

* define centers coordinates. flatten init_scores within each partition for multiclass-classification

* include issue comment and fix linting error

* Update index.rst (#4029)

Add alt text to logo image

Co-authored-by: James Lamb <[email protected]>

* [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059)

* [dask] raise more informative error for duplicates in 'machines'

* uncomment

* avoid test failure

* Revert "avoid test failure"

This reverts commit 9442bdf.

* [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030)

* [dask] add tutorial documentation (fixes #3814, fixes #3838)

* add notes on saving the model

* quick start examples

* add examples

* fix timeouts in examples

* remove notebook

* fill out prediction section

* table of contents

* add line back

* linting

* isort

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* move examples under python-guide

* remove unused pickle import

Co-authored-by: Nikita Titov <[email protected]>

* set 'pending' commit status for R Solaris optional workflow (#4061)

* [docs] add Yu Shi to repo maintainers (#4060)

* Update FAQ.rst

* Update CODEOWNERS

* set is_linear_ to false when it is absent from the model file (fix #3778) (#4056)

* Add CMake option to enable sanitizers and build gtest (#3555)

* Add CMake option to enable sanitizer

* Set up gtest

* Address reviewer's feedback

* Address reviewer's feedback

* Update CMakeLists.txt

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: Nikita Titov <[email protected]>

* added type hint (#4070)

* [ci] run Dask examples on CI (#4064)

* Update Parallel-Learning-Guide.rst

* Update test.sh

* fix path

* address review comments

* [python-package] add type hints on Booster.set_network() (#4068)

* [python-package] add type hints on Booster.set_network()

* change behavior

* [python-package] Some mypy fixes (#3916)

* Some mypy fixes

* address James' comments

* Re-introduce pass in empty classes

* Update compat.py

Remove extra lines

* [dask] [ci] fix flaky network-setup test (#4071)

* [tests][dask] simplify code in Dask tests (#4075)

* simplify Dask tests code

* enable CI

* disable CI

* Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076)

This reverts commit 4e9c976.

* Fix parsing of non-finite values (#3942)

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

* remove duplicate functions from merge

* Fix parsing of non-finite values.  Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match.  No attempt to optimise string allocations in this implementation since it is usually rarely invoked.

* Dummy commit to trigger CI

* Also handle -nan in double parsing method

* Update include/LightGBM/utils/common.h

Remove trailing whitespace to pass linting tests

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>

* [dask] remove unused imports from typing (#4079)

* Range check for DCG position discount lookup (#4069)

* Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data.

* Change debug logging location so that we can print the data file name as well.

* Revert "Change debug logging location so that we can print the data file name as well."

This reverts commit 3981b34.

* Add data file name to debug logging.

* Move log line to a place where it is output even when query IDs are read from a separate file.

* Also add the out-of-range check to rank metrics.

* Perform check after number of queries is initialized.

* Update

* [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084)

* [ci] install additional LaTeX packages in R CI jobs

* update autoconf version

* bump upper limit on package size to 100

* [SWIG] Add streaming data support + cpp tests (#3997)

* [feature] Add ChunkedArray to SWIG

* Add ChunkedArray
* Add ChunkedArray_API_extensions.i
* Add SWIG class wrappers

* Address some review comments

* Fix linting issues

* Move test to tests/test_ChunkedArray_manually.cpp

* Add test note

* Move ChunkedArray to include/LightGBM/utils/

* Declare more explicit types of ChunkedArray in the SWIG API.

* Port ChunkedArray tests to googletest

* Please C++ linter

* Address StrikerRUS' review comments

* Update SWIG doc & disable ChunkedArray<int64_t>

* Use CHECK_EQ instead of assert

* Change include order (linting)

* Rename ChunkedArray -> chunked_array files

* Change header guards

* Address last comments from StrikerRUS

* store all CMake files in one place (#4087)

* v3.2.0 release (#3872)

* Update VERSION.txt

* update appveyor.yml and configure

* fix Appveyor builds

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>

* [ci] Bump version for development (#4094)

* Update .appveyor.yml

* Update cran-comments.md

* Update VERSION.txt

* update configure

Co-authored-by: James Lamb <[email protected]>

* [ci] fix flaky Azure Pipelines jobs (#4095)

* Update test.sh

* Update setup.sh

* Update .vsts-ci.yml

* Update test.sh

* Update setup.sh

* Update .vsts-ci.yml

* Update setup.sh

* Update setup.sh

Co-authored-by: Subham Agrawal <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: shiyu1994 <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: jmoralez <[email protected]>
Co-authored-by: marcelonieva7 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Deddy Jobson <[email protected]>
Co-authored-by: Alberto Ferreira <[email protected]>
Co-authored-by: mjmckp <[email protected]>
Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: ashok-ponnuswami-msft <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Subham Agrawal <[email protected]>
Co-authored-by: shiyu1994 <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: jmoralez <[email protected]>
Co-authored-by: marcelonieva7 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Deddy Jobson <[email protected]>
Co-authored-by: Alberto Ferreira <[email protected]>
Co-authored-by: mjmckp <[email protected]>
Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: ashok-ponnuswami-msft <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-package] [ci] mingw gcc often fails on CMake builds
2 participants