Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix apt repo issue for docker #3823

Merged
merged 7 commits into from
Oct 6, 2020
Merged

Fix apt repo issue for docker #3823

merged 7 commits into from
Oct 6, 2020

Conversation

ydcjeff
Copy link
Contributor

@ydcjeff ydcjeff commented Oct 3, 2020

What does this PR do?

Fixes # (issue)

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team October 3, 2020 17:33
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Oct 3, 2020

Docker builds relating with nightly will fail at the moment since pytorch has released 1.7 rc version.

@@ -34,7 +34,7 @@ SHELL ["/bin/bash", "-c"]

ENV PATH="$PATH:/root/.local/bin"

RUN apt-get update && apt-get install -y --no-install-recommends \
RUN apt-get clean && apt-get update && apt-get install -y --no-install-recommends \
Copy link
Member

@Borda Borda Oct 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange, you need update before install not clean..

Suggested change
RUN apt-get clean && apt-get update && apt-get install -y --no-install-recommends \
RUN apt-get update && apt-get install -y --no-install-recommends \

Copy link
Contributor Author

@ydcjeff ydcjeff Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it's CDN issue, some suggest clean and update
but let's wait and see if CDN update can solve this

NVIDIA/nvidia-docker#1392
NVIDIA/nvidia-docker#877
NVIDIA/nvidia-docker#1328

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the problem is in

Reading package lists...
E: Failed to fetch https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/Packages.gz  File has unexpected size (47871 != 49498). Mirror sync in progress? [IP: 152.195.19.142 443]
   Hashes of expected file:
    - Filesize:49498 [weak]
    - SHA256:332f3ee4e353b8a5e5a2bdd8fdbd47cf140c73822b82b328815f122e09e195a0
    - SHA1:4dc8ef9a3ee3c97b3c26d46e07fdd83997e6880b [weak]
    - MD5Sum:bbff3b9c3462257479d72521ee78ec29 [weak]
   Release file created at: Wed, 23 Sep 2020 22:09:13 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.

not the update step...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comes from this, I think

Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Packages [49.5 kB]
Err:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Packages
  File has unexpected size (47871 != 49498). Mirror sync in progress? [IP: 152.195.19.142 443]
  Hashes of expected file:
   - Filesize:49498 [weak]
   - SHA256:332f3ee4e353b8a5e5a2bdd8fdbd47cf140c73822b82b328815f122e09e195a0
   - SHA1:4dc8ef9a3ee3c97b3c26d46e07fdd83997e6880b [weak]
   - MD5Sum:bbff3b9c3462257479d72521ee78ec29 [weak]
  Release file created at: Wed, 23 Sep 2020 22:09:13 +0000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so skip of the update is just hiding the NVIDIA package issue not solving it.. :[

@mergify mergify bot requested a review from a team October 3, 2020 18:28
@@ -28,7 +28,7 @@ ENV CONDA_ENV=lightning
# show system inforation
RUN lsb_release -a && cat /etc/*-release

RUN apt-get update && \
RUN apt-get clean && apt-get update && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN apt-get clean && apt-get update && \
RUN apt-get update && \

@mergify mergify bot requested a review from a team October 4, 2020 08:07
@@ -34,7 +34,7 @@ SHELL ["/bin/bash", "-c"]

ENV PATH="$PATH:/root/.local/bin"

RUN apt-get update && apt-get install -y --no-install-recommends \
RUN apt-get clean && apt-get update && apt-get install -y --no-install-recommends \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the problem is in

Reading package lists...
E: Failed to fetch https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/Packages.gz  File has unexpected size (47871 != 49498). Mirror sync in progress? [IP: 152.195.19.142 443]
   Hashes of expected file:
    - Filesize:49498 [weak]
    - SHA256:332f3ee4e353b8a5e5a2bdd8fdbd47cf140c73822b82b328815f122e09e195a0
    - SHA1:4dc8ef9a3ee3c97b3c26d46e07fdd83997e6880b [weak]
    - MD5Sum:bbff3b9c3462257479d72521ee78ec29 [weak]
   Release file created at: Wed, 23 Sep 2020 22:09:13 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.

not the update step...

@mergify mergify bot requested a review from a team October 4, 2020 08:08
@Borda Borda added 3rd party Related to a 3rd-party ci Continuous Integration labels Oct 4, 2020
@codecov
Copy link

codecov bot commented Oct 4, 2020

Codecov Report

Merging #3823 into master will decrease coverage by 4%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #3823    +/-   ##
=======================================
- Coverage      84%     80%    -4%     
=======================================
  Files         111     111            
  Lines        8793    9379   +586     
=======================================
+ Hits         7362    7515   +153     
- Misses       1431    1864   +433     

@Borda Borda added the priority: 0 High priority task label Oct 4, 2020
@mergify mergify bot requested a review from a team October 4, 2020 08:57
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Oct 4, 2020

Is it okay to use CUDA 11?

@mergify mergify bot requested a review from a team October 5, 2020 07:50
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Oct 5, 2020

Downgrading to ubuntu 16.04 works fine

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls, no 5 years old OS

@mergify mergify bot requested a review from a team October 5, 2020 14:11
@williamFalcon williamFalcon merged commit 90929fa into Lightning-AI:master Oct 6, 2020
@ydcjeff ydcjeff deleted the docker/fix-nvidia branch October 6, 2020 04:08
# FROM nvidia/cuda:${CUDA_VERSION}-devel
# FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04
# FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu18.04
FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu16.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is bad, this is too. old OS version, this is the almost unsupported OS (ends in April 2021)

@Borda Borda removed the priority: 0 High priority task label Oct 6, 2020
@mergify mergify bot requested a review from a team October 6, 2020 08:50
@Borda Borda mentioned this pull request Oct 6, 2020
@Borda Borda added this to the 0.10.0 milestone Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party ci Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants