Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework docker development environment #2859

Merged

Conversation

lunkwill42
Copy link
Member

@lunkwill42 lunkwill42 commented Mar 4, 2024

This PR aims to do several things to the docker-compose based development environment:

  • Removed redundant build commands that make things take a long time
  • Switch to installing everything as an unprivileged user into a virtualenv inside the container. Running fewer things as root is better, and we can take advantage of a single pip cache for reduced install times. Also, at some point we will need to move to Debian Bookworm, which will deny pip from installing system-level packages, so a virtualenv will be necessary anyway.
  • Maps the unprivileged nav user to a UID/GID pair at build time, so we don't need to have strange entrypoint magic to dynamically switch the nav users uid/gid every time the container starts.
  • Adds cache mounts to Dockerfile in an attempt to speed up image rebuild times.

Docs are updated. Essential difference for developers is that a UID/GID value now needs to be passed when the container images are built, but this can be automated and reduced to a call to make docker.

Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 56.69%. Comparing base (9cfd877) to head (59185f3).
Report is 7 commits behind head on master.

Files Patch % Lines
python/nav/startstop.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2859   +/-   ##
=======================================
  Coverage   56.69%   56.69%           
=======================================
  Files         602      602           
  Lines       43971    43971           
=======================================
  Hits        24931    24931           
  Misses      19040    19040           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Mar 4, 2024

Test results

     12 files       12 suites   11m 53s ⏱️
3 320 tests 3 320 ✔️ 0 💤 0
9 435 runs  9 435 ✔️ 0 💤 0

Results for commit 59185f3.

♻️ This comment has been updated with latest results.

@lunkwill42 lunkwill42 force-pushed the feature/docker-python-improvements branch from 2d14d19 to ff3e26b Compare March 4, 2024 14:58
@lunkwill42 lunkwill42 requested a review from hmpf March 4, 2024 15:01
@lunkwill42 lunkwill42 force-pushed the feature/docker-python-improvements branch from ff3e26b to c4a0510 Compare March 4, 2024 15:12
Copy link

sonarcloud bot commented Mar 4, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@lunkwill42 lunkwill42 self-assigned this Mar 4, 2024
@lunkwill42 lunkwill42 marked this pull request as ready for review March 4, 2024 16:18
Copy link
Contributor

@podliashanyk podliashanyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review, see comments and suggestion below

Dockerfile Show resolved Hide resolved
doc/hacking/using-docker.rst Outdated Show resolved Hide resolved
podliashanyk

This comment was marked as resolved.

Copy link
Contributor

@podliashanyk podliashanyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in Netmap.
When I go to Netmap tool on web, an alert pops up. There is a 500 Internal Server Error on GET .../netmap/grapgh/layer2/....
Let me know if you can reproduce it. If not, I will provide a more thorough bug report.
Screenshot 2024-03-06 at 09 49 32

@lunkwill42
Copy link
Member Author

Let me know if you can reproduce it. If not, I will provide a more thorough bug report.

Not reproducible for me.

Copy link
Contributor

@hmpf hmpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try it out and check that netmap-link.

Dockerfile Show resolved Hide resolved
docker-compose.yml Show resolved Hide resolved

# Ensure latest NAV code is built
mydir=$(dirname $0)
"$mydir/build.sh"
cd /source


django-admin check && exec django-admin runserver --insecure 0.0.0.0:80
django-admin check && exec django-admin runserver --insecure 0.0.0.0:8080
Copy link
Contributor

@hmpf hmpf Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline after? The side-by-side diff in github ends with a red stop-sign for the new code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bah humbug #blames-editor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't pre-commit handle this for us though... lesigh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably... should be cleaned up in the latest push, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-commit doesn't handle it for us since we don't have the hook end-of-file-fixer added (like we have in Argus). Should I make a PR to add that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do!

@podliashanyk
Copy link
Contributor

Let me know if you can reproduce it. If not, I will provide a more thorough bug report.

Not reproducible for me.

Describe the bug

With changes in this PR, accessing the Netmap tool in the web leads to an alert pop-up and an error message in the browser console. No graph is loaded.

To Reproduce

Steps to reproduce the behavior:

  1. Check out c4a0510. Run make docker and then docker compose up.
  2. Go to /netmap
  3. See an alert pop-up with message "Error loading graph, please try to reload the page".
  4. Open Console in browser dev tools and see an error message "GET http://localhost/netmap/graph/layer2/3/?_=1709804282265 500 (Internal Server Error)"
  5. See that no graph has been loaded

Expected behavior

Netmap tool in web works without prompting alerts or error messages, and the graph is loaded properly with all the nodes.

Screenshots

Screenshot 2024-03-06 at 09 49 32

Tracebacks

Logs from the web container:

2024-03-07 10:43:21 Internal Server Error: /netmap/graph/layer2/3/
2024-03-07 10:43:21 Traceback (most recent call last):
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
2024-03-07 10:43:21     response = get_response(request)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
2024-03-07 10:43:21     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
2024-03-07 10:43:21     return view_func(*args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
2024-03-07 10:43:21     return self.dispatch(request, *args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
2024-03-07 10:43:21     response = self.handle_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
2024-03-07 10:43:21     self.raise_uncaught_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
2024-03-07 10:43:21     raise exc
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
2024-03-07 10:43:21     response = handler(request, *args, **kwargs)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/api.py", line 195, in get
2024-03-07 10:43:21     return Response(get_topology_graph(layer, load_traffic, view))
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/graph.py", line 51, in get_topology_graph
2024-03-07 10:43:21     return _json_layer2(load_traffic, view=view)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/cache.py", line 70, in get_traffic
2024-03-07 10:43:21     cached = cache.get(cache_key)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/cache/backends/filebased.py", line 34, in get
2024-03-07 10:43:21     with open(fname, 'rb') as f:
2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'
2024-03-07 10:43:21 [Thu Mar 07 10:43:21 2024] [ERROR] [pid=63 django.request] Internal Server Error: /netmap/graph/layer2/3/
2024-03-07 10:43:21 Traceback (most recent call last):
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
2024-03-07 10:43:21     response = get_response(request)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
2024-03-07 10:43:21     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
2024-03-07 10:43:21     return view_func(*args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
2024-03-07 10:43:21     return self.dispatch(request, *args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
2024-03-07 10:43:21     response = self.handle_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
2024-03-07 10:43:21     self.raise_uncaught_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
2024-03-07 10:43:21     raise exc
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
2024-03-07 10:43:21     response = handler(request, *args, **kwargs)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/api.py", line 195, in get
2024-03-07 10:43:21     return Response(get_topology_graph(layer, load_traffic, view))
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/graph.py", line 51, in get_topology_graph
2024-03-07 10:43:21     return _json_layer2(load_traffic, view=view)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/cache.py", line 70, in get_traffic
2024-03-07 10:43:21     cached = cache.get(cache_key)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/cache/backends/filebased.py", line 34, in get
2024-03-07 10:43:21     with open(fname, 'rb') as f:
2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'
2024-03-07 10:43:21 [07/Mar/2024 10:43:21] "GET /netmap/graph/layer2/3/?_=1709804601074 HTTP/1.1" 500 105474

Environment (please complete the following information):

  • OS: macOS Sonoma 14.3.1
  • Browser: Firefox, Chrome, Safari, Vivaldi

Additional context

MacBook with Apple chip M2.

@lunkwill42
Copy link
Member Author

2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'

Still not reproducible for me But, you could be re-using an outdated cache mount, @podliashanyk ? Have you completely destroyed your docker compose environment and removed the associated volumes? Specifically, nav_nav_cache needs to go:

docker volume rm nav_nav_cache

@lunkwill42
Copy link
Member Author

Specifically, these volumes are defined for re-use in docker-compose.yml. It sometimes behooves one to remove them entirely when rebuilding from scratch:

nav/docker-compose.yml

Lines 107 to 111 in c4a0510

volumes:
nav_config:
driver: local
nav_cache:
driver: local

@hmpf
Copy link
Contributor

hmpf commented Mar 7, 2024

docker volume rm nav_nav_cache

Sounds like a good candidate for a "make nuke"-rule.

@johannaengland
Copy link
Contributor

johannaengland commented Mar 7, 2024

I also cannot reproduce the netmap bug and the /watchdog/ page also loads normally for me

@podliashanyk
Copy link
Contributor

podliashanyk commented Mar 7, 2024

2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'

Still not reproducible for me But, you could be re-using an outdated cache mount, @podliashanyk ? Have you completely destroyed your docker compose environment and removed the associated volumes? Specifically, nav_nav_cache needs to go:

docker volume rm nav_nav_cache

Just in case I have also docker pruned everything related to NAV containers, and rebuilt everything from scratch. The bug still appears. I noticed that the same happens with Watch Dog tool, except that /watchdog/ page doesn't load at all.

@lunkwill42
Copy link
Member Author

Sounds like a good candidate for a "make nuke"-rule.

Sure, if you can find a volume nuke command that works in all cases. docker volume rm nav_nav_cache only works if you have exactly 1 NAV docker environment, and your checked out copy of NAV resides in a directory called nav. I have previously looked for docker compose-commands that would do the same but guarantee that it only works on objects related to the current context - I have been unsuccessful (blind, maybe?) thus far. Most importantly, we don't want make commands that potentially nuke unrelated things.

@hmpf
Copy link
Contributor

hmpf commented Mar 7, 2024

My make nuke involves docker system prune =) Nuke it from orbit, it's the only way to be sure!

The build step used to be there for "caching" purposes, and for the
build files to be stored with owner privileges for the `nav` user,
rather than dropping files owned by root into the mounted `/source`
directory.

Using the `build` module does not give this effect, since it does
everything inside a separate virtual environment, meaning the subsequent
`pip install` command receives no caching benefit.  In fact, we just
end up with two redundant (and potentially slow) builds of NAV and its
dependencies.
This enables us to better detect which build steps are taking too
much time.
This adds Docker image build time cache mounts for the pip cache
directory, in an effort to reduce build times for the docker-based
dev environment.
When docker image rebuilds are needed, using cache mounts can improve
the rebuild speed, by ensuring APT's download cache is persistent
between rebuilds.
Prep for installing everything NAV-related as an unprivileged user
inside the development container.  This also ensures installating Python
stuff will keep working once the image transitions to Debian Bookworm,
which will not allow pip to install packages at the system level.
These now all need to be able to run as an unprivileged user (but they
can gain root using sudo, if need be)
Since the web process now runs as an unprivileged user inside the
container, it cannot listen to port 80.
Simplifies passing of the current uid/gid combo to `docker compose build`
@lunkwill42 lunkwill42 force-pushed the feature/docker-python-improvements branch from c4a0510 to 4f87336 Compare March 7, 2024 13:22
@lunkwill42
Copy link
Member Author

My make nuke involves docker system prune =) Nuke it from orbit, it's the only way to be sure!

Yeah, I don't think we should provide that kind of shoot-yourself-in-the-foot service to just anyone. If you know how to use it, you can do it yourself 😆

@lunkwill42
Copy link
Member Author

I just pushed updated docs and a slightly altered method for getting the UID/GID into the images - which hopefully reduces the hassle.

Also, I realized that the nuke command you may be looking for is docker compose down --volumes

For the new Docker image to work properly, the PATH variable needs
to propagate through user changes performed by `nav start` commands.
There were, however, two barriers:  First, su is used to drop privileges
before starting nav daemons, but this resets the PATH variable to
the ones defined in `/etc/login.defs`.  Secondly, `nav start` uses
a `-` to tell `su` to use a login shell, in which case the PATH is still
reset.  The login shell option was introduced by a fix for Uninett#2218, but
it's not clear that the use or non-use of a login shell was the actual
problem solved, it was just that the order of the arguments needed to
be portable.

Tagged Uninett#2218 just in case this turns into another BSD regression.
Makefile Show resolved Hide resolved
Copy link
Contributor

@hmpf hmpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run a manual test also.


Rebuilding the NAV code from scratch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A complete rebuild of the NAV code can be initiated by::

docker-compose restart nav
docker compose restart nav
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably necessary to remind ppl about upgrading docker somewhere..

@hmpf
Copy link
Contributor

hmpf commented Mar 8, 2024

Bug in Netmap. When I go to Netmap tool on web, an alert pops up. There is a 500 Internal Server Error on GET .../netmap/grapgh/layer2/....

Fresh install of NAV, nothing added to seeddb.

If I visit /netmap/ I get an empty graph, since seeddb is empty.

If I visit /netmap/graph/layer2/ I get the friendly DRF view of the API and no errors.

@hmpf hmpf self-requested a review March 8, 2024 09:22
Copy link
Contributor

@podliashanyk podliashanyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the reworked docs a lot! 👏 Also great work with considerably cutting down time of docker builds and container start ups 🙌

doc/hacking/using-docker.rst Show resolved Hide resolved
@podliashanyk
Copy link
Contributor

Let me know if you can reproduce it. If not, I will provide a more thorough bug report.

Not reproducible for me.

Describe the bug

With changes in this PR, accessing the Netmap tool in the web leads to an alert pop-up and an error message in the browser console. No graph is loaded.

...

Environment (please complete the following information):

  • OS: macOS Sonoma 14.3.1
  • Browser: Firefox, Chrome, Safari, Vivaldi

Additional context

MacBook with Apple chip M2.

Resolved in ffa18e2 (co-authored with @lunkwill42)

@lunkwill42 lunkwill42 merged commit 43909ae into Uninett:master Apr 23, 2024
10 of 11 checks passed
@lunkwill42 lunkwill42 deleted the feature/docker-python-improvements branch April 23, 2024 09:00
lunkwill42 added a commit that referenced this pull request Apr 30, 2024
After the Docker development container was re-engineered (in #2859) to
run as a non-root user, this helper script stopped working, as it now
needs sudo to run the nav start/stop command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants