Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize and deduplicate Docker configurations #471

Open
giohappy opened this issue Aug 25, 2023 · 23 comments
Open

Reorganize and deduplicate Docker configurations #471

giohappy opened this issue Aug 25, 2023 · 23 comments
Assignees

Comments

@giohappy
Copy link
Contributor

giohappy commented Aug 25, 2023

Context

Problems

  • We have duplicated configurations
  • Building and publishing from a single place (geonode-project's own Docker configurations) is convenient BUT the versioning of images would be impossible. This is already happening for Geoserver.
  • For most cases, there's no need to have local builds. This can also be problematic for bug tracking and troubleshooting within projects.

Solutions

Solution A

  • Restore all the specific repos for the Docker images, including Letsencrypt, Nginx, and Docker, and remove the docker configs from geonode and geonode-project
    • These will be the repos from which the official images will be tracked and built
  • Adopt multi-stage builds inside projects if a custom image is needed.
  • Pros:
    • Cleaner management
    • Versioning
  • Cons:
    • More work for its maintenance
    • The solution for building and deploying custom images in projects will be in the end of the final developer

Solution B

  • centralize all the configurations inside the geonode-project but make its docker-compose.yml use the official images by default
  • deprecate the specific Docker repos with the downside of loosing versioning for the single images
  • For this we already have a draft PR
  • Pros:
    • Easier to manage
    • A Docker configuration already available for a custom builds
  • Cons:
    • No versioning for images

Opinions?

@gannebamm
Copy link

@ridoo Could you please take a look at those two options and comment on them from a dev/ops perspective

@ridoo
Copy link

ridoo commented Aug 29, 2023

@gannebamm yes, I will find some time for this by tomorrow.

@ridoo
Copy link

ridoo commented Aug 30, 2023

We discussed this internally and we very welcome the approach to reorganize the docker setup and to clean up the repositories in order to improve maintainability but also to simplify things and to make development more transparent!

I would weigh our docker experience higher than Django. So we created a PR to propose a perspective on how to setup a custom GeoNode project from the Docker view. However, this PR might look quite radical on a first view as it leaves out the project generation from a template completely. This may be the common way in the Django world, but overlaps with concepts from the Docker world.

I have to admit, however, that there may be very good reasons having such a Django template (at least for people coming from the Django world who are more used to this) which I am not aware of. Also, we left out the option to start development locally using paver, just to make our proposal as pure as possible.

However, as you are providing Docker base images (which are also used in the Ansible and Vagrant setups), we do not see the need for having such a template, anymore. You could restructure this repository and provide ways to intercept customizations the Docker-way. This also would ease the maintainability of customized geonode projects as discussed on the mailing list (see here and here) or within this discussion on how to improve upgrading to newer version.

So, how we see the geonode-project right now (you already mentioned some of these points):

  • The template has too much logic including duplicated code from geonode upstream
  • The repository takes too many deployment options (which seems not to be maintained)
  • Intransparent source of the Docker images

What we propose:

  • Minimal setup using plain docker-compose using base images
  • Just use Docker setup (or at least separate Vagrant/Ansible/... deployment options to ensure a working setup configuration for a once tagged version)
  • Use Docker concepts to make changes on your base images
  • Re-work Docker volumes
    • We do not see the need for the nginx image to store configuration in a volume
    • Scrutinize the necessity of using data containers (e.g. geoserver_data)
  • Be transparent on building images for each component
    • Document the source from where the image has been built (on hub.docker.com and a corresponding README)
    • Use Github actions .. please build them differently (the link is meant as example .. image tagging is not what we propose here .. see next point on this)
    • maybe the given links can serve as starter for further discussions
  • For components, use appropriate tags for to indicate
    • what GeoNode version it was built for
    • does not enforce to rebuild the whole stack when just one image has to be rebuilt
  • Archive repositories to make clear which ones are not maintained anymore

All this would improve maintaining a geonode project. However, it would require some more things to smoothen the path

  • No re-build of already tagged images (like pointed out on the issue about geonode login
  • Base images built with good environment defaults which would leave the .env file for overrides (which might make the override_dev_env.sample unnecessary)

I hope this proposal is not too radical, but as said before, we just wanted to contribute a non-Django perspective using Docker. Looking forward on your feedback.

@giohappy
Copy link
Contributor Author

giohappy commented Aug 31, 2023

@ridoo thanks for your proposal. I went through your drastic PR.

  • I guess you're in favor of using dedicated repos for each image
  • I like the idea of using the Dockerfiles with just the FROM command. They could also be used as a base for multi-stage builds

However, as you are providing Docker base images (which are also used in the Ansible and Vagrant setups), we do not see the need for having such a template, anymore. You could restructure this repository and provide ways to intercept customizations the Docker-way

There are a lot of cases where a Docker deployment is not an option, so when discussing the architecture of this repo this must be kept in mind.

Regarding the template project, the purpose of {{project-name}} was to automate the assignment of a name to your sample_app, and related settings, Dockerfiles, without having to create or rename it manually.
I agree that we should minimize the duplicated stuff, and remove what is not strictly required for the customization of a project. However, a utility that bootstraps the sample app would be useful. It could create it instead of rendering a template though,

The following parts are particularly useful:

  • create / give a name to the sample app. This was the purpose of {{project.name}}
  • configure some settings, in particular these
  • the app should be able to override the geonode.settings, which AFAICS is not covered in your layout at the moment
  • requirements.txt The app should be able to install it's own requirements. Here the app would need to extend the Dockerfile to add and run its own requirements

We could use something like pipx for the bootstrap.

Re-work Docker volumes

  • Nginx: I agree if you ship the SSL certificates inside the image (with a customized Dockerfile). This change could be problematic for the upgrade of all the existing projects. It would mean having to customize the Dockerfile for each of them. Of course, it's doable, but...
  • With data containers you mean the data-dir-conf image, right? Well, it's just used to bootstrap the default data directory, without having to ship it inside the Geoserver image. Alternative solutions?

For components, use appropriate tags

  • We have started with the geonode images. For the others, I was considering tagging the code and the images with the version of the upstream software + an incremental tag for the image code. E.g. geoserver-2.23.0-0.1. Opinions?

Base images built with good environment defaults

As you know there are several env variables that depend on the deployment, in particular the internal and external hostnames, ports, and whether you have HTTPS or not, But yes, we can minimize the .env to what is strictly required.

@ridoo
Copy link

ridoo commented Aug 31, 2023

Regarding the template project, the purpose of {{project-name}} was to automate the assignment of a name to your sample_app, and related settings, Dockerfiles, without having to create or rename it manually.

Yes, actually, what we propose is not a template project anymore. So there is no replacement. The structure represents more or less a project skeleton to start from. Just a sample setup how to structure your Docker develop which directly builds upon the upstream (images) and without having to bootstrap anything first.

You are also right, that Docker is not for everything! So maybe, we could separate two things here:

  1. How to start the Docker way?
  2. How to start the Django way?

In the end, we are all free to set up our own customized project structure of a GeoNode project. However, our proposal tries to tie the following: a) staying as close to the upstream as possible (without the hassle to backport template specific changes to the generated project), b) getting rid of all the duplicated bootstrap code and logic (you have to maintain) to generate the project (and decouple it from the upstream template), c) focus on Docker concepts as the project actually produces Docker images.

Let me answer/comment on your feedback:

  • configure some settings, in particular these

You could add these in the settings.py provided by the proposal (see below).

  • the app should be able to override the geonode.settings, which AFAICS is not covered in your layout at the moment

Well, in fact there is an override mechanism: We do load geonode.settings in the provided settings.py. It is mounted into the django/celery containers. To take effect, we set this settings.py via DJANGO_SETTINGS_MODULE environment variable. This way we are able to override things without to change anything in the regular set up.

  • requirements.txt The app should be able to install it's own requirements. Here the app would need to write it's own entrypoint.sh which overrides the one from GeoNode image...

For what use cases adding more Docker RUN commands to the docker/geonode/Dockerfile or even mounting a custom requirements.txt would not be enough? I am open to think about that a bit deeper.

  • Nginx: I agree if you ship the SSL certificates inside the image (with a customized Dockerfile). This change could be problematic for the upgrade of all the existing projects. It would mean having to customize the Dockerfile for each of them. Of course, it's doable, but...

Hmm, there are two nginx-related volumes: a) taking nginx-conf and b) taking the certificates. We do not see any advantages to store the nginx-conf as volume. Maybe it is just me/us, but it seems unusual when re-configuring the container that one has to delete the volume instead of just re-creating the container. But this might be a personal view on that (unless it would mark a volume and related config unnecessary).

  • With data containers you mean the data-dir-conf image, right? Well, it's just used to bootstrap the default data directory, without having to ship it inside the Geoserver image. Alternative solutions?

Well, this depends on the use cases. We were always re-using what has been prepared in the geonode/geoserver image and for now did not bother. For sure, you have a broader view on the needed flexibility on that configuration. Anyway, I would guess, that having a good default packaged in the image would make things easier to maintain. When using the base image, you can choose to make adjustments via volume mounts as you like.

  • We have started with the geonode images. For the others, I was considering tagging the code and the images with the version of the upstream software + an incremental tag for the image code. E.g. geoserver-2.23.0-0.1. Opinions?

My personal preference would be, to directly see which image is meant to be deployed next to a GeoNode version. Taking GeoNode version 4.1.2 as an example: Tag the GeoServer image with 4.1.2-2.23.0. This way, I can see the image belongs to GeoNode 4.1.2 and builds upon GeoServer 2.23.0 (to be strict, it is not the original GeoServer, but hey, let's not make things unnecessarily complex). Later you could build another GeoServer image tagged with version 4.1.2-2.23.2 and make transparent this image includes GeoServer 2.23.2 and is meant to be used in GeoNode 4.1.2.

  • As you know there are several env variables that depend on the deployment, in particular the internal and external hostnames, ports, and whether you have HTTPS or not, But yes, we can minimize the .env to what is strictly required.

Yes, I agree. Picking sensible defaults can be tricky, but you actually do so already in the settings.py. However, there are variables which do have a default value in the settings.py but still there are locations which read them from environment (in entrypoint.sh or task.py for example). You could solve that by just declaring them with a default in the Dockerfile. However, there are A LOT variable, so not 100% sure if you want to put everything there.

EDIT

Forgot to answer on this completely:

I guess you're in favor of using dedicated repos for each image

Actually the opposite :). We would favor to see the Docker image configurations in the GeoNode core repository. Not sure if this somehow interferes with image tagging and the requirement to not override once tagged images in the registry. However, I think it makes sense to put the whole docker setup in a docker directory.

@gannebamm
Copy link

@mwallschlaeger, does this restructure have an impact on your Kubernetes rework?

@gannebamm
Copy link

Just some quick feedback from me:

My personal preference would be, to directly see which image is meant to be deployed next to a GeoNode version. Taking GeoNode version 4.1.2 as an example: Tag the GeoServer image with 4.1.2-2.23.0. This way, I can see the image belongs to GeoNode 4.1.2 and builds upon GeoServer 2.23.0 (to be strict, it is not the original GeoServer, but hey, let's not make things unnecessarily complex). Later you could build another GeoServer image tagged with version 4.1.2-2.23.2 and make transparent this image includes GeoServer 2.23.2 and is meant to be used in GeoNode 4.1.2.

I like that tagging semantic.

We do not see any advantages to store the nginx-conf as volume.

Thünen also got rid of the nginx-conf volume since it did not help us but only created confusion.

requirements.txt The app should be able to install it's own requirements. Here the app would need to write it's own entrypoint.sh which overrides the one from GeoNode image...

For what use cases adding more Docker RUN commands to the docker/geonode/Dockerfile or even mounting a custom requirements.txt would not be enough? I am open to think about that a bit deeper.

I think extending the Dockerfile should work for our use cases 🤔

@giohappy
Copy link
Contributor Author

@ridoo thx for the clarification and your comments. I think we're converging.

Well, in fact there is an override mechanism: We do load geonode.settings in the provided settings.py. It is mounted into the django/celery containers. To take effect, we set this settings.py via DJANGO_SETTINGS_MODULE environment variable. This way we are able to override things without to change anything in the regular set up.

I didn't notice it was mounted inside docker-compose.yml. In that case it looks ok to me. I would only rename it to local_settings.py which is more typical in Django lingo.

For what use cases adding more Docker RUN commands to the docker/geonode/Dockerfile or even mounting a custom requirements.txt would not be enough? I am open to think about that a bit deeper.

I would add a requirements.txt, for the non-Docker case, as we have now.
We use the custom requirements.txt a lot in our projects. It would be much quicker if it was mounted (overriding the default requirements.txt) by default inside the geonode Dockerfile.

Hmm, there are two nginx-related volumes: a) taking nginx-conf and b) taking the certificates. We do not see any advantages to store the nginx-conf as volume. Maybe it is just me/us, but it seems unusual when re-configuring the container that one has to delete the volume instead of just re-creating the container. But this might be a personal view on that (unless it would mark a volume and related config unnecessary).

Oh, ok. Yes makes sense!

Well, this depends on the use cases. We were always re-using what has been prepared in the geonode/geoserver image and for now did not bother.

So you would favor putting the data dir inside the GeoServer image and letting it create the volume? I'm open to it. We need to be very careful not to touch an already existing volume when the container is run.

My personal preference would be, to directly see which image is meant to be deployed next to a GeoNode version. Taking GeoNode version 4.1.2 as an example: Tag the GeoServer image with 4.1.2-2.23.0. This way, I can see the image belongs to GeoNode 4.1.2 and builds upon GeoServer 2.23.0 (to be strict, it is not the original GeoServer, but hey, let's not make things unnecessarily complex). Later you could build another GeoServer image tagged with version 4.1.2-2.23.2 and make transparent this image includes GeoServer 2.23.2 and is meant to be used in GeoNode 4.1.2.

This was my initial idea, but we have situations where the image must be rebuilt (a fix, or whatever) for the same component and/or GeoNode version. For those cases, you would republish the image with the same tag.
In any any case I would reverse the versions: geonode/geoserver:2.23.0-4.1.2, since the first version natually binds to the geoserver label

Actually the opposite :). We would favor to see the Docker image configurations in the GeoNode core repository. Not sure if this somehow interferes with image tagging and the requirement to not override once tagged images in the registry. However, I think it makes sense to put the whole docker setup in a docker directory.

Let's we have this:

  • we found a bug inside the Nginx image for GeoNode 4.1.2
  • we fix it inside the geonode repo on branch 4.1.x:
    • should we release a new version of GeoNode (4.1.3) to include the fix the Nginx image?
    • shall we tag again the fixed image as nginx:4.1.2-1.25.1 and republish it?

@ridoo
Copy link

ridoo commented Aug 31, 2023

thx for the clarification and your comments. I think we're converging.

Good to read that!

I didn't notice it was mounted inside docker-compose.yml. In that case it looks ok to me. I would only rename it to local_settings.py which is more typical in Django lingo.

Naming should not be an issue. I am fine with that. But for any case: the PR is just a proposal to help to clarify our discussion.

I would add a requirements.txt, for the non-Docker case, as we have now.

What would be the non-docker case? I do see the use case of a non-docker build (starting all services as non-containers), but would this have to reside along with the docker setup? I always thought the geonode-project as some customization structure which a user wants to build, run and develop in Docker context anyway. So far, I never started a non-docker setup to be honest. For debugging I would tend and prefer to use a devcontainer config setup anyway (as you do already in GeoNode core).

We use the custom requirements.txt a lot in our projects. It would be much quicker if it was mounted (overriding the default requirements.txt) by default inside the geonode Dockerfile.

Yes, you could do so: mount your custom requirements_customized.txt and fire RUN pip install -r requirements_customized.txt to re-install packages already installed in the base image. I chose a different name to avoid overriding the original requirements.txt in the container. This should be possible from the customized ./docker/geonode/Dockerfile.

However, if this turns out to be practical and useful would have to be tested and evaluated from the different perspectives we all have naturally.

So you would favor putting the data dir inside the GeoServer image and letting it create the volume? I'm open to it. We need to be very careful not to touch an already existing volume when the container is run.

You always have to take care before deleting volumes which may take more important data than "just" configuration (maybe a reason more to get rid of not necessary volumes). Anyway, it becomes tricky here, as the way how geoserver_data dir is prepared right now is (as you mentioned in another ticket) convoluted. But I like that, so let's take the ride for version 2.23.0:

  • data-docker Repository
    • The origin geoserver data image downloads a zip file containing the prepared data dir (not including everything for GeoNode context)
    • Builds an image just by unzip and copying the prepared directory structure to a well known data directory
    • This image already contains most prepared data (but not all) for a /geoserver_data directory
  • geonode-project
    • geonode-project configures the data dir image in its docker-compose.yml
    • It puts a volume mount on the /geoserver_data and does a lot of foo magic using the environment configuration to configure settings in a persistent volume (i.e. the /geoserver_data directory)
    • It also adds a geofence template which seems to be essential to let GeoNode and GeoServer work together. If this workflow should stay as it is, this file could be part of the prepared directory structure at least.

I see the need to configure this data directory from the outside which is not ideal (container config vs. persisted config), but I fear this is the way GeoServer has to be configured at the moment (please correct me, if I anyone has other ideas here). However, I do not see the need to "hide" this directory from the repository as it is part of the overall configuration, right? Just include it next to the ./docker/geoserver/Dockerfile (where the base image is built).

  • In any any case I would reverse the versions: geonode/geoserver:2.23.0-4.1.2, since the first version natually binds to the geoserver label

My preference would be to place those versions first, which get changed less often. But I get your point and will not put personal preferences at front.

Let's we have this:

  • we found a bug inside the Nginx image for GeoNode 4.1.2
  • we fix it inside the geonode repo on branch 4.1.x:
    • should we release a new version of GeoNode (4.1.3) to include the fix the Nginx image?
    • shall we tag again the fixed image as nginx:4.1.2-1.25.1 and republish it?

Well, I do not vote for re-releasing the whole stack when a component changes. In addition, it would not make sense to include the component version (you could just use 4.1.2 for all components then). However, you are completely right. This would have to be resolved and it should be clear how tagging is done before throwing everything together. Let me think about this further.

@mwallschlaeger
Copy link
Member

hey i just went through this thread and also wanna present my perspective on this, even i have just minimal experiences with geonode-project, but at least docker knowledge.

I would disagree on your opinion for the image tags like: geonode/geoserver:2.23.0-4.1.2 as it seems very uncommon for me. In my opinion the geonode container should be tagged geonode/geonode:$MAJOR_RELEASE.$MINOR_RELEASE.$BUILD_NUMBER (e.g. geonode/geonode:4.1:15) and the geonode/geonode:$MAJOR_RELEASE.$MINOR_RELEASE (in this case geonode/geonode:4.1) is the latest build number from that release. Furhter geonode:geonode:latest should be the latest geonode/geonode:$MAJOR_RELEASE.$MINOR_RELEASE. For the geoserver image I would think of geonode/geoserver:2.3.0 without linking it to a specific geonode version. a) geoserver version could be linked to multiple geonode versions, b) simplicity.

lets assume that geoserver major versions does not change within a geonode version so the release notes, and docker-compose file could provide info about the required geoserver version and only link to $MAJOR_RELEASE.$MINOR_RELEASE

From the kubernetes perspective having constant image builds based on commits on the master branch is really appreciated for the geonode/geoserver and the geonode/geonode image. Also getting rid of the geonode:geoserver-data-dir repo would reduce the amount of complexity for the geonode deployment a bit. It seems odd on the first glance, but why can't the geonode:geoserver image itself hold the data, as it is required to run the image anyway.

When it comes to geonode-project, geonode-k8s cannot geonode-project images atm. The main reason for that is that geonode-k8s is overwriting some initial steps like the entrypoint.sh and the tasks.py. Which wouldn't work together with a geonode-project container.

@ridoo
Copy link

ridoo commented Sep 4, 2023

From the kubernetes perspective having constant image builds based on commits on the master branch is really appreciated for the geonode/geoserver and the geonode/geonode image

If you want to stay cutting-edge, constant builds of the main dev branch are a bonus (but not a requirement) also for non-kubernetes environments. Once built centrally, it would save time and ressources at other places IMO.

However, more important is to NOT push new images to an already tagged FULL version (including bugfix). I want to rely on the promise an image will not change, when I choose a very specific version of it. Use multi tagging approach instead: 4.1.2 not necessarily is equal to 4.1. Here's a good reading on using multiple docker tags to leverage semantic versioning .

For the geoserver image I would think of geonode/geoserver:2.3.0 without linking it to a specific geonode version. a) geoserver version could be linked to multiple geonode versions, b) simplicity.

Again, I find it handy to see which version is indicated to run in what GeoNode context. But I do not see this as a hard requirement. More important would be to document (on the GeoNode side) what GeoServer (et. al.) versions shall be used.

Also getting rid of the geonode:geoserver-data-dir repo would reduce the amount of complexity for the geonode deployment a bit. It seems odd on the first glance, but why can't the geonode:geoserver image itself hold the data, as it is required to run the image anyway.

I agree on that. It may be possible to do the whole entrypoint.sh foo on each startup. The geonode/geoserver image could provide a data-dir template for that. However, I think the data dir is a location where geoserver admin relies on to edit and store things. It should stay stable beyond a geoserver version. However, for a quick setup and testing it is better to go for a template approach and for production, one should externalize the data-dir as a volume mount.

When it comes to geonode-project, geonode-k8s cannot geonode-project images atm. The main reason for that is that geonode-k8s is overwriting some initial steps like the entrypoint.sh and the tasks.py. Which wouldn't work together with a geonode-project container.

@mwallschlaeger I did not go through all the changes which were required, but I think there may be room to converge things here. What do you think?

Let's we have this:

  • we found a bug inside the Nginx image for GeoNode 4.1.2
  • we fix it inside the geonode repo on branch 4.1.x:
    • should we release a new version of GeoNode (4.1.3) to include the fix the Nginx image?
    • shall we tag again the fixed image as nginx:4.1.2-1.25.1 and republish it?

Well, I do not vote for re-releasing the whole stack when a component changes. In addition, it would not make sense to include the component version (you could just use 4.1.2 for all components then). However, you are completely right. This would have to be resolved and it should be clear how tagging is done before throwing everything together. Let me think about this further.

You could release 4.1.3 if you think it is an important bugfix. However, if 1.25.1 is the patched version of nginx you could publish it at least using multiple version tags: 1.25 and 1.25.1. For the 4.1.x branch you may reference just the minor version (1.25) in the docker-compose.yml file. Re-building minor version locally is just fine for development but only push/override for example 4.1 with a truely released version. e.g. 4.1.2.

@giohappy
Copy link
Contributor Author

giohappy commented Sep 5, 2023

This discussion is very useful and interesting, but I think it's becoming too complex to follow since we're mixing many concepts.
What do you think if we open a discussion for each point?

  • Docker image tagging
  • Docker configurations architecture
  • GeoNode Project architecture

With regard to the last point, we had an internal call a few days ago. The change would have a relevant impact:

  • Documentation and training should be updated
  • Planned upgrades of existing projects will be more complicated, in particular for those with significant customizations

We would be in favor of creating an alternative/experimental repo with the new proposal (could be geonode-project-v2, or something different). However, before creating such a repo we still have to iron out a common vision, since there are still some corners to be rounded.

On the side of Docker, we're almost aligned

  • We still have to agree on an effective and simple tagging approach, but we're not so far.
  • We also agree on moving Docker configurations under the geonode repo (although this still depends on how we want to manage the images tagging and releases in relation to geonode's own tagging and release cycles), and just leeave the geonode-project Dockerfiles empty (of course with the FROM command!).

Do you agree on creating the three discussions?

@ridoo
Copy link

ridoo commented Sep 5, 2023

Hey @giohappy .. this is a very good discussion, though it grew a bit long already.

Let me try to summarize shortly:

  • A pure Docker project using base images is able to cover most use cases (except the non-docker setup ;-))
  • Remove unnecessary volume mounts
    • At least nginx-Conf
    • Remove complexity of the geoserver data-dir genesis (provide even a template structure within the base image)
  • Maintaining the base images
    • Build all images from Geonode Core and provide good Readme at the registry where the images come from
    • Use consistent and less complex versioning of Docker images. We discussed this internally again and do not see any problem to re-release the whole stack. We even would drop the version of the base image (e.g. -ubuntu-22.10 or 1.25.1-alpine). Everything should be transparent enough and therefore reasonable -- 4.1.2 is just fine and patch versions are for free :-)
    • Use a Github Release workflow which let you release GeoNode with a single click
  • Do not override full version (including patch/bugfix version) once tagged and published

EDIT: (since we were writing comments almost concurrently)

On my opinion, I think we should create a docker-project repository (as you proposed) and move there the current status of the PR as initial commit. We can document to be a draft (for now). To my point of view, we all should take this to test our very own use cases. Of course, we can use the discussions forum of the repository. 👍

  • Documentation and training should be updated

I can discuss internally, where we can help.

  • Planned upgrades of existing projects will be more complicated, in particular for those with significant customizations

This should be a one time task .. and upgrading to a new version would also require efforts even without the new project structure. However, as you said, you can just open a parallel track (two repositories) and see, how both get used over time.

  • We still have to agree on an effective and simple tagging approach, but we're not so far.

Yes, you have to decide on that. Our preference would be the most simple one. As we are doing open-source, everything is transparent enough. You do not build the same version on different base images, so there is no need to have such a fine grained version tagging. And, if there's a bug just trigger the whole release train -- it's a one click task nowadays.

@giohappy
Copy link
Contributor Author

giohappy commented Sep 7, 2023

On my opinion, I think we should create a docker-project repository (as you proposed) and move there the current status of the PR as initial commit. We can document to be a draft (for now). To my point of view, we all should take this to test our very own use cases. Of course, we can use the discussions forum of the repository.

@ridoo I would try to make a new version for the project that supports both Docker and "bare metal" setups. As we said it's a matter of rounding some corners in your proposal. I'm already testing it and I should be able to submit my response.

One thing that didn't work so far is the condiguration of DJANGO_MODULE_SETTINGS from the compose file. I think it's the complicated precedence rules that make the same variable in the .env file (which is set to geonode.settings) take precedence.

Yes, you have to decide on that. Our preference would be the most simple one. As we are doing open-source, everything is transparent enough. You do not build the same version on different base images, so there is no need to have such a fine grained version tagging. And, if there's a bug just trigger the whole release train -- it's a one click task nowadays.

I will review your proposal next days. I think I will open a dedicated issue for that. We should move in steps with this refactoring...

@ridoo
Copy link

ridoo commented Sep 7, 2023

No hurry on that. We will test such a setup in a Thünen-related project which starts in the next weeks. One Goal is to reveal issues when running the whole thing in a real world setup. @gannebamm will keep you posted on the progress, I am sure 😄

Regarding the DJANGO_SETTINGS_MODULE I would bet, you could just remove it from the .env file. I is a central configuration and would not be changed anyway, would it?

@giohappy
Copy link
Contributor Author

After further internal discussions, we came up with the following proposal for the Docker configurations:

  1. SOLUTION C (in addition to the two mentioned in the initial description): Move the docker configurations, all together, in a new single repo dedicated only to them. It's a compromise between specific repos for the single services and keeping everything inside the geonode repo. The latter was also suggested by @ridoo but we think it would clutter the repo (issues, tags, etc.) and make it very hard to follow.
  2. Tag the images with their own version. Rationale: We don't need to bind an image to a version of GeoNode, since it's the job of the docker-compose file to make the match. We rather want to be able to publish new versions for the same software and let users adapt the compose file.
    Example:
    • I have geonode/geoserver:2.23.0-v1.0.0 and it's used by GeoNode 4.2.0.
    • I need to fix the image. I tag it as geonode/geoserver:2.23.0-v1.0.1 and make GeoNode 4.2.x use it.
    • If someone wants to use the fixes image in their GeoNode 4.2.0 they can change their compose file accordingly.
  3. Adopt the proposal from @ridoo to make the local Dockerfile only inherit from the published images by default
  4. Remove the Dockerfiles from the geonode vanilla. We think geonode vanilla should be... vanilla :) It pulls the published images

For the moment we won't touch the geonode-project's code structure. This will be evaluated again afterwards,

@ridoo
Copy link

ridoo commented Sep 26, 2023

Hey @giohappy good to read your approach.

I have two comments:

  • I guess, geonode base image will reside on geonode/geonode? Otherwise, where do you get the geonode context from to build the image (git clone or something?).
  • Good to see, that overwriting image tags is not under discussion anymore :). However, versioning suffix seems a bit overkill ;-), but I get your point.

@giohappy
Copy link
Contributor Author

giohappy commented Oct 6, 2023

I guess, geonode base image will reside on geonode/geonode? Otherwise, where do you get the geonode context from to build the image (git clone or something?).

yes, sure.

Good to see, that overwriting image tags is not under discussion anymore :). However, versioning suffix seems a bit overkill ;-), but I get your point.

I don't see other solutions to avoid overwriting, have the flexibility to apply fixes to an image, and decouple images from specific geonode versions.

@giohappy
Copy link
Contributor Author

giohappy commented Oct 6, 2023

The repo for the Docker configurations has been bootstrapped. For the moment it just contains copies of the original configurations.

@giohappy
Copy link
Contributor Author

giohappy commented Oct 9, 2023

Workflows for the automatic build and publishing of Docker images have been created.
They can be improved but at least we have something:

  • automatic build and publishing of latest images on pushes to the master branch
  • automatic build and publishing of tagged images when a release is published
  • manual builds

@giohappy
Copy link
Contributor Author

giohappy commented Oct 9, 2023

First step into the refactoring of Docker images usage in geonode-project

@gannebamm
Copy link

@gannebamm will keep you posted on the progress, I am sure 😄

Sorry for the late response here. We have tested the proposed approach in our latest development project and were quite happy with the tooling. You can see the outcome here: https://github.com/GeoNodeUserGroup-DE/geonode-blueprint-docker
We tried to be transparent about the reasoning behind the repo here: https://github.com/GeoNodeUserGroup-DE/geonode-blueprint-docker?tab=readme-ov-file#background

We will further gather experience with the approach and also see how upgrades between versions will work out. The first upgrades in between our development cycle went relatively smoothly (4.1 -> 4.2.2)

I think for now this issue can be closed @giohappy? Otherwise, I would follow your idea and switch to a discussion.

@giohappy
Copy link
Contributor Author

@gannebamm it would be useful to bring the discussion about the project layout on. I see your geonode-docker-blueprint implements some of the ideas discussed here. It would be great if the official layout and the "official" one converged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants