Skip to content

Fixed small bugs after trying wiki test-kitchen#259

Merged
tomhughes merged 5 commits intoopenstreetmap:masterfrom
migurski:master
Mar 18, 2020
Merged

Fixed small bugs after trying wiki test-kitchen#259
tomhughes merged 5 commits intoopenstreetmap:masterfrom
migurski:master

Conversation

@migurski
Copy link
Copy Markdown
Contributor

@migurski migurski commented Jan 15, 2020

  • Updated Ubuntu from 16.04 to 18.04 based on current running hardware
  • Added Mediwaiki Apt repository to support Parsoid install Added apt as mediawiki dependency to complete installation
  • Added missing Wiki user account Added accounts as mediawiki dependency to complete installation
  • Added dummy passwords in test data bag

Test plan:

  1. Add test kitchen configuration like:

       - name: migurski_testing
         run_list:
           - recipe[munin]
           - recipe[wiki]
           - role[wiki]
    
  2. Run kitchen converge migurski-testing-ubuntu-1804

Result

Still seeing some late-stage failures detailed in this log, but gets further than previously: https://gist.github.com/migurski/e08c0207d18462258498b198f1ae3607

@migurski
Copy link
Copy Markdown
Contributor Author

I made some edits in 4542304 that specify the apt and account cookbooks as dependencies of mediawiki based on requested changes. It seems like munin might also be a dependent cookbook in this case; I’ll need some more time to re-run from scratch and test that change before making further edits to this PR.

@migurski
Copy link
Copy Markdown
Contributor Author

In 638131d I completed the addition of the "wiki" user account in the accounts cookbook and updated the correct signing key for the Mediawiki apt repository. Munin seemed to be a pre-requisite as well, and there’s now a configuration in test-kitchen for the wiki so this works:

kitchen converge wiki-ubuntu-1804

The Mediawiki cookbook still seems buggy to me. On a fresh run, there’s a SQL error which will disappear on a second run: https://gist.github.com/migurski/e2096952dbff2779223751d9ab5fa7d5 — this non-idempotent behavior makes me very uncomfortable.

@tomhughes
Copy link
Copy Markdown
Member

Pretty sure it worked fine last time I deployed a new wiki but mediawiki could easily have changed/broken something in between. They do like making life hard for us so it's all a bit hit and miss how well things work I'm afraid.

@migurski
Copy link
Copy Markdown
Contributor Author

migurski commented Jan 15, 2020

I moved to a style that I think matches what’s in otrs: recipes for apt, accounts, and munin all added to the test configuration and removed from dependencies within the wiki and mediawiki cookbooks. The new configuration is still not picking up the wiki user by default, and dies here regardless of the role/recipe order in the run list:

AL: Chef::Exceptions::UserIDNotFound: mediawiki_site[wiki.openstreetmap.org] (wiki::default line 30) had an error: Chef::Exceptions::UserIDNotFound: directory[/srv/wiki.openstreetmap.org] (/tmp/kitchen/cache/cookbooks/mediawiki/resources/site.rb line 94) had an error: Chef::Exceptions::UserIDNotFound: cannot determine user id for 'wiki', does the user exist on this system?

I’ve also tried moving role[wiki] to the top of the wiki configuration run list but this surfaces a problem in the Elasticsearch default attributes:

undefined method `|' for {}:Chef::Node::VividMash

The method call is here:

default[:apt][:sources] |= ["elasticsearch#{node[:elasticsearch][:version]}"]

@tomhughes
Copy link
Copy Markdown
Member

Have you added test/data_bags/accounts/wiki.json to define the wiki account?

@migurski
Copy link
Copy Markdown
Contributor Author

Good hint. I did that, and then found an early fatal error due to a missing node[:etc] value which I fixed in the default attributes in e87c666. Unsure if this is the right way to go about supplying values, or if the cookbook should be more resilient to missing values?

@gravitystorm
Copy link
Copy Markdown
Contributor

Unsure if this is the right way to go about supplying values, or if the cookbook should be more resilient to missing values?

As I commented earlier, the right way to do things is to make each cookbook or role converge on its own. Putting dependent recipes in the test-kitchen config is just a workaround for these missing dependencies.

I don't understand what advantage there is to having all these missing dependencies, but from the comments above, @tomhughes seems quite keen on keeping it that way. 🤷‍♂️

As a general rule if you're having to alter the cookbooks to test them then there's probably something wrong somewhere...

Yes, the problem is the cookbooks need to be fixed in order to include their implicit dependencies! Working around that (via the test-kitchen config) is a workaround - and not something we should perpetuate when a volunteer is volunteering to fix it properly by using include_recipe.

Still, it's @tomhughes who gets the final say on this topic, so if we can't persuade him otherwise, I guess we keep layering on the workarounds.

@tomhughes
Copy link
Copy Markdown
Member

So first I would need to study this in more detail - to be honest I'm not entirely sure exactly what include_recipe does as such and what the implications of it potentially being pulled in more than once are. I assume it's much the same as listing the recipe in the role except that a topological sort is done to order the overall set of recipes correctly which should be fine.

The larger problem is that many of our recipes make no sense, or won't work at all, without some sort of configuration - so there is no point for example pulling in the recipe to configure the network without the host role that contains the attributes describing the network configuration for the machine. That's an extreme example perhaps as not many things will need to pull that in but the general scope of this problem is mind boggling.

I realise that our chef configuration is not at all modern, and may not even be done well on it's own terms, but fixing that is a separate issue to writing tests - we shouldn't try and redesign our chef setup as a side effect of adding a test, it should be a separate task.

The trickiest issues do generally revolve around shared resources like users and apt repositories which is why we configure them in slightly odd ways and why you can't just use a resource in the way @migurski was initially trying to do. That said for relatively simple cases like here just including the recipes is probably enough so long as you at least have the wiki role which turns on the right accounts and repositories.

As an aside I would just point out that the wiki cookbook is probably not the best place to start - that is specific to configuring wiki.openstreetmap.org and it would probably be better to start with the mediawiki cookbook which handles generic installation of a mediawiki instance - the wiki cookbook then builds on that to create one specific wiki.

Generally speaking I would love to have tests, although I remain unconvinced that the kind of granularity @gravitystorm is practical and/or useful - mostly my concern is not "will cookbook X run in some default configuration" but "will the total set of things deployed on machine X work" which is a rather different question. That said individual cookbook tests is a good start I just worry about having to duplicate things to make it work and that actually increasing the risk in real deployments from having conflicting information.

Really the main reason I haven't pursued tests more though is that running them turns out to be such a nightmare, and so slow, that I find them impractical to use in real life. That's why we have so few and why I stopped after writing a few of the simpler ones.

@gravitystorm
Copy link
Copy Markdown
Contributor

As an aside I would just point out that the wiki cookbook is probably not the best place to start

The reason that I've been suggesting the wiki cookbook is that it's the system that has the most outside interest, i.e. from non-sysadmins, over the last 3 years. We have people who want to change the configuration of wiki.openstreetmap.org, and it would be easier for them if they could converge the recipes locally, and easier for admins if it was easy to check PRs locally before deploying them. But without the ability to test convergence using test-kitchen, outside contributors are coding blind, PRs are hard to review, and we only find out if the changes actually work when we run them on the live servers.

Sure, if test coverage was my goal, I'd suggest starting elsewhere! But my goal is to get other people involved in this repo, so targetting the cookbook with the most outside interest seems like the right approach, since it's the most likely one to get external PRs.

mostly my concern is not "will cookbook X run in some default configuration" but "will the total set of things deployed on machine X work" which is a rather different question.

That's totally understandable. But for other contributors wanting to try writing some code, it's probably better to have smaller testable options (like testing individual cookbooks), rather than having to run an all-up integration test each time they change a line of code.

Really the main reason I haven't pursued tests more though is that running them turns out to be such a nightmare, and so slow, that I find them impractical to use in real life.

Sure, it's definitely an overhead if you're happy to commit to the repo and run the changes on the live servers to check that they work. And without splitting the repo up into smaller parts, it's unrealistic to run any CI service (imagine having it converge every cookbook when a PR only affects one of them!). And we haven't even touched on using test-kitchen to actually do some tests, since we're so far only checking if the code converges as a binary yes/no, and not doing anything like using chefspec or a kitchen verifier to make sure e.g. services are running or throwing configuration errors.

But having the recipes and roles converging locally - as a bare minimum step - is still useful and helps other people get involved, and that's what I think we should work together on.

@tomhughes
Copy link
Copy Markdown
Member

The reason that I've been suggesting the wiki cookbook is that it's the system that has the most outside interest, i.e. from non-sysadmins, over the last 3 years. We have people who want to change the configuration of wiki.openstreetmap.org, and it would be easier for them if they could converge the recipes locally, and easier for admins if it was easy to check PRs locally before deploying them. But without the ability to test convergence using test-kitchen, outside contributors are coding blind, PRs are hard to review, and we only find out if the changes actually work when we run them on the live servers.

I was really just trying to say that the wiki cookbook probably isn't what people think it is - most of the configuration of the wiki is actually in the mediawiki cookbook.

The biggest problem with the wiki anyway is not having anybody that is confident enough in mediawiki to be able to review the PRs and requests.

Sure, it's definitely an overhead if you're happy to commit to the repo and run the changes on the live servers to check that they work. And without splitting the repo up into smaller parts, it's unrealistic to run any CI service (imagine having it converge every cookbook when a PR only affects one of them!). And we haven't even touched on using test-kitchen to actually do some tests, since we're so far only checking if the code converges as a binary yes/no, and not doing anything like using chefspec or a kitchen verifier to make sure e.g. services are running or throwing configuration errors.

Actually some of them do test that the right services are running afterwards!

My real point though is that whenever I try and run them I spend the first three hours trying to convince vagrant to work and then every test run takes about ten minutes because it has to spin up a VM which it then probably forgets to shutdown again so I find them cluttering up my system weeks later :-(

I wish I knew a good solution to this!

@gravitystorm
Copy link
Copy Markdown
Contributor

Actually some of them do test that the right services are running afterwards!

Awesome :-)

then probably forgets to shutdown again

Yeah, running 'converge' leaves the VM available for inspection, you need to run 'destroy' to get rid of it. That's much less frustrating than running 'test', which destroys the vm automatically at the end of the run - that makes development super slow!

I wish I knew a good solution to this!

I use container-based drivers (kitchen-docker, see also kitchen-dokken) whenever possible, since they are faster to spin up than a VM, but there's some networking edge cases that they can't handle. So they are more useful for individual cookbook testing and less useful for all-up role tests. We can mix and match them in the kitchen config - individual suites can have different drivers.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 16, 2020

I spend the first three hours trying to convince vagrant to work and then every test run takes about ten minutes because it has to spin up a VM which it then probably forgets to shutdown again so I find them cluttering up my system weeks later :-(

I use container-based drivers (kitchen-docker, see also kitchen-dokken) whenever possible,

I described this in more detail here: #74 (comment) - It's much more lightweight than virtualbox . Unfortuantely, it's not really documented anywhere in this repo.

@tomhughes
Copy link
Copy Markdown
Member

Well frankly I hate docker as well, though in theory podman should be able to stand in and avoid many of it's problems. I am working on proving that theory - so far there are a few edge cases with the compatibility.

I think originally we came to the conclusion that containers wouldn't work because they didn't have an init so you couldn't start services or anything. I know a container can gave an init but is kitchen actually able to use it in that way?

Finally I am somewhat concerned that all the URLs to kitchen.ci on the kitchen github are dead - the domain no longer exists. Is kitchen dead technology?

@migurski
Copy link
Copy Markdown
Contributor Author

migurski commented Jan 16, 2020

The kitchen.ci DNS failure also gives me pause. Maybe there will be an update on https://ourincrediblejourney.tumblr.com/ … 

Reading back through this thread it sounds like my PR is still an incremental step forward even though it doesn’t quite address all problems. I am happy to continue poking at this if it is useful. I wonder if we can merge this set of changes before starting the next set? I’m interested to hear a resolution on the include_recipe question for example. I definitely agree that being able to run a selected piece of OSM’s infra locally in order to make changes is vital for ensuring quality community suggestions.

As far as testing goes, I am tinkering with getting these tests running on a clean Ubuntu system. Right now I’m struggling to get past an early error:

Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.

If it worked, this could be configured to run in parallel on a temporary EC2 server where it might be integrated into the CI for this repository?

kitchen test --parallel --destroy=always all && notify-passed || notify-failed

@migurski
Copy link
Copy Markdown
Contributor Author

Poking this thread another time: I think I’ve made the requested changes, and this PR incrementally improves the repo even if the wiki cookbook does not yet fully complete.

@tomhughes
Copy link
Copy Markdown
Member

I'm still working on it but I need to get my test kitchen environment up and working again before I can look at this in detail.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 19, 2020

For the fun of it, I was following the steps in https://www.openstreetmap.org/user/migurski/diary/391942 on Ubuntu 18.04 / virtualbox. Running the script took about 25 minutes and failed with the same error message that was mentioned for Mac OS / Vagrant in the blog post:

The error is caused by the CheckUser wiki extension.

       Creating cu_log table ...done.
       ...index cuc_ip_hex_time already set on cu_changes table.
       ...index cuc_user_ip_time already set on cu_changes table.
       ...have cuc_private field in cu_changes table.
       ...site_stats is populated...done.
       ...Update 'populate rev_len and ar_len' already logged as completed.
       ...Update 'populate rev_sha1' already logged as completed.
       ...img_sha1 column of image table already populated.
       ...protocol-relative URLs in externallinks table already fixed.
       ...fa_sha1 column of filearchive table already populated.
       ...*_from_namespace column of backlink tables already populated.
       ...Update 'FixDefaultJsonContentPages' already logged as completed.
       ...Update 'cleanup empty categories' already logged as completed.
       ...RFC and PMID already added to interwiki database table.
       ...Update 'populate pp_sortkey' already logged as completed.
       ...Update 'populate ip_changes' already logged as completed.
       ...externallinks table indexes up to date
       Starting poulation of cu_changes with recentchanges rc_id from 1 to 100
       ...migrating rc_id from 1 to 100
       STDERR: [ccaedd880af180fc311d3c19] [no req]   Wikimedia\Rdbms\DBQueryError from line 1587 of /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
       Query: INSERT  INTO `cu_changes` (cuc_timestamp,cuc_user,cuc_user_text,cuc_namespace,cuc_title,cuc_comment,cuc_minor,cuc_page_id,cuc_this_oldid,cuc_last_oldid,cuc_type,cuc_ip,cuc_ip_hex) VALUES ('20200119093941',NULL,'MediaWiki default','0','Main_Page','','0','1','1','0','1','127.0.0.1','7F000001')
       Function: PopulateCheckUserTable::doDBUpdates
       Error: 1048 Column 'cuc_user' cannot be null (localhost)
       
       Backtrace:
       #0 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(1556): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
       #1 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(1274): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
       #2 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(2149): Wikimedia\Rdbms\Database->query(string, string)
       #3 /srv/wiki.openstreetmap.org/w/extensions/CheckUser/maintenance/populateCheckUserTable.php(98): Wikimedia\Rdbms\Database->insert(string, array, string)
       #4 /srv/wiki.openstreetmap.org/w/maintenance/Maintenance.php(1719): PopulateCheckUserTable->doDBUpdates()
       #5 /srv/wiki.openstreetmap.org/w/maintenance/update.php(215): LoggedUpdateMaintenance->execute()
       #6 /srv/wiki.openstreetmap.org/w/maintenance/doMaintenance.php(96): UpdateMediaWiki->execute()
       #7 /srv/wiki.openstreetmap.org/w/maintenance/update.php(275): require_once(string)
       #8 {main}
       ---- End output of php maintenance/update.php --quick ----
       Ran php maintenance/update.php --quick returned 255
       
-----> Destroying <wiki-ubuntu-1804>...
       ==> default: Forcing shutdown of VM...
       ==> default: Destroying VM and associated drives...
       Vagrant instance <wiki-ubuntu-1804> destroyed.
       Finished destroying <wiki-ubuntu-1804> (0m4.46s).
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Converge failed on instance <wiki-ubuntu-1804>.  Please see .kitchen/logs/wiki-ubuntu-1804.log for more details
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 19, 2020

By the way, a good way to speed things up a bit is to add extra CPUs and memory:

platforms:
  # Ubuntu 18.04 is what’s used on the live OSM servers,
  # according to https://hardware.openstreetmap.org/
  - name: ubuntu-18.04
    driver:
      customize:
        cpus: 4
        memory: 4096

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 19, 2020

Besides the CheckUser extension issue, there's also an issue with generating SSL certificates, which currently fails inside test-kitchen:

FATAL: Net::HTTPServerException: mediawiki_site[wiki.openstreetmap.org] (wiki::default line 30) had an error: Net::HTTPServerException: ssl_certificate[wiki.openstreetmap.org] (/tmp/kitchen/cache/cookbooks/mediawiki/resources/site.rb line 489) had an error: Net::HTTPServerException: 404 "Not Found"

@tomhughes
Copy link
Copy Markdown
Member

Issuing SSL certificates requires co-operation with another machine, but it should just create a self signed one if that's not available.

I repeat though that I am working on this as fast as I can. As well as several evenings last week I have spent this afternoon on it as well. There is really no point spending a lot of time on a new test until the existing ones are working properly again.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 19, 2020

No worries. I have those tests running in the background, check from time to time how they're doing and post some results here. The idea is to get a rough overview of what kind of issues lie ahead and need some kind of fix eventually.

@migurski
Copy link
Copy Markdown
Contributor Author

@mmd-osm, can you share any details on how you installed vagrant on Ubuntu to get as far as you did? So far I’ve only been able to get to this stage on Mac OS as the host.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 19, 2020

I played with test kitchen and the osm-chef repo back in 2018. It may very well be that I encountered the same issue as you did and somehow fixed it (likely by something I found on StackOverflow). But quite honestly I don't recall the details.

One thing you could look into is RSA/DSS keys for remote log on via SSH. Those tend to cause some issues sometime. Also when you open up virtualbox, there's an option to inspect the log file for the respective VM.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 19, 2020

Latest update from my side: after temporarily turning off the CheckUser wiki extension and the SSL for Apache, the wiki cookbook actually builds without error. So those would be the two topics that need some closer look.

Log file (click to expand)
       Recipe: wiki::default
         * mediawiki_site[wiki.openstreetmap.org] action update
           * template[/srv/wiki.openstreetmap.org/w/LocalSettings.php] action create
             - update content in file /srv/wiki.openstreetmap.org/w/LocalSettings.php from 2da3dd to 5024f9
             --- /srv/wiki.openstreetmap.org/w/LocalSettings.php	2020-01-19 18:40:37.410909028 +0000
             +++ /srv/wiki.openstreetmap.org/w/.chef-LocalSettings20200119-2220-1mvw4s7.php	2020-01-19 18:47:34.807503028 +0000
             @@ -331,6 +331,7 @@
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-CirrusSearch.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-DisableAccount.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Skin-Vector.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Scribunto.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Poem.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-AntiSpoof.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-CleanChanges.inc.php');
             @@ -339,6 +340,9 @@
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-ParserFunctions.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-VisualEditor.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-UniversalLanguageSelector.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-MultiMaps.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-CodeEditor.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-CodeMirror.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-LocalisationUpdate.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-DismissableSiteNotice.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Elastica.inc.php');
             @@ -353,15 +357,20 @@
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Skin-MinervaNeue.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-InputBox.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-WikiEditor.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-OsmWikibase.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-MobileFrontend.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Nuke.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Skin-Modern.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-SpamBlacklist.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Skin-CologneBlue.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-osmtaginfo.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Echo.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Babel.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-ConfirmEdit.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-AbuseFilter.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-TimedMediaHandler.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-PdfHandler.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Thanks.inc.php');
       require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Skin-MonoBook.inc.php');
             +require_once('/srv/wiki.openstreetmap.org/w/LocalSettings.d/Ext-Wikibase.inc.php');
           * execute[/srv/wiki.openstreetmap.org/w/maintenance/update.php] action run
             - execute php maintenance/update.php --quick
           * execute[/srv/wiki.openstreetmap.org/w/maintenance/update.php] action run
             - execute php maintenance/update.php --quick
         
       Recipe: elasticsearch::default
         * service[elasticsearch] action restart
           - restart service service[elasticsearch]
       
       Running handlers:
       Running handlers complete
       Chef Infra Client finished, 460/686 resources updated in 26 minutes 04 seconds
       Downloading files from <wiki-ubuntu-1804>
       Finished converging <wiki-ubuntu-1804> (26m52.12s).
-----> Setting up <wiki-ubuntu-1804>...
       Finished setting up <wiki-ubuntu-1804> (0m0.00s).
-----> Verifying <wiki-ubuntu-1804>...
       Preparing files for transfer
       Transferring files to <wiki-ubuntu-1804>
       Finished verifying <wiki-ubuntu-1804> (0m0.00s).
-----> Destroying <wiki-ubuntu-1804>...
       ==> default: Forcing shutdown of VM...
       ==> default: Destroying VM and associated drives...
       Vagrant instance <wiki-ubuntu-1804> destroyed.
       Finished destroying <wiki-ubuntu-1804> (0m4.89s).
       Finished testing <wiki-ubuntu-1804> (27m50.82s).
-----> Test Kitchen is finished. (27m51.01s)
Diff: (click to expand)
diff --git a/.kitchen.yml b/.kitchen.yml
index 687a142a..d65a0b53 100644
--- a/.kitchen.yml
+++ b/.kitchen.yml
@@ -13,6 +13,10 @@ platforms:
   # Ubuntu 18.04 is what’s used on the live OSM servers,
   # according to https://hardware.openstreetmap.org/
   - name: ubuntu-18.04
+    driver:
+      customize:
+        cpus: 4
+        memory: 4096
 
 suites:
   - name: accounts
diff --git a/cookbooks/mediawiki/resources/site.rb b/cookbooks/mediawiki/resources/site.rb
index 55228af0..94057f41 100644
--- a/cookbooks/mediawiki/resources/site.rb
+++ b/cookbooks/mediawiki/resources/site.rb
@@ -393,11 +393,11 @@ action :create do
     update_site false
   end
 
-  mediawiki_extension "CheckUser" do
-    site new_resource.site
-    template "mw-ext-CheckUser.inc.php.erb"
-    update_site false
-  end
+#  mediawiki_extension "CheckUser" do
+#    site new_resource.site
+#    template "mw-ext-CheckUser.inc.php.erb"
+#    update_site true
+#  end
 
   mediawiki_extension "DismissableSiteNotice" do
     site new_resource.site
@@ -486,9 +486,9 @@ action :create do
     backup false
   end
 
-  ssl_certificate new_resource.site do
-    domains [new_resource.site] + Array(new_resource.aliases)
-  end
+#  ssl_certificate new_resource.site do
+#    domains [new_resource.site] + Array(new_resource.aliases)
+#  end
 
   apache_site new_resource.site do
     cookbook "mediawiki"
diff --git a/cookbooks/mediawiki/templates/default/apache.erb b/cookbooks/mediawiki/templates/default/apache.erb
index d5485a62..36715815 100644
--- a/cookbooks/mediawiki/templates/default/apache.erb
+++ b/cookbooks/mediawiki/templates/default/apache.erb
@@ -23,9 +23,9 @@
 
   ServerAdmin webmaster@openstreetmap.org
 
-  SSLEngine on
-  SSLCertificateFile /etc/ssl/certs/<%= @name %>.pem
-  SSLCertificateKeyFile /etc/ssl/private/<%= @name %>.key
+#  SSLEngine on
+#  SSLCertificateFile /etc/ssl/certs/<%= @name %>.pem
+#  SSLCertificateKeyFile /etc/ssl/private/<%= @name %>.key
 
   CustomLog /var/log/apache2/<%= @name %>-secure-access.log combined
   ErrorLog /var/log/apache2/<%= @name %>-secure-error.log

@migurski
Copy link
Copy Markdown
Contributor Author

Great @mmd-osm! I expect @tomhughes to have an opinion on this. If I can get this cookbook running under Ubuntu without incident, I’d be willing to set up and fund a three-month experiment with EC2 to see if working tests support better contributions here.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Jan 20, 2020

Let's reflect for a moment about the ultimate goals of this thing:

We have people who want to change the configuration of wiki.openstreetmap.org, and it would be easier for them if they could converge the recipes locally, and easier for admins if it was easy to check PRs locally before deploying them

I guess the EC2 instance wouldn't help with the first point (local testing). Regarding the second point, would an admin be able to log on to the EC2 instance and do a live test of those changes? I mean right now all we get back from test kitchen is some status that the build was successful. It won't tell anything about functional correctness:

Like in my example I removed SSL and an important Wiki extension to fight spam, still the overall test kitchen result was successful. Deploying such a change for sure would cause some major issues.

Which pieces are really missing here to turn this into something useful?

@tomhughes
Copy link
Copy Markdown
Member

What's missing is for you to all be quiet for a few days until I can investigate properly and find out what the real problems are and how to fix them!

I'm not really sure what EC2 has to do with anything, or how it would be used.

@Firefishy
Copy link
Copy Markdown
Member

🎆 IT PASSED! https://travis-ci.org/openstreetmap/chef/jobs/650967907

@migurski
Copy link
Copy Markdown
Contributor Author

It is a beast of a test!

The funky CheckUser fork might be a cause for concern, though I see there are other instances of MW plugins that we host ourselves. Curious to see if MW accepts our changes.

@tomhughes
Copy link
Copy Markdown
Member

@Firefishy I'm waiting to see what happens on the mediawiki bug - I'd prefer not to switch to a fork and I definitely don't want to deploy this without some indication from upstream that the change is not going to break things.

@migurski
Copy link
Copy Markdown
Contributor Author

I’m continuing to rebase as-needed while waiting for follow up on https://phabricator.wikimedia.org/T244283

migurski and others added 5 commits March 17, 2020 21:07
- Added missing databags to fix transient errors
- Added wiki test suite with appropriate cookbooks
…wiki install

Fixes CheckUser failure logged at https://travis-ci.org/openstreetmap/chef/jobs/650204578

    Wikimedia\Rdbms\DBQueryError from line 1587 of /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?
    Query: INSERT  INTO `cu_changes` (cuc_timestamp,cuc_user,cuc_user_text,cuc_namespace,cuc_title,cuc_comment,cuc_minor,cuc_page_id,cuc_this_oldid,cuc_last_oldid,cuc_type,cuc_ip,cuc_ip_hex) VALUES ('20200203011622',NULL,'MediaWiki default','0','Main_Page','','0','1','1','0','1','127.0.0.1','7F000001')
    Function: PopulateCheckUserTable::doDBUpdates
    Error: 1048 Column 'cuc_user' cannot be null (localhost)
    Backtrace:
    #0 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(1556): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
    openstreetmap#1 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(1274): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
    openstreetmap#2 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(2149): Wikimedia\Rdbms\Database->query(string, string)
    openstreetmap#3 /srv/wiki.openstreetmap.org/w/extensions/CheckUser/maintenance/populateCheckUserTable.php(98): Wikimedia\Rdbms\Database->insert(string, array, string)
    openstreetmap#4 /srv/wiki.openstreetmap.org/w/maintenance/Maintenance.php(1719): PopulateCheckUserTable->doDBUpdates()
    openstreetmap#5 /srv/wiki.openstreetmap.org/w/maintenance/update.php(215): LoggedUpdateMaintenance->execute()
    openstreetmap#6 /srv/wiki.openstreetmap.org/w/maintenance/doMaintenance.php(96): UpdateMediaWiki->execute()
    openstreetmap#7 /srv/wiki.openstreetmap.org/w/maintenance/update.php(275): require_once(string)
    openstreetmap#8 {main}
@migurski
Copy link
Copy Markdown
Contributor Author

migurski commented Mar 18, 2020

MW closed https://phabricator.wikimedia.org/T244283 earlier this morning so I’ve removed the fork of CheckUser and rebased this PR to re-run tests. The wiki test runs clean now!

I looked at each of the other test failures, and all of them except one fail due to the same regrettable flaky git.openstreetmap.org checkout:

Failed to connect to git.openstreetmap.org port 443: Connection timed out

The civiccrm test fails because it can’t check out veda-consulting-company/uk.co.vedaconsulting.mailchimp@25798ff though the ref named in the attributes does still exist:

default[:civicrm][:extensions][:mailchimp][:name] = "uk.co.vedaconsulting.mailchimp"
default[:civicrm][:extensions][:mailchimp][:repository] = "https://github.com/veda-consulting/uk.co.vedaconsulting.mailchimp.git"
default[:civicrm][:extensions][:mailchimp][:revision] = "25798ffe209650f12162a0cdc3c7ee9203d950f2"

Might be worth bringing that Mailchimp plugin up to a newer commit or master. Here’s what’s changed since 25798ffe.

@tomhughes tomhughes merged commit 9d3b263 into openstreetmap:master Mar 18, 2020
migurski added a commit to migurski/chef that referenced this pull request Mar 19, 2020
migurski added a commit to migurski/chef that referenced this pull request Mar 19, 2020
migurski added a commit to migurski/chef that referenced this pull request Mar 19, 2020
@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Apr 4, 2020

Regarding the timeout issues we're seeing with Github Actions, I would actually raise an issue with Github Actions team for further analysis. Maybe we're hitting some kind of limit?

This doesn't only affect git.openstreetmap.org but also fetching keys from Ubuntu keyservers, and some other external sites that are being queried in some of the cookbooks. All of them bail out with a timeout once in a while.

Moving git.openstreetmap.org to another DC won't be sufficient to address this issue in that case. In theory, there might also be some subtle bug in the Docker + dokken + test-kitchen approach we're using.

@tomhughes
Copy link
Copy Markdown
Member

I believe @grischard is already working on making contact.

@mmd-osm
Copy link
Copy Markdown
Contributor

mmd-osm commented Apr 4, 2020

Ok, sounds good. I've read this bit here on https://wiki.osmfoundation.org/wiki/Operations_Working_Group/Minutes/2020-04-10 and thought that there might be a bit of root cause analysis needed.

Fixing the failing CI tests

Issue: Unreliable https://github.com/openstreetmap/chef/actions
Should we make Github Primary? Should we move git.openstreetmap.org to another data centre?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants