Fixed small bugs after trying wiki test-kitchen#259
Fixed small bugs after trying wiki test-kitchen#259tomhughes merged 5 commits intoopenstreetmap:masterfrom
Conversation
|
I made some edits in 4542304 that specify the |
|
In 638131d I completed the addition of the "wiki" user account in the The Mediawiki cookbook still seems buggy to me. On a fresh run, there’s a SQL error which will disappear on a second run: https://gist.github.com/migurski/e2096952dbff2779223751d9ab5fa7d5 — this non-idempotent behavior makes me very uncomfortable. |
|
Pretty sure it worked fine last time I deployed a new wiki but mediawiki could easily have changed/broken something in between. They do like making life hard for us so it's all a bit hit and miss how well things work I'm afraid. |
|
I moved to a style that I think matches what’s in I’ve also tried moving The method call is here: |
|
Have you added |
|
Good hint. I did that, and then found an early fatal error due to a missing |
As I commented earlier, the right way to do things is to make each cookbook or role converge on its own. Putting dependent recipes in the test-kitchen config is just a workaround for these missing dependencies. I don't understand what advantage there is to having all these missing dependencies, but from the comments above, @tomhughes seems quite keen on keeping it that way. 🤷♂️
Yes, the problem is the cookbooks need to be fixed in order to include their implicit dependencies! Working around that (via the test-kitchen config) is a workaround - and not something we should perpetuate when a volunteer is volunteering to fix it properly by using Still, it's @tomhughes who gets the final say on this topic, so if we can't persuade him otherwise, I guess we keep layering on the workarounds. |
|
So first I would need to study this in more detail - to be honest I'm not entirely sure exactly what The larger problem is that many of our recipes make no sense, or won't work at all, without some sort of configuration - so there is no point for example pulling in the recipe to configure the network without the host role that contains the attributes describing the network configuration for the machine. That's an extreme example perhaps as not many things will need to pull that in but the general scope of this problem is mind boggling. I realise that our chef configuration is not at all modern, and may not even be done well on it's own terms, but fixing that is a separate issue to writing tests - we shouldn't try and redesign our chef setup as a side effect of adding a test, it should be a separate task. The trickiest issues do generally revolve around shared resources like users and apt repositories which is why we configure them in slightly odd ways and why you can't just use a resource in the way @migurski was initially trying to do. That said for relatively simple cases like here just including the recipes is probably enough so long as you at least have the As an aside I would just point out that the Generally speaking I would love to have tests, although I remain unconvinced that the kind of granularity @gravitystorm is practical and/or useful - mostly my concern is not "will cookbook X run in some default configuration" but "will the total set of things deployed on machine X work" which is a rather different question. That said individual cookbook tests is a good start I just worry about having to duplicate things to make it work and that actually increasing the risk in real deployments from having conflicting information. Really the main reason I haven't pursued tests more though is that running them turns out to be such a nightmare, and so slow, that I find them impractical to use in real life. That's why we have so few and why I stopped after writing a few of the simpler ones. |
The reason that I've been suggesting the wiki cookbook is that it's the system that has the most outside interest, i.e. from non-sysadmins, over the last 3 years. We have people who want to change the configuration of wiki.openstreetmap.org, and it would be easier for them if they could converge the recipes locally, and easier for admins if it was easy to check PRs locally before deploying them. But without the ability to test convergence using test-kitchen, outside contributors are coding blind, PRs are hard to review, and we only find out if the changes actually work when we run them on the live servers. Sure, if test coverage was my goal, I'd suggest starting elsewhere! But my goal is to get other people involved in this repo, so targetting the cookbook with the most outside interest seems like the right approach, since it's the most likely one to get external PRs.
That's totally understandable. But for other contributors wanting to try writing some code, it's probably better to have smaller testable options (like testing individual cookbooks), rather than having to run an all-up integration test each time they change a line of code.
Sure, it's definitely an overhead if you're happy to commit to the repo and run the changes on the live servers to check that they work. And without splitting the repo up into smaller parts, it's unrealistic to run any CI service (imagine having it converge every cookbook when a PR only affects one of them!). And we haven't even touched on using test-kitchen to actually do some tests, since we're so far only checking if the code converges as a binary yes/no, and not doing anything like using chefspec or a kitchen verifier to make sure e.g. services are running or throwing configuration errors. But having the recipes and roles converging locally - as a bare minimum step - is still useful and helps other people get involved, and that's what I think we should work together on. |
I was really just trying to say that the The biggest problem with the wiki anyway is not having anybody that is confident enough in mediawiki to be able to review the PRs and requests.
Actually some of them do test that the right services are running afterwards! My real point though is that whenever I try and run them I spend the first three hours trying to convince vagrant to work and then every test run takes about ten minutes because it has to spin up a VM which it then probably forgets to shutdown again so I find them cluttering up my system weeks later :-( I wish I knew a good solution to this! |
Awesome :-)
Yeah, running 'converge' leaves the VM available for inspection, you need to run 'destroy' to get rid of it. That's much less frustrating than running 'test', which destroys the vm automatically at the end of the run - that makes development super slow!
I use container-based drivers (kitchen-docker, see also kitchen-dokken) whenever possible, since they are faster to spin up than a VM, but there's some networking edge cases that they can't handle. So they are more useful for individual cookbook testing and less useful for all-up role tests. We can mix and match them in the kitchen config - individual suites can have different drivers. |
I described this in more detail here: #74 (comment) - It's much more lightweight than virtualbox . Unfortuantely, it's not really documented anywhere in this repo. |
|
Well frankly I hate docker as well, though in theory podman should be able to stand in and avoid many of it's problems. I am working on proving that theory - so far there are a few edge cases with the compatibility. I think originally we came to the conclusion that containers wouldn't work because they didn't have an init so you couldn't start services or anything. I know a container can gave an init but is kitchen actually able to use it in that way? Finally I am somewhat concerned that all the URLs to kitchen.ci on the kitchen github are dead - the domain no longer exists. Is kitchen dead technology? |
|
The kitchen.ci DNS failure also gives me pause. Maybe there will be an update on https://ourincrediblejourney.tumblr.com/ … Reading back through this thread it sounds like my PR is still an incremental step forward even though it doesn’t quite address all problems. I am happy to continue poking at this if it is useful. I wonder if we can merge this set of changes before starting the next set? I’m interested to hear a resolution on the As far as testing goes, I am tinkering with getting these tests running on a clean Ubuntu system. Right now I’m struggling to get past an early error: If it worked, this could be configured to run in parallel on a temporary EC2 server where it might be integrated into the CI for this repository? |
|
Poking this thread another time: I think I’ve made the requested changes, and this PR incrementally improves the repo even if the |
|
I'm still working on it but I need to get my test kitchen environment up and working again before I can look at this in detail. |
|
For the fun of it, I was following the steps in https://www.openstreetmap.org/user/migurski/diary/391942 on Ubuntu 18.04 / virtualbox. Running the script took about 25 minutes and failed with the same error message that was mentioned for Mac OS / Vagrant in the blog post: The error is caused by the CheckUser wiki extension. |
|
By the way, a good way to speed things up a bit is to add extra CPUs and memory: |
|
Besides the CheckUser extension issue, there's also an issue with generating SSL certificates, which currently fails inside test-kitchen: |
|
Issuing SSL certificates requires co-operation with another machine, but it should just create a self signed one if that's not available. I repeat though that I am working on this as fast as I can. As well as several evenings last week I have spent this afternoon on it as well. There is really no point spending a lot of time on a new test until the existing ones are working properly again. |
|
No worries. I have those tests running in the background, check from time to time how they're doing and post some results here. The idea is to get a rough overview of what kind of issues lie ahead and need some kind of fix eventually. |
|
@mmd-osm, can you share any details on how you installed vagrant on Ubuntu to get as far as you did? So far I’ve only been able to get to this stage on Mac OS as the host. |
|
I played with test kitchen and the osm-chef repo back in 2018. It may very well be that I encountered the same issue as you did and somehow fixed it (likely by something I found on StackOverflow). But quite honestly I don't recall the details. One thing you could look into is RSA/DSS keys for remote log on via SSH. Those tend to cause some issues sometime. Also when you open up virtualbox, there's an option to inspect the log file for the respective VM. |
|
Latest update from my side: after temporarily turning off the CheckUser wiki extension and the SSL for Apache, the wiki cookbook actually builds without error. So those would be the two topics that need some closer look. Log file (click to expand)Diff: (click to expand) |
|
Great @mmd-osm! I expect @tomhughes to have an opinion on this. If I can get this cookbook running under Ubuntu without incident, I’d be willing to set up and fund a three-month experiment with EC2 to see if working tests support better contributions here. |
|
Let's reflect for a moment about the ultimate goals of this thing:
I guess the EC2 instance wouldn't help with the first point (local testing). Regarding the second point, would an admin be able to log on to the EC2 instance and do a live test of those changes? I mean right now all we get back from test kitchen is some status that the build was successful. It won't tell anything about functional correctness: Like in my example I removed SSL and an important Wiki extension to fight spam, still the overall test kitchen result was successful. Deploying such a change for sure would cause some major issues. Which pieces are really missing here to turn this into something useful? |
|
What's missing is for you to all be quiet for a few days until I can investigate properly and find out what the real problems are and how to fix them! I'm not really sure what EC2 has to do with anything, or how it would be used. |
|
It is a beast of a test! The funky CheckUser fork might be a cause for concern, though I see there are other instances of MW plugins that we host ourselves. Curious to see if MW accepts our changes. |
|
@Firefishy I'm waiting to see what happens on the mediawiki bug - I'd prefer not to switch to a fork and I definitely don't want to deploy this without some indication from upstream that the change is not going to break things. |
|
I’m continuing to rebase as-needed while waiting for follow up on https://phabricator.wikimedia.org/T244283 |
aaa395f to
4e2c627
Compare
- Added missing databags to fix transient errors - Added wiki test suite with appropriate cookbooks
…wiki install Fixes CheckUser failure logged at https://travis-ci.org/openstreetmap/chef/jobs/650204578 Wikimedia\Rdbms\DBQueryError from line 1587 of /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? Query: INSERT INTO `cu_changes` (cuc_timestamp,cuc_user,cuc_user_text,cuc_namespace,cuc_title,cuc_comment,cuc_minor,cuc_page_id,cuc_this_oldid,cuc_last_oldid,cuc_type,cuc_ip,cuc_ip_hex) VALUES ('20200203011622',NULL,'MediaWiki default','0','Main_Page','','0','1','1','0','1','127.0.0.1','7F000001') Function: PopulateCheckUserTable::doDBUpdates Error: 1048 Column 'cuc_user' cannot be null (localhost) Backtrace: #0 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(1556): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string) openstreetmap#1 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(1274): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean) openstreetmap#2 /srv/wiki.openstreetmap.org/w/includes/libs/rdbms/database/Database.php(2149): Wikimedia\Rdbms\Database->query(string, string) openstreetmap#3 /srv/wiki.openstreetmap.org/w/extensions/CheckUser/maintenance/populateCheckUserTable.php(98): Wikimedia\Rdbms\Database->insert(string, array, string) openstreetmap#4 /srv/wiki.openstreetmap.org/w/maintenance/Maintenance.php(1719): PopulateCheckUserTable->doDBUpdates() openstreetmap#5 /srv/wiki.openstreetmap.org/w/maintenance/update.php(215): LoggedUpdateMaintenance->execute() openstreetmap#6 /srv/wiki.openstreetmap.org/w/maintenance/doMaintenance.php(96): UpdateMediaWiki->execute() openstreetmap#7 /srv/wiki.openstreetmap.org/w/maintenance/update.php(275): require_once(string) openstreetmap#8 {main}
|
MW closed https://phabricator.wikimedia.org/T244283 earlier this morning so I’ve removed the fork of CheckUser and rebased this PR to re-run tests. The wiki test runs clean now! I looked at each of the other test failures, and all of them except one fail due to the same regrettable flaky The civiccrm test fails because it can’t check out veda-consulting-company/uk.co.vedaconsulting.mailchimp@25798ff though the ref named in the attributes does still exist: chef/cookbooks/civicrm/attributes/default.rb Lines 23 to 25 in 8fe15dd Might be worth bringing that Mailchimp plugin up to a newer commit or master. Here’s what’s changed since |
…streetmap/stateofthemap Sha ref 03879a9 matches, see openstreetmap#259 (comment)
…p/openstreetmap-website Sha ref e8974292 matches, see openstreetmap#259 (comment)
…ap/openstreetmap/cgimap Sha ref e785788 matches, see openstreetmap#259 (comment)
|
Regarding the timeout issues we're seeing with Github Actions, I would actually raise an issue with Github Actions team for further analysis. Maybe we're hitting some kind of limit? This doesn't only affect Moving git.openstreetmap.org to another DC won't be sufficient to address this issue in that case. In theory, there might also be some subtle bug in the Docker + dokken + test-kitchen approach we're using. |
|
I believe @grischard is already working on making contact. |
|
Ok, sounds good. I've read this bit here on https://wiki.osmfoundation.org/wiki/Operations_Working_Group/Minutes/2020-04-10 and thought that there might be a bit of root cause analysis needed.
|
Added Mediwaiki Apt repository to support Parsoid installAddedaptasmediawikidependency to complete installationAdded missing Wiki user accountAddedaccountsasmediawikidependency to complete installationTest plan:
Add test kitchen configuration like:
Run
kitchen converge migurski-testing-ubuntu-1804Result
Still seeing some late-stage failures detailed in this log, but gets further than previously: https://gist.github.com/migurski/e08c0207d18462258498b198f1ae3607