work #25

ghost · 2020-09-21T13:22:43Z

Work out number of errors in Scrapy log file
Consider number of errors in decision to backup or not
Don't backup subsets (sample, date filters)
Don't backup if data files missing
Add Sentry
Add test run option
Delete files we make and log and data files

ghost · 2020-09-21T13:50:20Z

Move Archive to S3 and decommission server

ghost · 2020-09-22T15:02:33Z

Making draft while adding more things; will take out of draft when ready for review

Work out number of errors in Scrapy log file Consider number of errors in decision to backup or not Don't backup subsets (sample, date filters) Don't backup if data files missing Add Sentry Don't check deleted collections

ghost · 2020-09-24T13:18:41Z

@jpmckinney Now ready for review - thanks

ghost · 2020-09-29T09:30:42Z

@jpmckinney Can you look at this? Thanks

jpmckinney

Looks good. Would be good to do the small tidying in the comments, and create issues for anything longer.

manage.py

jpmckinney · 2020-09-30T00:33:51Z

ocdskingfisherarchive/archive.py

+            print(
+                "Collection " + str(collection.database_id) + " result: " + ("Archive" if should_archive else "Leave")
+            )


Instead of both logging and printing, you should just add appropriate handlers.

If we are sure we want logging messages to go to console, then yes. However I remembered some chat about minimising noisy cron's, which would mean we don't. In that case having a print is good, so the operator can see results straight away without having to then go and look in the log files. That was my thinking behind putting the print in.

ocdskingfisherarchive/archive.py

ocdskingfisherarchive/collection.py

jpmckinney · 2020-09-30T01:03:10Z

ocdskingfisherarchive/collection.py

+            data_dir = self.config.directory_data + '/' + self.source_id + '/' + \
+                self.data_version.strftime("%Y%m%d_%H%M%S")
+            # We use os.system here so we know the exact command so we can set up sudo correctly
+            return1 = os.system('sudo -u ocdskfs /bin/rm -rf '+data_dir)


Why not just set things up so that the ocdskfs user runs the archive command? Seems like a lot of trouble otherwise (changes to the sudo command require changes to deploy...).

So this is actually matching the current setup as closely as possible, to minimise changes on server. We already have sudo set up: https://github.com/open-contracting/deploy/blob/master/salt/ocdskingfisherarchive/archive.sudoers

And also noting that we have to have some salt changes when deploying this (config file, logging config file, new pillar variables, pip); I've already done that initial work and can present it soon. But minimising changes for now seemed sensible.

ocdskingfisherarchive/scrapy_log_file.py

#25

…is bigger #25

#25 " fewer or equal errors" should have been in 8228dbf

#25

jpmckinney

I added a couple commits and created issues for outstanding comments.

jpmckinney · 2020-09-30T17:02:50Z

I'm going to merge so that I can do the other admin setting changes. Let me know if you have any issue with my two commits.

ghost marked this pull request as draft September 22, 2020 15:02

ghost force-pushed the james-work-in-progress-4 branch from 39fb570 to 1c72b6d Compare September 22, 2020 15:15

work: errors, subsets, data missing, sentry

a319064

Work out number of errors in Scrapy log file Consider number of errors in decision to backup or not Don't backup subsets (sample, date filters) Don't backup if data files missing Add Sentry Don't check deleted collections

ghost force-pushed the james-work-in-progress-4 branch from 1c72b6d to a319064 Compare September 23, 2020 07:49

cli: Add test run option

6994560

ghost changed the title ~~work: errors, subsets, data missing~~ work Sep 23, 2020

work: Delete local files we create

428858a

ghost force-pushed the james-work-in-progress-4 branch 4 times, most recently from 013366a to b031cdf Compare September 23, 2020 11:19

work: Delete data files and log files

b782b5c

ghost force-pushed the james-work-in-progress-4 branch from b031cdf to b782b5c Compare September 23, 2020 11:31

work: Delete files downloaded from S3 after use

9b2d7b0

ghost marked this pull request as ready for review September 24, 2020 13:18

jpmckinney reviewed Sep 30, 2020

View reviewed changes

jarofgreen added 10 commits September 30, 2020 10:30

cli: change test-run to dry-run

b23963e

#25

archive: Remove unnecessary hierarchy

5884778

#25

archive: Spelling and Grammar in log messages

7497527

#25

archive: Grammar in log messages

66229b9

#25

archive: Skip not Leave in messages

ac7c8cc

#25

scrapy_log_file: Rename method and docstring

2c32918

#25

collection: _cache_scrapyd_log_file_info should not return anything

78679e9

#25

collection: Simpler isdir checks

e264da1

#25

collection: Simpler bool check

c6cfb0a

#25

collection: Move common code to _get_data_dir_name, use os.path

3116059

jarofgreen added 4 commits September 30, 2020 13:32

collection: calculate S3 path with join

a5a34d6

#25

archive: Also backup if earlier collection has same errors and local …

8228dbf

…is bigger #25

archive: Correct debug message (both grammer and correctness)

73db65c

#25 " fewer or equal errors" should have been in 8228dbf

collection: use python @Property for log file

66121ca

#25

ghost mentioned this pull request Sep 30, 2020

Document Current Filters open-contracting/kingfisher-collect#506

Closed

jarofgreen added 2 commits September 30, 2020 15:55

s3: bug fix; create logger

a634b59

database_process: Get collections in order

d2bb979

This was referenced Sep 30, 2020

Try to read files from S3 into memory #26

Open

Consider changing the deployment to run this as the ocdskfs user, to avoid use of sudo #27

Closed

Instead of both logging and printing, add appropriate handlers #28

Closed

jpmckinney added 2 commits September 30, 2020 12:44

Use the same short flag for --dry-run as most other CLI tools

7dc6114

Use methods instead of repeating their bodies

824b1c2

jpmckinney approved these changes Sep 30, 2020

View reviewed changes

jpmckinney merged commit 873f0a8 into new-master Sep 30, 2020

jpmckinney deleted the james-work-in-progress-4 branch September 30, 2020 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

work #25

work #25

ghost commented Sep 21, 2020 •

edited by ghost

Loading

ghost commented Sep 21, 2020

ghost commented Sep 22, 2020

ghost commented Sep 24, 2020

ghost commented Sep 29, 2020

jpmckinney left a comment

jpmckinney Sep 30, 2020

ghost Sep 30, 2020

jpmckinney Sep 30, 2020

ghost Sep 30, 2020

jpmckinney left a comment

jpmckinney commented Sep 30, 2020

work #25

work #25

Conversation

ghost commented Sep 21, 2020 • edited by ghost Loading

ghost commented Sep 21, 2020

ghost commented Sep 22, 2020

ghost commented Sep 24, 2020

ghost commented Sep 29, 2020

jpmckinney left a comment

Choose a reason for hiding this comment

jpmckinney Sep 30, 2020

Choose a reason for hiding this comment

ghost Sep 30, 2020

Choose a reason for hiding this comment

jpmckinney Sep 30, 2020

Choose a reason for hiding this comment

ghost Sep 30, 2020

Choose a reason for hiding this comment

jpmckinney left a comment

Choose a reason for hiding this comment

jpmckinney commented Sep 30, 2020

ghost commented Sep 21, 2020 •

edited by ghost

Loading