-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
ghost
commented
Sep 21, 2020
•
edited by ghost
Loading
edited by ghost
- Work out number of errors in Scrapy log file
- Consider number of errors in decision to backup or not
- Don't backup subsets (sample, date filters)
- Don't backup if data files missing
- Add Sentry
- Add test run option
- Delete files we make and log and data files
Making draft while adding more things; will take out of draft when ready for review |
39fb570
to
1c72b6d
Compare
Work out number of errors in Scrapy log file Consider number of errors in decision to backup or not Don't backup subsets (sample, date filters) Don't backup if data files missing Add Sentry Don't check deleted collections
1c72b6d
to
a319064
Compare
013366a
to
b031cdf
Compare
b031cdf
to
b782b5c
Compare
@jpmckinney Now ready for review - thanks |
@jpmckinney Can you look at this? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Would be good to do the small tidying in the comments, and create issues for anything longer.
ocdskingfisherarchive/archive.py
Outdated
print( | ||
"Collection " + str(collection.database_id) + " result: " + ("Archive" if should_archive else "Leave") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of both logging and printing, you should just add appropriate handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are sure we want logging messages to go to console, then yes. However I remembered some chat about minimising noisy cron's, which would mean we don't. In that case having a print is good, so the operator can see results straight away without having to then go and look in the log files. That was my thinking behind putting the print in.
ocdskingfisherarchive/collection.py
Outdated
data_dir = self.config.directory_data + '/' + self.source_id + '/' + \ | ||
self.data_version.strftime("%Y%m%d_%H%M%S") | ||
# We use os.system here so we know the exact command so we can set up sudo correctly | ||
return1 = os.system('sudo -u ocdskfs /bin/rm -rf '+data_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just set things up so that the ocdskfs user runs the archive command? Seems like a lot of trouble otherwise (changes to the sudo command require changes to deploy...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is actually matching the current setup as closely as possible, to minimise changes on server. We already have sudo set up: https://github.com/open-contracting/deploy/blob/master/salt/ocdskingfisherarchive/archive.sudoers
And also noting that we have to have some salt changes when deploying this (config file, logging config file, new pillar variables, pip); I've already done that initial work and can present it soon. But minimising changes for now seemed sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple commits and created issues for outstanding comments.
I'm going to merge so that I can do the other admin setting changes. Let me know if you have any issue with my two commits. |