Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

work #25

Merged
merged 23 commits into from
Sep 30, 2020
Merged

work #25

merged 23 commits into from
Sep 30, 2020

Conversation

ghost
Copy link

@ghost ghost commented Sep 21, 2020

  • Work out number of errors in Scrapy log file
  • Consider number of errors in decision to backup or not
  • Don't backup subsets (sample, date filters)
  • Don't backup if data files missing
  • Add Sentry
  • Add test run option
  • Delete files we make and log and data files

@ghost
Copy link
Author

ghost commented Sep 21, 2020

@ghost
Copy link
Author

ghost commented Sep 22, 2020

Making draft while adding more things; will take out of draft when ready for review

@ghost ghost marked this pull request as draft September 22, 2020 15:02
@ghost ghost force-pushed the james-work-in-progress-4 branch from 39fb570 to 1c72b6d Compare September 22, 2020 15:15
Work out number of errors in Scrapy log file
Consider number of errors in decision to backup or not
Don't backup subsets (sample, date filters)
Don't backup if data files missing
Add Sentry
Don't check deleted collections
@ghost ghost force-pushed the james-work-in-progress-4 branch from 1c72b6d to a319064 Compare September 23, 2020 07:49
@ghost ghost changed the title work: errors, subsets, data missing work Sep 23, 2020
@ghost ghost force-pushed the james-work-in-progress-4 branch 4 times, most recently from 013366a to b031cdf Compare September 23, 2020 11:19
@ghost ghost force-pushed the james-work-in-progress-4 branch from b031cdf to b782b5c Compare September 23, 2020 11:31
@ghost ghost marked this pull request as ready for review September 24, 2020 13:18
@ghost
Copy link
Author

ghost commented Sep 24, 2020

@jpmckinney Now ready for review - thanks

@ghost
Copy link
Author

ghost commented Sep 29, 2020

@jpmckinney Can you look at this? Thanks

Copy link
Member

@jpmckinney jpmckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Would be good to do the small tidying in the comments, and create issues for anything longer.

manage.py Outdated Show resolved Hide resolved
Comment on lines 37 to 39
print(
"Collection " + str(collection.database_id) + " result: " + ("Archive" if should_archive else "Leave")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of both logging and printing, you should just add appropriate handlers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are sure we want logging messages to go to console, then yes. However I remembered some chat about minimising noisy cron's, which would mean we don't. In that case having a print is good, so the operator can see results straight away without having to then go and look in the log files. That was my thinking behind putting the print in.

ocdskingfisherarchive/archive.py Outdated Show resolved Hide resolved
ocdskingfisherarchive/archive.py Outdated Show resolved Hide resolved
ocdskingfisherarchive/archive.py Outdated Show resolved Hide resolved
ocdskingfisherarchive/collection.py Outdated Show resolved Hide resolved
ocdskingfisherarchive/collection.py Outdated Show resolved Hide resolved
data_dir = self.config.directory_data + '/' + self.source_id + '/' + \
self.data_version.strftime("%Y%m%d_%H%M%S")
# We use os.system here so we know the exact command so we can set up sudo correctly
return1 = os.system('sudo -u ocdskfs /bin/rm -rf '+data_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just set things up so that the ocdskfs user runs the archive command? Seems like a lot of trouble otherwise (changes to the sudo command require changes to deploy...).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is actually matching the current setup as closely as possible, to minimise changes on server. We already have sudo set up: https://github.com/open-contracting/deploy/blob/master/salt/ocdskingfisherarchive/archive.sudoers

And also noting that we have to have some salt changes when deploying this (config file, logging config file, new pillar variables, pip); I've already done that initial work and can present it soon. But minimising changes for now seemed sensible.

ocdskingfisherarchive/scrapy_log_file.py Outdated Show resolved Hide resolved
ocdskingfisherarchive/scrapy_log_file.py Show resolved Hide resolved
Copy link
Member

@jpmckinney jpmckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple commits and created issues for outstanding comments.

@jpmckinney
Copy link
Member

I'm going to merge so that I can do the other admin setting changes. Let me know if you have any issue with my two commits.

@jpmckinney jpmckinney merged commit 873f0a8 into new-master Sep 30, 2020
@jpmckinney jpmckinney deleted the james-work-in-progress-4 branch September 30, 2020 17:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants