Skip to content

Commit 009d705

Browse files
committed
feat: limit to number of urls checked
1 parent 2303fc1 commit 009d705

File tree

3 files changed

+7
-0
lines changed

3 files changed

+7
-0
lines changed

example.env

+1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ ACCESS_ID=cca
22
API_KEY="secret..."
33
HOST=api.summon.serialssolutions.com
44
KOHA_DOMAIN=library.cca.edu
5+
LINKCHECK_LIMIT=500
56
LINKCHECK_LOGFILE=data/linkcheck.log
67
LINKCHECK_REPORT="https://library.cca.edu/cgi-bin/koha/svc/report?id=345"
78
LINKCHECK_OPAC_URL="https://library.cca.edu/cgi-bin/koha/opac-detail.pl?biblionumber={id}"

linkcheck/linkcheck.py

+5
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,17 @@ def quote(list):
4242

4343

4444
def main() -> None:
45+
count = 0
4546
report = httpx.get(config["LINKCHECK_REPORT"])
4647
for bib in report.json():
4748
# bibs are arrays like [urls string, title, biblionumber]
4849
urls, title, id = bib
4950
# urls are separated by " | "
5051
urls = urls.split(" | ")
5152
for url in urls:
53+
count += 1
54+
if config.get("LINKCHECK_LIMIT") and count > int(config["LINKCHECK_LIMIT"]):
55+
break
5256
try:
5357
r = httpx.get(url, follow_redirects=True)
5458
status = r.status_code
@@ -108,3 +112,4 @@ def signal_handler(sig, frame) -> None:
108112
# TODO but the script keeps running
109113
signal.signal(signal.SIGINT, signal_handler)
110114
main()
115+
summarize()

linkcheck/readme.md

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Takes a public Koha report and checks each URL (`856$u`) to see if they resolve
66

77
The script uses the same .env file as the root project or it can take environment variables.
88

9+
- `LINKCHECK_LIMIT` number of links to check (leave undefined for all of them)
910
- `LINKCHECK_REPORT` URL to a Koha report that returns item URLs (see [report.sql](./report.sql)). Report must be Public.
1011
- `LINKCHECK_OPAC_URL` catalog link for individual records, should include `biblionumber={id}` in it (id is interpolated)
1112
- `LINKCHECK_LOGFILE` path to logged CSV, defaults to the data dir named "YYYY-MM-DD-linkcheck.csv" with today's date

0 commit comments

Comments
 (0)