Skip to content

Comments

Bump Wikipedia export version#738

Merged
maxjakob merged 1 commit intomasterfrom
wikipedia-export-version
Jan 31, 2025
Merged

Bump Wikipedia export version#738
maxjakob merged 1 commit intomasterfrom
wikipedia-export-version

Conversation

@maxjakob
Copy link
Contributor

@maxjakob maxjakob commented Jan 31, 2025

To fix

> python _tools/parse_documents.py dewiki-20250120-pages-articles.xml.bz2

Traceback (most recent call last):
  File "/Users/maxjakob/src/rally-tracks/wikipedia/_tools/parse_documents.py", line 59, in <module>
    to_json(file_name)
  File "/Users/maxjakob/src/rally-tracks/wikipedia/_tools/parse_documents.py", line 28, in to_json
    for doc_data in doc_generator(fp):
  File "/Users/maxjakob/src/rally-tracks/wikipedia/_tools/parse_documents.py", line 16, in doc_generator
    yield parse_page(element, namespaces)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maxjakob/src/rally-tracks/wikipedia/_tools/parse_documents.py", line 42, in parse_page
    "title": element.find("title", XML_NAMESPACES).text,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'text'

See https://www.mediawiki.org/wiki/Help:Export#Export_format

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:rubber-stamp:

@maxjakob maxjakob merged commit 6e2396f into master Jan 31, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants