Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator directory format #6

Merged
merged 1 commit into from
Jul 10, 2023
Merged

Generator directory format #6

merged 1 commit into from
Jul 10, 2023

Conversation

newsch
Copy link
Collaborator

@newsch newsch commented Jun 23, 2023

I decided to break up the next steps into smaller PRs compared to the last one.

This PR updates the program to create to the folder structure that the map generator expects, e.g.:

.
├── de.wikipedia.org
│  └── wiki
│     ├── Coal_River_Springs_Territorial_Park
│     │  ├── de.html
│     │  └── ru.html
│     ├── Ni'iinlii_Njik_(Fishing_Branch)_Territorial_Park
│     │  ├── de.html
│     │  └── en.html
│    ...
├── en.wikipedia.org
│  └── wiki
│     ├── Arctic_National_Wildlife_Refuge
│     │  ├── de.html
│     │  ├── en.html
│     │  ├── es.html
│     │  ├── fr.html
│     │  └── ru.html
│     │
│     │ **NOTE: Article titles with a `/` are not escaped, so "Baltimore/Washington_International_Airport" becomes two subfolders as below.**
│     │
│     ├── Baltimore
│     │  └── Washington_International_Airport
│     │     ├── de.html
│     │     ├── en.html
│     │     ├── es.html
│     │     ├── fr.html
│     │     └── ru.html
│    ...
└── wikidata
   ├── Q59320
   │  ├── de.html
   │  ├── en.html
   │  ├── es.html
   │  ├── fr.html
   │  └── ru.html
   ├── Q120306
   │  ├── de.html
   │  ├── en.html
   │  ├── es.html
   │  ├── fr.html
   │  └── ru.html
  ...

While the old description scraper would write duplicates for the same article's title and qid, this implementation writes symlinks in the wikipedia tree that point to the wikidata files.

I know I can change what the generator looks for, but I figured it would be easier to have this working and then change them together instead of debugging both at the same time while neither works.

The goal is that with this PR, the parser will be a drop-in replacement for the current scraper, even if the speed and html size is not what we'd like.

Remaining work for this PR:

  • handle articles without QIDs (yes, they exist! 🤷)
  • only write symlinks for requested redirects
  • handle updating existing files (e.g. timestamps) timestamps moved to Skip articles that haven't changed between dumps #9
  • do a test run with the generator and multiple languages
  • add documentation for running with multiple languages

@newsch newsch changed the title Finalize directory format Generator directory format Jun 23, 2023
Copy link
Member

@biodranik biodranik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach )

src/main.rs Outdated
Ok(title) => title,
};

// NOTE: Some wikipedia titles have '/' in them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are they processed in the generator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator only works with complete wikipedia urls (and wikidata QIDs). The OSM tags like en:Article_Title are converted to urls somewhere early in the OSM ingestion process.
It dumps the urls to a file for the descriptions scraper, then when it adds them to the mwm files it strips the protocol, appends the url to the base directory, and looks for language html files in the folder at that location..
It doesn't do any special processing for articles with a slash in the title, they are just another subdirectory down. I'll update the diagram to show that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Please don't forget that if something can be simplified or improved by changing the current generator approach, then it makes sense to do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a list of changes that would be helpful

src/main.rs Outdated

// NOTE: Some wikipedia titles have '/' in them.
let wikipedia_dir = title.get_dir(base.as_ref().to_owned());
// TODO: handle incorrect links, directories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "incorrect links, directories" refers to updating a directory tree from a previous run, instead of starting from scratch. Right now the behavior is to skip any file that exists.

/// Write selected article to disk.
///
/// - Write page contents to wikidata page (`wikidata.org/wiki/QXXX/lang.html`).
/// - If the page has no wikidata qid, write contents to wikipedia location (`lang.wikipedia.org/wiki/article_title/lang.html`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lang is used two times here in the path, but only one file is always stored in the directory, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior that the generator/scraper expects is to write all available translations in each directory.
So for the article for Berlin, if there are OSM tags for wikipedia:en=Berlin, wikipedia:de=Berlin, wikipedia:fr=Berlin and wikidata=Q64, and the generator keeps them all, then there will be four folders with duplicates of all language copies:

en.wikipedia.org/wiki/Berlin/{en.html, de.html, fr.html, ...}
de.wikipedia.org/wiki/Berlin/{en.html, de.html, fr.html, ...}
fr.wikipedia.org/wiki/Berlin/{en.html, de.html, fr.html, ...}
wikidata/Q64/{en.html, de.html, fr.html, ...}

Now, I don't understand exactly how the generator picks which tags to use yet, but just from looking at the Canada Yukon region map there are duplicated copies of wikipedia items there.

For this program, we only see one language at a time, so we write that copy to the master wikidata directory. When later we get the same article in a different language, we write it to the same wikidata directory.

Once all the languages have been processed, it would look like:

en.wikipedia.org/wiki/Berlin/ -> wikidata/Q64/
de.wikipedia.org/wiki/Berlin/ -> wikidata/Q64/
fr.wikipedia.org/wiki/Berlin/ -> wikidata/Q64/
wikidata/Q64/{en.html, de.html, fr.html, ...}

@@ -132,6 +152,11 @@ impl FromStr for WikidataQid {
///
/// assert!(WikipediaTitleNorm::from_url("https://en.wikipedia.org/not_a_wiki_page").is_err());
/// assert!(WikipediaTitleNorm::from_url("https://wikidata.org/wiki/Q12345").is_err());
///
/// assert!(
/// WikipediaTitleNorm::from_url("https://de.wikipedia.org/wiki/Breil/Brigels").unwrap() !=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can / be percent-escaped in such cases? How the generator handles it now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could be, I haven't looked for that. Wikipedia works with either.

See below for more details, but the generator should decode those before dumping the urls.

It looks like a handful of encoded titles still slip through, but none with %2F=/.
I made an issue with some notes about this in #7.

From my read of when it first adds a wikipedia tag and later writes it as a url:

  1. If the tag looks like a url instead of the expected lang:Article Title format, take what's after .wikipedia.org/wiki/, url decode it, replace underscores with spaces, then concat that with the lang at the beginning of the url and store it.
  2. Otherwise attempt to check if it's a url, replace underscores with spaces, and store it.
  3. To transform it back into a url, replace spaces with underscores in the title, escape any %s, and add it to the end of https://lang.wikipedia.org/wiki/.

Glancing at the url decoding, I don't think there's anything wrong with it - it should handle arbitrary characters, although neither the encoding or decoding look unicode-aware.

@@ -145,7 +170,7 @@ impl WikipediaTitleNorm {
title.trim().replace(' ', "_")
}

// https://en.wikipedia.org/wiki/Article_Title
// https://en.wikipedia.org/wiki/Article_Title/More_Title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is more than one slash in the title possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a handful, for example https://en.wikipedia.org/wiki/KXTV/KOVR/KCRA_Tower.

There are 39 present in the generator urls
$ grep -E '^https://\w+\.wikipedia\.org/wiki/.+/.+/' /tmp/wikipedia_urls.txt | sort | uniq
https://de.wikipedia.org/wiki/Darum/Gretesch/Lüstringen
https://de.wikipedia.org/wiki/Kienhorst/Köllnseen/Eichheide
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Erlangen/A#Altstädter_Friedhof_2/3,_Altstädter_Friedhof
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/001-1/099)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/001–1/099)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/001–1/099)#Evang._Christuskirche
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/100–1/199)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/200–1/299)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/300–1/399)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/400–1/499)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/500–1/580)
https://de.wikipedia.org/wiki/Liste_der_Baudenkmäler_in_Neuss_(1/500–1/580)#Schulgeb.C3.A4ude
https://de.wikipedia.org/wiki/Rhumeaue/Ellerniederung/Gillersheimer_Bachtal
https://de.wikipedia.org/wiki/Speck_/_Wehl_/_Helpenstein
https://de.wikipedia.org/wiki/Veldrom/Feldrom/Kempen
https://de.wikipedia.org/wiki/VHS_Witten/Wetter/Herdecke
https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Österreich/JE/Bach
https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Österreich/JE/Judenberg
https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Österreich/JE/Kramerberg
https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Österreich/JE/Loasleiten
https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Österreich/JE/Pelzereck
https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Österreich/JE/Theresienberg
https://de.wikipedia.org/wiki/Wohnanlage_Arzbacher_Straße/Thalkirchner_Straße/Wackersberger_Straße/Würzstraße
https://en.wikipedia.org/wiki/Abura/Asebu/Kwamankese_District
https://en.wikipedia.org/wiki/Ajumako/Enyan/Essiam_District
https://en.wikipedia.org/wiki/Bibiani/Anhwiaso/Bekwai_Municipal_District
https://en.wikipedia.org/wiki/Clapp/Langley/Crawford_Complex
https://en.wikipedia.org/wiki/KXTV/KOVR/KCRA_Tower
https://en.wikipedia.org/wiki/SAIT/AUArts/Jubilee_station
https://en.wikipedia.org/wiki/Santa_Cruz/Graciosa_Bay/Luova_Airport
https://fr.wikipedia.org/wiki/Landunvez#/media/Fichier:10_Samson_C.jpg
https://gl.wikipedia.org/wiki/Moaña#/media/Ficheiro:Plano_de_Moaña.png
https://it.wikipedia.org/wiki/Tswagare/Lothoje/Lokalana
https://lb.wikipedia.org/wiki/Lëscht_vun_den_nationale_Monumenter_an_der_Gemeng_Betzder#/media/Fichier:Roodt-sur-Syre,_14_rue_d'Olingen.jpg
https://pt.wikipedia.org/wiki/Wikipédia:Wikipédia_na_Universidade/Cursos/Rurtugal/Gontães
https://ru.wikipedia.org/wiki/Алажиде#/maplink/0
https://uk.wikipedia.org/wiki/Вікіпедія:Вікі_любить_пам'ятки/Волинська_область/Старовижівський_район
https://uk.wikipedia.org/wiki/Вікіпедія:Вікі_любить_пам'ятки/Київська_область/Броварський_район
https://uk.wikipedia.org/wiki/Вікіпедія:Вікі_любить_пам'ятки/Полтавська_область/Семенівський_район

@newsch newsch mentioned this pull request Jun 29, 2023
4 tasks
@newsch newsch marked this pull request as ready for review June 30, 2023 21:24
@newsch
Copy link
Collaborator Author

newsch commented Jun 30, 2023

I ran them with all languages on my machine. I only have 4 cores, so more than two instances didn't show much of an improvement.
I didn't run into any errors, but there is a race condition between checking if the folder for a QID exists and creating it.
If we decide to do parallelism by running multiple instances, that should be handled. But I think we will be better off running multiple decompression threads internally.

Speaking of which, after investigating pgzip further, my understanding is it can only parallelize decompressing files that it compressed in a specific way. I'll make another issue for investigating other gunzip implementations.

@biodranik
Copy link
Member

Parallelism is the next step, it can be done using existing tools. Let's lower its priority.

Why is there a race condition with QID? Aren't they created from a separate pass over the OSM dump?

@newsch
Copy link
Collaborator Author

newsch commented Jul 3, 2023

Why is there a race condition with QID? Aren't they created from a separate pass over the OSM dump?

When running multiple instances in parallel, they could process different translations of an article at the same time, and interleave between checking that the QID folder doesn't exist and creating it.

The same thing could hypothetically happen with article title folders, but since each dump is in a different language it shouldn't occur.

It is probably unlikely to occur, and it won't take down the entire program.
I can add special handling for the error to mitigate it.

@biodranik
Copy link
Member

Aren't file system operations atomic? Adding handler for the case "tried to create it but it was already created by another process" is a good idea.

@newsch
Copy link
Collaborator Author

newsch commented Jul 4, 2023

Yes, individual syscalls should be atomic but I don't think there are any guarantees between the call to path.is_dir() and fs::create_dir(&path).

It looks like create_dir_all explicitly handles this though by checking if the directory exists after getting an error. So it should not be a problem after all.

@newsch newsch added this to the v0.1 milestone Jul 4, 2023
@newsch newsch requested a review from biodranik July 5, 2023 14:21
README.md Outdated
To serve as a drop-in replacement for the descriptions scraper:
- Install this tool to `$PATH` as `om-wikiparser`.
- Download [the dumps in the desired languages](https://dumps.wikimedia.org/other/enterprise_html/runs/) (Use the files with the format `${LANG}wiki-NS0-${DATE}-ENTERPRISE-HTML.json.tar.gz`).
- Set `WIKIPEDIA_ENTERPRISE_DUMPS` to the list of the dump files to process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List? Delimited by what? Any example? Is specifying a directory with dumps better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a shell list/array(?), separated by spaces.

One example is a glob, so using a directory and then referencing $WIKIPEDIA_DUMP_DIRECTORY/*.json.tar.gz might be clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to mention list item separators explicitly and provide some example for clarity.

README.md Outdated
@@ -4,5 +4,24 @@ _Extracts articles from [Wikipedia database dumps](https://en.wikipedia.org/wiki

## Usage

To serve as a drop-in replacement for the descriptions scraper:
- Install this tool to `$PATH` as `om-wikiparser`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it should be at PATH? Can it be run from any directory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be, the example script read more clearly to me if it's in the context of the intermediate_data directory. It could also be run as ../../../wikiparser/target/release/om-wikiparser, with cargo run --release from the wikiparser directory, or anything else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...then why suggesting to install the tool at PATH?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that you can always reference it as om-wikiparser wherever you are, without worrying about where it is relative to you, or copying it into your working directory.

I meant this as an explanation of how to use it, not a step-by-step for what to run on the build server filesystem.

Maybe writing a shell script to use on the maps server instead would be helpful?

Would you prefer:

# Transform intermediate files from generator.
cut -f 2 id_to_wikidata.csv > wikidata_ids.txt
tail -n +2 wiki_urls.txt | cut -f 3 > wikipedia_urls.txt
# Begin extraction.
for dump in $WIKIPEDIA_ENTERPRISE_DUMPS
do
  tar xzf $dump | $WIKIPARSER_DIR/target/release/om-wikiparser \
    --wikidata-ids wikidata_ids.txt \
    --wikipedia-urls wikipedia_urls.txt \
    descriptions/
done

or

# Transform intermediate files from generator.
maps_build=~/maps_build/$BUILD_DATE/intermediate_data
cut -f 2 $maps_build/id_to_wikidata.csv > $maps_build/wikidata_ids.txt
tail -n +2 $maps_build/wiki_urls.txt | cut -f 3 > $maps_build/wikipedia_urls.txt
# Begin extraction.
for dump in $WIKIPEDIA_ENTERPRISE_DUMPS
do
  tar xzf $dump | ./target/release/om-wikiparser \
    --wikidata-ids $maps_build/wikidata_ids.txt \
    --wikipedia-urls $maps_build/wikipedia_urls.txt \
    $maps_build/descriptions/
done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can it be wrapped in a helper script that can be easily customized and run on the generator, maybe directly from the wikiparser repo? :)
  2. cargo run -r may be even better instead of a path to binary :) But it's also ok to hard-code the path or use $WIKIPARSER_BINARY var.

Think about me testing your code soon on a production server. Less surprises = less stress ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it may make sense to also print/measure time taken to execute some commands after the first run on the whole planet, to have some reference starting values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the README to be more of an explanation, and make another issue/PR for a script that handles the build directory, timing, backtraces, saving logs, etc.

README.md Outdated
- Set `WIKIPEDIA_ENTERPRISE_DUMPS` to the list of the dump files to process
- Run the following from within the `intermediate_data` subdirectory of the maps build directory:
```shell
# transform intermediate files from generator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is extracting ids directly from the osm pbf planet dump better than relying on the intermediate generator files? What are pros and cons?

Copy link
Collaborator Author

@newsch newsch Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pros:

  • Independent of the generator process. Can be run as soon as planet file is updated.

Cons:

  • Need to keep osm query in sync with generator's own multi-step filtering and transformation process.
  • Need to match generator's multi-step processing of urls exactly.

When I did this earlier, it was with the osm-filter tool, I only tested it on the yukon region, and it output more entries than the generator did.

I can create an issue for this, but the rough steps to get that working are:

  • Convert osmfilter query to osmium command so it can work on pbf files directly.
  • Dig into generator map processing to try to improve querying.
  • Compare processing of a complete planet with generator output.
  • Write conversion of osmuim output for wikiparser to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does it make sense to create an issue to document the existing generator's output format, and propose some improvements if necessary? What kind of complex transformations are done in the generator now, and why?
  2. What's wrong in outputting more URLs? I assume that the generator may filter now OSM POIs/types that we are not supporting yet. In the worst case, some more articles will be extracted from the planet, right? Do you remember how big is the percent of "unnecessary" articles?
  3. osmfilter can work with o5m, osmconvert can process pbf. There is also https://docs.rs/osmpbf/latest/osmpbf/ for direct pbf processing if it makes the approach simpler. How good is the osmium tool compared to other options?

It would be great to have a well-defined and independent API between the generator and wikiparser, to avoid complications when supporting it in the longer term. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a well-defined and independent API between the generator and wikiparser, to avoid complications when supporting it in the longer term.

Absolutely agree!

Does it make sense to create an issue to document the existing generator's output format, and propose some improvements if necessary? What kind of complex transformations are done in the generator now, and why?

I think so, do you mean the wikipedia/wikidata files or the mwm format in general?

By transformations, when I looked at it last, it looked like it was doing some sort of merging of ways/nodes/shapes to get a single parent object.
When I compared the OSM IDs that it output with the Wikidata ids it didn't match up with what I got from osmfilter, even if the urls were the same. Not a problem for the wikiparser, as long as the QIDs/articles are all caught, but it was harder to tell if they were doing the same thing.

As we talked about before, there are also multiple layers of filtering nodes by amenity or other tags, and I only looked at the final layer when I was trying this osmfilter approach (based on ftypes_matcher.cpp).

What's wrong in outputting more URLs? I assume that the generator may filter now OSM POIs/types that we are not supporting yet. In the worst case, some more articles will be extracted from the planet, right?

As you say, the worst case isn't a problem for the end user, but I want to do more comparisons with the whole planet file to be confident that this is really a superset of them.

Do you remember how big is the percent of "unnecessary" articles?

That was around 25%, but in the Yukon territory so not very many nodes and I would guess not comparable to the planet.

How good is the osmium tool compared to other options?

I haven't looked into omium much, but my understanding is it is at least as powerful as osmfilter/osmconvert. I know we talked about using pbfs directly at some point so that's why I mentioned it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, do you mean the wikipedia/wikidata files or the mwm format in general?

I meant those files that are required for wikiparser to work. It actually may make sense to keep it in README or some other doc, not in an issue.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/wm/page.rs Outdated Show resolved Hide resolved
The map generator expects a certain folder structure created by the
current scraper to add the article content into the mwm files.

- Article html is written to wikidata directory.
- Directories are created for any matched titles and symlinked to the
  wikidata directory.
- Articles without a QID are written to article title directory.
- Article titles containing `/` are not escaped, so multiple
  subdirectories are possible.

The output folder hierarchy looks like this:

    .
    ├── de.wikipedia.org
    │  └── wiki
    │     ├── Coal_River_Springs_Territorial_Park
    │     │  ├── de.html
    │     │  └── ru.html
    │     ├── Ni'iinlii_Njik_(Fishing_Branch)_Territorial_Park
    │     │  ├── de.html
    │     │  └── en.html
    │    ...
    ├── en.wikipedia.org
    │  └── wiki
    │     ├── Arctic_National_Wildlife_Refuge
    │     │  ├── de.html
    │     │  ├── en.html
    │     │  ├── es.html
    │     │  ├── fr.html
    │     │  └── ru.html
    │     ├── Baltimore
    │     │  └── Washington_International_Airport
    │     │     ├── de.html
    │     │     ├── en.html
    │     │     ├── es.html
    │     │     ├── fr.html
    │     │     └── ru.html
    │    ...
    └── wikidata
       ├── Q59320
       │  ├── de.html
       │  ├── en.html
       │  ├── es.html
       │  ├── fr.html
       │  └── ru.html
       ├── Q120306
       │  ├── de.html
       │  ├── en.html
       │  ├── es.html
       │  ├── fr.html
       │  └── ru.html
      ...

Signed-off-by: Evan Lloyd New-Schmidt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants