Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for additional sites: Open Archives & Genealogy Online #94

Open
coret opened this issue Mar 15, 2021 · 3 comments
Open

Comments

@coret
Copy link

coret commented Mar 15, 2021

I'd like to recommend two sites to be included into SmartCopy: Open Archives (Dutch and Belgian archive sources, 270M profiles) and Genealogy Online (online trees, 60M profiles).

Both websites are multilingual and use microdata to semantically tag information. For Open Archives there's a good documented API too.

If there's more info on how to add a site to SmartCopy, I (founder of both sites) am willing too help.

@eljeffeg
Copy link
Owner

Nice, I'll take a look when I have a moment. I haven't created any docs on how to add additional sites - it just sort of grew and I haven't gone back to make it dev friendly.

Regarding your site, funny coincidence, I just saw one of your records today in a MyHeritage match, unfortunately, the links were dead. It was for Marriage record for Charlotte Lacasse, Jan 21 1800  St-Charles De Bellechasse, Qc

Here are the links:
https://www.genealogieonline.nl/en/les-celtes-base-1/I2668634.php
https://www.genealogieonline.nl/en/les-celtes-base-1/I2101343.php

@coret
Copy link
Author

coret commented Mar 18, 2021

@jeffg2k, Genealogy Online is a platform where genealogist can publish their genealogical data (and images). And genealogists also have the possibility to unpublish their data. The list of removed publications is available to MyHeritage too (as it it for Ancestry), but possibly they don't update their matches that often.

@coret
Copy link
Author

coret commented Mar 21, 2021

Are there sources which use microdata and are included in SmartCopy?
Those can form the basis of a collection .js file for Open Archives (record with archival data) and Genealogy Online (family trees) as the both websites use microdata. This means less HTML parsing/regexping as the data is semantically marked up.

To easily see the semanticly enriched data of an Open Archives page (as Google does) see https://search.google.com/structured-data/testing-tool/u/0/?hl=nl#url=https%3A%2F%2Fwww.openarch.nl%2Felo%3Abada0d37-3a2d-02ca-cf36-446ed4359927%2Fen (click on http://historical-data.org/HistoricalRecord in the right pane).

The URL structure of Open Archives is:
https://www.openarch.nl/{record collection}:{record guid}{optional language: /en|/de/fr}

And the same semantic insight of a Genealogy Online page see https://search.google.com/structured-data/testing-tool/u/0/?hl=nl#url=https%3A%2F%2Fwww.genealogieonline.nl%2Fen%2Fkwartierstaat-hans-flipse%2FI2263.php (click on https://schema.org/Person in the right pane).

The URL structure of Genealogy Online:
https://www.genealogieonline.nl/{optional language: en/|de/|fr/}{publication_uri}/{person_uri}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants