Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for filae.com #28

Open
raphink opened this issue Jan 16, 2017 · 15 comments
Open

Support for filae.com #28

raphink opened this issue Jan 16, 2017 · 15 comments
Assignees

Comments

@raphink
Copy link
Contributor

raphink commented Jan 16, 2017

In France, Filae.com has lots of informations and trees. Would you be willing to accept a PR to support it?

@eljeffeg
Copy link
Owner

Let me investigate how to query data from the site and how difficult it would be to create a parser for it. Do you have a link to a public profile or tree I could use for testing? One challenge will be that I don't speak French and I don't see an option to set the site language to English.

I also have a request in for geneanet.org, so I'll be looking at that as well.

@raphink
Copy link
Contributor Author

raphink commented Jan 19, 2017

I'm afraid you need a premium account to see profiles. They do give you a 1 month trial though (which I'm currently using). Geneanet would be interesting, too.

@raphink
Copy link
Contributor Author

raphink commented Jan 19, 2017

I could try to code it already. Do you need a Geni Pro account to use this?

@eljeffeg
Copy link
Owner

You need a Geni Pro account to be able to copy over family members (API restriction on Geni).

I often use http://www.nosorigines.qc.ca for free French / Canadian trees, so I've considered adding that. What I've tried to do thus far is focus on either the most popular or free sites.

@raphink
Copy link
Contributor Author

raphink commented Jan 20, 2017

So getting a myheritage data subscription only makes sense with geni pro then ?

@eljeffeg
Copy link
Owner

I think the MyHeritage Data subscription has a lot of value for research. SmartCopy just makes it easier to copy that data from MyHeritage to Geni. But SmartCopy can work on many websites, to include trees and records at FamilySearch.

@raphink
Copy link
Contributor Author

raphink commented Jan 20, 2017

@eljeffeg
Copy link
Owner

I have to say, geneanet.org is horribly constructed. It's so unstructured. No classes, ids, uggg. It's one of the toughest pages I've seen for parsing.

@raphink
Copy link
Contributor Author

raphink commented Jan 27, 2017

Yes indeed. It's a mess of a DOM...

@eljeffeg
Copy link
Owner

In some cases, it has been easier to grab the relationships from the initial page and then actually download and parse the parent pages, siblings, children. The data structure for the profile is usually more complete (such as containing the gender) and easier to parse, than trying to grab it out of the relationship info on the focus page.

Here is one page I was using as a test. The death date is giving several profiles a problem. http://gw.geneanet.org/genevtabouis?lang=en&pz=sosa+fictif&nz=legoutiere&ocz=0&p=augustine+madeleine+augustine&n=riviere

@eljeffeg
Copy link
Owner

eljeffeg commented Jan 27, 2017

We'll also need to redirect the getcode if the url contains type=tree or type=medias

@eljeffeg
Copy link
Owner

Is filae.com still something you're working on? I hadn't seen any changes to that branch.

@raphink
Copy link
Contributor Author

raphink commented Feb 19, 2017

No I haven't worked on it really. The main reason is that I hardly use the genealogical trees I find there as there's many more trees on Geneanet. So the main value of Filae is records, but they're not really parseable most of the time. I might want to work on this again later, but for now it's not worth the effort.

@Tuisto59
Copy link

Tuisto59 commented May 6, 2018

I'm a french python codder and I wrote a little parser to go through geneanet.
It make the research, iterate over the result page, go through each tree, and make the table familly tree ancestor (ascendant) through the option of geneanet. I have also a premium account, but its work for normal account.
It use requests, re and panda to make it library and python standard library. It's simple.
The goal are to make a consensus tree, I also will devellop the research of the newest founded ancestor and make the research again an again until we found anything else.
The difficult part are in how to make the logic of the algorythm, by implemented all the logic of a familly tree maker researcher.
By comparing the data between each other and like a familly tree maker, choose the good one and make the search again to reach new ancestor.
Also with premium account its possible to make a search to find same individual in other tree that carry new and more information like parents, parse in it, and complete and found again ancestor to reach the maximum top end.

I was looking for also parsing the data inside the numerised microfilm of f i l a e . com and get all the indexed reccord. I have also an access to it with my familly tree local association.

F i l a e stole every numerised picture of every departement of france and profite of a juridic void to make index it, billions of picture was reached by this companie and all of them were sended to other companie to process the image and make indexation (Like familly search index image research).

like f i l a e I also make parsing for belgium picture (i can download them in HD) and same for 62, 80, 59, and 02 departemental archives. All in python.

if you will where interested, contact me at: yoan [dot] bouzin {at} gmail [dot] com

@raphink
Copy link
Contributor Author

raphink commented May 7, 2018

@Tuisto59 we already have a parser for geneanet. The problem with filae is parsing the dom that is pretty bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants