Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Node has no left sibling" when calling paginate_webentity_pagelinks #462

Open
dale-wahl opened this issue Jun 16, 2022 · 6 comments
Open
Labels

Comments

@dale-wahl
Copy link

dale-wahl commented Jun 16, 2022

I'm running into an issue getting the links from a particular webentity. I keep receiving a "Node has no left sibling" message instead. I'm assuming it has to do with a particular link since I'm able to get the first set of links from the webentity. Is there any way I can go find the culprit to remove it and collect the rest of the links for the network? Thanks!

Backend Docker container logs:

2022-06-16 08:02:53+0000 [DEBUG - QUERY from MYIP, 192.168.0.3] {u'params': [10393, 10, u'31607|230|0#2Zm', False, u'ic-2-356581'], u'method': u'store.paginate_webentity_pagelinks_network'}
2022-06-16 08:02:53+0000 [INFO - ic-2-356581] Traph client query: paginate_webentity_pagelinks [10393, ["s:https|h:org|h:immunize|h:www|", "s:http|h:org|h:immunize|", "s:http|h:org|h:immunize|h:www|"]] {"include_outbound": false, "pagination_token": "0#2Zm", "source_page_count": 10}
2022-06-16 08:02:53+0000 [INFO - ic-2-356581] Traph server answer: {"query": "paginate_webentity_pagelinks", "code": "success", "result": {"done": false, "token": "0#9pv", "count_pagelinks": 1187, "count_sourcepages": 10, "pagelinks": [["s:https|h:org|h:immunize|h:www|p:vw|", "s:https|h:org|h:immunize|h:www|p:vw|p:|", 1], ["s:https|h:org|h:immunize|h:www|p:vw|p:archive.asp|", "s:https|h:org|h:immunize|h:www|p:vw|", 1], ["s:https|h:org|h:immunize|h:www|p:vw|p:|", "s:https|h:org|h:immunize|h:www|p:vax-and-covid-19|p:|", 1], ["s:https|h:org|h:immunize|h:www|p:vw|p ... [132994 cars truncated]
2022-06-16 08:02:53+0000 [DEBUG - ANSWER] store.paginate_webentity_pagelinks_network: "{\"jsonrpc\": \"2.0\", \"result\": {\"code\": \"success\", \"result\": {\"token\": \"32794|240|0#9pv\", \"links\": [[\"s:https|h:org|h:immunize|h:www|p:vw|\", \"s:https|h:org|h:immunize|h:www|p:vw|p:|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:archive.asp|\", \"s:https|h:org|h:immunize|h:www|p:vw|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:vax-and-covid-19|p:|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:laws|p:exemptions.asp|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:news|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:subscribe|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:acip|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:shop|\", 1], [\"s:https|h:org|h:immunize|h:www|p:vw|p:|\", \"s:https|h:org|h:immunize|h:www|p:mening ... [137206 cars truncated]
2022-06-16 08:02:53+0000 [DEBUG - QUERY from MYIP, 192.168.0.3] {u'params': [10393, 10, u'32794|240|0#9pv', False, u'ic-2-356581'], u'method': u'store.paginate_webentity_pagelinks_network'}
2022-06-16 08:02:53+0000 [INFO - ic-2-356581] Traph client query: paginate_webentity_pagelinks [10393, ["s:https|h:org|h:immunize|h:www|", "s:http|h:org|h:immunize|", "s:http|h:org|h:immunize|h:www|"]] {"include_outbound": false, "pagination_token": "0#9pv", "source_page_count": 10}
2022-06-16 08:02:53+0000 [INFO - ic-2-356581] Traph server answer: {"query": {"args": [10393, ["s:https|h:org|h:immunize|h:www|", "s:http|h:org|h:immunize|", "s:http|h:org|h:immunize|h:www|"]], "method": "paginate_webentity_pagelinks", "kwargs": {"include_outbound": false, "pagination_token": "0#9pv", "source_page_count": 10}}, "message": "Node has no left sibling.", "code": "fail"}
2022-06-16 08:02:53+0000 [DEBUG - ANSWER] store.paginate_webentity_pagelinks_network: "{\"jsonrpc\": \"2.0\", \"result\": {\"query\": {\"args\": [10393, [\"s:https|h:org|h:immunize|h:www|\", \"s:http|h:org|h:immunize|\", \"s:http|h:org|h:immunize|h:www|\"]], \"method\": \"paginate_webentity_pagelinks\", \"kwargs\": {\"include_outbound\": false, \"pagination_token\": \"0#9pv\", \"source_page_count\": 10}}, \"message\": \"Node has no left sibling.\", \"code\": \"fail\"}, \"id\": null}"
@boogheta
Copy link
Member

boogheta commented Jun 16, 2022

Hello Dale,
That's a first sorry, we would need to investigate a bit to understand what's happening.
Is your corpus big? Could you try and share with us its traph data? (you should have a traph-data directory either in your hyphe one or under the DATA_PATH you might have set in your .env file, and it should contain one directory per corpus id.
An alternative would be to share with us a dump of your corpus' pages collection from the mongodb container, using mongodump -d "hyphe_CORPUSID" -c pages within the container

@dale-wahl
Copy link
Author

It's not my biggest network... But it's the first time I've seen this! 25k webentities.

Here is the traph-data directory. Let me know if you want the mongo dump as well.

@boogheta
Copy link
Member

Hello Dale,
The file requires authorization access, I've requested it yesterday but didn't get it yet.

@dale-wahl
Copy link
Author

Hey @boogheta, I accepted your request for access a while ago. Just commenting here in case you didn't see the notification.

@boogheta
Copy link
Member

Hello Dale, I got it yes, @Yomguithereal started looking at it but we don't have many leads yet
If you're in a hurry, I'm afraid you should probably rather restart the corpus from scratch (which is a bit of a pain I know... :/ )

@dale-wahl
Copy link
Author

No rush. I'll look at load and see if I can recollect.

I do wish I could skip the one link (or page) and collect the rest of the network, but am unsure how to do that.

@boogheta boogheta added the bug label Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants