-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support the --xidmap option in Bulkload* #4917
Comments
Bulk loader's The |
Okay, but what is the purpose? How do we use Both uses very similar flags. If they don't do the same behavior, this should be stated somewhere in the docs. And/Or use a different flag. Also, an example of usage should be documented. |
Should be a simple change. Let's bring the two in-sync. |
Okay guys, after some time trying to identify the contexts. I will share my findings. 1 - In fact --store_xids and --xidmap are totally different things. 2 - As there is no documentation or any specific test for both* (actually there are tests for XIDMAP, not the --store_xids). It is suggested that they would behave in the same way by having similar names. But it's not true. Perhaps the tests have been lost over time, as this feature is very old and is related to RDFs. It is necessary to update the documentation (I can do this) and also to share the two functions between both loaders in my opinion. Both features available in the two tools would be very good for users. 3 - About the tools. --xidmap: A feature that we only have in Live load. It is very useful to create a mutation pipeline using Live load. Where we can reuse the blank nodes (and XIDs) with each new data ingestion without having to use the Upsert Block for example. It is very useful for those who maintain consistent control over the use of Blank nodes naming (and XIDs too, e.g. data coming from RDF triple stores). Liveloader asks the user to enter a path so that it can be saved in posting lists (I guess it is posting lists, but they are files from Badger). That way you can reuse the XIDs with each new data ingestion just by indicating the location of the previously saved XIDs. This functionality does not exist in the Bulk loader as my tests concluded. --store_xids: A feature that exists only in Bulkloader. It takes the XID or blank node and saves it to the same node as a property of it with the predicate name as "xid". e.g.:
This feature is not compatible with --xidmap. This functionality at first does not seem useful. But I'm sure it's related to the approach on external IDs https://docs.dgraph.io/mutations/#external-ids e.g.:
It can be useful in this case and we can use Upsert Block. But it is not useful for those who need to ingest large amounts of data. Only small cases. |
Thanks for the detailed info @MichelDiz! So, the --store_xids option in bulkload is working after all. But the original expected behaviour and result problem, to have bulkload create an xid map the same as live loader, is still open. Should we close this issue and open a new one? |
This issue is for Bulkloader and it is fine (it is already assigned internaly), in this case now to support |
Update
Previously it was believed that the --store_xids flag had the same behave as
--xidmap
. However, it is not true. But the--xidmap
flag is needed in Bulkloader.What version of Dgraph are you using?
Have you tried reproducing the issue with the latest release?
Yes.
Steps to reproduce the issue (command/config used to run Dgraph).
Update: Ignore these steps, go to my last comment below.
Expected behavior and actual result.
It should create an XID folder to be used in later imports via Live loader.
Origin of this issue: https://discuss.dgraph.io/t/bulk-loader-x-option/6115
The text was updated successfully, but these errors were encountered: