-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) #77
Comments
Not sure if borges-indexer build is supported on Windows. Tagging @erizocosmico |
Thanks for such a rapid reply! Not a problem - launching it on linux then:) |
Looks like a dependency might have problems building on windows. To be honest, we never tried on Windows as we're not currently supporting windows as an OS on that tool. |
@vmarkovtsev I have installed pga. I have downloaded some repos with .siva extension. Still, from the docs You provide, there is couple of unclear points:
Generally, I do not see logics behind the suggestion of these 3 tools in PublicGitArchive README (pga, multitool and borges-indexer). Possibly it is unclear only to me. Concluding: I want an access to detailed metadata and timestamped source code. What do I do to get it without a docker?:) Thank You |
@sakalouski Your choice is jgit-spark-connector from pyspark.sql import SparkSession
from sourced.engine import Engine
session = SparkSession.builder.master("local[*]").appName("test").config("spark.jars.packages", "tech.sourced:engine:0.7.0").getOrCreate()
engine = Engine(session, "/path/to/siva")
print(engine.repositories.collect()) |
@vmarkovtsev Thank You for the information! |
BTW not sure what you mean by
In my understanding "detailed metadata" is the commit message, author and date and "timestamped source code" is |
@vmarkovtsev regarding the metadata - I mean exactly what You have mentioned. Thanks for the hints - having a look! |
Oh and there is another option: SQL interface to the underlying repos via gitbase |
@vmarkovtsev There are so many tools, but all of them are somehow different:/ |
People get confused and think that it is already working src-d/datasets#77 (comment)
As far as I know gitbase+Spark integration is not ready yet. But yep this is the goal. So the only way to run PySpark over siva atm is through jgit-spark-connector (former "engine"). |
@vmarkovtsev Thank You Vadim - You have saved me lots of time! |
@sakalouski we'll keep you posted when Spark SQL integration is ready, it is on the top of our priority list. cc @vcoisne |
I have tried under admin rights as well - no luck.
The text was updated successfully, but these errors were encountered: