[PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) #77

sakalouski · 2018-09-05T12:11:27Z

I have tried under admin rights as well - no luck.

vmarkovtsev · 2018-09-05T12:40:39Z

Not sure if borges-indexer build is supported on Windows. Tagging @erizocosmico

sakalouski · 2018-09-05T12:42:59Z

Thanks for such a rapid reply! Not a problem - launching it on linux then:)

erizocosmico · 2018-09-05T12:45:35Z

Looks like a dependency might have problems building on windows. To be honest, we never tried on Windows as we're not currently supporting windows as an OS on that tool.

sakalouski · 2018-09-12T12:58:52Z

@vmarkovtsev I have installed pga. I have downloaded some repos with .siva extension.

Still, from the docs You provide, there is couple of unclear points:

how to query this .siva from pyspark?
how to get this detailed metadata mentioned in engine, but without engine without a docker (working on a cluster - no root)?

Generally, I do not see logics behind the suggestion of these 3 tools in PublicGitArchive README (pga, multitool and borges-indexer). Possibly it is unclear only to me.

Concluding: I want an access to detailed metadata and timestamped source code. What do I do to get it without a docker?:)

Thank You

vmarkovtsev · 2018-09-13T08:43:58Z

@sakalouski Your choice is jgit-spark-connector
It is a JAR which is plugged into Spark. If your programming language is Python, it is as easy as

from pyspark.sql import SparkSession
from sourced.engine import Engine
session = SparkSession.builder.master("local[*]").appName("test").config("spark.jars.packages", "tech.sourced:engine:0.7.0").getOrCreate()
engine = Engine(session, "/path/to/siva")
print(engine.repositories.collect())

sakalouski · 2018-09-13T08:47:54Z

@vmarkovtsev Thank You for the information!

vmarkovtsev · 2018-09-13T08:47:55Z

BTW not sure what you mean by

detailed metadata and timestamped source code

In my understanding "detailed metadata" is the commit message, author and date and "timestamped source code" is git blame. So your other option is to run Hercules over the siva files without using Spark. If you really need git blame you will have to hack some Go code using the Hercules framework.

sakalouski · 2018-09-13T08:49:39Z

@vmarkovtsev regarding the metadata - I mean exactly what You have mentioned. Thanks for the hints - having a look!

vmarkovtsev · 2018-09-13T08:52:56Z

Oh and there is another option: SQL interface to the underlying repos via gitbase

sakalouski · 2018-09-13T09:00:31Z

@vmarkovtsev There are so many tools, but all of them are somehow different:/
So, in case if I want to make all the data processible in Pyspark, would You suggest using Hercules or gitbase?
As I see, gitbase mentions Apache Spark integration - does it mean I could config it to run Spark SQL?

People get confused and think that it is already working src-d/datasets#77 (comment)

vmarkovtsev · 2018-09-13T09:19:04Z

As far as I know gitbase+Spark integration is not ready yet. But yep this is the goal.

So the only way to run PySpark over siva atm is through jgit-spark-connector (former "engine").

sakalouski · 2018-09-13T09:23:25Z

@vmarkovtsev Thank You Vadim - You have saved me lots of time!

eiso · 2018-09-28T10:56:31Z

@sakalouski we'll keep you posted when Spark SQL integration is ready, it is on the top of our priority list. cc @vcoisne

vmarkovtsev added a commit to vmarkovtsev/gitbase that referenced this issue Sep 13, 2018

Change the notice about Spark integration

f248498

People get confused and think that it is already working src-d/datasets#77 (comment)

vmarkovtsev mentioned this issue Sep 13, 2018

Change the notice about Spark integration src-d/gitbase#468

Merged

m09 mentioned this issue Sep 13, 2018

Clean up documentation for PublicGitArchive #76

Open

6 tasks

smola changed the title ~~Pointer redeclared during import "unsafe"~~ [PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) Oct 2, 2018

smola added bug Something isn't working help wanted Extra attention is needed labels Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) #77

[PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) #77

sakalouski commented Sep 5, 2018

vmarkovtsev commented Sep 5, 2018 •

edited

Loading

sakalouski commented Sep 5, 2018

erizocosmico commented Sep 5, 2018

sakalouski commented Sep 12, 2018 •

edited

Loading

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

eiso commented Sep 28, 2018

[PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) #77

[PGA] Pointer redeclared during import "unsafe" (borges-indexer on Windows) #77

Comments

sakalouski commented Sep 5, 2018

vmarkovtsev commented Sep 5, 2018 • edited Loading

sakalouski commented Sep 5, 2018

erizocosmico commented Sep 5, 2018

sakalouski commented Sep 12, 2018 • edited Loading

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

vmarkovtsev commented Sep 13, 2018

sakalouski commented Sep 13, 2018

eiso commented Sep 28, 2018

vmarkovtsev commented Sep 5, 2018 •

edited

Loading

sakalouski commented Sep 12, 2018 •

edited

Loading