Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

codeintel/design: Go to definition of unindexed/uncloned code #13137

Closed
efritz opened this issue Aug 19, 2020 · 5 comments
Closed

codeintel/design: Go to definition of unindexed/uncloned code #13137

efritz opened this issue Aug 19, 2020 · 5 comments
Assignees
Labels
estimate/0.5d feature-request spike Time boxed investigation meant to facilitate more granular planning. team/graph Graph Team (previously Code Intel/Language Tools/Language Platform)
Milestone

Comments

@efritz
Copy link
Contributor

efritz commented Aug 19, 2020

Code navigation currently fails if both the source and target of a go to definition operation are not cloned in the Sourcegraph instance.

In situations where RepoA depends on RepoB:

  • if RepoA and RepoB both have LSIF indexes, we get precise go to definition;
  • if RepoA has an LSIF index but RepoB does not, we fall back to search-based code intelligence, which may find the definition in RepoB;
  • if RepoB is not cloned in the instance, we have nothing to target.

We should be able to match the behavior of VSCode by implementing a TextDocumentContentProvider in these circumstances, which will be able to render the text of a non-local dependency. This may require that we index the text documents of dependencies while indexing RepoA. Alternatively, we could federate code intelligence requests to a public Sourcegraph instance (https://sourcegraph.com) which is likely to contain these third-party dependencies.

This ticket is tracking an RFC effort to propose a solution to this problem.

@efritz efritz added the team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) label Aug 19, 2020
@efritz efritz added this to the 3.20 milestone Aug 19, 2020
@efritz efritz self-assigned this Aug 19, 2020
@efritz efritz added the spike Time boxed investigation meant to facilitate more granular planning. label Aug 19, 2020
@efritz efritz changed the title codeintel/design: Jump to definition of third-party code codeintel/design: Go to definition of third-party code Aug 19, 2020
@aidaeology aidaeology removed this from the 3.20 milestone Aug 19, 2020
@efritz efritz added this to the 3.20 milestone Aug 19, 2020
@efritz efritz changed the title codeintel/design: Go to definition of third-party code codeintel/design: Go to definition of unindexed/uncloned code Aug 19, 2020
@felixfbecker
Copy link
Contributor

Cross-posting the proposal from #9952 for the case where the target repo is cloned but has no LSIF index itself (specifically for TypeScript, but may be applicable or even easier for other languages):

Could the LSIF indexer use the information from node_modules to determine the target repo, commit, file and position like the language server does it? I wrote an overview of the algorithm back then here: sourcegraph/sourcegraph-typescript#8 (comment)

The "resolve clone URL to repo URL using raw API" step would probably not be done by the indexer, but by the LSIF server or JIT by the code intel extension (not sure how it currently works for LSIF-to-LSIF xrepo j2d). The indexer would just do the steps of the algorithm necessary to produce the information necessary for LSIF's xrepo information encoding mechanism (package monikers etc).

@olafurpg
Copy link
Member

I am excited about this feature! We have a large monorepo with many external dependencies that we currently can't navigate to via Sourcegraph. I agree that cross-repository navigation would be the ideal UX but I estimate it would require a large effort from our side to keep an updated list of the repositories for our external dependencies and associate their git commits with their published library versions. For example, conventions around git tagging vary between library to library.

The way IDEs support code navigatio in the JVM world (Java, Scala, Kotlin, Clojure, ...) is that they index *-sources.jar files that are published by libraries alongside their bytecode *.jar files.

Just thinking out loud, would it be possible to somehow upload *-sources.jar files to Sourcegraph and treat them similarly to repositories? Jar files can be read exactly the same way as *.zip files, for example try running the commands below

cd $(mktemp -d)
wget https://repo1.maven.org/maven2/com/google/guava/guava/29.0-jre/guava-29.0-jre-sources.jar
unzip guava-29.0-jre-sources.jar
ls com/google/common/cache/
AbstractCache.java         Cache.java         CacheBuilderSpec.java  CacheStats.java       ForwardingLoadingCache.java  LocalCache.java   LongAddables.java  package-info.java    RemovalCause.java     RemovalListeners.java     Striped64.java
AbstractLoadingCache.java  CacheBuilder.java  CacheLoader.java       ForwardingCache.java  LoadingCache.java            LongAddable.java  LongAdder.java     ReferenceEntry.java  RemovalListener.java  RemovalNotification.java  Weigher.java

Alternatively, I wonder if we could register our Artifactory instance (essentially a static file server) as a "repository" since it contains *-sources.jar for all of our external libraries. For example, we could point Sourcegraph at https://repo1.maven.org/maven2/ and it would walk through the file system and index everything (although don't try that on Maven Central since it's very large and you will likely get rate limited! 😄 ).

@davejrt
Copy link
Contributor

davejrt commented Sep 14, 2020

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.20 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly.
When in doubt, reach out!

Thank you

@efritz
Copy link
Contributor Author

efritz commented Sep 15, 2020

I'm currently drafting an RFC for this feature at https://docs.google.com/document/d/1QigoTPGbc5ztGRzeqBBFVM5sD_uLfT4yOO3qxch2kGE. Feel free to leave comments if you have input while we discuss some possible implementation avenues.

@efritz
Copy link
Contributor Author

efritz commented Sep 16, 2020

I'm going to close this issue as we will track discussion and later implementation progress in the RFC.

@efritz efritz closed this as completed Sep 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
estimate/0.5d feature-request spike Time boxed investigation meant to facilitate more granular planning. team/graph Graph Team (previously Code Intel/Language Tools/Language Platform)
Projects
None yet
Development

No branches or pull requests

5 participants