Skip to content

Latest commit

 

History

History
212 lines (125 loc) · 16.9 KB

File metadata and controls

212 lines (125 loc) · 16.9 KB

Sourcegraph architecture overview

This document provides a high level overview of Sourcegraph's architecture so you can understand how our systems fit together.

Code syncing

At its core, Sourcegraph maintains a persistent cache of all the code that is connected to it. It is persistent, because this data is critical for Sourcegraph to function, but it is ultimately a cache because the code host is the source of truth and our cache is eventually consistent.

  • gitserver is the sharded service that stores the code and makes it accessible to other Sourcegraph services.
  • repo-updater is the singleton service that is responsible for ensuring all the code in gitserver is as up-to-date as possible while respecting code host rate limits. It is also responsible for syncing code repository metadata from the code host that is stored in the repo table of our Postgres database.

If you want to learn more about how code is synchronized, read Life of a repository.

Search

Devs can search across all the code that is connected to their Sourcegraph instance.

By default, Sourcegraph uses zoekt to create a trigram index of the default branch of every repository so that searches are fast. This trigram index is the reason why Sourcegraph search is more powerful and faster than what is usually provided by code hosts.

Sourcegraph also has a fast search path for code that isn't indexed yet, or for code that will never be indexed (for example: code that is not on a default branch). Indexing every branch of every repository isn't a pragmatic use of resources for most customers, so this decision balances optimizing the common case (searching all default branches) with space savings (not indexing everything).

  • searcher implements the non-indexed search.

Syntax highlighting for any code view, including search results, is provided by Syntect server.

If you want to learn more about search:

Code intelligence

Code intelligence surfaces data (for example: doc comments for a symbol) and actions (for example: go to definition, find references) based on our semantic understanding of code (unlike search, which is completely text based).

By default, Sourcegraph provides imprecise search-based code intelligence. This reuses all the architecture that makes search fast, but it can result in false positives (for example: finding two definitions for a symbol, or references that aren't actually references), or false negatives (for example: not able to find the definition or all references). This is the default because it works with no extra configuration and is pretty good for many use cases and languages. We support a lot of languages this way because it only requires writing a few regular expressions.

With some setup, customers can enable precise code intelligence. Repositories add a step to their build pipeline that computes the index for that revision of code and uploads it to Sourcegraph. We have to write language specific indexers, so adding precise code intel support for new languages is a non-trivial task.

If you want to learn more about code intelligence:

Batch Changes

Batch Changes (formerly known as campaigns) creates and manages large scale code changes across projects, repositories, and code hosts.

To create a batch change, users write a batch spec, which is a YAML file that specifies the changes that should be performed, and the repositories that they should be performed upon — either through a Sourcegraph search, or by declaring them directly. This spec is then executed by src-cli on the user's machine (or in CI, or some other environment controlled by the user), which results in changeset specs that are sent to Sourcegraph. These changeset specs are then applied by Sourcegraph to create one or more changesets per repository. (A changeset is a pull request or merge request, depending on the code host.)

Once created, changesets are monitored by Sourcegraph, and their current review and CI status can be viewed on the batch change's page, providing a single pane of glass view of all the changesets created as part of the batch change. The batch change can be updated at any time by re-applying the original batch spec: this will transparently add or remove changesets in repositories that now match or don't match the original search as needed.

If you want to learn more about batch changes:

Code insights

Code insights surface higher-level, aggregated information to leaders in engineering organizations in dashboards. For example, code insights can track the number of matches of a search query over time, the number of code intelligence diagnostic warnings in a code base, usage of different programming languages, or even data from external services, like test coverage from Codecov. Sample use cases for this are for tracking migrations, usage of libraries across an organization, tech debt, code base health, and much more.

Code insights are currently feature-flagged - set "experimentalFeatures": { "codeInsights": true } in your user settings to enable them.

Code insights currently work through extensions. A code insight extension can register a view provider that contributes a graph to either the repository/directory page, the search homepage, or the global "Insights" dashboard reachable from the navbar. It is called on-demand on the client (the browser) to return the data needed for the chart. How that extension produces the data is up to the extension - it can run search queries, query code intelligence data or analyze Git data using the Sourcegraph GraphQL API, or it can query an external service using its public API, e.g. Codecov.

To enable a code insight, install one of the code insights extensions. The extension can then be configured in your user settings according to the examples in the extension README. Just like other extensions, it's also possible to install and configure them organization-wide.

Because of code insights currently being run on-demand in the client, the performance of code insights is bound to the performance of the underlying data source. For example, search queries are relatively fast as long as the scope doesn't include many repositories, but performance degrades when trying to include a lot of repositories. We're actively working on removing this limitation.

If you want to learn more about code insights:

Code monitoring

Code monitoring allows users to get notified of changes to their codebase.

Users can view, edit and create code monitors through the code monitoring UI (/code-monitoring). A code monitor comprises a trigger, and one or more actions.

The trigger watches for new data and if there is new data we call this an event. For now, the only supported trigger is a search query of type:diff or type:commit, run every five minutes by the Go backend with an automatically added after: parameter narrowing down the diffs/commits that should be searched. The monitor's configured actions are run when this query returns a non-zero number of results.

The actions are run in response to a trigger event. For now, the only supported action is an email notification to the primary email address of the code monitor's owner. In order for this to work, email.address and email.smtp must be configured in site configuration. Code monitoring actions will be extended in the future to support webhooks.

If you want to learn more about code monitoring:

Browser extensions

The Sourcegraph browser extensions bring the features of Sourcegraph directly into the UI of code hosts such as GitHub, GitLab and Bitbucket.

With the Sourcegraph browser extension installed, users get Sourcegraph features (including code intelligence and Sourcegraph extensions) on their code host while browsing code, viewing diffs, or reviewing pull requests.

This lets users get value from Sourcegraph without leaving their existing workflows on their code host, while also giving them a convenient way to jump into Sourcegraph at any time (by using the Open in Sourcegraph button on any repository or file). The browser extension also adds an address bar search shortcut, allowing you to search on Sourcegraph directly from the browser address bar.

If you want to learn more about browser extensions:

Native integrations (for code hosts)

Native integrations bring Sourcegraph features directly into the UI of code hosts, in a similar way to the browser extension.

Instead of requiring a browser extension, native integrations inject a script by extending the code host directly (for example, using the code host's plugin architecture). The advantage is that Sourcegraph can be enabled for all users of a code host instance, without any action required from each user.

If you want to learn more about native integrations:

Sourcegraph extension API

The Sourcegraph extension API allows developers to write extensions that extend the functionality of Sourcegraph.

Extensions that use the API can add elements and interactions to the Sourcegraph UI, such as:

  • adding action buttons in the toolbar
  • decorating specific lines of code in a file
  • contributing hover tooltip information on specific tokens in a file
  • decorating files in directory listings

Some core features of Sourcegraph, like displaying code intelligence hover tooltips, are implemented using the extension API.

If you want to learn more about our extension API:

src-cli

src-cli, or src, is a command line tool that users can run locally to interact with Sourcegraph.

src-cli is written in Go, and distributed as a standalone binary for Windows, macOS, and Linux. Its features include running searches, managing Sourcegraph, and executing batch changes. src-cli is an integral part of the batch changes product.

Note that src-cli is not contained within the Sourcegraph monorepo, and has its own release cadence.

If you want to learn more about src-cli:

Editor extensions

Sourcegraph editor extensions will bring Sourcegraph features like search, code intelligence, and Sourcegraph extensions into your IDE. (Switching between Sourcegraph and an IDE when viewing a file is separately powered by Sourcegraph extensions.)

The editor extension is still in the exploratory phase of determining priority and scope. For more information:

Deployment

Sourcegraph is deployable via three supported methods:

  • Kubernetes is intended for all medium to large scale production deployments that require fault tolerance and high availibility. For advanced users only with significant kubernetes experience required. This deployment method is developed in deploy-sourcegraph.
  • Docker-Compose is intended to be used for small to medium production deployments, with some customization available. Easy to setup with basic infrastructure and docker knowledge required. A variation on this is the pure-Docker option. Both of these deployment methods are developed in deploy-sourcegraph-docker.
  • Server for small environments on a single server. Easiest and quickest to setup with a single command. Little infrastructure knowledge is required. This deployment method is developed in cmd/server.

The resource estimator can guide you on the requirements for each deployment type.

Observability

Observability encapsulates the monitoring and debugging of Sourcegraph deployments. Sourcegraph is designed, and ships with, a number of observability tools and capabilities out-of-the box to enable visibility into the health and state of a Sourcegraph deployment.

Monitoring includes metrics and dashboards, alerting, and health checking capabilities. Learn more about monitoring in the monitoring architecture overview.

  • grafana is the frontend for service metrics, and ships with customized dashboards for Sourcegraph services.
  • prometheus handles scraping of service metrics, and ships with recording rules, alert rules, and alerting capabilities.
  • cadvisor provides per-container performance metrics (scraped by Prometheus) in most Sourcegraph environments.
  • Health checks are provided by each Sourcegraph service.

Debugging includes tracing and logging.

If you want to learn more about observability:

Diagram

You can click on each component to jump to its respective code repository or subtree.

Note that almost every service has a link back to the frontend, from which it gathers configuration updates. These edges are omitted for clarity.

Other resources