Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side of fetch/pull #307

Open
6 of 16 tasks
Byron opened this issue Jan 22, 2022 · 6 comments
Open
6 of 16 tasks

Server-side of fetch/pull #307

Byron opened this issue Jan 22, 2022 · 6 comments
Labels
C-tracking-issue An issue to track to track the progress of multiple PRs or issues

Comments

@Byron
Copy link
Owner

Byron commented Jan 22, 2022

What would be needed to allow a server to send a pack?

Tasks

Server fetch/pull (server to client)

  • git-odb

The below is a very early draft - it would be better to study existing implementations first to get a better overview on what (not) to do.
This one starts with the fun part to allow writing tests early and experiment with different diff algorithms and potentially their performance.

  • generate a pack from objects received by an iterator producing (see issue)
    • base objects only
    • re-use existing delta objects
    • A mechanism to declare some bases to be 'out of pack' for thin pack support
  • Iterator to feed pack generation efficiently
  • pack creation
  • git-transport

Certainly needs more research, but roughly…

  • Server side accept()

    • http(s)
    • ssh
    • daemon probaby only used in testing, and we might implement it if it's useful for us as well
  • git-protocol

    • Server side chatter to negotiate a pack for
      • protocol V2
      • protocol V1 (probably not worth it, let's see)
  • gix-serve

Probably more like a toy at first merely for testing operation against various git clients.

  • A server able to answer via
    • http(s)
    • file protocol (or remote invocation via SSH)

Notes

  • Could something like gittorrent be build using the plumbing of the server? Is it desirable even? Can there be some differentiation to allow custom transport layers easily?
@Byron Byron added the C-tracking-issue An issue to track to track the progress of multiple PRs or issues label Jan 22, 2022
@masklinn
Copy link

masklinn commented Jul 1, 2022

Server side accept()

http(s)
ssh

Just to be clear, this doesn't mean reimplementing things from accept() upwards, but supporting these as inputs using existing libraries, similar to what git-transport currently does on the client side?

@Byron
Copy link
Owner Author

Byron commented Jul 2, 2022

If I understand the question correctly, the answer is yes. The client of git-transport could be handled with what's provided by accept(), while the actual interaction patterns would be abstracted in git-protocol.

@ghost
Copy link

ghost commented Nov 4, 2022

This is very desirable to be able to quickly implement your own self-hosted git server (existing ones in other langs require too much memory and therefore are a poor fit for cheap vms).

@vlad-ivanov-name
Copy link

I think it might be enough to accept AsyncWrite and AsyncRead for transport without worrying too much where those come from. Or, at the very least, the whole HTTP and SSH plumbing, providing alternatives to tools like git-http-backend and git-upload/receive-pack, should be separate.

@willstott101
Copy link
Sponsor Contributor

willstott101 commented Dec 21, 2023

I've been experimenting with writing a server (in a private repo so far). My interest here is in using parts of gix for the protocol, but leaving the storage up to pluggable backends. I'm very curious about git-on-db, and git-on-object-storage, and git-on-kv-store, and mixtures of the three. Step 1 of this is to have a clean working HTTP git server written with gitoxide, using it's filesystem access as the only storage backend.

I think I have ls-refs working and I'm starting to investigate fetch. The protocol is very command-based, and in HTTP AsyncRead & AsyncWrite can't really exist at the same time. Regardless we'd want to leave it up to server implementations to authenticate during the connection, find the relevant repo, authorize the parsed command, then hand back to gitoxide to respond.

So for me a sketch might look like this:

  • Server parses HTTP headers & path to verify protocol v2, authenticate
  • async fn read_command(source: AsyncRead) -> Result<Command>
  • Server authorizes the command and writes response headers
  • async fn execute_command(cmd: Command, repo: ..., dest: AsyncWrite) -> Result<()>

It's possible that a sufficiently advanced AsyncWrite could have a buffer and be appended to the HTTP response after writing the headers (in-case the command parsing wanted to reply with errors in packet-line format?). I have also spent no time so far learning about the SSH transport. But those are my thoughts so far.

I also have some questions about packfile construction. I haven't spent a great deal of time investigating yet but I currently haven't found much in the existing gitoxide codebase. Especially relating to resolving packfiles between two peers, but there must be some logic in gix for this somewhere. I'll keep looking, and if anyone is particularly interested in collaborating let me know and I can un-private the repo, I'm just quite enjoying the messy private sandpit atm.

@Byron
Copy link
Owner Author

Byron commented Dec 22, 2023

That's great news! Please be sure to let us know here once the repo goes public!

Regarding the sketch, the server would probably also reject V1 requests. But then, read_command() would read the command itself and arguments to it, but I wonder if that's not a liability as it might read more than it has to given that the server might reject the command itself. That probably also depends on what information the server wants to use to reject the command, but I can imagine that a step-wise process would be better. Read the command-name, then read its arguments, but maybe that is implied.

In general, it's probably OK to just cobble it together and then refactor.

pack creation

Regarding packs, you can try gix free pack create and see from there how the API works. In general, packs can start streaming quickly, but they won't be the most efficient as they don't delta-compress on the fly. But that might even be a beneficial trade-off at first.

Transports

You can check the client-side (use gix --trace clone ssh://… to see how ssh is typically invoked. From there you will see that it definitely requires its own binary, but that should then be the easiest implementation as it's the same as gix --trace clone file://local/path. However, getting the server-side SSH server going is probably it's own set of problems to solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue An issue to track to track the progress of multiple PRs or issues
Projects
None yet
Development

No branches or pull requests

4 participants