Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gix pipeline filter instead of manual crlf implementation #9503

Merged
merged 2 commits into from
Feb 3, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
192 changes: 192 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion helix-vcs/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ tokio = { version = "1", features = ["rt", "rt-multi-thread", "time", "sync", "p
parking_lot = "0.12"
arc-swap = { version = "1.6.0" }

gix = { version = "0.58.0", default-features = false , optional = true }
gix = { version = "0.58.0", features = ["attributes"], default-features = false, optional = true }
imara-diff = "0.1.5"
anyhow = "1"

Expand Down
37 changes: 15 additions & 22 deletions helix-vcs/src/git.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
use anyhow::{bail, Context, Result};
use arc_swap::ArcSwap;
use gix::filter::plumbing::driver::apply::Delay;
use std::io::Read;
use std::path::Path;
use std::sync::Arc;

Expand Down Expand Up @@ -76,29 +78,20 @@ impl DiffProvider for Git {
let file_oid = find_file_in_commit(&repo, &head, file)?;

let file_object = repo.find_object(file_oid)?;
let mut data = file_object.detach().data;
// convert LF to CRLF if configured to avoid showing every line as changed
if repo
.config_snapshot()
.boolean("core.autocrlf")
.unwrap_or(false)
{
let mut normalized_file = Vec::with_capacity(data.len());
let mut at_cr = false;
for &byte in &data {
if byte == b'\n' {
// if this is a LF instead of a CRLF (last byte was not a CR)
// insert a new CR to generate a CRLF
if !at_cr {
normalized_file.push(b'\r');
}
}
at_cr = byte == b'\r';
normalized_file.push(byte)
}
data = normalized_file
let data = file_object.detach().data;
// Get the actual data that git would make out of the git object.
// This will apply the user's git config or attributes like crlf conversions.
if let Some(work_dir) = repo.work_dir() {
let rela_path = file.strip_prefix(work_dir)?.to_string_lossy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing lossy conversion is incorrect. Gix provides some utilities for proper path -> byte conversion.

let (mut pipeline, _) = repo.filter_pipeline(None)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note: It's a bit sad that the filter pipeline can't be reused under the circumstances. if this should ever be an issue that can be achieved by dropping down to the internal plumbing types.

However, I wouldn't go there as this will seriously complicate the code - I just wanted you to know that you can to shave off some cycles.

let mut worktree_outcome =
pipeline.convert_to_worktree(&data, rela_path.as_ref().into(), Delay::Forbid)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Byron how does this handle encoding? Git stores everything as utf-8 internally IIRC and only converts back on checkout.

I guess that this function would do that? Could we instead checkout utf-8 somehow. (My memory is very cloudy here so I may be missremebering)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the case, but it's true that Git has an 'internal' form that it wants to store in the object database. This is merely a form that passed through the filter pipeline though, which on Unix is most often a no-op.

It's true though that this method takes care of transforming data which is supposed to be directly from the object database and turn it into what would be checked out, applying all necessary transformations.

let mut buf = Vec::with_capacity(data.len());
worktree_outcome.read_to_end(&mut buf)?;
Ok(buf)
} else {
Ok(data)
}
Ok(data)
}

fn get_current_head_name(&self, file: &Path) -> Result<Arc<ArcSwap<Box<str>>>> {
Expand Down
Loading