Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glob file type support #8006

Merged
merged 11 commits into from
Feb 11, 2024
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 15 additions & 13 deletions book/src/languages.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,24 +78,26 @@ from the above section. `file-types` is a list of strings or tables, for
example:

```toml
file-types = ["Makefile", "toml", { suffix = ".git/config" }]
file-types = ["toml", { glob = "Makefile" }, { glob = ".git/config" }, { glob = ".github/workflows/*.yaml" } ]
```

When determining a language configuration to use, Helix searches the file-types
with the following priorities:

1. Exact match: if the filename of a file is an exact match of a string in a
`file-types` list, that language wins. In the example above, `"Makefile"`
will match against `Makefile` files.
2. Extension: if there are no exact matches, any `file-types` string that
matches the file extension of a given file wins. In the example above, the
`"toml"` matches files like `Cargo.toml` or `languages.toml`.
3. Suffix: if there are still no matches, any values in `suffix` tables
are checked against the full path of the given file. In the example above,
the `{ suffix = ".git/config" }` would match against any `config` files
in `.git` directories. Note: `/` is used as the directory separator but is
replaced at runtime with the appropriate path separator for the operating
system, so this rule would match against `.git\config` files on Windows.
1. Glob: values in `glob` tables are checked against the full path of the given
file. Globs are standard Unix-style path globs (e.g. the kind you use in Shell)
and can be used to match paths for a specific prefix, suffix, directory, etc.
In the above example, the `{ glob = "Makefile" }` config would match files
with the name `Makefile`, the `{ glob = ".git/config" }` config would match
`config` files in `.git` directories, and the `{ glob = ".github/workflows/*.yaml" }`
config would match any `yaml` files in `.github/workflow` directories. Note
that globs should always use the Unix path separator `/` even on Windows systems;
the matcher will automatically take the machine-specific separators into account.
If the glob isn't an absolute path or doesn't already start with a glob prefix,
`*/` will automatically be added to ensure it matches for any subdirectory.
2. Extension: if there are no glob matches, any `file-types` string that matches
the file extension of a given file wins. In the example above, the `"toml"`
config matches files like `Cargo.toml` or `languages.toml`.

## Language Server configuration

Expand Down
1 change: 1 addition & 0 deletions helix-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ chrono = { version = "0.4", default-features = false, features = ["alloc", "std"

etcetera = "0.8"
textwrap = "0.16.0"
globset = "0.4.14"

nucleo.workspace = true
parking_lot = "0.12"
Expand Down
45 changes: 40 additions & 5 deletions helix-core/src/config.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,45 @@
/// Syntax configuration loader based on built-in languages.toml.
pub fn default_syntax_loader() -> crate::syntax::Configuration {
use crate::syntax::{Configuration, Loader, LoaderError};

/// Language configuration based on built-in languages.toml.
pub fn default_lang_config() -> Configuration {
helix_loader::config::default_lang_config()
.try_into()
.expect("Could not serialize built-in languages.toml")
.expect("Could not deserialize built-in languages.toml")
}
/// Syntax configuration loader based on user configured languages.toml.
pub fn user_syntax_loader() -> Result<crate::syntax::Configuration, toml::de::Error> {

/// Language configuration loader based on built-in languages.toml.
pub fn default_lang_loader() -> Loader {
Loader::new(default_lang_config()).expect("Could not compile loader for default config")
}

#[derive(Debug)]
pub enum LanguageLoaderError {
DeserializeError(toml::de::Error),
LoaderError(LoaderError),
}

impl std::fmt::Display for LanguageLoaderError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::DeserializeError(err) => write!(f, "Failed to parse language config: {err}"),
Self::LoaderError(err) => write!(f, "Failed to compile language config: {err}"),
}
}
}

impl std::error::Error for LanguageLoaderError {}

/// Language configuration based on user configured languages.toml.
pub fn user_lang_config() -> Result<Configuration, toml::de::Error> {
helix_loader::config::user_lang_config()?.try_into()
}

/// Language configuration loader based on user configured languages.toml.
pub fn user_lang_loader() -> Result<Loader, LanguageLoaderError> {
let config: Configuration = helix_loader::config::user_lang_config()
.map_err(LanguageLoaderError::DeserializeError)?
.try_into()
.map_err(LanguageLoaderError::DeserializeError)?;

Loader::new(config).map_err(LanguageLoaderError::LoaderError)
}
150 changes: 96 additions & 54 deletions helix-core/src/syntax.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,6 @@ pub struct Configuration {
pub language_server: HashMap<String, LanguageServerConfiguration>,
}

impl Default for Configuration {
fn default() -> Self {
crate::config::default_syntax_loader()
}
}

// largely based on tree-sitter/cli/src/loader.rs
#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case", deny_unknown_fields)]
Expand Down Expand Up @@ -164,9 +158,11 @@ pub enum FileType {
/// The extension of the file, either the `Path::extension` or the full
/// filename if the file does not have an extension.
Extension(String),
/// The suffix of a file. This is compared to a given file's absolute
/// path, so it can be used to detect files based on their directories.
Suffix(String),
/// A Unix-style path glob. This is compared to the file's absolute path, so
/// it can be used to detect files based on their directories. If the glob
/// is not an absolute path and does not already start with a glob pattern,
/// a glob pattern will be prepended to it.
Glob(globset::Glob),
}

impl Serialize for FileType {
Expand All @@ -178,9 +174,9 @@ impl Serialize for FileType {

match self {
FileType::Extension(extension) => serializer.serialize_str(extension),
FileType::Suffix(suffix) => {
FileType::Glob(glob) => {
let mut map = serializer.serialize_map(Some(1))?;
map.serialize_entry("suffix", &suffix.replace(std::path::MAIN_SEPARATOR, "/"))?;
map.serialize_entry("glob", glob.glob())?;
map.end()
}
}
Expand Down Expand Up @@ -213,9 +209,20 @@ impl<'de> Deserialize<'de> for FileType {
M: serde::de::MapAccess<'de>,
{
match map.next_entry::<String, String>()? {
Some((key, suffix)) if key == "suffix" => Ok(FileType::Suffix({
suffix.replace('/', std::path::MAIN_SEPARATOR_STR)
})),
Some((key, mut glob)) if key == "glob" => {
// If the glob isn't an absolute path or already starts
// with a glob pattern, add a leading glob so we
// properly match relative paths.
if !glob.starts_with('/') && !glob.starts_with("*/") {
glob.insert_str(0, "*/");
}

globset::Glob::new(glob.as_str())
.map(FileType::Glob)
.map_err(|err| {
serde::de::Error::custom(format!("invalid `glob` pattern: {}", err))
})
}
Some((key, _value)) => Err(serde::de::Error::custom(format!(
"unknown key in `file-types` list: {}",
key
Expand Down Expand Up @@ -752,81 +759,113 @@ pub struct SoftWrap {
pub wrap_at_text_width: Option<bool>,
}

#[derive(Debug)]
struct FileTypeGlob {
glob: globset::Glob,
language_id: usize,
}

impl FileTypeGlob {
fn new(glob: globset::Glob, language_id: usize) -> Self {
Self { glob, language_id }
}
}

#[derive(Debug)]
struct FileTypeGlobMatcher {
matcher: globset::GlobSet,
file_types: Vec<FileTypeGlob>,
}

impl FileTypeGlobMatcher {
fn new(file_types: Vec<FileTypeGlob>) -> Result<Self, globset::Error> {
let mut builder = globset::GlobSetBuilder::new();
for file_type in &file_types {
builder.add(file_type.glob.clone());
}

Ok(Self {
matcher: builder.build()?,
file_types,
})
}

fn language_id_for_path(&self, path: &Path) -> Option<&usize> {
self.matcher
.matches(path)
.iter()
.filter_map(|idx| self.file_types.get(*idx))
.max_by_key(|file_type| file_type.glob.glob().len())
.map(|file_type| &file_type.language_id)
}
}

// Expose loader as Lazy<> global since it's always static?

#[derive(Debug)]
pub struct Loader {
// highlight_names ?
language_configs: Vec<Arc<LanguageConfiguration>>,
language_config_ids_by_extension: HashMap<String, usize>, // Vec<usize>
language_config_ids_by_suffix: HashMap<String, usize>,
language_config_ids_glob_matcher: FileTypeGlobMatcher,
language_config_ids_by_shebang: HashMap<String, usize>,

language_server_configs: HashMap<String, LanguageServerConfiguration>,

scopes: ArcSwap<Vec<String>>,
}

pub type LoaderError = globset::Error;

impl Loader {
pub fn new(config: Configuration) -> Self {
let mut loader = Self {
language_configs: Vec::new(),
language_server_configs: config.language_server,
language_config_ids_by_extension: HashMap::new(),
language_config_ids_by_suffix: HashMap::new(),
language_config_ids_by_shebang: HashMap::new(),
scopes: ArcSwap::from_pointee(Vec::new()),
};
pub fn new(config: Configuration) -> Result<Self, LoaderError> {
let mut language_configs = Vec::new();
let mut language_config_ids_by_extension = HashMap::new();
let mut language_config_ids_by_shebang = HashMap::new();
let mut file_type_globs = Vec::new();

for config in config.language {
// get the next id
let language_id = loader.language_configs.len();
let language_id = language_configs.len();

for file_type in &config.file_types {
// entry().or_insert(Vec::new).push(language_id);
match file_type {
FileType::Extension(extension) => loader
.language_config_ids_by_extension
.insert(extension.clone(), language_id),
FileType::Suffix(suffix) => loader
.language_config_ids_by_suffix
.insert(suffix.clone(), language_id),
FileType::Extension(extension) => {
language_config_ids_by_extension.insert(extension.clone(), language_id);
}
FileType::Glob(glob) => {
file_type_globs.push(FileTypeGlob::new(glob.to_owned(), language_id));
}
};
}
for shebang in &config.shebangs {
loader
.language_config_ids_by_shebang
.insert(shebang.clone(), language_id);
language_config_ids_by_shebang.insert(shebang.clone(), language_id);
}

loader.language_configs.push(Arc::new(config));
language_configs.push(Arc::new(config));
}

loader
Ok(Self {
language_configs,
language_config_ids_by_extension,
language_config_ids_glob_matcher: FileTypeGlobMatcher::new(file_type_globs)?,
language_config_ids_by_shebang,
language_server_configs: config.language_server,
scopes: ArcSwap::from_pointee(Vec::new()),
})
}

pub fn language_config_for_file_name(&self, path: &Path) -> Option<Arc<LanguageConfiguration>> {
// Find all the language configurations that match this file name
// or a suffix of the file name.
let configuration_id = path
.file_name()
.and_then(|n| n.to_str())
.and_then(|file_name| self.language_config_ids_by_extension.get(file_name))
let configuration_id = self
.language_config_ids_glob_matcher
.language_id_for_path(path)
.or_else(|| {
path.extension()
.and_then(|extension| extension.to_str())
.and_then(|extension| self.language_config_ids_by_extension.get(extension))
})
.or_else(|| {
self.language_config_ids_by_suffix
.iter()
.find_map(|(file_type, id)| {
if path.to_str()?.ends_with(file_type) {
Some(id)
} else {
None
}
})
});

configuration_id.and_then(|&id| self.language_configs.get(id).cloned())
Expand Down Expand Up @@ -2592,7 +2631,8 @@ mod test {
let loader = Loader::new(Configuration {
language: vec![],
language_server: HashMap::new(),
});
})
.unwrap();
let language = get_language("rust").unwrap();

let query = Query::new(language, query_str).unwrap();
Expand Down Expand Up @@ -2654,7 +2694,8 @@ mod test {
let loader = Loader::new(Configuration {
language: vec![],
language_server: HashMap::new(),
});
})
.unwrap();

let language = get_language("rust").unwrap();
let config = HighlightConfiguration::new(
Expand Down Expand Up @@ -2760,7 +2801,8 @@ mod test {
let loader = Loader::new(Configuration {
language: vec![],
language_server: HashMap::new(),
});
})
.unwrap();
let language = get_language(language_name).unwrap();

let config = HighlightConfiguration::new(language, "", "", "").unwrap();
Expand Down
2 changes: 1 addition & 1 deletion helix-core/tests/indent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ fn test_treesitter_indent(
lang_scope: &str,
ignored_lines: Vec<std::ops::Range<usize>>,
) {
let loader = Loader::new(indent_tests_config());
let loader = Loader::new(indent_tests_config()).unwrap();

// set runtime path so we can find the queries
let mut runtime = std::path::PathBuf::from(env!("CARGO_MANIFEST_DIR"));
Expand Down
14 changes: 4 additions & 10 deletions helix-term/src/application.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,7 @@ fn setup_integration_logging() {
}

impl Application {
pub fn new(
args: Args,
config: Config,
syn_loader_conf: syntax::Configuration,
) -> Result<Self, Error> {
pub fn new(args: Args, config: Config, lang_loader: syntax::Loader) -> Result<Self, Error> {
#[cfg(feature = "integration")]
setup_integration_logging();

Expand All @@ -126,7 +122,7 @@ impl Application {
})
.unwrap_or_else(|| theme_loader.default_theme(true_color));

let syn_loader = std::sync::Arc::new(syntax::Loader::new(syn_loader_conf));
let syn_loader = std::sync::Arc::new(lang_loader);

#[cfg(not(feature = "integration"))]
let backend = CrosstermBackend::new(stdout(), &config.editor);
Expand Down Expand Up @@ -394,10 +390,8 @@ impl Application {

/// refresh language config after config change
fn refresh_language_config(&mut self) -> Result<(), Error> {
let syntax_config = helix_core::config::user_syntax_loader()
.map_err(|err| anyhow::anyhow!("Failed to load language config: {}", err))?;

self.syn_loader = std::sync::Arc::new(syntax::Loader::new(syntax_config));
let lang_loader = helix_core::config::user_lang_loader()?;
self.syn_loader = std::sync::Arc::new(lang_loader);
self.editor.syn_loader = self.syn_loader.clone();
for document in self.editor.documents.values_mut() {
document.detect_language(self.syn_loader.clone());
Expand Down
Loading
Loading