Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(parse/json): add test for bug where overrides erroneously override special parsing options #3260

Conversation

dyc3
Copy link
Contributor

@dyc3 dyc3 commented Jun 22, 2024

Summary

This one is kinda weird, and this doesn't really feel like the correct fix. I'm unfamiliar with this area of the codebase.

If an override is applied to a json file that is considered "well known" to be parsed with trailing commas and/or comments, then those settings will not be applied because the override is present (not necessarily using the override's value, if the settings are cached).
For example, if we add the following override to this repo's biome.json, running biome check will fail, even though it should be effectively a no-op.

"overrides": [
    {
      "include": ["**/*.json"],
      "formatter": {
        "indentStyle": "space"
      }
    }
  ]

Note that json files that have different default parsing settings, that also match the override must be checked. The following command does not reproduce the issue:

cargo run --bin biome -- check packages/@biomejs/backend-jsonrpc/tsconfig.json

This PR aims to fix this behavior, allowing the settings from JsonFileSource take more precedence if the settings are true. This PR now just adds a disabled unit test for the correct behavior.

I'd like to know more about the purpose of caching the settings. Removing this cache read branch from the following code also fixes the issue: (After testing it again, this change actually doesn't fix it)

if let Ok(readonly_cache) = self.cached_json_parser_options.read() {
if let Some(cached_options) = readonly_cache.as_ref() {
*options = *cached_options;
return;
}
}

blocks: #3246
tangentally related to: #1724
additional context: #3246 (comment)

Test Plan

The following command now runs without emitting diagnostics, with the override applied to biome.json.

pnpm check

Planning to add a specific unit test.

@github-actions github-actions bot added the A-Project Area: project label Jun 22, 2024
@dyc3 dyc3 force-pushed the 06-22-fix_parse_json_don_t_let_overrides_erroneously_override_parsing_options branch from ed73e95 to 8ebe45f Compare June 22, 2024 19:30
@github-actions github-actions bot added the A-CLI Area: CLI label Jun 22, 2024
Comment on lines 196 to 202
// HACK: prevent overrides from affecting how special json files are parsed
if options_file_source.allow_comments {
options.allow_comments = true;
}
if options_file_source.allow_trailing_commas {
options.allow_trailing_commas = true;
}
Copy link
Member

@Sec-ant Sec-ant Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should prioritize "file source" (well-known files) specific options over "overrides" options. For example, if a user wants to forbid comments or trailing commas in tsconfig.json, they should be able to achieve that.

File source specific options should be the default fallback < then normal options < then overrides options

I think we need to fix this problem in .to_override_json_parser_options.

I also think it's acceptable to temporarily disable the cache if it can give us the correct behavior, if maintaining them is error prone.

@Sec-ant
Copy link
Member

Sec-ant commented Jun 23, 2024

The following isn't necessarily related to the issue, but can be a good refactoring that makes the code easier to read and maintain.

I think we should add a parser_options method to this struct (we already have format_options and analyzer_options here):

impl<'a> WorkspaceSettingsHandle<'a> {
/// Resolve the formatting context for the given language
pub(crate) fn format_options<L>(
&self,
path: &BiomePath,
file_source: &DocumentFileSource,
) -> L::FormatOptions
where
L: ServiceLanguage,
{
let settings = self.inner.get_current_settings();
let formatter = settings.map(|s| &s.formatter);
let overrides = settings.map(|s| &s.override_settings);
let editor_settings = settings
.map(|s| L::lookup_settings(&s.languages))
.map(|result| &result.formatter);
L::resolve_format_options(formatter, overrides, editor_settings, path, file_source)
}
pub(crate) fn analyzer_options<L>(
&self,
path: &BiomePath,
file_source: &DocumentFileSource,
) -> AnalyzerOptions
where
L: ServiceLanguage,
{
let settings = self.inner.get_current_settings();
let linter = settings.map(|s| &s.linter);
let overrides = settings.map(|s| &s.override_settings);
let editor_settings = settings
.map(|s| L::lookup_settings(&s.languages))
.map(|result| &result.linter);
L::resolve_analyzer_options(
settings,
linter,
overrides,
editor_settings,
path,
file_source,
)
}
}

And add a new method resolve_parse_options in this trait for each language to implement to support well-known file specific parse settings (we already have resolve_format_options and resolve_analyzer_options here):

pub trait ServiceLanguage: biome_rowan::Language {
/// Formatter settings type for this language
type FormatterSettings: Default;
type LinterSettings: Default;
/// Organize imports settings type for this language
type OrganizeImportsSettings: Default;
/// Fully resolved formatter options type for this language
type FormatOptions: biome_formatter::FormatOptions + Clone + std::fmt::Display;
/// Settings that belong to the parser
type ParserSettings: Default;
/// Settings related to the environment/runtime in which the language is used.
type EnvironmentSettings: Default;
/// Read the settings type for this language from the [LanguageListSettings] map
fn lookup_settings(languages: &LanguageListSettings) -> &LanguageSettings<Self>;
/// Resolve the formatter options from the global (workspace level),
/// per-language and editor provided formatter settings
fn resolve_format_options(
global: Option<&FormatSettings>,
overrides: Option<&OverrideSettings>,
language: Option<&Self::FormatterSettings>,
path: &BiomePath,
file_source: &DocumentFileSource,
) -> Self::FormatOptions;
/// Resolve the linter options from the global (workspace level),
/// per-language and editor provided formatter settings
fn resolve_analyzer_options(
global: Option<&Settings>,
linter: Option<&LinterSettings>,
overrides: Option<&OverrideSettings>,
language: Option<&Self::LinterSettings>,
path: &BiomePath,
file_source: &DocumentFileSource,
) -> AnalyzerOptions;
}

So we can abstract this part of the logic:

let parser = settings.map(|s| &s.languages.json.parser);
let overrides = settings.map(|s| &s.override_settings);
let optional_json_file_source = file_source.to_json_file_source();
let options = JsonParserOptions {
allow_comments: parser.map(|p| p.allow_comments).unwrap_or_default()
|| optional_json_file_source.map_or(false, |x| x.allow_comments()),
allow_trailing_commas: parser.map(|p| p.allow_trailing_commas).unwrap_or_default()
|| optional_json_file_source.map_or(false, |x| x.allow_trailing_commas()),
};
let options = if let Some(overrides) = overrides {
overrides.to_override_json_parser_options(biome_path, options)
} else {
options
};

into this struct (in the resolve_parse_options method implementation):

impl ServiceLanguage for JsonLanguage {

@ematipico
Copy link
Member

I introduced the caching mainly because computing overrides can be very expensive and slow down the CLI, especially if there's an override that changes lint rules.

This was done months ago after Yagiz noticed a degradation in performance for each override added to the configuration.

Unfortunately we don't have benchmarks for our CLI - and we really should - but we could evaluate how the caching is done, still, since it's a hot path, we should add more care to it.

@Sec-ant
Copy link
Member

Sec-ant commented Jun 23, 2024

There're two factors that caused this erroneous behavior:

  1. We use a shared parser cache for all the matched json files as long as one specifies overrides in their configuration and the combination of include and exclude patterns matches one of the file in the project:

let options = if let Some(overrides) = overrides {
overrides.to_override_json_parser_options(biome_path, options)
} else {
options
};

if pattern.include.matches_path(path) && !pattern.exclude.matches_path(path) {
pattern.apply_overrides_to_json_parser_options(&mut options);
}

  1. The JSON parser options are not optional, i.e. they are bool instead of Option<bool>, so even a user doesn't specify any parser options in the overrides, the default values (which is false) will still override the existing options:

options.allow_trailing_commas = json_parser.allow_trailing_commas;
options.allow_comments = json_parser.allow_comments;

#[partial(bpaf(hide))]
/// Allow parsing comments in `.json` files
pub allow_comments: bool,
#[partial(bpaf(hide))]
/// Allow parsing trailing commas in `.json` files
pub allow_trailing_commas: bool,

language_setting.parser.allow_comments = json.parser.allow_comments;
language_setting.parser.allow_trailing_commas = json.parser.allow_trailing_commas;


I think we should keep consistent about using Option<Type> for all the options in the configuration structs so we can know whether an option is unset, and when the option is unset in overrides, we shouldn't apply it. For the caching part, using the file source as the cache key may be a solution. We adopted this approach in javascript formatter options caching:

// Cache
// For js format options, we use the file source as the cache key because
// the file source params will affect how tokens are treated during formatting.
// So we cannot reuse the same format options for all js-family files.
pub(crate) cached_js_format_options: RwLock<FxHashMap<JsFileSource, JsFormatOptions>>,


Edit:

For the caching part, using the file source as the cache key may be a solution. We adopted this approach in javascript formatter options caching:

Actually, this is not reliable, file source cannot reliably distinguish different format options specified in overrides. So when different options are specified in different override entries for files with the same file source type, only one of them will win.

We should use the matched patterns as the cache key. If two files match the same set of overrides patterns, and they have the same file source type, they should share the settings.

@dyc3
Copy link
Contributor Author

dyc3 commented Jun 23, 2024

This appears to be a much larger issue than I originally thought. It's clear that the current caching approach is not semantically correct. Would it be acceptable to disable caching parser options for json files to unblock #3246, and open an issue to continue discussion?

@Sec-ant
Copy link
Member

Sec-ant commented Jun 23, 2024

My 2 cents are holding that PR for a bit before we really understand how this bug is triggered. For example, I seem to not understand why cargo run --bin biome -- check packages/@biomejs/backend-jsonrpc/tsconfig.json is fine? Because if overrides are applied, then allow_trailing_commas and allow_comments should have been reset to false because of this line, no matter if we cache or not?

options.allow_trailing_commas = json_parser.allow_trailing_commas;
options.allow_comments = json_parser.allow_comments;

@dyc3
Copy link
Contributor Author

dyc3 commented Jun 23, 2024

I must have made a mistake in my previous testing. cargo run --bin biome -- check packages/@biomejs/backend-jsonrpc/tsconfig.json actually does emit diagnostics with the override present.

@Sec-ant
Copy link
Member

Sec-ant commented Jun 23, 2024

I must have made a mistake in my previous testing. cargo run --bin biome -- check packages/@biomejs/backend-jsonrpc/tsconfig.json actually does emit diagnostics with the override present.

In that case, only disabling caching will not solve the problem. So I guess we should address this first:

I think we should keep consistent about using Option<Type> for all the options in the configuration structs so we can know whether an option is unset, and when the option is unset in overrides, we shouldn't apply it.

I'll need some confirmation and thoughts from @ematipico before heading towards this direction. But I also would like to know if you want to take up this refactoring task if that's the decision? :) @dyc3

@dyc3
Copy link
Contributor Author

dyc3 commented Jun 23, 2024

Yeah, I can take a shot at it. I can change this PR to just have the test (disabled) since it's already here.

@dyc3 dyc3 force-pushed the 06-22-fix_parse_json_don_t_let_overrides_erroneously_override_parsing_options branch from 8ebe45f to 37db33d Compare June 23, 2024 16:00
@dyc3 dyc3 changed the title fix(parse/json): don't let overrides erroneously override parsing options test(parse/json): add test for bug where overrides erroneously override special parsing options Jun 23, 2024
@github-actions github-actions bot removed the A-Project Area: project label Jun 23, 2024
@ematipico
Copy link
Member

@Sec-ant

Yes, I think we made a small mistake when implementing some options, some of which are T instead of Option<T>, e.g. bool vs Option<bool>. That's because we rely on the default value of some primitives, like bool, but this is incorrect because, in reality, we must have three values: None, the default of T and the value provided for T.

And also, yes, I think it's time to have a method called parser_options function and a resolve_parser_options inside the ServiceLanguage trait.

Sorry @dyc3 for this mess, I know you had great intentions for the .editorconfig feature

@dyc3 dyc3 marked this pull request as ready for review June 24, 2024 09:57
@dyc3
Copy link
Contributor Author

dyc3 commented Jun 24, 2024

I've posted the Option refactor in #3272, and marked this ready for review.

@ematipico ematipico merged commit 0ff8de4 into biomejs:main Jun 24, 2024
12 checks passed
@dyc3 dyc3 deleted the 06-22-fix_parse_json_don_t_let_overrides_erroneously_override_parsing_options branch June 24, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-CLI Area: CLI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants