Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract parts of parse_next_token into new method #144

Merged
merged 1 commit into from
Apr 20, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 81 additions & 69 deletions src/parsing/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -169,88 +169,27 @@ impl ParseState {
// println!("token at {} on {}", start, line.trim_right());

let mut min_start = usize::MAX;
let mut match_from_with_proto = false;
let mut cur_match: Option<RegexMatch> = None;
let mut match_from_with_proto = false;

for (from_with_proto, ctx, captures) in context_chain {
for (pat_context_ptr, pat_index) in context_iter(ctx) {
let mut pat_context = pat_context_ptr.borrow_mut();
let match_pat = pat_context.match_at_mut(pat_index);
// println!("{} - {:?} - {:?}", match_pat.regex_str, match_pat.has_captures, cur_level.captures.is_some());
let match_ptr = match_pat as *const MatchPattern;

// Avoid matching the same pattern twice in the same place, causing an infinite loop
if matched.contains(&match_ptr) {
continue;
}

if let Some(maybe_region) =
search_cache.get(&match_ptr) {
let mut valid_entry = true;
if let Some(ref region) = *maybe_region {
let match_start = region.pos(0).unwrap().0;
if match_start < *start {
valid_entry = false;
} else if match_start < min_start {
// print!("match {} at {} on {}", match_pat.regex_str, match_start, line);
min_start = match_start;
match_from_with_proto = from_with_proto;
cur_match = Some(RegexMatch {
regions: region.clone(),
context: pat_context_ptr.clone(),
pat_index: pat_index,
});
}
}
if valid_entry {
continue;
}
}

match_pat.ensure_compiled_if_possible();
let refs_regex = if match_pat.has_captures && captures.is_some() {
let &(ref region, ref s) = captures.unwrap();
Some(match_pat.compile_with_refs(region, s))
} else {
None
};
let regex = if let Some(ref rgx) = refs_regex {
rgx
} else {
match_pat.regex.as_ref().unwrap()
};
let matched = regex.search_with_param(line,
*start,
line.len(),
SearchOptions::SEARCH_OPTION_NONE,
Some(regions),
MatchParam::default());

// If there's an error during search, treat it as non-matching.
// For example, in case of catastrophic backtracking, onig should
// fail with a "retry-limit-in-match over" error eventually.
if let Ok(Some(match_start)) = matched {
let match_end = regions.pos(0).unwrap().1;
// this is necessary to avoid infinite looping on dumb patterns
let does_something = match match_pat.operation {
MatchOperation::None => match_start != match_end,
_ => true,
};
if refs_regex.is_none() && does_something {
search_cache.insert(match_pat, Some(regions.clone()));
}
if match_start < min_start && does_something {
// print!("catch {} at {} on {}", match_pat.regex_str, match_start, line);
if let Some((match_start, match_region)) = self.search(
line, *start, match_pat, captures, search_cache, matched, regions
) {
if match_start < min_start {
// Match is earlier in text, use that
min_start = match_start;
match_from_with_proto = from_with_proto;
cur_match = Some(RegexMatch {
regions: regions.clone(),
regions: match_region,
context: pat_context_ptr.clone(),
pat_index: pat_index,
});
match_from_with_proto = from_with_proto;
}
} else if refs_regex.is_none() {
search_cache.insert(match_pat, None);
}
}
}
Expand All @@ -275,6 +214,79 @@ impl ParseState {
}
}

fn search(&self,
line: &str,
start: usize,
match_pat: &mut MatchPattern,
captures: Option<&(Region, String)>,
search_cache: &mut SearchCache,
matched: &mut MatchedPatterns,
regions: &mut Region)
-> Option<(usize, Region)> {
// println!("{} - {:?} - {:?}", match_pat.regex_str, match_pat.has_captures, cur_level.captures.is_some());
let match_ptr = match_pat as *const MatchPattern;

// Avoid matching the same pattern twice in the same place, causing an infinite loop
if matched.contains(&match_ptr) {
return None;
}

if let Some(maybe_region) = search_cache.get(&match_ptr) {
if let Some(ref region) = *maybe_region {
let match_start = region.pos(0).unwrap().0;
if match_start >= start {
// Cached match is valid, return it. Otherwise do another
// search below.
return Some((match_start, region.clone()));
}
} else {
// Didn't find a match earlier, so no point trying to match it again
return None;
}
}

match_pat.ensure_compiled_if_possible();
let refs_regex = if match_pat.has_captures && captures.is_some() {
let &(ref region, ref s) = captures.unwrap();
Some(match_pat.compile_with_refs(region, s))
} else {
None
};
let regex = if let Some(ref rgx) = refs_regex {
rgx
} else {
match_pat.regex.as_ref().unwrap()
};
let matched = regex.search_with_param(line,
start,
line.len(),
SearchOptions::SEARCH_OPTION_NONE,
Some(regions),
MatchParam::default());

// If there's an error during search, treat it as non-matching.
// For example, in case of catastrophic backtracking, onig should
// fail with a "retry-limit-in-match over" error eventually.
if let Ok(Some(match_start)) = matched {
let match_end = regions.pos(0).unwrap().1;
// this is necessary to avoid infinite looping on dumb patterns
let does_something = match match_pat.operation {
MatchOperation::None => match_start != match_end,
_ => true,
};
if refs_regex.is_none() && does_something {
search_cache.insert(match_pat, Some(regions.clone()));
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for only inserting it into the cache if does_something is true? I kept the current logic, but I'm not sure I understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that sometimes people write syntax definitions with silly rules like a regex that can match nothing (and doesn't change the context stack):

- match: a*
  scope: some.scope

and if it cached places where it matched nothing, it'd never find the places where it matches something, i.e. it needs to repeat the match at a different position to find a real non-zero length match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense not to use that match, but caching it should be ok. The way it works is that if there's a cached match, it is only used if the start position of it is >= the current position where we're trying to search. Otherwise, we search again using that pattern.

Just now, I tried removing && does_something from the condition and both tests and syntest still pass. But maybe there's no test for this case (with caching)?

Copy link
Owner

@trishume trishume Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the real reason I wrote it like that may have been just caution and being extra safe, but I think I might have come up with a rare case it is necessary.

The cache hit code doesn't have the logic for the does_something check, so if a pattern uses a lookahead to match an empty string a few characters ahead, then next time around we hit it in the cache and it is the closest, we'll choose it, then get stuck in a loop choosing it again and again.

Edit: NVM not an infinite loop since there's extra protections against that, but maybe at least extra pushes and pops, although maybe those are fine.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look maybe the does_something check is unnecessary now. I think I might have added it before the more general infinite loop protection, and Sublime may just have the more general protection.

Copy link
Collaborator Author

@robinst robinst Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. But I think how this works might change with the fix for #127 anyway (maybe). So let's defer making this change for now.

if does_something {
// print!("catch {} at {} on {}", match_pat.regex_str, match_start, line);
return Some((match_start, regions.clone()));
}
} else if refs_regex.is_none() {
search_cache.insert(match_pat, None);
}
return None;
}

/// Returns true if the stack was changed
fn exec_pattern(&mut self,
line: &str,
Expand Down