Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement replace and extract #43

Open
NPCRUS opened this issue Jan 4, 2025 · 11 comments
Open

Implement replace and extract #43

NPCRUS opened this issue Jan 4, 2025 · 11 comments
Assignees
Labels
api Relates to the API design beginner A good introduction for newcomers core Describes core required functionality

Comments

@NPCRUS
Copy link

NPCRUS commented Jan 4, 2025

Would be nice to be able to match and extract all of the occurences of regexp within string

@propensive
Copy link
Owner

Yes, this would be useful, though not in pattern matching (in case that's what you were suggesting with the /g flag), but as a simple method on strings.

For example,

text.search(r"f..b..")

which would return a List or maybe a LazyList of matches.

I think there would need to be an option to allow overlapping matches. For example if text is "fofbfbobar", we would want to get matches of "fofbfb", "fbfbob" and "fbobar". Without overlapping, we would expect just a single match.

@propensive propensive self-assigned this Jan 5, 2025
@propensive propensive added api Relates to the API design beginner A good introduction for newcomers core Describes core required functionality labels Jan 5, 2025
@github-project-automation github-project-automation bot moved this to Todo in Soundness Jan 5, 2025
@propensive
Copy link
Owner

Note that it might be more appropriate to implement this in Gossamer.

@propensive
Copy link
Owner

I've implemented two methods, seek (to find the first match) and search (all matches), both in Kaleidoscope and Gossamer. The Gossamer version is the one you probably want to use, as the Kaleidoscope version is designed for low-level use: it doesn't return the actual strings, only their start/end indexes.

The Gossamer implementations map these to the substring that matches, which is probably what you want. This has the added advantage that it also works with other Textual types, such as Escapade's rich terminal strings.

@github-project-automation github-project-automation bot moved this from Todo to Done in Soundness Jan 5, 2025
@NPCRUS
Copy link
Author

NPCRUS commented Jan 7, 2025

what I want is this:

string.withGlobalFlag {
  case r"url\($url(.*?)\)" => ??? // where url is Array[String]
}

and the input string might be a huge minified css file for example, so I want to extract all of those urls, but I also want to have groups

@propensive propensive reopened this Jan 7, 2025
@propensive
Copy link
Owner

That makes a lot of sense, and it would be useful. I think we could be more direct about the name, and call it replace. Unfortunately, I think it might be a bit trickier to implement, because we will need to switch from using entire-string matching to find. And then we will need to repeat the operation on the remainder of the string to replace the subsequent occurrences.

But it's all possible, and worthwhile.

Incidentally, my idea for implementing it would naturally allow multiple patterns to be defined in the partial function, but the behavior would be to only try the second (or third) patterns in the cases where the first is not found in the remainder of the string.

For example,

t"foobarfoobar".replace:
  case r"foo" => t"baz"
  case r"bar" => t"quux"

would result in t"bazbarbazquux". I'm not sure if that's the most desirable outcome, though.

@NPCRUS
Copy link
Author

NPCRUS commented Jan 7, 2025

if you can do replace by groups, that would be even crazier,
but I think in my example I need something like this:

r"$capture(pattern)$capture2(pattern2)".findAll { result: Seq((String, String))
 ??? // process the result
}

it kinda resembles the standard library findAllIn, but I guess can be typed better
I think both of this functionalities can coexist

@propensive
Copy link
Owner

I think I probably didn't give an interesting enough example in my last message. The intention is to be able to write:

t"foobarfuebar".replace:
  case r"f$vowels(..)" => t"g$vowels"

and get t"goobarguebar".

Does that do what you need?

@NPCRUS
Copy link
Author

NPCRUS commented Jan 8, 2025

@propensive this would be cool for other use cases, but not really. In this issue I just want to find all matches and extract capture groups

@propensive
Copy link
Owner

Ah, I see. So replace is not quite right because it uses only allows you to use the captured groups by putting them back into the string as replacements. And search isn't quite right because it only returns entire matches.

So I think you want something closer to an optimisedsearch followed by map, i.e. find all the occurrences of the pattern, then map across them, extracting the capturing groups. That also sounds useful.

@propensive
Copy link
Owner

Great. Now I understand it. I think we can call it extract (which is the word you used originally)!

@propensive propensive changed the title Support /g flag Implement replace and extract Jan 8, 2025
@propensive
Copy link
Owner

We have a basic implementation of extract, but it has some issues. In particular, multi-case partial functions do not work. My first attempt to fix this relied on an implicit visitor-pattern to carry state across multiple invocations of the partial function. But it turned out to be unsafe and far too complicated, so we need a new solution.

I believe the best solution will be to break the partial function down into separate cases, and to pattern match on each one in turn, choosing which one to return based on which of them has the earliest result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Relates to the API design beginner A good introduction for newcomers core Describes core required functionality
Projects
Archived in project
Development

No branches or pull requests

2 participants