Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add replace(io, str, patterns...) #48625

Merged
merged 9 commits into from
Jul 3, 2023
Merged

add replace(io, str, patterns...) #48625

merged 9 commits into from
Jul 3, 2023

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Feb 10, 2023

This PR adds an optional io::IO argument to replace(str, patterns...) so that the output can be written to an IO stream rather than returning a String. This requires almost no new code, only some refactoring, since internally replace was already implemented by writing to an IOBuffer.

As @jakobnissen has pointed out, we have a shortage of in-place operations for strings. replace was an egregious example because it was virtually impossible to do in-place without copy-pasting large swathes of code from Base. Now you can simply write to a pre-allocated, re-usable IOBuffer.

This PR was also motivated by a discourse discussion about performing regex substitutions on a file. For large files that cannot simply be read into memory, I realized that this is unnecessarily difficult (especially if you want to use SubstitutionString since it requires a completely undocumented internal _replace API, or multiple replacement patterns). With this PR, it should be possible to simply do:

using Mmap, StringViews
s = StringView(mmap("input_file"))
open(out -> replace(out, s, pat=>repl), "output_file", "w")

to get something very efficient (after corresponding hooks are added to StringViews.jl to prevent s from being copied to a String).

@stevengj stevengj added the strings "Strings!" label Feb 10, 2023
@stevengj stevengj force-pushed the replaceio branch 2 times, most recently from 84874c9 to 246f2d4 Compare February 10, 2023 04:40
@stevengj stevengj added the needs tests Unit tests are required for this change label Feb 10, 2023
@stevengj stevengj mentioned this pull request Feb 10, 2023
9 tasks
@stevengj stevengj removed the needs tests Unit tests are required for this change label Feb 11, 2023
@jariji
Copy link
Contributor

jariji commented Feb 12, 2023

Could someone clarify the naming convention -- why isn't this replace!?

@stevengj
Copy link
Member Author

@jariji, because we don't use ! anywhere in Julia for functions that write to a stream. (e.g. it's write, not write!.) Even though this modifies the stream.

@aplavin
Copy link
Contributor

aplavin commented Feb 16, 2023

Would be great to have efficient string functions like this in Julia!

But potential consistency and readability is somewhat concerning. My first guess from reading the signature in this PR title was that it performs replacements on the content of io somehow. It's also a reasonable piece of functionality, to read from istream, perform replacements, write to ostream, so users/readers may expect something similar happening.
Maybe, some naming/ordering pattern can be designed for this and similar functions that may appear in the future?

@stevengj
Copy link
Member Author

stevengj commented Feb 17, 2023

I guess we could call it copyreplace in analogy to #48273?

(Another advantage of copyreplace is that it opens the door to eventually implement copyreplace(out::IO, in::IO, …), which in principle is possible via streaming with a limited-size buffer using PCRE partial matches. It's nontrivial, though, so definitely not in this PR!)

Or, since this is not copying from one stream to another, maybe writereplace?

The tradeoff here is that it pollutes the namespace to have a bunch of copyfoo or writefoo functions, and is maybe harder to find, compared to just adopting the convention that stringfoo(io, ...) writes the output to io, similar to join (io, …). Do we really want this?

@stevengj stevengj added the triage This should be discussed on a triage call label Mar 7, 2023
@stevengj
Copy link
Member Author

stevengj commented Mar 7, 2023

Marking as triage since it would be good to get a decision here on both the functionality and the naming.

@gbaraldi
Copy link
Member

Triage thinks the name(replace) is fine, it follows the convention set by write and read.

@gbaraldi gbaraldi removed the triage This should be discussed on a triage call label Mar 16, 2023
@mkitti
Copy link
Contributor

mkitti commented Mar 16, 2023

The convention being that objects involving IO do not contain a !.

@stevengj
Copy link
Member Author

Rebased. Merge?

@stevengj
Copy link
Member Author

CI failures seem unrelated? The build seems pretty broken at the moment. 😢

@jakobnissen
Copy link
Contributor

Yes, only 32-bit windows failed, as it currently does most of the time.

@giordano
Copy link
Contributor

giordano commented Jun 21, 2023

I restarted the build job a few hours after JuliaCI/julia-buildkite#308 was merged, I thought that would have made that work.

Edit: I was wrong, simply restarting the job doesn't pick up the new CI configuration.

@giordano
Copy link
Contributor

All green now 🙂

@stevengj stevengj added the merge me PR is reviewed. Merge when all tests are passing label Jun 29, 2023
@stevengj
Copy link
Member Author

stevengj commented Jul 3, 2023

Unfortunately, looks like this didn't get merged in time for the 1.10 feature freeze. Oh well, it can wait until 1.11 — rebased.

@IanButterworth IanButterworth merged commit ce1b420 into master Jul 3, 2023
@IanButterworth IanButterworth deleted the replaceio branch July 3, 2023 18:27
@IanButterworth IanButterworth added backport 1.10 Change should be backported to the 1.10 release and removed merge me PR is reviewed. Merge when all tests are passing labels Jul 3, 2023
@IanButterworth
Copy link
Member

Seems fair to backport given merge me and green before the feature freeze

KristofferC pushed a commit that referenced this pull request Jul 18, 2023
KristofferC added a commit that referenced this pull request Oct 2, 2023
Backported PRs:
- [x] #48625 <!-- add replace(io, str, patterns...) -->
- [x] #48387 <!-- Improve documentation of sort-related functions -->
- [x] #48363 <!-- Revise sort.md and docstrings in sort.jl -->
- [x] #48977 <!-- Update SparseArrays.jl stdlib for SuiteSparse 7 -->
- [x] #50719 <!-- fix `CyclePadding(::DataType)` -->
- [x] #50694 <!-- inference: permit recursive type traits -->
- [x] #50860 <!-- Add `Base.get_extension` to docs/API -->
- [x] #50594 <!-- Disallow non-index Integer types in isassigned -->
- [x] #50802 <!-- Makes IntrusiveLinkedListSynchronized mutable to avoid
allocation on poptask -->
- [x] #50858 <!-- Add a `threadpool` parameter to `Channel` constructor
-->
- [x] #50874 <!-- Restrict COFF to a single thread when symbol count is
high -->
- [x] #50822 <!-- Add default method for setmodifiers! -->
- [x] #50730 <!-- Fix integer overflow in `isapprox` -->
- [x] #50850 <!-- Remove weird Rational dispatch and add pi functions to
list -->
- [x] #50809 <!-- Limit type-printing in MethodError -->
- [x] #50915 <!-- Add note the `Task` about sticky bit -->
- [x] #50929 <!-- when widening tuple types in tmerge, only widen the
complex parts -->
- [x] #50928 <!-- Bump JuliaSyntax to 0.4.6 -->
- [x] #50959 <!-- Update libssh2 patches -->
- [x] #50823 <!-- Make ranges more robust with unsigned indexes. -->
- [x] #48542 <!-- Add docs on task-specific buffering using
multithreading -->
- [x] #50912 <!-- Separate foreign threads into a :foreign threadpool
-->
- [x] #51010 <!-- Add ORIGIN to SuiteSparse rpath on Linux/FreeBSD -->
- [x] #50753 <!-- cat: remove unused promote_eltype methods that confuse
inference -->
- [x] #51027 <!-- Implement realloc accounting correctly -->
- [x] #51019 <!-- fix a case of potentially use of undefined variable
when handling error in distributed message processing -->
- [x] #51039 <!-- Revert "Optimize findall(f, ::AbstractArray{Bool})
(#42202)" -->
- [x] #51036 <!-- add missing invoke edge for nospecialize targets -->
- [x] #51042 <!-- inference: fix return_type_tfunc modeling of concrete
functions -->
- [x] #51114 <!-- Workaround upstream FreeBSD issue #272992 -->
- [x] #50892 <!-- Add `JL_DLLIMPORT` to `small_typeof` declaration -->
- [x] #51154 <!-- broadcast: use recursion rather than ntuple to map
over a tuple -->
- [x] #51153 <!-- fix debug typo in "add missing invoke edge for
nospecialize targets (#51036)" -->
- [x] #51222 <!-- Check again if the tty is open inside the IO lock -->
- [x] #51236 <!-- Add lock around uv_unref during init -->
- [x] #51243 <!-- GMP: Gracefully handle more overflows. -->
- [x] #51254 <!-- Ryu: make sure adding zeros does not overwrite
trailing dot -->
- [x] #51175 <!-- shorten stale_age for cachefile lock -->
- [x] #51300 <!-- fix method definition error for bad vararg -->
- [x] #51307 <!-- fix force-throw ctrl-C on Windows -->
- [x] #51303 <!-- ensure revising structs is safe -->
- [x] #51393 
- [x] #51403 

Need manual backport:
- [x] #51009 <!-- fix #50562, regression in `in` of tuple of Symbols -->
- [x] #51053 <!-- Bump Statistics stdlib -->
- [x] #51013 <!-- fix #50709, type bound error due to free typevar in
sparam env -->
- [x] #51305 <!-- reduce test time for rounding and floatfuncs -->

Contains multiple commits, manual intervention needed:
- [ ] #50663 <!-- Fix Expr(:loopinfo) codegen -->
- [ ] #51035 <!-- refactor GC scanning code to reflect jl_binding_t are
now first class -->
- [ ] #51092 <!-- inference: fix bad effects for recursion -->
- [x] #51247 <!-- Check if malloc has succeeded before incrementing gc
stats -->
- [x] #51294 <!-- use LibGit2_jll for LibGit2 library -->

Non-merged PRs with backport label:
- [ ] #51132 <!-- Handle `AbstractQ` in concatenation -->
- [x] #51029 <!-- testdefs: make sure that if a test set changes the
active project, they change it back when they're done -->
- [ ] #50919 <!-- Code loading: do the "skipping mtime check for stdlib"
check regardless of the value of `ispath(f)` -->
- [ ] #50824 <!-- Add some aliasing warnings to docstrings for mutating
functions -->
- [x] #50385 <!-- Precompile pidlocks: add to NEWS and docs -->
- [ ] #49805 <!-- Limit TimeType subtraction to AbstractDateTime -->
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants