Introduce `splice_buffer!` abstraction for IOBuffer in the REPL #6429

mbauman · 2014-04-05T16:04:41Z

~~Removes all text between from and to, keeping the position stable with the text.~~ Behaves just like splice! for arrays, except that it accepts 0-indexed positions and keeps the cursor position stable with the text. Minor behavior change to kill_line -- when killing just a newline, the newline char itself goes into the buffer.

Takes the abstraction from #6427 that @carlobaldassi found useful.

kmsquire · 2014-04-05T17:45:52Z

For consistency, it might be nice to have this be called splice (and match the splice interface).

mbauman · 2014-04-05T18:05:18Z

Ah, yes, that's exactly what this functionality is. The name didn't feel quite right to me, either. In that case, would it make sense to define it as a true method of splice! in iobuffer.jl? It's not really a typical "IO" method, but it'd really simplify a lot of these editing commands.

carlobaldassi · 2014-04-05T18:09:56Z

Thanks!
I'd also prefer another name, either a less generic one (to clarify it's internal code, even though there is no cut in Base at the moment) or splice! as @kmsquire suggested (but that requires extending the interface; and then it could go in iobuffer.jl I suppose).

kmsquire · 2014-04-05T18:28:38Z

After thinking about it, I'm more ambivalent about naming it splice!, in that I'm not sure we really want/need that as part of IOBuffer. But the behavior does kind of match. Whatever works for you.

mbauman · 2014-04-05T18:43:28Z

Agreed. I'm currently mocking up a splice_buffer! function in the REPL module. I think it makes sense to echo the behavior and interface of splice!, but it really doesn't make sense to export it.

carlobaldassi · 2014-04-05T19:04:14Z

Sounds like a good plan.

mbauman · 2014-04-07T20:20:31Z

@carlobaldassi - Finally got a chance to get back to this today. Best part about this PR is that it now completely removes all memmove calls and makes all of the editing methods much simpler. It still is playing with the internal data structures of IOBuffer, but it's not nearly as egregious now. Thanks for the suggestion to use splice!, Kevin.

StefanKarpinski · 2014-04-07T21:11:41Z

It would be really nice to separate the IOBuffer guts from how one programs the REPL. This is not an efficiency critical API so I think that copying strings around would be perfectly acceptable. No amount of human mashing on the keyboard is going to tax a modern computer.

carlobaldassi · 2014-04-08T11:42:27Z

Looks good to me, and significantly cleaner. I'll leave it around for others to review as well (cc @loladiro, @nolta, ...).
@StefanKarpinski: this seems a step in that direction, in that it further isolates code where IOBuffer data structures are directly accessed instead of using abstractions like position. After this change, there are still a few places where this happens now in the REPL code. It probably wouldn't be too hard to abstract those away in an extra layer between IOBuffer and REPL code, if that's what you had in mind.

StefanKarpinski · 2014-04-08T16:04:08Z

I basically had in mind that all line editing states would simply be represented by (UTF8String,Int) pairs where the string is the content of the line and the integer is the index into that where the cursor is. Each hook would simply be a function combining the input key with such a state and producing a new (UTF8String,Int) pair. This would be less efficient but much simpler to write and understand and since efficiency is essentially irrelevant for REPL code (we would be hard pressed to be as laggy as readline), this would be a good tradeoff.

kmsquire · 2014-04-08T18:51:52Z

This sounds like a place where RopeStrings might actually be useful?

On Tuesday, April 8, 2014, Stefan Karpinski [email protected]
wrote:

I basically had in mind that all line editing states would simply be
represented by (UTF8String,Int) pairs where the string is the content of
the line and the integer is the index into that where the cursor is. Each
hook would simply be a function combining the input key with such a state
and producing a new (UTF8String,Int) pair. This would be less efficient
but much simpler to write and understand and since efficiency is
essentially irrelevant for REPL code (we would be hard pressed to be as
laggy as readline), this would be a good tradeoff.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6429#issuecomment-39867014
.

StefanKarpinski · 2014-04-08T18:59:33Z

Eh, doesn't really matter. You can type at the REPL until you're blue in the face and the CPU wouldn't even break a sweat.

On Apr 8, 2014, at 2:51 PM, Kevin Squire [email protected] wrote:

This sounds like a place where RopeStrings might actually be useful?

On Tuesday, April 8, 2014, Stefan Karpinski [email protected]
wrote:

I basically had in mind that all line editing states would simply be
represented by (UTF8String,Int) pairs where the string is the content of
the line and the integer is the index into that where the cursor is. Each
hook would simply be a function combining the input key with such a state
and producing a new (UTF8String,Int) pair. This would be less efficient
but much simpler to write and understand and since efficiency is
essentially irrelevant for REPL code (we would be hard pressed to be as
laggy as readline), this would be a good tradeoff.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6429#issuecomment-39867014
.

—
Reply to this email directly or view it on GitHub.

carlobaldassi · 2014-04-09T21:50:18Z

I basically had in mind that all line editing states would simply be represented by (UTF8String,Int) pairs where the string is the content of the line and the integer is the index into that where the cursor is

I see. I actually wrote a stub for this using an InputLine type with those two fields, without even seriously trying to make it correct, just to see how it would come along. It would surely be a little better in terms of code, but not that much after all. Basically, you end up writing a lot of small methods for that type, and most of those are duplicates of what's already in place using IOBuffer. Plus, you still need to be extra careful with indexing, since many of the subtleties which are in IOBuffer manipulations arise from the fact that the cursor position is between characters, so it ranges from 0 to length(line), and you need to take into account multibyte chars on top of that.
I'm not saying it's not worth doing it, just that it wouldn't provide as much magical simplification as one would hope for. Or maybe I'm just saying that I wasn't able to design a good interface for that abstraction ;)
Also, it's not a small task, as it requires some extensive changes spread out in various files and functions, and what's worse we don't have tests for the REPL. I suggest that before doing that we should have tests.

Anyway, I don't think this discussion should block this PR, which simplifies and rationalizes a number of things, no? @mbauman, I can't merge right now because of conflicts, but if you rebase I think we can merge it.

StefanKarpinski · 2014-04-09T22:03:58Z

I certainly don't think it will simplify any of the complex logic involving Unicode characters and cursor position but that's not the point – the point is that's the only thing the code will have to worry about. It doesn't have to know anything at all about IOBuffers or anything else involved with I/O. It completely separates line editing logic from I/O logic – and that's a good thing.

Behaves just like `splice!` for arrays, except that it accepts 0-indexed positions and keeps the cursor position stable with the text. Minor behavior change to kill_line -- when killing just a newline, that goes into the buffer.

mbauman · 2014-04-09T22:21:09Z

Rebased and ready to go again.

I like the idea of eventually moving to UTF8Strings, too. While it doesn't simplify the logic around unicode, it does allow us to abstract it and put it where it belongs — in utf8.jl. It'd be great to use next… and why not start playing with a reverse iteration protocol and define prev there, too. Many of the movement-by-word functions could punt to search and rsearch, too.

Positions would be consistent (1-indexed), and we could still use a function like splice (albeit returning a new string instead of modifying it).

carlobaldassi · 2014-04-09T22:34:39Z

@StefanKarpinski: I agree, that's why I originally thought about an abstraction layer between the REPL code and the IOBuffer code. I still think that could be a good idea as an intermediate step: 1) isolate the code which deals with the IOBuffer internals 2) create a new simpler type which implements the exact same interface 3) possibly simplify code, which at this point can be done in small steps, testing everything carefully.

carlobaldassi · 2014-04-09T22:35:27Z

Travis failure is unrelated (arpack).

Introduce `splice_buffer!` abstraction for IOBuffer in the REPL

carlobaldassi · 2014-04-09T22:47:39Z

Positions would be consistent (1-indexed)

Well, but positions are actually consistent already, just with the notion that the cursor is between characters rather than on top of one char. Basically, it's just the difference between position(buf) and buf.ptr: using one representation or the other simplifies some code and complicates some other code, and in any case you still need to special-case operations at the extremes. Using next is not that much different than using read, testing for the end of the string is not different than using eof, etc. (But again, don't take me wrong, I'm all in favour of doing this, I'm just reporting on some quick experiment I did).

Keno · 2014-04-09T23:02:35Z

While I'm fine with changing it to UTF8Strings with a separate index, I agree with @carlobaldassi that it would most likely not change the code very much at all and is therefore probably not all that big a priority. The nice thing about IOBuffers is that you can use mutation which is slightly more natural than creating new immutable strings. In any case, the real trick to make UTF8 work correctly is the line refresh algorithm anyway (which I think is pretty much correct at this point, but probably needs to be changed for windowed prompts (within the terminal e.g. ncurses style), since it assumes that newline wrap.

StefanKarpinski · 2014-04-12T13:39:50Z

Here's another idea – make all the REPL logic work with UTF32Strings instead of UTF8Strings. That gets rid of all the problems of dealing with variable width encoding. You just have to transcode on input and output. Bam, solved.

Keno · 2014-04-17T02:17:06Z

Close, but no cigar. Currently we don't handle this, but consider

t⃗ = [1,2,3]

That's two unicode characters.

StefanKarpinski · 2014-04-17T14:26:15Z

What if you NFC normalize it?

mbauman · 2014-04-17T14:42:32Z

No dice. There are lots of these non-spacing, combining marks, and only a handful of combinations with them have 1-character equivalents.

As a more concrete example, check out the grave accent. It can be normalized and combine with vowels but not consonants:

julia> s = "à" # Bah, looks like Github or my browser is normalizing this to one-char, so copying it won't reproduce this.
"à"

julia> length(s), strwidth(s)
(2,1)

julia> length(normalize_string(s,:NFC))
1

julia> s = "b̀" # but it's not possible to normalize this to one char
"b̀"

julia> length(s), strwidth(s)
(2,1)

julia> length(normalize_string(s,:NFC))
2

julia> length(normalize_string(s,:NFKC))
2

julia> length(normalize_string(s,:NFKD))
2

julia> length(normalize_string(s,:NFD))
2

StefanKarpinski · 2014-04-17T15:27:25Z

I knew it couldn't possibly be that easy. Seems like UTF-32 isn't really much better than UTF-8. I wonder if it makes sense to have another layer of abstraction over normal strings where the unit of data is the code point, but it's not even clear what the correct unit of is at the higher level. This is essentially the same issue as the reverse problem (#6165).

stevengj · 2014-12-04T17:39:36Z

What you want is graphemes, I think. utf8proc/libmojibake will break up a string into graphemes for you.

(Probably we should expose this as an iterator: for g in graphemes(s) where g iterates over substrings.)

stevengj · 2014-12-07T03:17:13Z

@StefanKarpinski, see #9261 for a grapheme iterator that adds precisely this layer of abstraction.

StefanKarpinski · 2014-12-07T03:46:35Z

Awesome. This is great to have.

stevengj · 2014-12-17T21:08:03Z

Now that we have a graphemes iterator and (at a lower level) a UTF8Proc.isgraphemebreak(Char,Char) function to detect extended grapheme breaks, couldn't the REPL logic be updated to work with graphemes?

mbauman changed the title ~~Introduce cut abstraction for IOBuffer~~ Introduce cut abstraction for IOBuffer in the REPL Apr 5, 2014

mbauman changed the title ~~Introduce cut abstraction for IOBuffer in the REPL~~ Introduce splice_buffer! abstraction for IOBuffer in the REPL Apr 7, 2014

Introduce splice_buffer! abstraction for IOBuffer

ed9570e

Behaves just like `splice!` for arrays, except that it accepts 0-indexed positions and keeps the cursor position stable with the text. Minor behavior change to kill_line -- when killing just a newline, that goes into the buffer.

carlobaldassi added a commit that referenced this pull request Apr 9, 2014

Merge pull request #6429 from mbauman/cut-iobuffer

37508ca

Introduce `splice_buffer!` abstraction for IOBuffer in the REPL

carlobaldassi merged commit 37508ca into JuliaLang:master Apr 9, 2014

mbauman deleted the cut-iobuffer branch April 9, 2014 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `splice_buffer!` abstraction for IOBuffer in the REPL #6429

Introduce `splice_buffer!` abstraction for IOBuffer in the REPL #6429

mbauman commented Apr 5, 2014

kmsquire commented Apr 5, 2014

mbauman commented Apr 5, 2014

carlobaldassi commented Apr 5, 2014

kmsquire commented Apr 5, 2014

mbauman commented Apr 5, 2014

carlobaldassi commented Apr 5, 2014

mbauman commented Apr 7, 2014

StefanKarpinski commented Apr 7, 2014

carlobaldassi commented Apr 8, 2014

StefanKarpinski commented Apr 8, 2014

kmsquire commented Apr 8, 2014

StefanKarpinski commented Apr 8, 2014

carlobaldassi commented Apr 9, 2014

StefanKarpinski commented Apr 9, 2014

mbauman commented Apr 9, 2014

carlobaldassi commented Apr 9, 2014

carlobaldassi commented Apr 9, 2014

carlobaldassi commented Apr 9, 2014

Keno commented Apr 9, 2014

StefanKarpinski commented Apr 12, 2014

Keno commented Apr 17, 2014

StefanKarpinski commented Apr 17, 2014

mbauman commented Apr 17, 2014

StefanKarpinski commented Apr 17, 2014

stevengj commented Dec 4, 2014

stevengj commented Dec 7, 2014

StefanKarpinski commented Dec 7, 2014

stevengj commented Dec 17, 2014

Introduce splice_buffer! abstraction for IOBuffer in the REPL #6429

Introduce splice_buffer! abstraction for IOBuffer in the REPL #6429

Conversation

mbauman commented Apr 5, 2014

kmsquire commented Apr 5, 2014

mbauman commented Apr 5, 2014

carlobaldassi commented Apr 5, 2014

kmsquire commented Apr 5, 2014

mbauman commented Apr 5, 2014

carlobaldassi commented Apr 5, 2014

mbauman commented Apr 7, 2014

StefanKarpinski commented Apr 7, 2014

carlobaldassi commented Apr 8, 2014

StefanKarpinski commented Apr 8, 2014

kmsquire commented Apr 8, 2014

StefanKarpinski commented Apr 8, 2014

carlobaldassi commented Apr 9, 2014

StefanKarpinski commented Apr 9, 2014

mbauman commented Apr 9, 2014

carlobaldassi commented Apr 9, 2014

carlobaldassi commented Apr 9, 2014

carlobaldassi commented Apr 9, 2014

Keno commented Apr 9, 2014

StefanKarpinski commented Apr 12, 2014

Keno commented Apr 17, 2014

StefanKarpinski commented Apr 17, 2014

mbauman commented Apr 17, 2014

StefanKarpinski commented Apr 17, 2014

stevengj commented Dec 4, 2014

stevengj commented Dec 7, 2014

StefanKarpinski commented Dec 7, 2014

stevengj commented Dec 17, 2014

Introduce `splice_buffer!` abstraction for IOBuffer in the REPL #6429

Introduce `splice_buffer!` abstraction for IOBuffer in the REPL #6429