Consistency of read() and write() functions (per #14608) #14660

samoconnor · 2016-01-12T23:28:53Z

As proposed in #14608.

The discussion in #14608 includes ideas for better ways to do things in the future.
This PR is limited to making a few tweaks to the existing interface to make it a little more consistent:

Deprecate readbytes and readall, replaced by read and readstring.
Accept filename::AbstractString as 1st arg for write, read, read!, readuntil, readline and readlines.
write(to::IO, from::IO) per write(to::IO, from::IO) #14628

tkelman · 2016-01-12T23:40:17Z

base/io.jl

+    while !eof(from)
+        write(to, readavailable(from))
+    end
+end


needs-test label is for all the new methods here and above

edit: and docs

tkelman · 2016-01-13T04:28:30Z

contrib/BBEditTextWrangler-julia.plist

@@ -843,10 +843,9 @@
              <string>rationalize</string>
              <string>read</string>
              <string>read!</string>
-              <string>readall</string>
+              <string>readstring</string>


should be sorted, presumably?

... and will presumably be re-sorted next time the file is generated .
For now I think its better for the diff to be minimal so it is clear that a was renamed to b.

See #8298 - "It was more of a hack and slash effort than a script. I got a list of names by calling names() in the REPL and copy/paste -ing. Perhaps I'll do an automated version one day..."

Every other renaming and refactoring that has touched this file has kept it sorted. That's easier to figure out from a diff than ignoring review and adding work to clean it up on master. Ref #9468 (comment) where this has happened before.

nalimilan · 2016-01-13T09:32:48Z

Given the ongoing discussions at #14608, I think you'd better keep your second point (filename argument) for another PR if you want this one to be merged soon.

tkelman · 2016-01-13T10:00:56Z

base/docs/helpdb/Base.jl


-Read the entire contents of an I/O stream as a string.
+Read the entire contents of an I/O stream or a file as a string.


closes files when done, but not ::IO inputs

No.
A filename has no concept of open or close.
The doc says that it reads the entire contents of a file. That is what it does, open/close is hidden implementation detail. A stream may have a concept of open/close but if it does that is beyond the scope of this function, all this function does is read as a string.

tkelman · 2016-01-13T12:30:49Z

@nalimilan has a point. Separate the rename from the new APIs into different PR's.

JeffBezanson · 2016-01-14T19:16:34Z

read! vs readbytes! is a shame. They really ought to be merged. I'd vote for keeping the behavior of returning the number of bytes read, since there's no other way to know that. You already have a reference to the array it's reading into so returning that isn't as useful.

JeffBezanson · 2016-01-14T19:30:43Z

+1 I generally like this a lot. I only wonder if readline(filename) makes sense --- a method just for reading the first line of a file? Same argument applies to read(filename, T). It might make sense to only accept a filename argument if the whole file is involved. That seems intuitive to me.

samoconnor · 2016-01-14T21:57:41Z

... a method just for reading the first line of a file? ...

Yeah, I went back and forth on that myself. I considered likely use cases of the methods vs orthogonality consistency.

In general the point of making APIs orthogonal is that the API designer does not have to predict every use case and the user does not have to read the manual as much. i.e. the argument is that if read* accepts a filename most places it should accept a filename everywhere, otherwise some user ends up being frustrated by a no method error and has to stop thinking about their job and spend time reading the Julia manual.
The goal of being as good at gluing programs together as the shell implies using Julia for writing very short "shell scripts". In shell scripts files are almost always dealt with by filename.
In particular there are two partial read functions in this PR readuntil(f, c) and readline(f).
At least one use case comes to mind for readline(filename): Reading 1st line of a CSV file to get the column headers before reading whole file; or reading column headers from a whole set of CSV files to figure out common column set before merging.
The more general readuntil(filename, c) has more use cases around reading file headers. e.g. There might be an ascii text header up to the first '\0', then a hugh binary file. In the system I work on, the 0x1A EOF character is used to delimit plaintext header from binary data. A bunch of file formats use RFC 2822 style headers delimited by \r\n\r\n. e.g. `header = readuntil("email.txt", "\r\n\r\n").

samoconnor · 2016-01-14T21:58:17Z

base/pkg/dir.jl

-        open(joinpath(path(),"META_BRANCH")) do io
-          chomp(readuntil(io, "/n"))
-        end
+        chomp(readline(joinpath(path(),"META_BRANCH")))


@JeffBezanson at least one use of readline(filename) already exists.

...and good heavens, is that a /n instead of \n on the line above? Seems like it worked because the data it wants is actually the whole file (usually?). But wow.

JeffBezanson · 2016-01-14T22:13:57Z

Ok, I guess the extra methods are pretty harmless. In that case this just needs some tests, a rebase and a passing CI run then we can merge.

tkelman · 2016-01-14T22:36:56Z

new read! and write(io, io) methods are still undocumented, and the TextWrangler plist (edit: and exports list) may as well be kept sorted. I still think doing the rename separately from any new methods and refactoring is better practice, easier to review and less likely to introduce new bugs.

samoconnor · 2016-01-14T22:52:50Z

Thx for your efforts reviewing this @tkelman, I've updated doc for read!.
write(::IO,::IO) is already covered by the existing doc:

    write(stream or filename, x)

Write the canonical binary representation of a value to the given stream or file.
...

samoconnor · 2016-01-14T22:56:54Z

just needs some tests

I think all the places where I've updated other tests to use the new methods provides reasonableish coverage...

samoconnor · 2016-01-14T23:18:08Z

Asside re:

read! vs readbytes! is a shame.

It is really hard to find out how many places readbytes! is used in Julia code on github because they don't index the bang !.
!!!! !!!! !! ! :(

tkelman · 2016-01-14T23:28:12Z

readlines and readuntil with a filename are not covered yet. Neither is write(::IO, ::IO) for IO types that are not subtypes of AbstractIOBuffer

edit: neither is eachline of a filename

samoconnor · 2016-01-15T03:15:48Z

read! vs readbytes! is a shame. They really ought to be merged

I've made an attempt at that on a different branch...
samoconnor@e757e5c

I'd rather get this PR merged and defer readbytes! for now.

samoconnor · 2016-01-15T11:39:57Z

Hi @nalimilan, my grep-foo is quite strong ;)
My gripe is with trying to search all git repositories on GitHub (without having to clone them all first).

samoconnor · 2016-01-16T03:18:37Z

Could we change directly readall(x) to read(x, UTF8String) instead of renaming it to readstring?

@nalimilan This should work (but I'd rather leave it for a seperate PR) ...

read(s, ::Type{UTF8String}) = UTF8String(read(s))
read(s, ::Type{ASCIIString}) = ASCIIString(read(s))

However, I think I'd still want to keep readstring.

samoconnor · 2016-01-17T21:30:04Z

Note, I've added more read*() tests over here: https://github.com/JuliaLang/julia/pull/14699/files#diff-435745fdfa02a0dc4f3b747ba977af11R1

tkelman · 2016-01-17T21:43:31Z

I'd still want to keep readstring.

Why? Defeats the point of API cleanups and renames to leave the old version in place without a deprecation. If you want the type unstable version, can make that read(io, ByteString)

This PR is getting too large and doing too many things at once.

samoconnor · 2016-01-17T22:05:01Z

From PR description:

The discussion in #14608 includes ideas for better ways to do things in the future.
This PR is limited to making a few tweaks to the existing interface to make it a little more consistent

This PR is not proposing any changes to the existing read(s, ::Type) methods. Could we please move the discussion about that back to #14608.

Note that appveyor seems to be confused at the moment. There is a passing run for the most recent commit here : https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.12613

tkelman · 2016-01-17T22:12:01Z

There's no sense doing all the renaming to readstring here if we'd rather do read(io, ::Type) for this operation.

JeffBezanson · 2016-01-17T23:47:25Z

I suspect we will still want readstring(f) as a shorthand since this is such a basic operation.

tkelman · 2016-01-18T01:13:39Z

base/exports.jl

@@ -1165,10 +1165,9 @@ export
    RawFD,
    read,
    read!,
-    readall,
+    readstring,


this list is sorted

samoconnor · 2016-01-20T22:38:16Z

Bump.
I see that the needs-tests Label is still applied to this PR.

I addressed testing to some extent in this comment.

In #14699 I've built a much more comprehensive read*() test.
However #14699 finds (and fixes) a number of bugs that are unrelated to this PR. So rather than merging that test (and all fixes required for it to pass) into this PR I'd rather keep it seperate.

I hope that this PR can be merged soon so that I can rebase #14699 and update its tests for the revised API conventions.

See also #14747 re possible readbytes(::PipeEndpoint) bug.

tkelman · 2016-01-20T22:40:17Z

There's quite a bit of unaddressed review here still. When I'm at a bigger keyboard I'll collect into a checklist. Since #14699 fixes bugs and is more completely tested, I think it is in better shape to be merged first, assuming vtjnash and others have checked it for correctness.

JeffBezanson · 2016-01-20T23:06:19Z

I think this is ready to merge. I agree we should suggest readstring as the replacement for readall in the docs but that's a minor change.

tkelman · 2016-01-20T23:23:56Z

@nalimilan and I both disliked the docstring format here. I'll fix it myself if need be.

JeffBezanson · 2016-01-20T23:26:07Z

Ok, then I think it would be best to merge this and you can fix the docstrings on master.

Consistency of read() and write() functions (per #14608)

…thods

nalimilan · 2016-01-25T17:37:49Z

Ok, then I think it would be best to merge this and you can fix the docstrings on master.

@samoconnor, could you make a PR for this? Thanks!

stevengj · 2016-02-09T18:41:16Z

It is really hard to find out how many places readbytes! is used in Julia code on github because they don't index the bang !.

readbytes! is used in: Bio BufferedStreams CRC Compat DataFrames Expect FileIO FoundationDB GraphViz InfoZIP LibArchive LibGit2 Libz MbedTLS PyCall Requests StatefulIterators VideoIO Zlib

samoconnor · 2016-02-09T22:02:28Z

thx @stevengj,

I wonder how many of these could be replaced by a block-wise iterator (similar to eachline):
for chunk in eachblock(io, block_size) ...
e.g. like this one in CRC.jl

tkelman added the needs tests Unit tests are required for this change label Jan 12, 2016

tkelman reviewed Jan 12, 2016
View reviewed changes

tkelman added the needs docs Documentation for this change is required label Jan 12, 2016

tkelman mentioned this pull request Jan 12, 2016

write(filename::AbstractString, data) #14546

Closed

samoconnor mentioned this pull request Jan 12, 2016

write(to::IO, from::IO) #14628

Closed

tkelman reviewed Jan 13, 2016
View reviewed changes

nalimilan mentioned this pull request Jan 13, 2016

Consistency of read() and write() functions. #14608

Closed

tkelman reviewed Jan 13, 2016
View reviewed changes

samoconnor force-pushed the issue_14608 branch from 4e52945 to 55f635c Compare January 13, 2016 12:55

samoconnor reviewed Jan 14, 2016
View reviewed changes

samoconnor force-pushed the issue_14608 branch from 8ab65ee to fd6b27f Compare January 14, 2016 23:05

tweaks per @nalimilan's comments

a6309d4

samoconnor force-pushed the issue_14608 branch from 5508108 to a6309d4 Compare January 17, 2016 21:31

samoconnor mentioned this pull request Jan 17, 2016

make 'read!(::Base.LibuvStream, ::Array{UInt8, 1})' indicate the number of bytes actually received on EOFError #14630

Merged

tkelman reviewed Jan 18, 2016
View reviewed changes

base/exports.jl

@@ -1165,10 +1165,9 @@ export

RawFD,

read,

read!,

readall,

readstring,

Copy link

Contributor

tkelman Jan 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list is sorted

JeffBezanson added a commit that referenced this pull request Jan 20, 2016

Merge pull request #14660 from samoconnor/issue_14608

fccce37

Consistency of read() and write() functions (per #14608)

JeffBezanson merged commit fccce37 into JuliaLang:master Jan 20, 2016

samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 21, 2016

update per JuliaLang#14660

0cbf1c7

samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 21, 2016

update per JuliaLang#14660, add tests for filename-as-1st-arg read me…

d1820c4

…thods

tkelman removed the needs tests Unit tests are required for this change label Jan 21, 2016

samoconnor mentioned this pull request Jan 21, 2016

Convenience functions for dealing with .gz files BioJulia/Libz.jl#12

Merged

nalimilan mentioned this pull request Jan 21, 2016

gzreadall(filename) and gzwrite(filename, data) JuliaIO/GZip.jl#45

Closed

samoconnor added a commit to samoconnor/Compat.jl that referenced this pull request Jan 21, 2016

https://github.com/JuliaLang/julia/pull/14660

62a6bfe

samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 22, 2016

update per JuliaLang#14660, add tests for filename-as-1st-arg read me…

f4a6b36

…thods

samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 22, 2016

update per JuliaLang#14660, add tests for filename-as-1st-arg read me…

0ce7a52

…thods


		Read the entire contents of an I/O stream as a string.
		Read the entire contents of an I/O stream or a file as a string.

Consistency of read() and write() functions (per #14608) #14660

Consistency of read() and write() functions (per #14608) #14660

Conversation

samoconnor commented Jan 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Jan 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman commented Jan 13, 2016

JeffBezanson commented Jan 14, 2016

JeffBezanson commented Jan 14, 2016

samoconnor commented Jan 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Jan 14, 2016

tkelman commented Jan 14, 2016

samoconnor commented Jan 14, 2016

samoconnor commented Jan 14, 2016

samoconnor commented Jan 14, 2016

tkelman commented Jan 14, 2016

samoconnor commented Jan 15, 2016

samoconnor commented Jan 15, 2016

samoconnor commented Jan 16, 2016

samoconnor commented Jan 17, 2016

tkelman commented Jan 17, 2016

samoconnor commented Jan 17, 2016

tkelman commented Jan 17, 2016

JeffBezanson commented Jan 17, 2016

Choose a reason for hiding this comment

samoconnor commented Jan 20, 2016

tkelman commented Jan 20, 2016

JeffBezanson commented Jan 20, 2016

tkelman commented Jan 20, 2016

JeffBezanson commented Jan 20, 2016

nalimilan commented Jan 25, 2016

stevengj commented Feb 9, 2016

samoconnor commented Feb 9, 2016