Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency of read() and write() functions (per #14608) #14660

Merged
merged 3 commits into from
Jan 20, 2016

Conversation

samoconnor
Copy link
Contributor

As proposed in #14608.

The discussion in #14608 includes ideas for better ways to do things in the future.
This PR is limited to making a few tweaks to the existing interface to make it a little more consistent:

  • Deprecate readbytes and readall, replaced by read and readstring.
  • Accept filename::AbstractString as 1st arg for write, read, read!, readuntil, readline and readlines.
  • write(to::IO, from::IO) per write(to::IO, from::IO) #14628

@tkelman tkelman added the needs tests Unit tests are required for this change label Jan 12, 2016
while !eof(from)
write(to, readavailable(from))
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs-test label is for all the new methods here and above

edit: and docs

@tkelman tkelman added the needs docs Documentation for this change is required label Jan 12, 2016
@@ -843,10 +843,9 @@
<string>rationalize</string>
<string>read</string>
<string>read!</string>
<string>readall</string>
<string>readstring</string>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be sorted, presumably?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and will presumably be re-sorted next time the file is generated <!-- these generated from names(Base) -->.
For now I think its better for the diff to be minimal so it is clear that a was renamed to b.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #8298 - "It was more of a hack and slash effort than a script. I got a list of names by calling names() in the REPL and copy/paste -ing. Perhaps I'll do an automated version one day..."

Every other renaming and refactoring that has touched this file has kept it sorted. That's easier to figure out from a diff than ignoring review and adding work to clean it up on master. Ref #9468 (comment) where this has happened before.

@nalimilan
Copy link
Member

Given the ongoing discussions at #14608, I think you'd better keep your second point (filename argument) for another PR if you want this one to be merged soon.


Read the entire contents of an I/O stream as a string.
Read the entire contents of an I/O stream or a file as a string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closes files when done, but not ::IO inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
A filename has no concept of open or close.
The doc says that it reads the entire contents of a file. That is what it does, open/close is hidden implementation detail. A stream may have a concept of open/close but if it does that is beyond the scope of this function, all this function does is read as a string.

@tkelman
Copy link
Contributor

tkelman commented Jan 13, 2016

@nalimilan has a point. Separate the rename from the new APIs into different PR's.

@JeffBezanson
Copy link
Sponsor Member

read! vs readbytes! is a shame. They really ought to be merged. I'd vote for keeping the behavior of returning the number of bytes read, since there's no other way to know that. You already have a reference to the array it's reading into so returning that isn't as useful.

@JeffBezanson
Copy link
Sponsor Member

+1 I generally like this a lot. I only wonder if readline(filename) makes sense --- a method just for reading the first line of a file? Same argument applies to read(filename, T). It might make sense to only accept a filename argument if the whole file is involved. That seems intuitive to me.

@samoconnor
Copy link
Contributor Author

... a method just for reading the first line of a file? ...

Yeah, I went back and forth on that myself. I considered likely use cases of the methods vs orthogonality consistency.

  • In general the point of making APIs orthogonal is that the API designer does not have to predict every use case and the user does not have to read the manual as much. i.e. the argument is that if read* accepts a filename most places it should accept a filename everywhere, otherwise some user ends up being frustrated by a no method error and has to stop thinking about their job and spend time reading the Julia manual.
  • The goal of being as good at gluing programs together as the shell implies using Julia for writing very short "shell scripts". In shell scripts files are almost always dealt with by filename.
  • In particular there are two partial read functions in this PR readuntil(f, c) and readline(f).
  • At least one use case comes to mind for readline(filename): Reading 1st line of a CSV file to get the column headers before reading whole file; or reading column headers from a whole set of CSV files to figure out common column set before merging.
  • The more general readuntil(filename, c) has more use cases around reading file headers. e.g. There might be an ascii text header up to the first '\0', then a hugh binary file. In the system I work on, the 0x1A EOF character is used to delimit plaintext header from binary data. A bunch of file formats use RFC 2822 style headers delimited by \r\n\r\n. e.g. `header = readuntil("email.txt", "\r\n\r\n").

open(joinpath(path(),"META_BRANCH")) do io
chomp(readuntil(io, "/n"))
end
chomp(readline(joinpath(path(),"META_BRANCH")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffBezanson at least one use of readline(filename) already exists.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and good heavens, is that a /n instead of \n on the line above? Seems like it worked because the data it wants is actually the whole file (usually?). But wow.

@JeffBezanson
Copy link
Sponsor Member

Ok, I guess the extra methods are pretty harmless. In that case this just needs some tests, a rebase and a passing CI run then we can merge.

@tkelman
Copy link
Contributor

tkelman commented Jan 14, 2016

new read! and write(io, io) methods are still undocumented, and the TextWrangler plist (edit: and exports list) may as well be kept sorted. I still think doing the rename separately from any new methods and refactoring is better practice, easier to review and less likely to introduce new bugs.

@samoconnor
Copy link
Contributor Author

Thx for your efforts reviewing this @tkelman, I've updated doc for read!.
write(::IO,::IO) is already covered by the existing doc:

    write(stream or filename, x)

Write the canonical binary representation of a value to the given stream or file.
...

@samoconnor
Copy link
Contributor Author

just needs some tests

I think all the places where I've updated other tests to use the new methods provides reasonableish coverage...

@samoconnor
Copy link
Contributor Author

Asside re:

read! vs readbytes! is a shame.

It is really hard to find out how many places readbytes! is used in Julia code on github because they don't index the bang !.
!!!! !!!! !! ! :(

@tkelman
Copy link
Contributor

tkelman commented Jan 14, 2016

readlines and readuntil with a filename are not covered yet. Neither is write(::IO, ::IO) for IO types that are not subtypes of AbstractIOBuffer

edit: neither is eachline of a filename

@samoconnor
Copy link
Contributor Author

read! vs readbytes! is a shame. They really ought to be merged

I've made an attempt at that on a different branch...
samoconnor@e757e5c

I'd rather get this PR merged and defer readbytes! for now.

@samoconnor
Copy link
Contributor Author

Hi @nalimilan, my grep-foo is quite strong ;)
My gripe is with trying to search all git repositories on GitHub (without having to clone them all first).

@samoconnor
Copy link
Contributor Author

Could we change directly readall(x) to read(x, UTF8String) instead of renaming it to readstring?

@nalimilan This should work (but I'd rather leave it for a seperate PR) ...

read(s, ::Type{UTF8String}) = UTF8String(read(s))
read(s, ::Type{ASCIIString}) = ASCIIString(read(s))

However, I think I'd still want to keep readstring.

@samoconnor
Copy link
Contributor Author

@tkelman
Copy link
Contributor

tkelman commented Jan 17, 2016

I'd still want to keep readstring.

Why? Defeats the point of API cleanups and renames to leave the old version in place without a deprecation. If you want the type unstable version, can make that read(io, ByteString)

This PR is getting too large and doing too many things at once.

@samoconnor
Copy link
Contributor Author

From PR description:

The discussion in #14608 includes ideas for better ways to do things in the future.
This PR is limited to making a few tweaks to the existing interface to make it a little more consistent

This PR is not proposing any changes to the existing read(s, ::Type) methods. Could we please move the discussion about that back to #14608.

Note that appveyor seems to be confused at the moment. There is a passing run for the most recent commit here : https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.12613

@tkelman
Copy link
Contributor

tkelman commented Jan 17, 2016

There's no sense doing all the renaming to readstring here if we'd rather do read(io, ::Type) for this operation.

@JeffBezanson
Copy link
Sponsor Member

I suspect we will still want readstring(f) as a shorthand since this is such a basic operation.

@@ -1165,10 +1165,9 @@ export
RawFD,
read,
read!,
readall,
readstring,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list is sorted

@samoconnor
Copy link
Contributor Author

Bump.
I see that the needs-tests Label is still applied to this PR.

I addressed testing to some extent in this comment.

In #14699 I've built a much more comprehensive read*() test.
However #14699 finds (and fixes) a number of bugs that are unrelated to this PR. So rather than merging that test (and all fixes required for it to pass) into this PR I'd rather keep it seperate.

I hope that this PR can be merged soon so that I can rebase #14699 and update its tests for the revised API conventions.

See also #14747 re possible readbytes(::PipeEndpoint) bug.

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2016

There's quite a bit of unaddressed review here still. When I'm at a bigger keyboard I'll collect into a checklist. Since #14699 fixes bugs and is more completely tested, I think it is in better shape to be merged first, assuming vtjnash and others have checked it for correctness.

@JeffBezanson
Copy link
Sponsor Member

I think this is ready to merge. I agree we should suggest readstring as the replacement for readall in the docs but that's a minor change.

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2016

@nalimilan and I both disliked the docstring format here. I'll fix it myself if need be.

@JeffBezanson
Copy link
Sponsor Member

Ok, then I think it would be best to merge this and you can fix the docstrings on master.

JeffBezanson added a commit that referenced this pull request Jan 20, 2016
Consistency of read() and write() functions (per #14608)
@JeffBezanson JeffBezanson merged commit fccce37 into JuliaLang:master Jan 20, 2016
samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 21, 2016
samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 21, 2016
@tkelman tkelman removed the needs tests Unit tests are required for this change label Jan 21, 2016
samoconnor added a commit to samoconnor/Compat.jl that referenced this pull request Jan 21, 2016
samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 22, 2016
samoconnor added a commit to samoconnor/julia that referenced this pull request Jan 22, 2016
@nalimilan
Copy link
Member

Ok, then I think it would be best to merge this and you can fix the docstrings on master.

@samoconnor, could you make a PR for this? Thanks!

@stevengj
Copy link
Member

stevengj commented Feb 9, 2016

It is really hard to find out how many places readbytes! is used in Julia code on github because they don't index the bang !.

readbytes! is used in: Bio BufferedStreams CRC Compat DataFrames Expect FileIO FoundationDB GraphViz InfoZIP LibArchive LibGit2 Libz MbedTLS PyCall Requests StatefulIterators VideoIO Zlib

@samoconnor
Copy link
Contributor Author

thx @stevengj,

I wonder how many of these could be replaced by a block-wise iterator (similar to eachline):
for chunk in eachblock(io, block_size) ...
e.g. like this one in CRC.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants