Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-26175: Fix SpooledTemporaryFile IOBase abstract #3249

Closed

Conversation

GFernie
Copy link
Contributor

@GFernie GFernie commented Aug 30, 2017

One would assume that this class implements the IOBase abstract. As the
underlying file-like object is either io.BytesIO, io.StringIO, or a true
file object, this is a reasonable abstract expect and to implement.

Regardless, the behaviour of this class does not change much in the case
of the attribute being missing from the underlying file-like; an
AttributeError is still raised, albeit from one additional frame on the
stack trace.

https://bugs.python.org/issue26175

One would assume that this class implements the IOBase abstract. As the
underlying file-like object is either io.BytesIO, io.StringIO, or a true
file object, this is a reasonable abstract expect and to implement.

Regardless, the behaviour of this class does not change much in the case
of the attribute being missing from the underlying file-like; an
AttributeError is still raised, albeit from one additional frame on the
stack trace.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

Copy link
Member

@merwok merwok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch looks good! Note that the bug contains a stronger motivation than the first mssage here: «This was discovered when seeking a SpooledTemporaryFile-backed lzma file.»

Lib/test/test_tempfile.py Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I didn't expect the Spanish Inquisition!. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@GFernie
Copy link
Contributor Author

GFernie commented Sep 30, 2017

Thanks for the feedback @merwok

I've made some changes to more accurately test how SpooledTemporaryFile implements IOBase. So, there are now two separate tests:

  • Assert that an instance of SpooledTemporaryFile is an instance of IOBase. Because IOBase already implements the interface (nearly) this makes it redundant to check for the individual attributes that SpooledTemporaryFile should implement.
  • Assert that the attributes which not declared in IOBase but should be implemented by subclasses are implemented by SpooledTemporaryFile.

From the IOBase docs:

Even though IOBase does not declare read(), readinto(), or write() because their signatures will vary, implementations and clients should consider those methods part of the interface.

I didn't expect the Spanish Inquisition!

@bedevere-bot
Copy link

Nobody expects the Spanish Inquisition!

@merwok: please review the changes made to this pull request.

@embe
Copy link

embe commented Feb 19, 2018

There is at least one more incompatibility with IOBase: seek should return the current position instead of None.

@@ -685,6 +685,9 @@ def __exit__(self, exc, value, tb):
def __iter__(self):
return self._file.__iter__()

def __del__(self):
Copy link

@pppery pppery Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be added: deleting the SpooledTemporaryFile will null out the reference to the underline file, and therefor call its __del__

Copy link
Member

@vadmium vadmium Jun 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you inherit the default IOBase.__del__ implementation, it will call close and defeat any ResourceWarning that might otherwise be emitted. Perhaps it is better to make __del__ do nothing, or set it to object.__del__. [Seems that last option doesn’t exist.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: the underlying file should not be explicitly deleted as this is not expected behaviour. I can imagine a situation where someone is deliberately holding a reference to the underlying file and they would not expect/want it to be closed until their own reference has fallen out of scope. I've changed the method to do nothing and added a docstring to reflect this.

Thanks for your feedback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: the underlying file should not be explicitly deleted as this is not expected behaviour

The doc says: """This function operates exactly as TemporaryFile() does, except [irrelevant differences]."""

And then, about TemporaryFile: """On completion of the context or destruction of the file object the temporary file will be removed from the filesystem."""

So it seems the underlying file should be deleted when the file object disappears.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that IOBase implements a __del__ which has some side-effects. Those side effects are not desirable here. Any underlying buffers being wrapped, e.g. the TemporaryFile or the BytesIO, will be gc'd and handled as they should.

I think this implementation is correct.

@terryjreedy terryjreedy requested a review from vadmium June 8, 2018 17:51
We don't want to delete the underlying file explicitly as the expected
behaviour is for the file to be deleted *after* it falls out of scope.
@GFernie
Copy link
Contributor Author

GFernie commented Sep 9, 2018

You are right, @embe; it's part of the interface:
https://docs.python.org/library/io.html#io.IOBase.seek

The method now returns the value returned from self._file.seek(), delegating the responsibility of returning the current new absolute file position to the underlying file.

Thanks for pointing this out

@GFernie
Copy link
Contributor Author

GFernie commented Sep 9, 2018

Re seek: I've also done the same with truncate, as per Martin's recommendations on the Python bug tracker: https://bugs.python.org/issue26175#msg319145

This is also part of the file interface:
https://docs.python.org/library/io.html#io.IOBase.truncate

@GFernie GFernie requested a review from pitrou November 23, 2019 16:22
@GFernie
Copy link
Contributor Author

GFernie commented Nov 23, 2019

@pitrou I've considered your comments on what we should expect when calling __del__. There's three options which have been discussed:

  • Do nothing. The underlying buffer or temp file will be deleted when the object is finalised anyway. If the object is deleted with del, this would happen immediately. However, if __del__ is called then this would delay the deletion of the underlying buffer or temp file until the object is deleted with del or is garbage collected, so possibly never. Here, we are in a situation where we called __del__, expected the resources to be freed, but not even a warning was raised.
  • Inherit the io.IOBase.__del__ implementation. The underlying buffer or temp file will be deleted when __del__ is called, the object is deleted with del, or the object is garbage collected. This is the expected behaviour, but base implementation will explicitly call close, suppressing the ResourceWarning which should be raised when the object is deleted before closing the file.
  • Delegate the __del__ call to the underlying file. If it is a buffer then then the buffer will be freed and if it's rolled over to disk then the temp file will be deleted. If the temp file was not closed before the object was deleted then a ResourceWarning will be raised, just like TemporaryFile. This ensures that resources are freed exactly when we expect them to be and also that we are raising a warning when resources are not being managed properly by the programmer.

I've picked the third option, as you suggested. Thanks for your review.

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@merwok, @pitrou: please review the changes made to this pull request.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for posting this. Generally speaking, calling __del__ directly is incorrect. Only the interpreter should do that. From a user's point of view, what's important is whether an object is still reachable or not.

@@ -116,6 +116,7 @@ The module defines the following user-callable items:

.. versionchanged:: 3.8
Added *errors* parameter.
Implemented :class:`io.IOBase` inteface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 3.8.0 has been released, this should be moved to a new versionchanged:: 3.9 block.

@@ -679,6 +679,9 @@ def __exit__(self, exc, value, tb):
def __iter__(self):
return self._file.__iter__()

def __del__(self):
return self._file.__del__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, calling the __del__ method directly like that is not correct. __del__ is only to be called when all external references are gone. Instead I would suggest something like the following:

    def __del__(self):
        if not self.closed:
            import warnings
            warnings.warn('unclosed file %r' % (self,), ResourceWarning,
                          stacklevel=2, source=self)
            self.close()

(this is adapted from FileIO.__del__ in Lib/pyio.py)

# ResourceWarning since the file was not explicitly closed.
f = self.do_create(max_size=2)
f.write(b'foo')
filename = f.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? Here I get something like:

>>> f = tempfile.SpooledTemporaryFile()
>>> f.name
>>> f.rollover()
>>> f.name
3

f.write(b'foo')
filename = f.name
self.assertTrue(os.path.exists(filename))
with self.assertWarns(ResourceWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also expect a ResourceWarning to be raised when the file is not rolled over? If yes, there should be a test for it. If not, then the __del__ implementation can be removed.

filename = f.name
self.assertTrue(os.path.exists(filename))
with self.assertWarns(ResourceWarning):
f.__del__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not correct to call __del__ directly. Instead, use del f or f = None.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@csabella
Copy link
Contributor

@GFernie, thank you for your last comments and for addressing the code review. It looks like @pitrou had some additional notes after that, so it would be great if you could look at this again. Thank you!

@@ -743,18 +755,21 @@ def tell(self):

def truncate(self, size=None):
if size is None:
self._file.truncate()
return self._file.truncate()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else is not needed, PEP warning, might as well fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to remove the else. It does no harm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never said it was necessary, just bad style as it unnecessarily increased the code depth, take my comment as an FYI

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to confirm @merwok - nothing needs to be changed here for approval?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I advise to not change the else line.

@danieldjewell
Copy link

Not sure if @GFernie is still around -- are the outstanding changes/questions still those raised by @pitrou? @merwok?

@merwok
Copy link
Member

merwok commented May 13, 2020

This PR needs to merge master and address Antoine’s latest review.

@csabella
Copy link
Contributor

@danieldjewell, you're right. Would you like to create a new PR based on this one? If so, please follow this section of the devguide to give GFernie the credit as the original author. Thanks!

@carlos-jenkins
Copy link

Hello,

For anyone that got here looking for the error 'SpooledTemporaryFile' object has no attribute 'seekable'

In my case it was trying to pass an SpooledTemporaryFile to an ZipFile to decompress it (Starlette's UploadFile class if anyone cares). But there are other cases. Until the issue referenced in this PR is fixed, an option is to monkey patch the class. Sorry, ugly hack follows.

In one of your modules:

from tempfile import SpooledTemporaryFile

def _seekable(self):
    """
    Monkey patched seekable() method for the SpooledTemporaryFile class.

    Sadly, we cannot send an instance of the SpooledTemporaryFile to the
    ZipFile object.

    This is because the SpooledTemporaryFile doesn't inherit / implement the
    IOBase class.

    This bug is reported in CPython:

        https://bugs.python.org/issue26175

    And an unmerged PR is located here:

        https://github.com/python/cpython/pull/3249/files

    In Python 3.8, attempting to pass the tmpfd fails with:

        AttributeError:
            'SpooledTemporaryFile' object has no attribute 'seekable'

    So, the workaround to avoid copying the file is to monkey patch the class 
    and add the missing method. Sorry.
    """
    return self._file.seekable()


SpooledTemporaryFile.seekable = _seekable

@Molaire
Copy link

Molaire commented Jan 18, 2021

To Google people: this also breaks pandas.read_csv() and this monkey patch can be used for property readable writable and seekable to fix it.

Hello,

For anyone that got here looking for the error 'SpooledTemporaryFile' object has no attribute 'seekable'

In my case it was trying to pass an SpooledTemporaryFile to an ZipFile to decompress it (Starlette's UploadFile class if anyone cares). But there are other cases. Until the issue referenced in this PR is fixed, an option is to monkey patch the class. Sorry, ugly hack follows.

In one of your modules:

from tempfile import SpooledTemporaryFile

def _seekable(self):
    """
    Monkey patched seekable() method for the SpooledTemporaryFile class.

    Sadly, we cannot send an instance of the SpooledTemporaryFile to the
    ZipFile object.

    This is because the SpooledTemporaryFile doesn't inherit / implement the
    IOBase class.

    This bug is reported in CPython:

        https://bugs.python.org/issue26175

    And an unmerged PR is located here:

        https://github.com/python/cpython/pull/3249/files

    In Python 3.8, attempting to pass the tmpfd fails with:

        AttributeError:
            'SpooledTemporaryFile' object has no attribute 'seekable'

    So, the workaround to avoid copying the file is to monkey patch the class 
    and add the missing method. Sorry.
    """
    return self._file.seekable()


SpooledTemporaryFile.seekable = _seekable

@MikeEcho
Copy link

Python dev here from the future (2021 - save yourselves while you can). Still getting SpooledTemporaryFile errors while reading CSVs (via tempfiles) in. Kindly asking if requested changes will be addressed @GFernie ?

@pR0Ps
Copy link
Contributor

pR0Ps commented Dec 31, 2021

@MikeEcho since it seemed like @GFernie had moved on I submitted an updated and rebased PR for this fix in #29560 with all of the code review comments addressed. Just waiting on a code review.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 21, 2022
@methane
Copy link
Member

methane commented May 3, 2022

Close this because I merged #29560.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.