Skip to content

Conversation

@bjornjorgensen
Copy link
Contributor

@bjornjorgensen bjornjorgensen commented Mar 8, 2022

What changes were proposed in this pull request?

To change from mktemp to mkstemp.

Why are the changes needed?

Pandas API on Spark use mktemp in test.
mktemp "THIS FUNCTION IS UNSAFE AND SHOULD NOT BE USED. The file name may
refer to a file that did not exist at some point, but by the time
you get around to creating it, someone else may have beaten you to
the punch."

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Got the green light.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38355]Upgrade mktemp to mkstemp [SPARK-38355][PYSPARK][TESTS] Use mkstemp instead of mktemp Mar 8, 2022
@dongjoon-hyun
Copy link
Member

cc @HyukjinKwon

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38355][PYSPARK][TESTS] Use mkstemp instead of mktemp [SPARK-38355][PYTHON][TESTS] Use mkstemp instead of mktemp Mar 9, 2022
@itholic
Copy link
Contributor

itholic commented Mar 9, 2022

Looks pretty good, but would you mind elaborating the PR description why mkstemp is safer than mktemp ??

For example, "Unlike mktemp , mkstemp is actually guaranteed to create a unique file that cannot possibly clash with any other program trying to create a temporary file" ?? (refer to https://www.gnu.org/software/libc/manual/html_node/Temporary-Files.html)

def temp_file(self):
with self.temp_dir() as tmp:
yield tempfile.mktemp(dir=tmp)
yield tempfile.mkstemp(dir=tmp)[1]
Copy link
Contributor

@itholic itholic Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Does mkstemp always guarantees returning the (fd, name pair) ?? Maybe don't we need to check the length or something to make sure mkstemp returning the proper value ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it dose:

mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.

@HyukjinKwon
Copy link
Member

Merged to master.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Mar 10, 2022
### What changes were proposed in this pull request?
To change from `mktemp` to `mkstemp`.

### Why are the changes needed?
Pandas API on Spark use `mktemp` in test.
`mktemp` "THIS FUNCTION IS UNSAFE AND SHOULD NOT BE USED.  The file name may
refer to a file that did not exist at some point, but by the time
you get around to creating it, someone else may have beaten you to
the punch."

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Got the green light.

Closes apache#35775 from bjornjorgensen/Upgrade-mktemp-to-mkstemp.

Authored-by: bjornjorgensen <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@bjornjorgensen
Copy link
Contributor Author

@itholic @HyukjinKwon and @LuciferYang

mktemp is deprecated since Python 2.3 witch was released July 29, 2003.

This is the text that I got from Sonarqubes :

Creating temporary files using insecure methods exposes the application to race conditions on filenames: a malicious user can try to create a file with a predictable name before the application does. A successful attack can result in other files being accessed, modified, corrupted or deleted. This risk is even higher if the application run with elevated permissions.

In the past, it has led to the following vulnerabilities

CVE-2014-1858

CVE-2014-1932

I did have some issues implementing this function. What to do with fd

So I open a post at SO upgrade-from-mktemp-to-mkstemp

@LuciferYang
Copy link
Contributor

LGTM +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants