-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-3777: [C++] Add Slow input streams and slow filesystem #5439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,7 +25,6 @@ | |
| #include <vector> | ||
|
|
||
| #include "arrow/status.h" | ||
| #include "arrow/util/compression.h" | ||
| #include "arrow/util/visibility.h" | ||
|
|
||
| // The Windows API defines macros from *File resolving to either | ||
|
|
@@ -44,6 +43,7 @@ namespace arrow { | |
| namespace io { | ||
|
|
||
| class InputStream; | ||
| class LatencyGenerator; | ||
| class OutputStream; | ||
| class RandomAccessFile; | ||
|
|
||
|
|
@@ -265,5 +265,47 @@ class ARROW_EXPORT SubTreeFileSystem : public FileSystem { | |
| Status FixStats(FileStats* st) const; | ||
| }; | ||
|
|
||
| /// \brief EXPERIMENTAL: a FileSystem implementation that delegates to another | ||
| /// implementation but inserts latencies at various points. | ||
| class ARROW_EXPORT SlowFileSystem : public FileSystem { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've brought this topic up before, but what do you think about putting the entire implementation of SlowFS in a .cc file and returning std::shared_ptr from the function that creates it? This can always be done later so refactor need not happen now. It's probably better to use factory methods for instantiating most FS classes anyway Unless you anticipate adding additional methods to the class
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comments hold regarding the slow stream classes
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good question. I don't think our conventions should vary too much. If some classes are exposed and other hidden it feels a bit weird. But strong opinion from me.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My preference would be to hide the implementations in all cases except where we expose additional methods. If only that it offers more freedom to refactor
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, we discuss this in a separate JIRA. Will merge. |
||
| public: | ||
| SlowFileSystem(std::shared_ptr<FileSystem> base_fs, | ||
| std::shared_ptr<io::LatencyGenerator> latencies); | ||
| SlowFileSystem(std::shared_ptr<FileSystem> base_fs, double average_latency); | ||
| SlowFileSystem(std::shared_ptr<FileSystem> base_fs, double average_latency, | ||
| int32_t seed); | ||
|
|
||
| using FileSystem::GetTargetStats; | ||
| Status GetTargetStats(const std::string& path, FileStats* out) override; | ||
| Status GetTargetStats(const Selector& select, std::vector<FileStats>* out) override; | ||
|
|
||
| Status CreateDir(const std::string& path, bool recursive = true) override; | ||
|
|
||
| Status DeleteDir(const std::string& path) override; | ||
| Status DeleteDirContents(const std::string& path) override; | ||
|
|
||
| Status DeleteFile(const std::string& path) override; | ||
|
|
||
| Status Move(const std::string& src, const std::string& dest) override; | ||
|
|
||
| Status CopyFile(const std::string& src, const std::string& dest) override; | ||
|
|
||
| Status OpenInputStream(const std::string& path, | ||
| std::shared_ptr<io::InputStream>* out) override; | ||
|
|
||
| Status OpenInputFile(const std::string& path, | ||
| std::shared_ptr<io::RandomAccessFile>* out) override; | ||
|
|
||
| Status OpenOutputStream(const std::string& path, | ||
| std::shared_ptr<io::OutputStream>* out) override; | ||
|
|
||
| Status OpenAppendStream(const std::string& path, | ||
| std::shared_ptr<io::OutputStream>* out) override; | ||
|
|
||
| protected: | ||
| std::shared_ptr<FileSystem> base_fs_; | ||
| std::shared_ptr<io::LatencyGenerator> latencies_; | ||
| }; | ||
|
|
||
| } // namespace fs | ||
| } // namespace arrow | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to this PR, but I finally got annoyed enough by the slow format-check that I spent some time figuring out what happened. The bottom line is that each call to
_check_one_filein a child process was serializing and transmitting the entire results for all files. Instead, we just transmit the result for the single file being checked, it's much faster.@fsaintjacques @bkietz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1