Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 5592: Colon (:) in in object_store::path::{Path} is not handled on Windows #5830

Merged
merged 17 commits into from
Jul 13, 2024

Conversation

hesampakdaman
Copy link
Contributor

Which issue does this PR close?

Closes #5592.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Jun 1, 2024
@alamb alamb changed the title Fix 5592 Fix 5592: Colon (:) in in object_store::path::{Path} is not handled on Windows Jun 3, 2024
@thomasfrederikhoeck
Copy link

@hesampakdaman is this ready for review? :-)

@hesampakdaman
Copy link
Contributor Author

hesampakdaman commented Jun 9, 2024

@thomasfrederikhoeck I believe so, although I haven't been able to run an integration test (don't have Windows accessible).

@hesampakdaman hesampakdaman marked this pull request as ready for review June 9, 2024 13:22
@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

Can someone please verify that this works on Windows? I don't have access to a windows machine at the moment

@thomasfrederikhoeck
Copy link

I don't think this fixes it. If I run the following on Windows with the fix

use object_store::path::{Path};

fn main() {
    let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\file.parquet".to_string();
    let path_from: Path = Path::from(path);
    println!( "{:?}", path_from);

    let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\file.parquet".to_string();
    let path_parse: Result<Path, object_store::path::Error> = Path::parse(path);
    println!( "{:?}", path_parse);
}

I still see:

Path { raw: "C:%5Ctable%5Ctime=2021-01-02 03:04:06.000003%5Cfile.parquet" }
Ok(Path { raw: "C:\\table\\time=2021-01-02 03:04:06.000003\\file.parquet" })

where I would have expected:

Path { raw: "C:%5Ctable%5Ctime=2021-01-02 03%3A04%3A06.000003%5Cfile.parquet" }
Ok(Path { raw: "C:\\table\\time=2021-01-02 03:04:06.000003\\file.parquet" })

But I might be assuming something wrong? @alamb @hesampakdaman

@hesampakdaman
Copy link
Contributor Author

@thomasfrederikhoeck You're not thinking wrong, my initial suggestion was to change how Path behaves. In the discussions we had in the issue, @tustvold mentioned that that would be a breaking change and affect all paths for all stores.

So instead we agreed to change how LocalFileSystem behaves when using it from Windows. If you use the put/get methods it should convert the : to %3A. Although, I'm a little uncertain if I'm correctly handling the leading C:\.

At any rate, if you could test this that would be awesome! I would have done it myself but I don't have access to Windows unfortunately.

@thomasfrederikhoeck
Copy link

@hesampakdaman I just tested with below and it doesn't appear like it is handeling the : in the drive correctly which also are changes but shouldn't. This results in the file being created in a relative position with the name as C:\projects\delta\pathtester\C%3A\Users\tfh\AppData\Local\Temp\.tmpwqJ9tp\file%3Aname.parquet

image

The script I ran was

use object_store::path::{Path};
use tempfile::{TempDir};
use object_store::local::LocalFileSystem;
use object_store::ObjectStore;
use bytes::Bytes;

#[tokio::main]
async fn main() {
    let root = TempDir::new().unwrap();
    let integration = LocalFileSystem::new_with_prefix(root.path()).unwrap();
    let path = Path::parse("file%3Aname.parquet").unwrap();
    let location = Path::parse("file:name.parquet").unwrap();

    integration.put(&location, "test".into()).await.unwrap();

    let result = integration
        .get(&location)
        .await
        .unwrap()
        .bytes()
        .await
        .unwrap();
    assert_eq!(result, Bytes::from("test"));
}

@thomasfrederikhoeck
Copy link

I can also add that if I run the following it creates a file at C:\projects\delta\pathtester\C%3A\file%3Aname.parquet

use object_store::path::{Path};
use object_store::local::LocalFileSystem;
use object_store::ObjectStore;
use bytes::Bytes;

#[tokio::main]
async fn main() {
    let integration = LocalFileSystem::new();
    let location = Path::parse("C:\\file:name.parquet").unwrap();
    println!("{}",location);
    integration.put(&location, "test".into()).await.unwrap();

    let result = integration
        .get(&location)
        .await
        .unwrap()
        .bytes()
        .await
        .unwrap();
    assert_eq!(result, Bytes::from("test"));
}

@thomasfrederikhoeck
Copy link

FYI the code I run is running with C:\projects\delta\pathtester as cwd :-)

@hesampakdaman hesampakdaman marked this pull request as draft June 25, 2024 19:22
@hesampakdaman
Copy link
Contributor Author

@thomasfrederikhoeck Thank you for testing, it helps a lot! Yes, as suspected I did not handle the leading colon from the drive. I made a change and hopefully it will work now.

@alamb I marked this as draft until it is tested again by @thomasfrederikhoeck.

@thomasfrederikhoeck
Copy link

@hesampakdaman I can confirm it gives the expected results for these two :-D :-D :

use object_store::path::{Path};
use object_store::local::LocalFileSystem;
use object_store::ObjectStore;
use bytes::Bytes;
use tempfile::{NamedTempFile, TempDir};

#[tokio::main]
async fn main() {
    let root_temp = TempDir::new().unwrap();

    // full path
    let integration_full = LocalFileSystem::new();
    let location_full = Path::parse(r"C:\Users\tfh\file:name-full.parquet").unwrap();
    println!("{}",location_full);
    integration_full.put(&location_full, "test".into()).await.unwrap();

    let result = integration_full
        .get(&location_full)
        .await
        .unwrap()
        .bytes()
        .await
        .unwrap();
    assert_eq!(result, Bytes::from("test"));


    let integration_prefefix = LocalFileSystem::new_with_prefix(r"C:\Users\tfh\").unwrap();
    let location_prefefix = Path::parse(r"file:name-prefix.parquet").unwrap();
    println!("{}",location_prefefix);
    integration_prefefix.put(&location_prefefix, "test".into()).await.unwrap();
    let result = integration_prefefix
        .get(&location_prefefix)
        .await
        .unwrap()
        .bytes()
        .await
        .unwrap();
    assert_eq!(result, Bytes::from("test"));

}

@alamb
Copy link
Contributor

alamb commented Jun 26, 2024

Thanks for the check @thomasfrederikhoeck

Would someone be willing to setup a github runner to test object_store on windows as part of CI? That would ensure we don't introduce regressions going forward.

The CI jobs are defined here: https://github.com/apache/arrow-rs/blob/master/.github/workflows/object_store.yml

Here is an example of testing on windows from DataFusion: https://github.com/apache/datafusion/blob/dd56dbe67c9c61fbe13fefda367fe50503921351/.github/workflows/rust.yml#L325

@hesampakdaman
Copy link
Contributor Author

@alamb I'll take a look later in the week and will add it in this PR!

@alamb
Copy link
Contributor

alamb commented Jun 27, 2024

Thank you @hesampakdaman

@hesampakdaman
Copy link
Contributor Author

@alamb I finally added a CI job. However, I noticed that some tests failed if I ran them all under windows. Some of these were outside of local.rs. I did two things to address (ignore) this:

  1. I run only the tests in local.rs
  2. I added a target = "unix" on two tests that were failing under windows.

I will try to verify if the tests are failing without my changes.

@hesampakdaman hesampakdaman marked this pull request as ready for review July 2, 2024 09:51
@hesampakdaman
Copy link
Contributor Author

I re-ran the CI tests for windows and removed my changes. It seems that they just fail under windows.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @hesampakdaman and @thomasfrederikhoeck for this contribution -- this looks good to me. I also appreciate the effort to make a test.

It looks like this PR has a conflict to resolve but then it should be good to go.

@@ -198,3 +198,16 @@ jobs:
run: cargo build --target wasm32-unknown-unknown
- name: Build wasm32-wasi
run: cargo build --target wasm32-wasi

windows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@hesampakdaman
Copy link
Contributor Author

@alamb Done 🚀

@alamb
Copy link
Contributor

alamb commented Jul 13, 2024

Thanks again everyone!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Colon (:) in in object_store::path::{Path} is not handled on Windows
3 participants