Skip to content

Conversation

@alinaliBQ
Copy link
Contributor

@alinaliBQ alinaliBQ commented Oct 9, 2025

Rationale for this change

Enable ODBC unicode support, so ODBC can handle wide characters.

What changes are included in this PR?

  • Enable ODBC unicode support by setting -DUNICODE to 1
  • DSN is read and retrieved in unicode format, and then converted to regular strings for driver
  • Fix a string reading bug in GetAttributeSQLWCHAR

Are these changes tested?

Build is tested locally

Are there any user-facing changes?

No

alinaliBQ and others added 2 commits October 9, 2025 10:29
* Let compiler append `W` to ODBC APIs where applicable.
* pending fixes for changing namespaces
Co-authored-by: rscales <[email protected]>

Length should be divided by number of characters in a wide string
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

⚠️ GitHub issue #47726 has been automatically assigned in GitHub to PR creator.

@alinaliBQ alinaliBQ marked this pull request as ready for review October 9, 2025 20:25
@alinaliBQ alinaliBQ requested a review from lidavidm as a code owner October 9, 2025 20:25
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make future porting to other platforms more difficult?

@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 10, 2025
Copy link
Contributor Author

@alinaliBQ alinaliBQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed David's comments

@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Oct 10, 2025
@alinaliBQ
Copy link
Contributor Author

alinaliBQ commented Oct 10, 2025

Would this make future porting to other platforms more difficult?

@lidavidm Are you referring to porting the unicode support to macOS/Linux platforms? Could you please elaborate on this point?

@lidavidm
Copy link
Member

Would this make future porting to other platforms more difficult?

@lidavidm Are you referring to porting the unicode support to macOS/Linux platforms? Could you please elaborate on this point?

Yes, would std::wstring cause issues on Linux/macOS? (I suppose the type is supported, but might be annoying since most OS APIs don't use wchar there...)

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it there's a lot of ValueOr that I ignored earlier...I would expect the errors need to be propagated?

const std::string& dflt = "") {
#define BUFFER_SIZE (1024)
std::vector<char> buf(BUFFER_SIZE);
std::wstring wdsn = arrow::util::UTF8ToWideString(dsn).ValueOr(L"");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we ignoring errors here?

I suppose the return type doesn't allow for errors. But I'd expect that we update the return type to Result<std::string>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise a good point, these errors should not be ignored. I have added throws for the Result errors.
The caller will catch the exceptions thrown


return std::string(buf.data(), ret);
std::wstring wresult = std::wstring(buf.data(), ret);
std::string result = arrow::util::WideStringToUTF8(wresult).ValueOr("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment, I have added throws

@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Oct 13, 2025
Throw error if conversion fails

Remove DSN window

Leave enabling DSN window to apacheGH-46574
Copy link
Contributor Author

@alinaliBQ alinaliBQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing David's comments

const std::string& dflt = "") {
#define BUFFER_SIZE (1024)
std::vector<char> buf(BUFFER_SIZE);
std::wstring wdsn = arrow::util::UTF8ToWideString(dsn).ValueOr(L"");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise a good point, these errors should not be ignored. I have added throws for the Result errors.
The caller will catch the exceptions thrown


return std::string(buf.data(), ret);
std::wstring wresult = std::wstring(buf.data(), ret);
std::string result = arrow::util::WideStringToUTF8(wresult).ValueOr("");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment, I have added throws

@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Oct 15, 2025
@alinaliBQ
Copy link
Contributor Author

Yes, would std::wstring cause issues on Linux/macOS? (I suppose the type is supported, but might be annoying since most OS APIs don't use wchar there...)

I will look into this and get back to you

Continue adding places where exceptions need to be thrown or caught
@alinaliBQ alinaliBQ force-pushed the gh-47726-unicode-support branch from b378900 to 08bf224 Compare October 15, 2025 00:13
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Oct 15, 2025
@alinaliBQ
Copy link
Contributor Author

@lidavidm To follow up on the usage of std::wstring on macOS/Linux, we will be using std::wstring to hold UTF-32 data on Linux/Mac and UTF-16 on Windows. The ODBC uses wchar / std::wstring when communicating with ODBC APIs to support unicode, and the ODBC will be converting to use std::string internally when communicating with any OS API that uses char. I think it should be okay

@lidavidm lidavidm merged commit fc5fd48 into apache:main Oct 16, 2025
44 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Oct 16, 2025
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit fc5fd48.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@alinaliBQ alinaliBQ deleted the gh-47726-unicode-support branch October 27, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants