Skip to content

Commit 599b550

Browse files
authored
Remove TranslateUnicodeToOem and all related code (#14745)
The overarching intention of this PR is to improve our Unicode support. Most of our APIs still don't support anything beyond UCS-2 and DBCS sequences. This commit doesn't fix the UTF-16 support (by supporting surrogate pairs), but it does improve support for UTF-8 by allowing longer `char` sequences. It does so by removing `TranslateUnicodeToOem` which seems to have had an almost viral effect on code quality wherever it was used. It made the assumption that _all_ narrow glyphs encode to 1 `char` and most wide glyphs to 2 `char`s. It also didn't bother to check whether `WideCharToMultiByte` failed or returned a different amount of `char`s. So up until now it was easily possible to read uninitialized stack memory from conhost. Any code that used this function was forced to do the same "measurement" of narrow/wide glyphs, because _of course_ it didn't had any way to indicate to the caller how much memory it needs to store the result. Instead all callers were forced to sorta replicate how it worked to calculate the required storage ahead of time. Unsurprisingly, none of the callers used the same algorithm... Without it the code is much leaner and easier to understand now. The best example is `COOKED_READ_DATA::_handlePostCharInputLoop` which used to contain 3 blocks of _almost_ identical code, but with ever so subtle differences. After reading the old code for hours I still don't know if they were relevant or not. It used to be 200 lines of code lacking any documentation and it's now 50 lines with descriptive function names. I hope this doesn't break anything, but to be honest I can't imagine anyone having relied on this mess in the first place. I needed some helpers to handle byte slices (`std::span<char>`), which is why a new `til/bytes.h` header was added. Initially I wrote a `buf_writer` class but felt like such a wrapper around a slice/span was annoying to use. As such I've opted for freestanding functions which take slices as mutable references and "advance" them (offset the start) whenever they're read from or written to. I'm not particularly happy with the design but they do the job. Related to #8000 Fixes #4551 Fixes #7589 Fixes #8663 ## Validation Steps Performed * Unit and feature tests ✅ * Far Manager ✅ * Fixes test cases in #4551, #7589 and #8663
1 parent cf87590 commit 599b550

23 files changed

+603
-1224
lines changed

src/host/dbcs.cpp

-74
Original file line numberDiff line numberDiff line change
@@ -177,77 +177,3 @@ BOOL IsAvailableEastAsianCodePage(const UINT uiCodePage)
177177
return false;
178178
}
179179
}
180-
181-
_Ret_range_(0, cbAnsi)
182-
ULONG TranslateUnicodeToOem(_In_reads_(cchUnicode) PCWCHAR pwchUnicode,
183-
const ULONG cchUnicode,
184-
_Out_writes_bytes_(cbAnsi) PCHAR pchAnsi,
185-
const ULONG cbAnsi,
186-
_Out_ std::unique_ptr<IInputEvent>& partialEvent)
187-
{
188-
const auto& gci = ServiceLocator::LocateGlobals().getConsoleInformation();
189-
const auto TmpUni = new (std::nothrow) WCHAR[cchUnicode];
190-
if (TmpUni == nullptr)
191-
{
192-
return 0;
193-
}
194-
195-
memcpy(TmpUni, pwchUnicode, cchUnicode * sizeof(WCHAR));
196-
197-
CHAR AsciiDbcs[2];
198-
AsciiDbcs[1] = 0;
199-
200-
ULONG i, j;
201-
for (i = 0, j = 0; i < cchUnicode && j < cbAnsi; i++, j++)
202-
{
203-
if (IsGlyphFullWidth(TmpUni[i]))
204-
{
205-
const auto NumBytes = sizeof(AsciiDbcs);
206-
ConvertToOem(gci.CP, &TmpUni[i], 1, (LPSTR)&AsciiDbcs[0], NumBytes);
207-
if (IsDBCSLeadByteConsole(AsciiDbcs[0], &gci.CPInfo))
208-
{
209-
if (j < cbAnsi - 1)
210-
{ // -1 is safe DBCS in buffer
211-
pchAnsi[j] = AsciiDbcs[0];
212-
j++;
213-
pchAnsi[j] = AsciiDbcs[1];
214-
AsciiDbcs[1] = 0;
215-
}
216-
else
217-
{
218-
pchAnsi[j] = AsciiDbcs[0];
219-
break;
220-
}
221-
}
222-
else
223-
{
224-
pchAnsi[j] = AsciiDbcs[0];
225-
AsciiDbcs[1] = 0;
226-
}
227-
}
228-
else
229-
{
230-
ConvertToOem(gci.CP, &TmpUni[i], 1, &pchAnsi[j], 1);
231-
}
232-
}
233-
234-
if (AsciiDbcs[1])
235-
{
236-
try
237-
{
238-
auto keyEvent = std::make_unique<KeyEvent>();
239-
if (keyEvent.get())
240-
{
241-
keyEvent->SetCharData(AsciiDbcs[1]);
242-
partialEvent.reset(static_cast<IInputEvent* const>(keyEvent.release()));
243-
}
244-
}
245-
catch (...)
246-
{
247-
LOG_HR(wil::ResultFromCaughtException());
248-
}
249-
}
250-
251-
delete[] TmpUni;
252-
return j;
253-
}

src/host/dbcs.h

-7
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,3 @@ bool IsDBCSLeadByteConsole(const CHAR ch, const CPINFO* const pCPInfo);
2929
BYTE CodePageToCharSet(const UINT uiCodePage);
3030

3131
BOOL IsAvailableEastAsianCodePage(const UINT uiCodePage);
32-
33-
_Ret_range_(0, cbAnsi)
34-
ULONG TranslateUnicodeToOem(_In_reads_(cchUnicode) PCWCHAR pwchUnicode,
35-
const ULONG cchUnicode,
36-
_Out_writes_bytes_(cbAnsi) PCHAR pchAnsi,
37-
const ULONG cbAnsi,
38-
_Out_ std::unique_ptr<IInputEvent>& partialEvent);

src/host/directio.cpp

+7-61
Original file line numberDiff line numberDiff line change
@@ -169,75 +169,21 @@ void EventsToUnicode(_Inout_ std::deque<std::unique_ptr<IInputEvent>>& inEvents,
169169
LockConsole();
170170
auto Unlock = wil::scope_exit([&] { UnlockConsole(); });
171171

172-
std::deque<std::unique_ptr<IInputEvent>> partialEvents;
173-
if (!IsUnicode)
174-
{
175-
if (inputBuffer.IsReadPartialByteSequenceAvailable())
176-
{
177-
partialEvents.push_back(inputBuffer.FetchReadPartialByteSequence(IsPeek));
178-
}
179-
}
180-
181-
size_t amountToRead;
182-
if (FAILED(SizeTSub(eventReadCount, partialEvents.size(), &amountToRead)))
183-
{
184-
return STATUS_INTEGER_OVERFLOW;
185-
}
186-
std::deque<std::unique_ptr<IInputEvent>> readEvents;
187-
auto Status = inputBuffer.Read(readEvents,
188-
amountToRead,
189-
IsPeek,
190-
true,
191-
IsUnicode,
192-
false);
172+
const auto Status = inputBuffer.Read(outEvents,
173+
eventReadCount,
174+
IsPeek,
175+
true,
176+
IsUnicode,
177+
false);
193178

194179
if (CONSOLE_STATUS_WAIT == Status)
195180
{
196-
FAIL_FAST_IF(!(readEvents.empty()));
197181
// If we're told to wait until later, move all of our context
198182
// to the read data object and send it back up to the server.
199183
waiter = std::make_unique<DirectReadData>(&inputBuffer,
200184
&readHandleState,
201185
eventReadCount,
202-
std::move(partialEvents));
203-
}
204-
else if (SUCCEEDED_NTSTATUS(Status))
205-
{
206-
// split key events to oem chars if necessary
207-
if (!IsUnicode)
208-
{
209-
try
210-
{
211-
SplitToOem(readEvents);
212-
}
213-
CATCH_LOG();
214-
}
215-
216-
// combine partial and readEvents
217-
while (!partialEvents.empty())
218-
{
219-
readEvents.push_front(std::move(partialEvents.back()));
220-
partialEvents.pop_back();
221-
}
222-
223-
// move events over
224-
for (size_t i = 0; i < eventReadCount; ++i)
225-
{
226-
if (readEvents.empty())
227-
{
228-
break;
229-
}
230-
outEvents.push_back(std::move(readEvents.front()));
231-
readEvents.pop_front();
232-
}
233-
234-
// store partial event if necessary
235-
if (!readEvents.empty())
236-
{
237-
inputBuffer.StoreReadPartialByteSequence(std::move(readEvents.front()));
238-
readEvents.pop_front();
239-
FAIL_FAST_IF(!(readEvents.empty()));
240-
}
186+
std::move(outEvents));
241187
}
242188
return Status;
243189
}

0 commit comments

Comments
 (0)