Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCRE.exec error during REPL startup error with non-Unicode locales #27239

Closed
alkorang opened this issue May 23, 2018 · 25 comments
Closed

PCRE.exec error during REPL startup error with non-Unicode locales #27239

alkorang opened this issue May 23, 2018 · 25 comments
Labels
system:windows Affects only Windows

Comments

@alkorang
Copy link
Contributor

With 32bit version on Windows 10:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.5161 (2018-05-22 16:48 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit c9583e76d5* (1 day old master)
|__/                   |  i686-w64-mingw32

ERROR: PCRE.exec error: UTF-8 error: isolated byte with 0x80 bit set
Stacktrace:
 [1] error at .\error.jl:33 [inlined]
 [2] exec at .\pcre.jl:137 [inlined]
 [3] match(::Regex, ::String, ::Int32, ::UInt32) at .\regex.jl:197
 [4] match at .\regex.jl:195 [inlined]
 [5] match at .\regex.jl:210 [inlined]
 [6] hist_from_file(::REPL.REPLHistoryProvider, ::IOStream, ::String) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:407
 [7] setup_interface(::REPL.LineEditREPL, ::Bool, ::Array{Dict{Any,Any},1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:845
 [8] #setup_interface#49(::Bool, ::Array{Dict{Any,Any},1}, ::Function, ::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [9] setup_interface(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [10] (::getfield(Pkg3, Symbol("##1#2")))(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\Pkg3\src\Pkg3.jl:58
 [11] __atreplinit(::REPL.LineEditREPL) at .\client.jl:309
 [12] (::getfield(Base, Symbol("#inner#2")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},typeof(Base.__atreplinit),Tuple{REPL.LineEditREPL}})() at .\essentials.jl:667
 [13] #invokelatest#1 at .\essentials.jl:668 [inlined]
 [14] invokelatest at .\essentials.jl:667 [inlined]
 [15] _atreplinit at .\client.jl:316 [inlined]
 [16] (::getfield(Base, Symbol("##875#877")){Bool,Bool,Bool,Bool})(::Module) at .\client.jl:352
 [17] (::getfield(Base, Symbol("#inner#2")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},getfield(Base, Symbol("##875#877")){Bool,Bool,Bool,Bool},Tuple{Module}})() at .\essentials.jl:667
 [18] #invokelatest#1 at .\essentials.jl:668 [inlined]
 [19] invokelatest at .\essentials.jl:667 [inlined]
 [20] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\client.jl:337
 [21] exec_options(::Base.JLOptions) at .\client.jl:275
 [22] _start() at .\client.jl:424

[ Info: Disabling history file for this session
julia> versioninfo()
Julia Version 0.7.0-DEV.5161
Commit c9583e76d5* (2018-05-22 16:48 UTC)
Platform Info:
  OS: Windows (i686-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  WORD_SIZE: 32
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)

julia>

With 64bit version on Windows 10:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.5170 (2018-05-22 22:52 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 6945f24d53* (0 days old master)
|__/                   |  x86_64-w64-mingw32

ERROR: PCRE.exec error: UTF-8 error: isolated byte with 0x80 bit set
Stacktrace:
 [1] error at .\error.jl:33 [inlined]
 [2] exec at .\pcre.jl:137 [inlined]
 [3] match(::Regex, ::String, ::Int64, ::UInt32) at .\regex.jl:197
 [4] match at .\regex.jl:195 [inlined]
 [5] match at .\regex.jl:210 [inlined]
 [6] hist_from_file(::REPL.REPLHistoryProvider, ::IOStream, ::String) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:407
 [7] setup_interface(::REPL.LineEditREPL, ::Bool, ::Array{Dict{Any,Any},1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:845
 [8] #setup_interface#49(::Bool, ::Array{Dict{Any,Any},1}, ::Function, ::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [9] setup_interface(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [10] (::getfield(Pkg3, Symbol("##1#2")))(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Pkg3\src\Pkg3.jl:58
 [11] __atreplinit(::REPL.LineEditREPL) at .\client.jl:309
 [12] (::getfield(Base, Symbol("#inner#2")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},typeof(Base.__atreplinit),Tuple{REPL.LineEditREPL}})() at .\essentials.jl:667
 [13] #invokelatest#1 at .\essentials.jl:668 [inlined]
 [14] invokelatest at .\essentials.jl:667 [inlined]
 [15] _atreplinit at .\client.jl:316 [inlined]
 [16] (::getfield(Base, Symbol("##878#880")){Bool,Bool,Bool,Bool})(::Module) at .\client.jl:352
 [17] (::getfield(Base, Symbol("#inner#2")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},getfield(Base, Symbol("##878#880")){Bool,Bool,Bool,Bool},Tuple{Module}})() at .\essentials.jl:667
 [18] #invokelatest#1 at .\essentials.jl:668 [inlined]
 [19] invokelatest at .\essentials.jl:667 [inlined]
 [20] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\client.jl:337
 [21] exec_options(::Base.JLOptions) at .\client.jl:275
 [22] _start() at .\client.jl:424

[ Info: Disabling history file for this session
julia> versioninfo()
Julia Version 0.7.0-DEV.5170
Commit 6945f24d53* (2018-05-22 22:52 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)

julia>

I deleted %USERPROFILE%\.julia_history yet same error occurs.

@honghuzi
Copy link

Same error. Deleting %USERPROFILE%.julia will help, but after installing any package, you will get the same error.

@fredrikekre
Copy link
Member

In v0.7 the history file is located at ~/.julia/logs/repl_history.jl. What happens if you remove that file instead?

@fredrikekre
Copy link
Member

Perhaps one of the commits in #27189 is responsible @Keno ?

@alkorang
Copy link
Contributor Author

alkorang commented May 25, 2018

In v0.7 the history file is located at ~/.julia/logs/repl_history.jl. What happens if you remove that file instead?

When I delete %USERPROFILE%\.julia\logs\repl_history.jl it worked fine. The problem occurs when the file is not empty, for example:

# time: 2018-05-25 10:46:14 대한민국 표준시
# mode: julia
	versioninfo()

And there is non-ascii characters 대한민국 표준시, and the default encoding is EUC-KR on Windows 10 K. When I saved it into UTF-8 using notepad, no error occurs. But another problem is that when Julia writes into repl_history.jl, it does not convert a string into UTF-8:

# time: 2018-05-25 10:46:14 대한민국 표준시
# mode: julia
	versioninfo()
# time: 2018-05-25 10:52:27 իȑڎѹ ǥ�ރ
# mode: julia
	a =  1

When I changed the file into UTF-16, different error occurs.

UTF-16 LE:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.5182 (2018-05-23 18:29 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit db158b85b7* (1 day old master)
|__/                   |  i686-w64-mingw32

ERROR: Invalid history file (C:\Users\alkorang\.julia\logs\repl_history.jl) format:
If you have a history file left over from an older version of Julia,
try renaming or deleting it.
Invalid character: '\xff' at line 1
Stacktrace:
 [1] error(::String, ::String, ::String, ::Int32) at .\error.jl:42
 [2] hist_from_file(::REPL.REPLHistoryProvider, ::IOStream, ::String) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:404
 [3] setup_interface(::REPL.LineEditREPL, ::Bool, ::Array{Dict{Any,Any},1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:845
 [4] #setup_interface#49(::Bool, ::Array{Dict{Any,Any},1}, ::Function, ::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [5] setup_interface(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [6] (::getfield(Pkg3, Symbol("##1#2")))(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\Pkg3\src\Pkg3.jl:58
 [7] __atreplinit(::REPL.LineEditREPL) at .\client.jl:309
 [8] #invokelatest#1 at .\essentials.jl:667 [inlined]
 [9] invokelatest at .\essentials.jl:666 [inlined]
 [10] _atreplinit at .\client.jl:316 [inlined]
 [11] (::getfield(Base, Symbol("##878#880")){Bool,Bool,Bool,Bool})(::Module) at .\client.jl:352
 [12] #invokelatest#1 at .\essentials.jl:667 [inlined]
 [13] invokelatest at .\essentials.jl:666 [inlined]
 [14] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\client.jl:337
 [15] exec_options(::Base.JLOptions) at .\client.jl:275
 [16] _start() at .\client.jl:424

[ Info: Disabling history file for this session
julia>

UTF-16 BE:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.5182 (2018-05-23 18:29 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit db158b85b7* (1 day old master)
|__/                   |  i686-w64-mingw32

ERROR: Invalid history file (C:\Users\alkorang\.julia\logs\repl_history.jl) format:
If you have a history file left over from an older version of Julia,
try renaming or deleting it.
Invalid character: '\xfe' at line 1
Stacktrace:
 [1] error(::String, ::String, ::String, ::Int32) at .\error.jl:42
 [2] hist_from_file(::REPL.REPLHistoryProvider, ::IOStream, ::String) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:404
 [3] setup_interface(::REPL.LineEditREPL, ::Bool, ::Array{Dict{Any,Any},1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:845
 [4] #setup_interface#49(::Bool, ::Array{Dict{Any,Any},1}, ::Function, ::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [5] setup_interface(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:756
 [6] (::getfield(Pkg3, Symbol("##1#2")))(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win32\build\usr\share\julia\stdlib\v0.7\Pkg3\src\Pkg3.jl:58
 [7] __atreplinit(::REPL.LineEditREPL) at .\client.jl:309
 [8] #invokelatest#1 at .\essentials.jl:667 [inlined]
 [9] invokelatest at .\essentials.jl:666 [inlined]
 [10] _atreplinit at .\client.jl:316 [inlined]
 [11] (::getfield(Base, Symbol("##878#880")){Bool,Bool,Bool,Bool})(::Module) at .\client.jl:352
 [12] #invokelatest#1 at .\essentials.jl:667 [inlined]
 [13] invokelatest at .\essentials.jl:666 [inlined]
 [14] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\client.jl:337
 [15] exec_options(::Base.JLOptions) at .\client.jl:275
 [16] _start() at .\client.jl:424

[ Info: Disabling history file for this session
julia>

In summary, the problem is that Julia does not support other encodings except UTF-8, so it cannot handle different locales which are default on Windows.

@Keno
Copy link
Member

Keno commented May 25, 2018

We clearly need to save the history file as UTF-8.

@alkorang
Copy link
Contributor Author

alkorang commented May 25, 2018

I just realized that this problem does not happen with older version. The reason is Julia simply ignores first two lines of each history elements.

δ = 1

When I run the code above in 0.6.2 and open .julia_history as EUC-KR:

# time: 2018-05-25 11:22:29 대한민국 표준시
# mode: julia
	灌 = 1

When I open it as UTF-8:

# time: 2018-05-25 11:22:29 իȑڎѹ ǥ�ރ
# mode: julia
	δ = 1

@StefanKarpinski
Copy link
Sponsor Member

The history file should only ever be UTF-8. We only write it as such. Did you open it in an editor that might have changed the encoding?

@Keno
Copy link
Member

Keno commented May 25, 2018

From the second example, it looks like the timezone code is getting saved as EUC-KR. Where are we getting that from?

@alkorang
Copy link
Contributor Author

alkorang commented May 25, 2018

The history file should only ever be UTF-8. We only write it as such. Did you open it in an editor that might have changed the encoding?

@StefanKarpinski I used Notepad and Notepad++, and they do not change encoding automatically. Using Notepad++ I can specify the encoding to interpret. I guess the problem is saving EUC-KR encoded bytes directly into the file opened as UTF-8.

From the second example, it looks like the timezone code is getting saved as EUC-KR. Where are we getting that from?

@Keno

# time: $(Libc.strftime("%Y-%m-%d %H:%M:%S %Z", time()))

Edit:

julia> Libc.strftime("%Z", time())
"\xb4\xeb\xc7����\xb9 ��\xc1��\xc3"

When I call Libc.strftime("%Z", time()), it returns EUC-KR encoded 대한민국 표준시, which is an invalid UTF-8 byte stream.

I tried "%z" (lower case z) to avoid non-ASCII characters based on http://man7.org/linux/man-pages/man3/strftime.3.html

   %z     The +hhmm or -hhmm numeric timezone (that is, the hour and
          minute offset from UTC). (SU)

   %Z     The timezone name or abbreviation.

But in Windows "%Z" and "%z" are treated as the same.
https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx

%z, %Z
Either the time-zone name or time zone abbreviation, depending on registry settings; no characters if time zone is unknown

@appleparan
Copy link

appleparan commented May 25, 2018

As @alkorang said, main reason of this is "%Z"

I have run following code snippet.

tz = "$(Libc.strftime("%Z", time()))"

f = open("print_STRFTIME.txt", "w")
print(f, "Test 테스트 $tz")
close(f)

f = open("write_STRFTIME.txt", "w")
write(f, "Test 테스트 $tz")
close(f)

Result:

  • without TZ variable : Test 테스트
  • with TZ variable (any file opened with UTF-8 encoding) : Test 테스트 ���ѹα� ǥ�ؽ�
  • with TZ variable (any file opened with EUC-KR encoding) : Test �뀒�뒪�듃 대한민국 표준시

You can also check what difference between print and write.
https://github.com/JuliaLang/julia/blob/master/base/strings/io.jl#L143

I think It could be a Windows bug, or visual C++ bug. It is nonsense because just calling a function unrelated to String changes encoding(It supposed to return UTF-16) . English Windows users may not suffer with this issue.

I wanted to test with different OS, but I'm using English macOS and it works well with the macOS. (result : "Test 테스트 KST")

Let's see if other languages have this issue and how to deal with without any encoding conversion method, because as far as I know, Julia doesn't have conversion function between EUC-KR and UTF-8

@StefanKarpinski
Copy link
Sponsor Member

Environment variables controlling program behavior in unpredictable ways is an unfortunate but widespread "bug by design". It definitely shouldn't work that way but it does and we should handle it better in any case on both ends—on output and on input.

@appleparan
Copy link

appleparan commented May 25, 2018

I was wrong. strftime in LibC is calling strftime in this docs and Windows uses system's codepage encoding if VC doesn't detect Unicode signature (check this SO answer) That's why it converts string to EUC-KR (or CP949)

julia/base/libc.jl

Lines 173 to 179 in deaefef

function strftime(fmt::AbstractString, tm::TmStruct)
timestr = Base.StringVector(128)
n = ccall(:strftime, Int, (Ptr{UInt8}, Int, Cstring, Ref{TmStruct}),
timestr, length(timestr), fmt, tm)
n == 0 && return ""
return String(resize!(timestr,n))
end

  1. Change strftime to wcsftime for windows. (with Cwchar_t)
  2. use transcode for Cwstring or Ptr{Cwchar_t}

can this be a solution?

@alkorang
Copy link
Contributor Author

alkorang commented May 25, 2018

I tried call wcsftime() with Vector{Cwchar_t} and trascode(), and found another problem of transcode().

julia> transcode(String, transcode(Cwchar_t, "E"))
"E"

julia> transcode(String, transcode(Cwchar_t, "한"))
"���"

Edit:
The problem was display() function, not transcode().

julia> println(transcode(String, transcode(Cwchar_t, "한")))
한

@appleparan
Copy link

appleparan commented May 25, 2018

By the way, we need wchar_t * not wchar_t , so please check documentation. It said

For wchar_t* arguments, the Julia type should be Cwstring (if the C routine expects a NUL-terminated string) or Ptr{Cwchar_t} otherwise. Note also that UTF-8 string data in Julia is internally NUL-terminated, so it can be passed to C functions expecting NUL-terminated data without making a copy (but using the Cwstring type will cause an error to be thrown if the string itself contains NUL characters).

Anyway, I'm also confused about transcode and Cwstring. Here is a transcode documentation. transcode accept String and Vector{UIntXX}, but Cwstring is not a Vector. Cwstring seems just bitcast of pointer.

julia/base/c.jl

Lines 126 to 134 in 7144b6b

Cstring(p::Union{Ptr{Int8},Ptr{UInt8},Ptr{Cvoid}}) = bitcast(Cstring, p)
Cwstring(p::Union{Ptr{Cwchar_t},Ptr{Cvoid}}) = bitcast(Cwstring, p)
(::Type{Ptr{T}})(p::Cstring) where {T<:Union{Int8,UInt8,Cvoid}} = bitcast(Ptr{T}, p)
(::Type{Ptr{T}})(p::Cwstring) where {T<:Union{Cwchar_t,Cvoid}} = bitcast(Ptr{Cwchar_t}, p)
convert(::Type{Cstring}, p::Union{Ptr{Int8},Ptr{UInt8},Ptr{Cvoid}}) = Cstring(p)
convert(::Type{Cwstring}, p::Union{Ptr{Cwchar_t},Ptr{Cvoid}}) = Cwstring(p)
convert(::Type{Ptr{T}}, p::Cstring) where {T<:Union{Int8,UInt8,Cvoid}} = Ptr{T}(p)
convert(::Type{Ptr{T}}, p::Cwstring) where {T<:Union{Cwchar_t,Cvoid}} = Ptr{T}(p)

The problem is that I don't know how to convert Cwstring to Vector{UInt16} in Julia. (As far as I know , wchar_t * is UTF-16). Maybe I should start how to convert Cwchar_t to UInt16.

@alkorang
Copy link
Contributor Author

alkorang commented May 25, 2018

The problem is that I don't know how to convert Cwstring to Vector{UInt16} in Julia. (As far as I know , wchar_t * is UTF-16). Maybe I should start how to convert Cwchar_t to UInt16.

@appleparan That's simple: set argument types Ptr{Cwchar_t} and pass Vector{Cwchar_t}, then ccall will automatically covert it into Ptr{Cwchar_t}.

I wrote wcsftime() function but it sometimes returns garbage values at last.

function wcsftime(fmt::AbstractString, tm::Libc.TmStruct)
	wcfmt = transcode(Cwchar_t, fmt)
	wctimestr = zeros(Cwchar_t, 128)
	n = ccall(:wcsftime, Csize_t, (Ptr{Cwchar_t}, Csize_t, Ptr{Cwchar_t}, Ref{Libc.TmStruct}),
	   wctimestr, length(wctimestr), wcfmt, tm)
	n == 0 && return ""
	return transcode(String, wctimestr)
end
julia> println(wcsftime("%Z", Libc.TmStruct(time())))
대한민국 표준시ᶘྰᶸྰ韐࿙뻠࿇᷄ྰ瞠ခᶰྰ᷐ྰ韠࿙뼰࿇ᷜྰ矀ခ᷈ྰᷨྰ㑐༄

julia> println(wcsftime("%Z", Libc.TmStruct(time())))
대한민국 표준시�

julia> println(wcsftime("%Z", Libc.TmStruct(time())))
대한민국 표준시�

julia> println(wcsftime("%Z", Libc.TmStruct(time())))
대한민국 표준시﹀࿣鹠ྰ��ဖ�

julia> println(wcsftime("%Z", Libc.TmStruct(time())))
대한민국 표준시�

@appleparan
Copy link

appleparan commented May 25, 2018

possible to relate with NUL-terminated character? and you don't need to wrap time() with Libc.TmStruct. Add

wcsftime(fmt::AbstractString, t::Real) = wcsftime(fmt, Libc.TmStruct(t))

like strftime.

@alkorang
Copy link
Contributor Author

alkorang commented May 26, 2018

possible to relate with NUL-terminated character?

@appleparan

function wcsftime(fmt::AbstractString, tm::Libc.TmStruct)
	wcfmt = push!(transcode(Cwchar_t, fmt), Cwchar_t(0))
	wctimestr = zeros(Cwchar_t, 128)
	n = ccall(:wcsftime, Csize_t, (Ptr{Cwchar_t}, Csize_t, Ptr{Cwchar_t}, Ref{Libc.TmStruct}),
	   wctimestr, length(wctimestr), wcfmt, tm)
	n == 0 && return ""
	return transcode(String, wctimestr[1:n])
end

I added wcfmt = push!(transcode(Cwchar_t, fmt), Cwchar_t(0)) and it works.

@appleparan
Copy link

We should use Cwstring for NUL-terminated character. See the notes in the documentation. I'm trying to understand relationship between Cwstring and Array{UInt16, 1}

For wchar_t* arguments, the Julia type should be Cwstring (if the C routine expects a NUL-terminated string) or Ptr{Cwchar_t} otherwise.

@alkorang
Copy link
Contributor Author

julia/base/file.jl

Lines 103 to 111 in b5c0cb0

function mkdir(path::AbstractString; mode::Integer = 0o777)
@static if Sys.iswindows()
ret = ccall(:_wmkdir, Int32, (Cwstring,), path)
else
ret = ccall(:mkdir, Int32, (Cstring, UInt32), path, checkmode(mode))
end
systemerror(:mkdir, ret != 0; extrainfo=path)
path
end

@appleparan
Yes, I found out the way how to use Cwstring. Just passing a value of AbstractString and ccall will handle it.

function wcsftime(fmt::AbstractString, tm::Libc.TmStruct)
    wctimestr = Vector{Cwchar_t}(undef, 128)
    n = ccall(:wcsftime, Csize_t, (Ptr{Cwchar_t}, Csize_t, Cwstring, Ref{Libc.TmStruct}),
       wctimestr, length(wctimestr), fmt, tm)
    n == 0 && return ""
    return transcode(String, wctimestr[1:n])
end

@appleparan
Copy link

appleparan commented May 26, 2018

I tweaked your code to adjust Cwstring type.

wcsftime(t) = wstrftime("%c", t)
wcsftime(fmt::AbstractString, t::Real) = wcsftime(fmt, Libc.TmStruct(t))

function wcsftime(fmt::AbstractString, tm::Libc.TmStruct)
	wcfmt = Base.cwstring(fmt)
	wctimestr = Base.cwstring(repeat("0", 128))

	n = ccall(:wcsftime, Csize_t, (Ptr{Cwchar_t}, Csize_t, Ptr{Cwchar_t}, Ref{Libc.TmStruct}),
        wctimestr, length(wctimestr), wcfmt, tm)
	n == 0 && return ""
	return transcode(String, wctimestr[1:n])
end

We need to create pull request. I think we should change function name

  1. This code should only for Windows, Base.cwstring is only for Windows.
  2. However, wcsftime is also available at Linux or other platform

We need a discussion for this, and then, please make pr because your code is uploaded first. @alkorang

@appleparan
Copy link

appleparan commented May 26, 2018

Oh you finished. I think your code is better, use them. I should have not used Cwstring to wctimestr.

@alkorang
Copy link
Contributor Author

We need to create pull request. I think we should change function name

This code should only for Windows, Base.cwstring is only for Windows.
However, wcsftime is also available at Linux or other platform

@appleparan
Thank you for your information!
In C there is wchar.h which provides wchar_t and other functions including wcsftime. So I guess that's why Julia also has Cwchar_t and Cwstring on Linux.

Oh you finished. Your code is better. Use them.

Okay, Again, thank you for your help!

@kshyatt kshyatt added the system:windows Affects only Windows label May 28, 2018
@alkorang alkorang changed the title PCRE.exec error during REPL startup error on Windows PCRE.exec error during REPL startup error with non-Unicode locales May 28, 2018
@alkorang
Copy link
Contributor Author

https://discourse.julialang.org/t/strftime-strptime-bug-27239-is-present-on-all-platforms-not-just-windows/11191/3
I changed the title because according to @ScottPJones, this problem can happen in other platforms with non-UTF-8 locales.

Example of the Mac from the link:

julia> setlocale(lc) = unsafe_string(ccall((:setlocale,"libc"), Cstring,(Cint,Cstring),0,lc))
setlocale (generic function with 1 method)
julia> setlocale("ko_KR.UTF-8")
"ko_KR.UTF-8"

julia> Libc.strftime("%Y-%m-%d %A %H:%M:%S %Z", time())
"2018-05-28 월요일 02:46:27 EDT"

julia> setlocale("ko_KR.CP949")
"ko_KR.CP949"

julia> Libc.strftime("%Y-%m-%d %A %H:%M:%S %Z", time())
"2018-05-28 \xbf\xf9\xbf\xe4\xc0\xcf 02:46:36 EDT"

@kshyatt Could you please remove windows tag?

@m-j-w
Copy link
Contributor

m-j-w commented Jun 28, 2018

I think I'm having the same issue on Windows 10 with Julia beta using a German locale. Any idea how to fix this, yet?

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-beta.0 (2018-06-24 01:32 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

ERROR: PCRE.exec error: UTF-8 error: byte 2 top bits not 0x80
Stacktrace:
 [1] error at .\error.jl:33 [inlined]
 [2] exec at .\pcre.jl:137 [inlined]
 [3] match(::Regex, ::String, ::Int64, ::UInt32) at .\regex.jl:197
 [4] match at .\regex.jl:195 [inlined]
 [5] match at .\regex.jl:210 [inlined]
 [6] hist_from_file(::REPL.REPLHistoryProvider, ::IOStream, ::String) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:408
 [7] setup_interface(::REPL.LineEditREPL, ::Bool, ::Array{Dict{Any,Any},1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:846
 [8] #setup_interface#49(::Bool, ::Array{Dict{Any,Any},1}, ::Function, ::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:757
 [9] setup_interface(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\REPL\src\REPL.jl:757
 [10] (::getfield(Pkg, Symbol("##1#2")))(::REPL.LineEditREPL) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\Pkg\src\Pkg.jl:62
 [11] __atreplinit(::REPL.LineEditREPL) at .\client.jl:312
 [12] #invokelatest#1 at .\essentials.jl:670 [inlined]
 [13] invokelatest at .\essentials.jl:669 [inlined]
 [14] _atreplinit at .\client.jl:319 [inlined]
 [15] (::getfield(Base, Symbol("##843#845")){Bool,Bool,Bool,Bool})(::Module) at .\client.jl:354
 [16] #invokelatest#1 at .\essentials.jl:670 [inlined]
 [17] invokelatest at .\essentials.jl:669 [inlined]
 [18] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\client.jl:339
 [19] exec_options(::Base.JLOptions) at .\client.jl:272
 [20] _start() at .\client.jl:427

[ Info: Disabling history file for this session
julia>

@StefanKarpinski
Copy link
Sponsor Member

The PR that fixes this issue should help: #27273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

8 participants