Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parse(Complex{T}, string) #22250

Closed
stevengj opened this issue Jun 6, 2017 · 6 comments · Fixed by #24713
Closed

add parse(Complex{T}, string) #22250

stevengj opened this issue Jun 6, 2017 · 6 comments · Fixed by #24713
Labels
complex Complex numbers strings "Strings!"

Comments

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

As @ChrisRackauckas points out on discourse, this shouldn't really be done in a package because Base.parse and all of the argument types are "owned" by Base.

Would also be nice not to have to use horribly inefficient techniques involving eval; see also #21935.

Should be straightforward to implement. The main question in my mind is how permissive it should be:

  • Should it allow arbitrary whitespace between the real and imaginary parts? e.g. "3+4im" and 3 + 4im both allowed?
  • Should it require im, or accept the common variants im, i, j, and I?
  • Should it require 3+4im, or also allow 3+4*im?
  • Should it support the Fortran style (real,imag)? (Probably not, since we don't support Fortran-style real literals 1.0D+00 either.)
@stevengj stevengj added the strings "Strings!" label Jun 6, 2017
@ararslan
Copy link
Member

ararslan commented Jun 6, 2017

IMO:

  • Allow arbitrary whitespace
  • Require im, since we're parsing out a Julia value
  • Don't allow 4*im (we don't allow + for example in parse(Int, "3+4"))
  • No Fortran style

@ararslan ararslan added the complex Complex numbers label Jun 6, 2017
@ttparker
Copy link

ttparker commented Jun 6, 2017

I agree with @ararslan on all points, although I don't feel strongly about whether * should be allowed before im. Another question is whether the real part should be required to come first, or e.g. 3im - 2 should be allowed. I think either order should be allowed, but I don't feel strongly.

If arbitrary whitespace doesn't end up allowed between the real and imaginary parts, then spaces should be required around the + or - sign (e.g. 2 + 3im rather than 2+3im), since that's Julia's default output format.

@stevengj
Copy link
Member Author

stevengj commented Jun 6, 2017

I'm inclined to support i and j, at least, since those are common in text datafiles, e.g. in Matlab and Python csv files, and supporting these variations should have almost no cost in code or performance.

@stevengj
Copy link
Member Author

stevengj commented Jun 6, 2017

I'm reluctant to support 4im+3, but Matlab supports purely imaginary numbers 7i and it might be worth supporting this too, since it should certainly support purely real values.

@stevengj
Copy link
Member Author

stevengj commented Jun 6, 2017

Here's a possible implementation that supports whitespace, purely real values, and i/j/im suffixes. It seems to be about 1000x faster than eval(parse(s)):

import Base: tryparse, parse

function tryparse(::Type{Complex{T}}, s::String) where {T<:Real}
    # skip initial whitespace
    i = start(s)
    e = endof(s)
    while i  e && isspace(s[i])
        i = nextind(s, i)
    end
    i > e && return Nullable{Complex{T}}()
    
    # find index of ± separating real/imaginary parts (if any)
    i₊ = search(s, ('+','-'), i)
    if i₊ == i # leading ± sign
        i₊ = search(s, ('+','-'), i₊+1)
    end
    if i₊ != 0 && s[i₊-1] in ('e','E','f') # exponent sign
        i₊ = search(s, ('+','-'), i₊+1)
    end
    if i₊ == 0 # purely real value
        return Nullable{Complex{T}}(tryparse(T, s))
    end
    
    # find trailing im/i/j
    iᵢ = rsearch(s, ('m','i','j'), e)
    iᵢ < i₊ && return Nullable{Complex{T}}()
    if s[iᵢ] == 'm' # im
        iᵢ -= 1
        s[iᵢ] == 'i' || return Nullable{Complex{T}}()
    end
    isdigit(s[iᵢ-1]) || return Nullable{Complex{T}}()
    
    # parse real part
    re = tryparse(T, SubString(s, i, i₊-1))
    isnull(re) && return Nullable{Complex{T}}()
    
    # parse imaginary part
    im = tryparse(T, SubString(s, i₊+1, iᵢ-1))
    isnull(im) && return Nullable{Complex{T}}()
    
    return Nullable{Complex{T}}(Complex{T}(get(re), s[i₊]=='-' ? -get(im) : get(im)))
end

# the ±1 indexing above for ascii chars is specific to String, so convert:
tryparse(T::Type{<:Complex}, s::AbstractString) = tryparse(T, String(s))

# can be merged with parse(::Type{<:AbstractFloat}, s::AbstractString):
function parse(::Type{T}, s::AbstractString) where T<:Complex
    result = tryparse(T, s)
    if isnull(result)
        throw(ArgumentError("cannot parse $(repr(s)) as $T"))
    end
    return unsafe_get(result)
end

@stevengj
Copy link
Member Author

stevengj commented Jun 6, 2017

(Could be a bit faster if we implement specialized search and rsearch methods for a tuple of ASCII chars, since in this case we can search bytes rather than chars. But the main speedup will come when things like the SubString object can be stack-allocated.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complex Complex numbers strings "Strings!"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants