Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add format converter #96

Open
hhaensel opened this issue Jun 14, 2023 · 13 comments
Open

add format converter #96

hhaensel opened this issue Jun 14, 2023 · 13 comments

Comments

@hhaensel
Copy link
Contributor

Currently, only smiles and sdf format are supported for molecule generation. For export, only v2 sdf is in place.

I submitted a #95 to generate sdf files and molecules from inchi strings.

Furthermore it would be fantastic to include OpenBabel, perhaps via a new package OpenBabel.jl and package extensions?
@mojaie Tell me what you think.

@Boxylmer
Copy link
Contributor

This would be an incredible addition to the library. consider this a +1 for a separate package + compatibility to it in MolecularGraph. Since OpenBabel is written in C++ and (I think) has an associated DLL/SO, could this be hosted as an artifact for the potential OpenBabel.jl? Or are you intending on making installing the program a prerequisite?

@timholy
Copy link
Contributor

timholy commented Jun 14, 2023

It's already available with https://github.com/JuliaBinaryWrappers/openbabel_jll.jl. That's a low-level wrapper, someone might want to create OpenBabel.jl to give it a more "Julian" interface. But having the JLL already built is a huge head start.

@hhaensel
Copy link
Contributor Author

Thanks for the hint. On the first glance, I can only handle the executables that come with the openbabel_jll.
With openbabel_jll installed I can do

using openbabel_jll

ENV["BABEL_LIBDIR"] = readdir(joinpath(dirname(dirname(openbabel_jll.obabel_path)), "lib", "openbabel"), join = true)[1]
babel_cmd = openbabel_jll.obabel();

rstrip(read(`$babel_cmd "-:CCO" -ismi -omol --gen2D`, String))

However the time for the conversion command turns out to be the double of what I achieved by simply extracting the files from the binary OpenBabel distribution

babel_cmd = raw"C:\Temp\Openbabel\obabel"
rstrip(read(`$babel_cmd "-:CCO" -ismi -omol --gen2D`, String))

where I simply extract the files from the distributed OpenBabel binary.
That's a bit strange.

Unfortunately, I don't have a clue, how to access libopenbabel directly. Would I need to use Cxx.jl or CxxWrap.jl as openbabel is written in C++? Up to now I only did some ccalls into well documented C libraries.
Any hint is welcome.

@timholy
Copy link
Contributor

timholy commented Jun 15, 2023

I've not written a high-level wrapper for a JLL before, so you should read docs or ask others. But my understanding is that you won't need to use backticks anywhere, that everything will be a direct call. This highlighted line suggests some of the things you should be able to do: https://github.com/JuliaBinaryWrappers/openbabel_jll.jl/blob/ad1a8e44175f44bd378afdb43f1896ff05161bda/src/wrappers/x86_64-linux-gnu-cxx11.jl#L2

@hhaensel
Copy link
Contributor Author

Unfortunately, all of these functions are indeed executable in julia but they do nothing else than returning a setenv with the binary executables together with an environment.
I guess one has to use a higher-level library that is written by the help of Swig or rewrite small pieces of code with Cxx.jl.

@timholy
Copy link
Contributor

timholy commented Jun 15, 2023

I'd strongly recommend checking in with the #binarybuilder channel on Slack before going to any effort.

@hhaensel
Copy link
Contributor Author

hhaensel commented Jun 15, 2023

Meanwhile I think the most effective solution is to use a Python wrapper, e.g.

using PyCall

ob = pyimport("openbabel.openbabel")

function babel(input, informat = "smi", outformat = "mol")
    conv = ob.OBConversion()
    conv.SetInAndOutFormats("smi", "mol")
    
    mol = ob.OBMol()
    conv.ReadString(mol, input)
    
    if ! mol.Has2D() && ! mol.Has3D()
        pgen = ob.OBOp.FindType("gen2D")
        pgen !== nothing && pgen.Do(mol)
    end

    conv.WriteString(mol)
end
julia> @time babel("CCO") |> println

 OpenBabel06162301242D

  3  2  0  0  0  0  0  0  0  0999 V2000
    1.7321   -0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8660    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0000   -0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END

  0.005649 seconds (62 allocations: 2.906 KiB)

@mojaie
Copy link
Owner

mojaie commented Jun 15, 2023

Thank you for the great suggestions.
It would be nice to create new library OpenBabel.jl with Julian interface. It would be easy to keep compatibility with MolecularGraph.jl.
FYI, coordgenlibs_jll used in MolecularGraph.jl is a fork of coordgenlibs written in C++ which has additional C interfaces enables its use in Julia's ccall.

@hhaensel
Copy link
Contributor Author

I'll have a look that sounds interesting.
Did you have a look at the PR #95? That should work right away, although full functionality would only come with an upgrade of libinchi.
What do you think about the

@mojaie
Copy link
Owner

mojaie commented Jun 16, 2023

Yes, it looks great. It will be merged soon.
Also, thank you for fixing the Windows issue. I was wondering why CI has been failing on only Windows!

@timoleistner
Copy link

timoleistner commented Nov 16, 2023

There is also the RDKitMinimalLib wrapper which has a different mol object of course.
A function which translates these mol objects into MolecularGraph mol objects would solve #72 and #67 and would be an awesome addition to this package.

Edit:
The simplest solution would be to use something like smiles as a common language.
E.g. use get_mol() and get_smiles() functions from RDKitMinimalLib to obtain a SMILES from e.g. a molblock and then use smilestomol() from MolecularGraph to obtain the mol object.

smilestomol(get_smiles(get_mol(molblock)))

But this would lose spatial information from the molblock...

Edit2: Instead of SMILES one should at least use InChI as SMILES is not an open standard and different SMILES specifications exist.

@timoleistner
Copy link

timoleistner commented Feb 5, 2024

Like mentioned here #23 (comment) openbabel_jll provides an ExecutableProduct.
It also providest two LibraryProducts libinchiand libopenbabel.
One can see all products available using ? openbabel_jll which returns the following products:

  Products
  ========

  The code bindings within this package are autogenerated from the following Products:

    •  LibraryProduct: libinchi

    •  LibraryProduct: libopenbabel

    •  ExecutableProduct: obabel

    •  ExecutableProduct: obconformer

    •  ExecutableProduct: obdistgen

    •  ExecutableProduct: obenergy

    •  ExecutableProduct: obfit

    •  ExecutableProduct: obfitall

    •  ExecutableProduct: obgen

    •  ExecutableProduct: obgrep

    •  ExecutableProduct: obminimize

    •  ExecutableProduct: obmm

    •  ExecutableProduct: obprobe

    •  ExecutableProduct: obprop

    •  ExecutableProduct: obrms

    •  ExecutableProduct: obrotamer

    •  ExecutableProduct: obrotate

    •  ExecutableProduct: obspectrophore

    •  ExecutableProduct: obsym

    •  ExecutableProduct: obtautomer

    •  ExecutableProduct: obthermo

    •  ExecutableProduct: roundtrip

Therefore we would be able to call functions using ccall() from libopenbabel, if we would know the functionnames...
Using readelf one can inspect shared library files and look for functions.
readelf -sW ~/.julia/artifacts/f1eb34813a945111198356e33b5f2034cc7990ab/lib/libopenbabel.so
This is still pretty messy and I didn't find a proper documentation for this library.

Edit: OpenBabel API Docs

@hhaensel
Copy link
Contributor Author

For multithreaded apps the above solution crashed. I then came up with

using PythonCall
using ThreadPools

macro pythread(expr)
    quote
        fetch(@tspawnat 1 begin
            $(esc(expr))
        end)
    end
end

function babel_convert(input, informat = "cdxml", outformat = "mol")
    @pythread begin
        ob = pyimport("openbabel.openbabel")
        conv = ob.OBConversion()
        conv.SetInAndOutFormats(informat, outformat)
        
        mol = ob.OBMol()
        conv.ReadString(mol, input)
        
        if ! Bool(mol.Has2D()) && ! Bool(mol.Has3D())
            pgen = ob.OBOp.FindType("gen2D")
            pgen !== nothing && pgen.Do(mol)
        end

        pyconvert(String, conv.WriteString(mol))
    end
end

I also placed this on discourse and on JuliaPy/PythonCall.jl#201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants