Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic identification of rings in sisl.Geometry() #688

Open
ialcon opened this issue Feb 22, 2024 · 24 comments
Open

Automatic identification of rings in sisl.Geometry() #688

ialcon opened this issue Feb 22, 2024 · 24 comments

Comments

@ialcon
Copy link

ialcon commented Feb 22, 2024

The idea of this feature would be to have an automated search of rings of a particular number of atoms (that is, squares, pentagons, hexagons, etc) within a sisl.Geometry() object. Such a feature would be extremely useful to further analyze/categorize the inner fragments of a given geometry. For example, say you have a nanoporous graphene structure, such as in this open-access paper (Fig. 1 in https://onlinelibrary.wiley.com/doi/full/10.1002/adfm.202104031), which is formed as a 2D array of (1D) graphene nanoribbons (GNRs). By getting the hexagons within the NPG, one could probably find a relatively simple (and general) manner to classify the atoms within the NPG as belonging to one GNR or the other, which would be something much harder to do (in a general way) without having that prior classification of atoms as within hexagons.

The way that I implemented this thing (outside sisl, of course) was focused exclusively on hexagonal rings in organic systems (i.e. benzene or aryl rings). I find that using the connections (or bonds) information is the most general way to do this. I attach a snipped of that implementation I made, where this is done within a polymer class that I made myself, so the "self" is a polymer instance.
rings_snipped.py.txt

As you will see, this is just 7 for loops over all atomic indices, and for each index we loop over the bonds of that atom (via "for id_3 in self.bonds[id_2]", for example) and we keep this going on until we find a closed 6-atoms ring. Of course, this is not a very efficient implementation, but for small unit cells of a few hundreds of atoms this takes absolutely nothing in a regular laptop. Of course, such implementation would be more problematic for large-scale structures, I guess, but for regular periodic or molecular structures this works fine.

Another important question is how to treat the hexagon (or pentagon, or whatever) structure (i.e. atomic indices): that is, if you treat them as a list of atoms, or as a geometry object in itself, or, if I understand it correctly, as a geometry.category - obviously, the more things we could do with that object the better, however, the complexity of the thing could explode quite easily. But that might be subject for another issue/discussion.

@zerothi
Copy link
Owner

zerothi commented Feb 22, 2024

Thanks for this, indeed we should think about this, and your proposals incorporate ideas from #687 #202 and other friends.

One thing to think about in such a case is whether an atom in a ring, can be part of another ring. But perhaps that can be refined in a post-processing of the categories.

My idea here is to use categories, imagine it could also be used to discover defects 7-5-7 etc.Hmmm. lets keep this in mind with #687

@pfebrer
Copy link
Contributor

pfebrer commented Feb 23, 2024

I have often come up with cases when I don't want a category to tell me if an atom is from that category or not, but instead I want to classify atoms based on some property. E.g. on the number of neighbours they have or, in this case, on the ring they belong to.

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

I have often come up with cases when I don't want a category to tell me if an atom is from that category or not, but instead I want to classify atoms based on some property. E.g. on the number of neighbours they have or, in this case, on the ring they belong to.

Could you elaborate? Couldn't categories be used for this as well, e.g. the neighbour category can tell which atoms has a specific neighbor environment.

@pfebrer
Copy link
Contributor

pfebrer commented Feb 23, 2024

Yes, but what if I have three different environments, and I want an array telling me which environment each atom belongs to.

With the current situation I can do it, but I always have to write some extra code to apply each category separately and then merge the categorizations.

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

That is true.

cat1 = Category(...)
cat2 = Category(...)
(cat1 or cat2).categorize(geometry) # returns only cat1 or cat2
cat1.categorize(geometry) and cat2.categorize(geometry) # returns and or category

the ambiguity of the categories is also that they return them-selves. Which is a bit of a head-ache, you never know if you are using a result category, or a just created category.

A possible solution would be if categorize had an extra argument all=False which ensures that everything gets evaluated.

Then I can foresee that (cat1 | cat2).categorize(...) should return a cat1&cat2 | cat1 | cat2. Or categories should just always deep locate or categories.

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

@pfebrer what do you need from the categories, are they too unflexible, what should we do to make them simpler and, more importantly, intuiitive to use.

@ialcon
Copy link
Author

ialcon commented Feb 23, 2024

First time seeing these categories/categorize in action, so I am bit lost. Just let me mention that for the hexagons example I made earlier, one should be able to have atoms in different hexagons, whatever the way they are classified, because in a GNR or NPG, hexagons share atoms with their neighboring hexagons, so...

besides, is it somewhere explained what these lines do (or are supposed to do)?

cat1 = Category(...)
cat1.categorize(geometry)

Just to understand what all the Category philosophy is about, then probably I may provide better ideas

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

currently, we haven't really publicised them too much...

The idea is that one can categorize an atom based on some criteria.

The actual code can be found here.

For instance one can get all carbon atoms via:

cat_C = sisl.geom.AtomZ(6)

then one can categorize atoms based on that criteria:

result = cat_C.categorize(geometry)
for ia, cat in enumerate(result):
     print(f"atom {ia} is a Carbon atom")

The true gain is when one starts combining categories:

from sisl.geom import *
cat_C = AtomZ(6) # match all Z == 6
neighbor = AtomNeighbor(3, R=(1, 1.44)) # match all atoms with 3 atoms within the [1; 1.44] distance
C_neighbor = cat_C & neighbor
C_neighbor.categorize(geometry)

in this way one can combine pretty complicated requirements to match a certain atom (there are many other options as well).
The above is the basic idea, but they were sort of toys, but they start to be very useful. And hence streamlining how to use them might be good now. They weren't fully exposed since it wasn't clear how their interface would be best implemented.

@ialcon
Copy link
Author

ialcon commented Mar 1, 2024

Just to elaborate a bit more on the ideas we briefly discussed yesterday in Discord. If fragment categories were possible, one could easily do the atom classification that @pfebrer was talking about in #689. For example, and staying within the topic of this issue, we could have a Ring category that could do the following:

from sisl.geom import *
cat_arylRing = Ring('C', 6) # match all 6-membered rings made of carbon atoms
aryl_rings = cat_arylRing.categorize(geometry) # list of all aryl rings in the geometry

# now we may loop through all aryl rings and get associated info from 
# each category result (i.e. each aryl_ring instance)
for aryl in aryl_rings :
    print(aryl.atom_ids) # e.g.: 0, 4, 5, 8, 1, 6
    print(aryl.id)       # e.g.: 4
    print(aryl.dihedral) # e.g.: 55.7 degs

Here of course I've gone a bit too far with the possible attributes that the category results from Ring could have, but just to display a bit the idea and potential that this could have, and extending the idea from #687. Note that within the Ring category we could obviously use AtomCategorys such as AtomZ, AtomNeighbors or any other - these would just enormously help simplifying the internal format of any fragmentCategory. Finally, also note that virtually any fragment category would be possible - one could have a metal_surface category, a phthalocyanine category, a GNR, NPG, etc etc. It is true though that this could become quite complex very rapidly, but since fragmentCategory would not affect at all the more fundamental AtomCategory, possible dependency issues arising from such complexity would uniquely affect fragmentCategorys.

I perfectly understand if you think this is a bit too much/far for now. But I felt this could be a possible way to allow categories to go a bit crazier and do significantly more complex things than simply identifying atoms with a particular characteristic.

@zerothi
Copy link
Owner

zerothi commented Apr 13, 2024

A note on the implementation here, my guess is that this would be much faster and simpler by using Voronoi diagrams of coordinates to extract probable positions of rings.
For instance see here:

import sisl as si
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial import Voronoi, voronoi_plot_2d

graphene = si.geom.graphene() * (5, 5, 1)

vor = Voronoi(graphene.xyz[:, :2])
fig = voronoi_plot_2d(vor)
plt.show()

this might be much simpler to parse, since with the Voronoi points one can use a single R and close to figure out how many neighbours, then their relative position to the center of the Voronoi point would be easy.

Note, that this can easily be generalized for N neighbors, triangles, square, pentagons, hexagons, ...

@ialcon
Copy link
Author

ialcon commented Apr 22, 2024

hNPG_str.fdf.txt
This certainly looks like a potential approach. However, I've found some issues when testing it with a hybrid-NPG structure, like the one I am attaching (these issues may also apply to other NPGs or carbon nanostructures). If you try your script but reading the NPG structure, you'll see that it creates Voronoi points to locations where there are no aryl rings (e.g. in the pores of the NPG, or even at edges where hydrogen atoms form a "semi-circle" with carbon atoms). Even, it seems that multple Voronoi points may end up very close, which I guess it is because scipy finds multiple rings of points (of increasing size) centered at the same point (this is just a guess, maybe there is another reason..). Some of these issues could be circumvented by, for instance, providing the coordinates of the atoms that are relevant for finding Rings, such as carbon or nitrogen atoms (hydrogen atoms would be omitted). But even then some issues would remain, such as getting Voronoi points at the NPG pores, or having multiple Voronoi points in the same coordinate, or very close to each other.

I am not sure if these issues could be handled by having specific inputs to the Voronoi function (e.g. through qhull_options key) or via some postprocessing of Voronoi points within sisl, to retain only those Voronoi points associated to rings of a certain radius, and disregard the other...

@zerothi
Copy link
Owner

zerothi commented Apr 22, 2024

thanks for checking this out @ialcon !

My approach would be something like this:

  1. create the inital Voronoi points
  2. join Voronoi points that are close to each other (e.g. ith slightly shifted coordinates you'll get >1 Voronoi points in an aryl ring, as you describe), there has to be some atol=0.2 (or so) that joins them, then I guess the most natural thing would be to take the average position of them to form a new Voronoi point
  3. Once Voronoi points are joined, I would re-search all of them for neighbours. Here as you say a radius here the points are located is necessary. How exactly that should be done isn't fully clear to me, ideally one should search in the space that is defined between the single Voronoi point and the neighboring Voronoi points. If one uses the other Voronoi points, then possibly you don't need a radius parameter.
  4. Lastly, once the atoms are grouped, one should likely assert that the selection of atoms is indeed an aryl ring, and not something weird. I.e. consider the case with a bilayer structure, then you might find 12 atoms in step 3, but here you would narrow it down to the rings in the plane, hmm it might be that Voronoi also solves this problem... Hmm...

The above I think could be pretty generic.

@ialcon
Copy link
Author

ialcon commented Apr 22, 2024

I find some issues though... First, as you may see with the NPG example, one also gets points of "rings" that are not really aryl rings - the nanopore in NPG is the most clear example of that. Therefore, there should be a way to discard the Voronoi points that do not belong to the targeted Rings (in my case it would be aryl rings). I believe the most effective way is via an input radius (or similarly named) keyword that would defined a threshold distance from the Voronoi coordinate below which all atoms should sit. So, if for an aryl ring this is ca. 1.4 A, looping through all Voronoi points in your point 3 would automatically discard the nanopores in NPG (e.g. if we set number_neighbours=6 and radius=1.4). Besides, the radius keyword could also accept two distances, similarly as the Hamiltonian.construct option - though this might not be really useful here... I don't know..

So, I think that providing a radius is more desirable than simply comparing the coordinates between Voronoi points (as you suggested in point 3) - because the latter would not allow to distinguish between, say, aryl rings and nanopores (to disregard the latter, in my case).

With regards to the bi-layer, some questions here:
a) May we use this methodology to find Rings by providing 3D structures? Note that this would be required for the bi-layer, as only providing xy coordinates as in vor = Voronoi(graphene.xyz[:, :2]) would not work (both layers would be superimposed..). So, either we provide also the z coordinate via vor = Voronoi(graphene.xyz) or we do this procedure per-layer:

bilayerGr = geom.bilayer(..., bottom_atoms=C, top_atoms=C)
top_layer = bilayerGr.top
bottom_layer = bilayerGr.bottom
# GET RINGS
vor_top = Voronoi(top_layer.xyz[:, :2])
vor_bottom = Voronoi(bottom_layer.xyz[:, :2])
...

b) Realize that if we provide the 3D-coordinates, we will get Voronoi points between the two graphene layers. So these out-of-plane Voronoi points should also be discarded somehow. This could be done via a fine-tuned radius, or via searching for Rings separately for different parts of the structure, as done with vor_top and vor_bottom.

So, to me, I find that some inputs should be provided by the user because, otherwise, it would be hard for sisl to know which are the "rings" that are relevant and which are not. Of course, sisl could provide everything and then the user filter what he/she was really looking for. However, it might also be helpful that the user may provide some inputs to the sisl.ring_finder so that it directly gets the filtered targeted Rings from sisl.

@pfebrer
Copy link
Contributor

pfebrer commented Apr 22, 2024

Another approach to find regular polygons, which I think would be more straightforward but maybe slower would be to:

  • Find all neighbors with the neighbor finder.
  • Build a graph.
  • Use a established algorithm to find loops of length X (e.g. X=6 for hexagons) like this one by networkx.
  • Find out if those loops are really a regular polygon.

@zerothi
Copy link
Owner

zerothi commented Apr 22, 2024

I find some issues though... First, as you may see with the NPG example, one also gets points of "rings" that are not really aryl rings - the nanopore in NPG is the most clear example of that. Therefore, there should be a way to discard the Voronoi points that do not belong to the targeted Rings (in my case it would be aryl rings). I believe the most effective way is via an input radius (or similarly named) keyword that would defined a threshold distance from the Voronoi coordinate below which all atoms should sit. So, if for an aryl ring this is ca. 1.4 A, looping through all Voronoi points in your point 3 would automatically discard the nanopores in NPG (e.g. if we set number_neighbours=6 and radius=1.4). Besides, the radius keyword could also accept two distances, similarly as the Hamiltonian.construct option - though this might not be really useful here... I don't know..

Well, there are multiple use-cases. :)

I think the Ring classe should basically do as we have suggested, i.e. find the ring. The hole is a ring ;) It is just not an Aryl ring. :)
I think it being generic here has a huge benifit, being able to find the atoms in the pores is very nice! :)

What I think is important, is if this method finds all aryl rings and other rings as well. In this case I think it does exactly what it is supposed to. :)

That is why I suggest to first use the method to create a base-class that encapsulates the ring property (this includes pores etc.) Then later, more specialized, ring finders can use the atoms on the rings to figure out if they are rings or not, and discard them if not.

For instance, a Pore class would remove all rings that are not pores ;)
Both things would be nice to have IMHO ;)
And to me it seems that all functionality is covered by this Voronoi tessellation.

So, I think that providing a radius is more desirable than simply comparing the coordinates between Voronoi points (as you suggested in point 3) - because the latter would not allow to distinguish between, say, aryl rings and nanopores (to disregard the latter, in my case).

With regards to the bi-layer, some questions here: a) May we use this methodology to find Rings by providing 3D structures? Note that this would be required for the bi-layer, as only providing xy coordinates as in vor = Voronoi(graphene.xyz[:, :2]) would not work (both layers would be superimposed..). So, either we provide also the z coordinate via vor = Voronoi(graphene.xyz) or we do this procedure per-layer:

bilayerGr = geom.bilayer(..., bottom_atoms=C, top_atoms=C)
top_layer = bilayerGr.top
bottom_layer = bilayerGr.bottom
# GET RINGS
vor_top = Voronoi(top_layer.xyz[:, :2])
vor_bottom = Voronoi(bottom_layer.xyz[:, :2])
...

The voronoi algorithm still works for 3d coordinates, simply pass xyz.

And it would actually separate the layers by using my method described above. So this might be an even simpler method to separate layers. E.g. a tilted bi-layer, a double walled nanotube, etc.

b) Realize that if we provide the 3D-coordinates, we will get Voronoi points between the two graphene layers. So these out-of-plane Voronoi points should also be discarded somehow. This could be done via a fine-tuned radius, or via searching for Rings separately for different parts of the structure, as done with vor_top and vor_bottom.

So, to me, I find that some inputs should be provided by the user because, otherwise, it would be hard for sisl to know which are the "rings" that are relevant and which are not. Of course, sisl could provide everything and then the user filter what he/she was really looking for. However, it might also be helpful that the user may provide some inputs to the sisl.ring_finder so that it directly gets the filtered targeted Rings from sisl.

Well, I am pretty convinced that Voronoi can fill the gap to a large extend, then finetuning the results can narrow the search.
Nice you have already played with it!

Another approach to find regular polygons, which I think would be more straightforward but maybe slower would be to:

* Find all neighbors with the neighbor finder.

* Build a graph.

* Use a established algorithm to find loops of length X (e.g. X=6 for hexagons) like [this one by networkx](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cycles.simple_cycles.html#networkx.algorithms.cycles.simple_cycles).

* Find out if those loops are really a regular polygon.

I think the Voronoi seems extremely generic and it will also be very efficient in terms of searching things since we can narrow the search space by the cells. I think creating the graph here isn't really necessary. One would always have to post-process the coordinates found by any algorithm to ensure the group of ring atoms forms an aryl ring.

@ialcon
Copy link
Author

ialcon commented Apr 23, 2024

And it would actually separate the layers by using my method described above. So this might be an even simpler method to separate layers. E.g. a tilted bi-layer, a double walled nanotube, etc.

How would the Voronoi points allow you to separate the layers? By analyzing the distance of atoms to the Voronoi points? Or distance between Voronoi points?

What I do not have fully clear as of now is your points 3 and 4. Regarding 3, how do you assign atoms that are sitting between two Voronoi points? This might be particularly tricky when you have rings of different radius (e.g. a meta-aryl-ring and the pore close to it). Also, note that pores do not have a regular circular "radius". How would you point 3 select only those atoms that define the nanopore?

Intuitively I would say that @pfebrer's approach should be more stable (though maybe less efficient computationally) because there you are already working with the graph obtained from the atomic connections (the "neighborhood"). Your Voronoi approach, on the contrary, starts without a clue about the connections, and only looks at the atomic distribution of atoms. This makes it more general, but I think it may complicate the post-processing and more easily generate errors.

It might as well be that I need to play more with the Voronoi function to get around it...

@zerothi
Copy link
Owner

zerothi commented Apr 25, 2024

And it would actually separate the layers by using my method described above. So this might be an even simpler method to separate layers. E.g. a tilted bi-layer, a double walled nanotube, etc.

How would the Voronoi points allow you to separate the layers? By analyzing the distance of atoms to the Voronoi points? Or distance between Voronoi points?

Hmm, I need to test this more rigorously.

What I do not have fully clear as of now is your points 3 and 4. Regarding 3, how do you assign atoms that are sitting between two Voronoi points? This might be particularly tricky when you have rings of different radius (e.g. a meta-aryl-ring and the pore close to it). Also, note that pores do not have a regular circular "radius". How would you point 3 select only those atoms that define the nanopore?

My idea is not to only use Voronoi. Voronoi has a good classification of regions. But we still require to post-analyse the regions as per the regular approach. So Voronoi is the initial segregator of regions of space, subsequent things needs to be done on the same rigourous way. The nice thing about Voronoi is that is separates parts, regardless of bond-lengths etc. So it is un-biased for elements etc. This is my main motivation for it, it is general in this sense.

So, point 3 would do it as @pfebrer suggests. And the Ring class should define whether it is a circular shape, or any pore. So probably the hierachy should be:

Pore | Hole
  |
  -> Ring
       |
       -> ArylRing

Intuitively I would say that @pfebrer's approach should be more stable (though maybe less efficient computationally) because there you are already working with the graph obtained from the atomic connections (the "neighborhood"). Your Voronoi approach, on the contrary, starts without a clue about the connections, and only looks at the atomic distribution of atoms. This makes it more general, but I think it may complicate the post-processing and more easily generate errors.

@pfebrer neighbor requires a correct radius input, and say an MD simulation where the radius' are slightly out of bounds would result in problems. At least here one could define them based on some other measures rather than aryl ring radius. Well, it of course needs to be explored how it should be done...

@pfebrer
Copy link
Contributor

pfebrer commented Apr 25, 2024

I also was thinking that if one can convert the structure to an image (where for example bonds are drawn in white over a black background, then one can apply typical algorithms to find contours (e.g. skimage's find_contours ). I don't know what's simpler, Voronoi or this. This approach I think would only work for 2D structures, and only if you find the plane. I don't know if the voronoi approach as you proposed extends to 3D (I haven't read the details). So probably not worth it, it is just fun to think of new approaches 😄

In any case, handling periodicity is not trivial, no? Would you expand to the auxiliary cell to run the algorithm?

@ialcon
Copy link
Author

ialcon commented Apr 25, 2024

In any case, handling periodicity is not trivial, no? Would you expand to the auxiliary cell to run the algorithm?

This is actually a very good point which, by the way, would be solved if we used graphs instead of Voronoi points (because periodicity is already accounted for in the neighbours list). Though, as @pfebrer suggests, maybe one could expand to the auxiliary cell to include inter-cell Voronoi points (i.e. inter-cell Rings).

@pfebrer neighbor requires a correct radius input, and say an MD simulation where the radius' are slightly out of bounds would result in problems. At least here one could define them based on some other measures rather than aryl ring radius. Well, it of course needs to be explored how it should be done...

I guess the question then is what is more robust (to, for e.g., a molecular dynamics): bond-lengths or ring-radius?

@pfebrer
Copy link
Contributor

pfebrer commented Apr 25, 2024

I think handling periodicity in the graph case is also not trivial. But maybe it is simpler, I don't know :)

@pfebrer
Copy link
Contributor

pfebrer commented Apr 25, 2024

By the way, your original goal (separating two ribbons in NPG) I think could be very easily solved by building the graph and then applying spectral clustering to separate the two parts (which you could do with sklearn: https://scikit-learn.org/stable/modules/clustering.html). You don't need to detect rings 😅

@ialcon
Copy link
Author

ialcon commented Apr 25, 2024

Well, the GNR separation in NPGs would be one usage for Rings - but there are many other cases where having a list of Rings becomes quite handy, like working with 1D/2D conjugated polymers, all sorts of organic compounds, and so on... So, a natural field where only aryl rings could have great use is organic chemistry, which is not a small area of chemistry 😛

That being said, the approach of @zerothi is even more general than that (detection of any type of Ring - not only aryl rings) being potentially applicable also to many inorganic materials. So yeah, I think this is a feature whose utility goes well beyond NPGs... 🙂

@ialcon
Copy link
Author

ialcon commented Apr 25, 2024

I think handling periodicity in the graph case is also not trivial. But maybe it is simpler, I don't know :)

But inter-cell bonds will be present in the Neighbourhood (or connections: list of bonds) that would be passed to the graph... wouldn't that allow the graph to spot inter-cell Rings?

@pfebrer
Copy link
Contributor

pfebrer commented Apr 26, 2024

There should probably be some differentiation between nodes of different cells, so that for example the searching algorithms don't see it as the same node. I think in the end one would need to create the graph for the auxiliary cell as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants