Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster custom metric kmeans #482

Closed
cugurm opened this issue Jan 4, 2019 · 4 comments
Closed

Faster custom metric kmeans #482

cugurm opened this issue Jan 4, 2019 · 4 comments
Assignees
Labels
Enhancement Tasks related to enhancement and development Investigation Tasks related to investigation of found issues

Comments

@cugurm
Copy link

cugurm commented Jan 4, 2019

First of all, gread job! Your library is awesome.

Is there any way to use cpp boosting with custom metric for your kmeans implementation?

When I specify my custom metric for your kmeans, its too slow!
I cant use numpy:

if self.__metric.get_type() != type_metric.USER_DEFINED:
self.__metric.enable_numpy_usage()
else:
self.__metric.disable_numpy_usage()

And also cant use cpp boosting:

self.__ccore = ccore and self.__metric.get_type() != type_metric.USER_DEFINED
if self.__ccore is True:
self.__process_by_ccore()
else:
self.__process_by_python()

But __process_by_python() is reall slow for my task.

Thanks in advance,
Milan

@annoviko
Copy link
Owner

annoviko commented Jan 6, 2019

Hello, @MilanCugur,
Thank you for your report, that means that it is time to increase priority of the following issue #422. There is crash on some platforms when C++ implementation calls user defined metric (python's code). I will investigate the problem and find out solution. But also I have another one question, what is metric used in your case? As a workaround I have tried to provide most well-known metrics, probably I have missed something.

@annoviko annoviko self-assigned this Jan 6, 2019
@annoviko annoviko added Enhancement Tasks related to enhancement and development Investigation Tasks related to investigation of found issues labels Jan 6, 2019
@cugurm
Copy link
Author

cugurm commented Jan 6, 2019

Hello @annoviko, thanks you for your fast reply!
.
Your list of metrics is extensive and covers all the main things, but I have an idea to try chi square metrics for my clustering.
This fashion of clustering, I want to code is similar to http://www.btluke.com/clusdis.html.

There are also a couple of scientific papers published on this topic.
When can I expect the update? Can I help in some way?

@annoviko
Copy link
Owner

annoviko commented Jan 8, 2019

@MilanCugur,
I have looked at the article briefly and looks like Canberra distance and Square Chi distance are calculated between points and does not requires additional parameters. Therefore it would be the easiest way to support them in python and C++. And as a second step it would be nice to resolve the problem with user's callback in the C++ implementation.

The first part (introduction of additional metrics) - I will be able to provide during this week.
The second part - it requires some digging and I am not ready to provide estimates. If you have experience with python callbacks in libraries - you can help to understand what is wrong.

annoviko added a commit that referenced this issue Jan 10, 2019
annoviko added a commit that referenced this issue Jan 10, 2019
annoviko added a commit that referenced this issue Jan 11, 2019
@annoviko
Copy link
Owner

annoviko commented Jan 13, 2019

@MilanCugur, I have introduced Canberra and Chi square distances to the library. They are available on master branch and will be available in the next release 0.9.0. You need to build library's core (C++ part of the library) using sources, there is instruction how to do that: https://github.com/annoviko/pyclustering/wiki/Core-of-the-PyClustering .

If you have any questions or troubles related to pyclustering, do not hesitate to ask them :-) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Tasks related to enhancement and development Investigation Tasks related to investigation of found issues
Projects
None yet
Development

No branches or pull requests

2 participants