Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issue - OPTICS #521

Closed
swetha0613 opened this issue Jun 18, 2019 · 19 comments
Closed

Performance Issue - OPTICS #521

swetha0613 opened this issue Jun 18, 2019 · 19 comments
Assignees
Labels
Investigation Tasks related to investigation of found issues Optimization Tasks related to code optimization

Comments

@swetha0613
Copy link

I am running OPTICS algorithm on 50k data points, since the data is text it has around 5k features. The time taken to run the program seems huge. Tried using ccore but doesnt seem to improve. Is there any way that I could improve performance.

@annoviko
Copy link
Owner

Hello, @mallika0613,

Are your sure that core is used? What is version do you use? Is it possible to see input data?

@annoviko annoviko added the Question Tasks that are questions from users label Jun 18, 2019
@swetha0613
Copy link
Author

swetha0613 commented Jun 18, 2019

Hello, @mallika0613,

Are your sure that core is used? What is version do you use? Is it possible to see input data?

I am using python3.6. How do I check if the core is used?

@annoviko
Copy link
Owner

@mallika0613 , I mean pyclustering version - what is pyclustering version? Have you seen warning messages like this one: "The pyclustering ccore is not supported for platform..." or something like this.

You can start debugging process check which method is used for processing in the process() method: __process_by_ccore or __process_by_python.

@swetha0613
Copy link
Author

I think you are right, core is not being used. But also I dont see the ccore not supported message.
I am using 0.8.1 version

@annoviko
Copy link
Owner

@mallika0613 ,

How did you install the library?
What kind of operating system do you have? If your operating system is a MAC OS, then you need to install 0.9.0 version where core is supported for MAC OS.

pip3 install pyclustering

@swetha0613
Copy link
Author

I am running it on aws instance. I used pip command to install the library

@annoviko
Copy link
Owner

@mallika0613 , is there any information about hardware platform and operating system?

@swetha0613
Copy link
Author

It has Linux OS with 488GB memory and 64 CPUs

@annoviko
Copy link
Owner

@mallika0613 , what is CPU architecture (for example, x86, x86_64)?

@swetha0613
Copy link
Author

It is x86_64

@annoviko
Copy link
Owner

annoviko commented Jun 19, 2019

x86_64 is supported. Ok, you can try to rebuilt core manually:

$ cd pyclustering/ccore
$ make ccore_x64

And, please, check that ccore is used instead of python after that.

@swetha0613
Copy link
Author

when I try to build it with
make ccore
it tries to install for 32bit, and 64bit seems to fail.
Screenshot 2019-06-19 at 2 58 22 PM

@annoviko
Copy link
Owner

@mallika0613 , in case of make ccore it tries to build core for x86 (32-bit) and for x86_64. In you case no need to build 32-bit version, that's why I wrote make ccore_x64. Looks like 64-bit version is built successfully, everything is ok.

@swetha0613
Copy link
Author

Ok, then I think installation is successful.
But I still don't see the progress in the performance

@annoviko
Copy link
Owner

@mallika0613 , just to be sure, could please do following:

$ make clean
$ make ccore_x64

@swetha0613
Copy link
Author

I followed the steps, but I dont think its improving the performace.
Also a quick observation- for 40k data points it takes around 11hrs and for 50k it is running for more than 24hrs? I am not sure if it is running or its stuck.
Is it because of huge number of features?

@annoviko
Copy link
Owner

annoviko commented Jun 21, 2019

@mallika0613 , clustering speed rate can be affected by data complexity, that's true. I will investigate perfomance issues, but, currently, I can recommend you to try other algorithms or to use other libraries, like scikit-learn or ELKI.

@swetha0613
Copy link
Author

Sure, thank you. Also a quick check, is it possible to extract important features from the model?

@annoviko annoviko added Investigation Tasks related to investigation of found issues Optimization Tasks related to code optimization and removed Question Tasks that are questions from users labels Jul 15, 2019
@annoviko
Copy link
Owner

@mallika0613 , I have reduce algorithmic complexity, it should help. But there is an additional issue that also should improve performance when it be done - #379 .

Well-scattered clusters and well-separated 10 clusters
r = 1.0, eps = 3
N               Optimized       Old Implementation
1000            0.00778         0.00671
10000           0.542           0.51
20000           2.05            2.05
30000           4.62            4.58



r = 0.1, eps = 3
N               Optimized       Old Implementation
30000           4.59            4.65



r = 0.01, eps = 3
N               Optimized       Old Implementation
30000           4.61            4.63
50000           12.91           12.94


Other Samples   Optimized       Old Implementation
Engy Time:      0.0388          0.0442
Atom:           0.0405          0.0419

@annoviko annoviko self-assigned this Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigation Tasks related to investigation of found issues Optimization Tasks related to code optimization
Projects
None yet
Development

No branches or pull requests

2 participants