Performance Issue - OPTICS #521

swetha0613 · 2019-06-18T15:15:34Z

I am running OPTICS algorithm on 50k data points, since the data is text it has around 5k features. The time taken to run the program seems huge. Tried using ccore but doesnt seem to improve. Is there any way that I could improve performance.

annoviko · 2019-06-18T16:15:38Z

Hello, @mallika0613,

Are your sure that core is used? What is version do you use? Is it possible to see input data?

swetha0613 · 2019-06-18T16:28:30Z

Hello, @mallika0613,

Are your sure that core is used? What is version do you use? Is it possible to see input data?

I am using python3.6. How do I check if the core is used?

annoviko · 2019-06-18T16:42:16Z

@mallika0613 , I mean pyclustering version - what is pyclustering version? Have you seen warning messages like this one: "The pyclustering ccore is not supported for platform..." or something like this.

You can start debugging process check which method is used for processing in the process() method: __process_by_ccore or __process_by_python.

swetha0613 · 2019-06-18T19:58:06Z

I think you are right, core is not being used. But also I dont see the ccore not supported message.
I am using 0.8.1 version

annoviko · 2019-06-19T09:00:37Z

@mallika0613 ,

How did you install the library?
What kind of operating system do you have? If your operating system is a MAC OS, then you need to install 0.9.0 version where core is supported for MAC OS.

pip3 install pyclustering

swetha0613 · 2019-06-19T12:00:36Z

I am running it on aws instance. I used pip command to install the library

annoviko · 2019-06-19T12:52:00Z

@mallika0613 , is there any information about hardware platform and operating system?

swetha0613 · 2019-06-19T13:04:50Z

It has Linux OS with 488GB memory and 64 CPUs

annoviko · 2019-06-19T13:21:34Z

@mallika0613 , what is CPU architecture (for example, x86, x86_64)?

swetha0613 · 2019-06-19T13:23:32Z

It is x86_64

annoviko · 2019-06-19T13:27:01Z

x86_64 is supported. Ok, you can try to rebuilt core manually:

$ cd pyclustering/ccore
$ make ccore_x64

And, please, check that ccore is used instead of python after that.

swetha0613 · 2019-06-19T14:00:47Z

when I try to build it with
make ccore
it tries to install for 32bit, and 64bit seems to fail.

annoviko · 2019-06-19T14:05:55Z

@mallika0613 , in case of make ccore it tries to build core for x86 (32-bit) and for x86_64. In you case no need to build 32-bit version, that's why I wrote make ccore_x64. Looks like 64-bit version is built successfully, everything is ok.

swetha0613 · 2019-06-19T15:49:03Z

Ok, then I think installation is successful.
But I still don't see the progress in the performance

annoviko · 2019-06-19T16:04:39Z

@mallika0613 , just to be sure, could please do following:

$ make clean
$ make ccore_x64

swetha0613 · 2019-06-20T16:25:34Z

I followed the steps, but I dont think its improving the performace.
Also a quick observation- for 40k data points it takes around 11hrs and for 50k it is running for more than 24hrs? I am not sure if it is running or its stuck.
Is it because of huge number of features?

annoviko · 2019-06-21T08:51:28Z

@mallika0613 , clustering speed rate can be affected by data complexity, that's true. I will investigate perfomance issues, but, currently, I can recommend you to try other algorithms or to use other libraries, like scikit-learn or ELKI.

swetha0613 · 2019-06-21T13:09:28Z

Sure, thank you. Also a quick check, is it possible to extract important features from the model?

annoviko · 2019-07-19T10:39:46Z

@mallika0613 , I have reduce algorithmic complexity, it should help. But there is an additional issue that also should improve performance when it be done - #379 .

Well-scattered clusters and well-separated 10 clusters
r = 1.0, eps = 3
N               Optimized       Old Implementation
1000            0.00778         0.00671
10000           0.542           0.51
20000           2.05            2.05
30000           4.62            4.58



r = 0.1, eps = 3
N               Optimized       Old Implementation
30000           4.59            4.65



r = 0.01, eps = 3
N               Optimized       Old Implementation
30000           4.61            4.63
50000           12.91           12.94


Other Samples   Optimized       Old Implementation
Engy Time:      0.0388          0.0442
Atom:           0.0405          0.0419

annoviko added the Question Tasks that are questions from users label Jun 18, 2019

annoviko added Investigation Tasks related to investigation of found issues Optimization Tasks related to code optimization and removed Question Tasks that are questions from users labels Jul 15, 2019

annoviko added a commit that referenced this issue Jul 16, 2019

#521: OPTICS optimization - reduce algorithmic complexity.

e5e01a9

annoviko added a commit that referenced this issue Jul 16, 2019

#521: OPTICS optimization - reduce algorithmic complexity.

56bc237

annoviko added a commit that referenced this issue Jul 16, 2019

#521: OPTICS optimization - update file CHANGES.

7538f59

annoviko added a commit that referenced this issue Jul 16, 2019

#521: OPTICS optimization - corrections for static analyser remarks.

414be02

annoviko added a commit that referenced this issue Jul 19, 2019

#521: Performance tests are added. [no-build].

3d52b4a

annoviko closed this as completed Jul 30, 2019

annoviko self-assigned this Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue - OPTICS #521

Performance Issue - OPTICS #521

swetha0613 commented Jun 18, 2019

annoviko commented Jun 18, 2019

swetha0613 commented Jun 18, 2019 •

edited

Loading

annoviko commented Jun 18, 2019

swetha0613 commented Jun 18, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019 •

edited

Loading

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 20, 2019

annoviko commented Jun 21, 2019 •

edited

Loading

swetha0613 commented Jun 21, 2019

annoviko commented Jul 19, 2019

Performance Issue - OPTICS #521

Performance Issue - OPTICS #521

Comments

swetha0613 commented Jun 18, 2019

annoviko commented Jun 18, 2019

swetha0613 commented Jun 18, 2019 • edited Loading

annoviko commented Jun 18, 2019

swetha0613 commented Jun 18, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019 • edited Loading

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 19, 2019

annoviko commented Jun 19, 2019

swetha0613 commented Jun 20, 2019

annoviko commented Jun 21, 2019 • edited Loading

swetha0613 commented Jun 21, 2019

annoviko commented Jul 19, 2019

swetha0613 commented Jun 18, 2019 •

edited

Loading

annoviko commented Jun 19, 2019 •

edited

Loading

annoviko commented Jun 21, 2019 •

edited

Loading