You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've investigated pyclustering to do time series clustering using custom metric, that is dtw. See below for the custom metric and the code. The result is that pyclustering stuck at local optima, even though able to use the custom metric. That is because pyclustering did know how to update based on the custom metric, since kmeans._kmeans__update_centers did not fully consider the custom metric and use centers[index] = cluster_points.mean(axis=0).
Possible solution is to give user a custom cluster centers update strategy.
For the mean time, I'm using other package, but I love pyclustering idea to give user a custom metric that suitable for research use.
importarrayimportmathimportnumpyasnpfromnumbaimportjit@jit(nopython=True)defjit_dtw_distance(s1, s2):
""" Dynamic Time Warping. jitted version of: https://github.com/wannesm/dtaidistance/blob/master/dtaidistance/dtw.py#L186 @5ebdd25b820a02dde7f25f05b4cf7c0faad9840d Improved from 53.7 ms ± 4.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) to 127 µs ± 15.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) This function keeps a compact matrix, not the full warping paths matrix. Uses dynamic programming to compute: wps[i, j] = (s1[i]-s2[j])**2 + min( wps[i-1, j ] + penalty, // vertical / insertion / expansion wps[i , j-1] + penalty, // horizontal / deletion / compression wps[i-1, j-1]) // diagonal / match dtw = sqrt(wps[-1, -1]) :param s1: First sequence :param s2: Second sequence :param window: Only allow for maximal shifts from the two diagonals smaller than this number. It includes the diagonal, meaning that an Euclidean distance is obtained by setting window=1. :param max_dist: Stop if the returned values will be larger than this value :param max_step: Do not allow steps larger than this value :param max_length_diff: Return infinity if length of two series is larger :param penalty: Penalty to add if compression or expansion is applied :param psi: Psi relaxation parameter (ignore start and end of matching). Useful for cyclical series. :param use_c: Use fast pure c compiled functions :param use_pruning: Prune values based on Euclidean distance. This is the same as passing ub_euclidean() to max_dist :param only_ub: Only compute the upper bound (Euclidean). Returns: DTW distance """r, c=len(s1), len(s2)
window=max(r, c)
max_step=np.infmax_dist=np.infpenalty=0psi_1b, psi_1e, psi_2b, psi_2e= (0, 0, 0, 0)
length=min(c+1, abs(r-c) +2* (window-1) +1+1+1)
dtw=np.inf*np.ones(2*length)
sc=0ec=0ec_next=0smaller_found=Falseforiinrange(psi_2b+1):
dtw[i] =0skip=0i0=1i1=0psi_shortest=np.infforiinrange(r):
skipp=skipskip=max(0, i-max(0, r-c) -window+1)
i0=1-i0i1=1-i1foriiinrange(i1*length, i1*length+length):
dtw[ii] =np.infj_start=max(0, i-max(0, r-c) -window+1)
j_end=min(c, i+max(0, c-r) +window)
ifsc>j_start:
j_start=scsmaller_found=Falseec_next=iiflength==c+1:
skip=0ifpsi_1b!=0andj_start==0andi<psi_1b:
dtw[i1*length] =0forjinrange(j_start, j_end):
d= (s1[i] -s2[j])**2ifd>max_step:
continueassertj+1-skip>=0assertj-skipp>=0assertj+1-skipp>=0assertj-skip>=0dtw[i1*length+j+1-skip] =d+min(dtw[i0*length+j-skipp],
dtw[i0*length+j+1-skipp] +penalty,
dtw[i1*length+j-skip] +penalty)
ifdtw[i1*length+j+1-skip] >max_dist:
ifnotsmaller_found:
sc=j+1ifj>=ec:
breakelse:
smaller_found=Trueec_next=j+1ec=ec_nextifpsi_1e!=0andj_end==len(s2) andlen(s1) -1-i<=psi_1e:
psi_shortest=min(psi_shortest, dtw[i1*length+j_end-skip])
ifpsi_1e==0andpsi_2e==0:
d=dtw[i1*length+min(c, c+window-1) -skip]
else:
ic=min(c, c+window-1) -skipifpsi_2e!=0:
vc=dtw[i1*length+ic-psi_2e:i1*length+ic+1]
d=min(np.min(vc), psi_shortest)
else:
d=min(dtw[i1*length+min(c, c+window-1) -skip], psi_shortest)
ifmax_distandd>max_dist:
d=np.infreturnd
I've investigated
pyclustering
to do time series clustering using custom metric, that isdtw
. See below for the custommetric
and the code. The result is thatpyclustering
stuck at local optima, even though able to use the custommetric
. That is becausepyclustering
did know how to update based on the custommetric
, sincekmeans._kmeans__update_centers
did not fully consider the custommetric
and usecenters[index] = cluster_points.mean(axis=0)
.Possible solution is to give user a custom cluster centers update strategy.
For the mean time, I'm using other package, but I love
pyclustering
idea to give user a custommetric
that suitable for research use.The text was updated successfully, but these errors were encountered: