Commit 64f3175
[SPARK-4611][MLlib] Implement the efficient vector norm
The vector norm in breeze is implemented by `activeIterator` which is known to be very slow.
In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and
`k-means` have big performance improvement.
Here is the benchmark against mnist8m dataset.
a) `Normalizer`
Before
DenseVector: 68.25secs
SparseVector: 17.01secs
With this PR
DenseVector: 12.71secs
SparseVector: 2.73secs
b) `k-means`
Before
DenseVector: 83.46secs
SparseVector: 61.60secs
With this PR
DenseVector: 70.04secs
SparseVector: 59.05secs
Author: DB Tsai <[email protected]>
Closes #3462 from dbtsai/norm and squashes the following commits:
63c7165 [DB Tsai] typo
0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back
6fa616c [DB Tsai] address feedback
9b7cb56 [DB Tsai] move norm to static method
0b632e6 [DB Tsai] kmeans
dbed124 [DB Tsai] style
c1a877c [DB Tsai] first commit1 parent b0a46d8 commit 64f3175
File tree
4 files changed
+79
-6
lines changed- mllib/src
- main/scala/org/apache/spark/mllib
- clustering
- feature
- linalg
- test/scala/org/apache/spark/mllib/linalg
4 files changed
+79
-6
lines changedLines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
| 128 | + | |
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| |||
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
428 | | - | |
| 428 | + | |
429 | 429 | | |
430 | 430 | | |
431 | 431 | | |
| |||
Lines changed: 1 addition & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | 20 | | |
23 | 21 | | |
24 | 22 | | |
| |||
47 | 45 | | |
48 | 46 | | |
49 | 47 | | |
50 | | - | |
| 48 | + | |
51 | 49 | | |
52 | 50 | | |
53 | 51 | | |
| |||
Lines changed: 51 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
264 | 315 | | |
265 | 316 | | |
266 | 317 | | |
| |||
Lines changed: 24 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
197 | 198 | | |
198 | 199 | | |
199 | 200 | | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
200 | 224 | | |
0 commit comments