-
Notifications
You must be signed in to change notification settings - Fork 1
/
stepfitting_library.py
1751 lines (1564 loc) · 78.3 KB
/
stepfitting_library.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/home/boulgakov/anaconda3/bin/python
"""
Library for for fitting step functions to peptide photometries.
Characterizing the photometry of labeled peptides is one of the pillars of
fluorosequencing. Discrete steps in fluorescence are a frequently-encountered
fluor behavior. They may, for example, occur during photobleaching or may be
used as a fluor-counting strategy. This behavior is an idealization of
fluorophore physics: we are hoping that fluors are either on or off, with their
emission being a constant, with some noise around this mean. With this in mind,
we try to fit "step functions" to these steps to infer the status of labeled
peptides' fluors.
The name "step functions" is due to their resemblence to steps when plotted in
Cartesian coordinates. In Cartesian nomenclature, frames of a field of view
(representing time) comprise the horizontal axis, and Spot luminosity across
the frames the vertical axis. Hence, the steps are a series of horizontal lines
interpolating the regions of luminosity that are constant (within noise),
connected by vertical steps. Such a series of steps is also sometimes called a
"step train".
For convenience, instead of discussing fitting steps to the data, we will
discuss the equivalent task of fitting a consecutive series of horizontal
"plateaus" -- i.e. the horizontal lines -- across frames. Fitting a plateau
across a series of frames means that these frames' deviations in luminosity
from the fitted plateau are minimized, and that we are thus inferring that
during these frames the fluorescence of the Spot is constant within noise.
These plateaus can be fully characterized by three numbers: (starting frame,
ending frame, height). "height" is luminosity. Note that the plateau defined by
this tuple covers both the starting and ending frame, inclusively (i.e. it does
not follow the Python list syntax of the last number in foo[a:b] being member #
b - 1). A step fit is a sequence of such plateaus as represented by a list:
[(plateau_0_start, plateau_0_stop, plateau_0_height),
(plateau_1_start, plateau_1_stop, plateau_1_height),
(plateau_2_start, plateau_2_stop, plateau_2_height),
...
(plateau_n_start, plateau_n_stop, plateau_n_height)]
This library provides algorithms to fit such plateaus to a sequence of
numbers -- ostensibly representing a peptide's luminosity recorded through
time.
"""
import logging
import datetime
import time
import numpy as np
import itertools
from operator import itemgetter
from scipy.signal import medfilt
from scipy.stats import ttest_ind, linregress
import math
#if the calling application does not have logging setup, stepfitting_library
#will log to NullHandler by default
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())
def _plateau_squared_residuals(luminosities, plateau):
"""
Calculate squared residuals for a sequence of points deviating from a
horizontal line fit.
Arguments:
luminosities: List of luminosities of a Spot through frames.
plateau: Plateau to compare residuals against.
Returns:
Sum of squares of distances of points and the plateau.
"""
start, stop, height = plateau
return sum([(lum - height)**2 for lum in luminosities[start:stop + 1]])
def _plateaus_squared_residuals(luminosities, plateaus):
"""
Sums _plateau_squared_residuals for multiple plateaus.
"""
return sum(_plateau_squared_residuals(luminosities, plateau)
for plateau in plateaus)
def _fit_plateau(luminosities, starting_frame, stopping_frame):
"""
Get the best-fit plateau to a sequence of luminosities that minimizes
the sum of their squared residuals.
Arguments:
luminosities: List of luminosities of a Spot through frames.
starting_frame and stopping_frame: The line should fit all points
starting with starting_frame and ending with stopping_frame.
Returns:
Plateau at the height of the best-fitting horizontal line.
"""
if not 0 <= starting_frame <= stopping_frame < len(luminosities):
raise ValueError("Invalid (starting_frame, stopping_frame): " +
str((starting_frame, stopping_frame)) +
" with len(luminosities) = " +
str(len(luminosities)))
return (starting_frame, stopping_frame,
np.mean(luminosities[starting_frame: stopping_frame + 1]))
def _split_plateau(luminosities, plateau, forbidden_splits=None,
min_step_magnitude=5000):
"""
Split a plateau into two plateaus, such that the split yields the smallest
possible sum of squares of residues.
Arguments:
luminosities: List of luminosities of the Spot through frames.
plateau: Plateau to split.
forbidden_splits: If not None, this is a list of splits that are not
allowed. Each split is represented by a tuple (stop, start), where
stop and start are the consecutive frames between which a split is
not allowed.
min_step_magnitude: Minimum size of allowed fitted step, i.e. the
difference in height of the two resulting plateaus.
Returns:
(left_plateau, left_plateau_residuals,
right_plateau, right_plateau_residuals,
total_residuals)
left_plateau and right_plateau are returned as None if a possible split
is not found within the constraints given.
"""
logger = logging.getLogger()
logging.debug("stepfitting_library._split_plateau locals() = " +
str(locals()))
start, stop, height = plateau
if not 0 <= start <= stop < len(luminosities):
raise ValueError("plateau start and stop does not fit within " +
"luminosities; locals() = " + str(locals()))
#initialize forbidden_splits as a set
if forbidden_splits is None:
forbidden_splits = set()
else:
forbidden_splits = set(forbidden_splits)
#stores best split found so far
#(left_plateau, left_residuals,
# right_plateau, right_residuals,
# total_residuals)
best_split = (None,
len(luminosities) *
(np.amax(luminosities) - np.amin(luminosities))**2,
None,
len(luminosities) *
(np.amax(luminosities) - np.amin(luminosities))**2,
2 * len(luminosities) *
(np.amax(luminosities) - np.amin(luminosities))**2)
for s in range(start, stop):
if (s, s + 1) in forbidden_splits:
continue
left_plateau = _fit_plateau(luminosities, start, s)
right_plateau = _fit_plateau(luminosities, s + 1, stop)
if abs(left_plateau[2] - right_plateau[2]) < min_step_magnitude:
continue
left_residuals = _plateau_squared_residuals(luminosities,
left_plateau)
right_residuals = _plateau_squared_residuals(luminosities,
right_plateau)
total_residuals = left_residuals + right_residuals
if total_residuals <= best_split[4]: #<= instead of < for flat case
best_split = (left_plateau, left_residuals,
right_plateau, right_residuals,
total_residuals)
logger.debug("stepfitting_library._split_plateau: returning best_split " +
"= " + str(best_split))
return best_split
def _best_split(luminosities, plateaus, bestfit_plateaus=None,
min_step_length=2, min_step_magnitude=5000):
"""
Find the plateau split amongst plateaus that minimizes the total sum of
squared residuals across all plateaus.
Arguments:
luminosities: List of luminosities of the Spot through frames.
plateaus: Existing sequence of plateaus to try splitting.
bestfit_plateaus: If not None, indicates that we're performing a
counter-fit and thus must constrain any new split not to align with
any boundary of any plateau in bestfit_plateaus. Furthermore, this
constrains any split so that there cannot be more than one counter-
fit plateau split per bestfit plateau. This option is designed
specifically for chi_squared_step_fitter.
min_step_length: Minimum length of fitted steps.
min_step_magnitude: Minimum size of allowed fitted step.
Returns:
The sequence of plataus given, with one of them split into two such
that this split has the lowest sum of squared residues possible.
Returns None if plateaus cannot be split (due to e.g. bestfit_plateaus
blocking all possible splits, or all plateaus already being
min_step_length, etc.).
"""
logger = logging.getLogger()
logger.debug("stepfitting_library._best_split locals() = " + str(locals()))
forbidden_splits = []
#append to forbidden_splits the bestfit steps: counterfits cannot share
#these splits
if bestfit_plateaus is not None:
for p, (start, stop, height) in enumerate(bestfit_plateaus[:-1]):
next_start, next_stop, next_height = bestfit_plateaus[p + 1]
forbidden_splits.append((stop, next_start))
#append to forbidden_splits counterfit plateaus' invalid splits that
#would result in more than one counterfit step per bestfit step
if bestfit_plateaus is not None:
all_counterfit_starts = [start
for (start, stop, height) in plateaus]
all_counterfit_stops = [stop
for (start, stop, height) in plateaus]
for p, (start, stop, height) in enumerate(bestfit_plateaus):
for f in range(start, stop + 1):
if f in all_counterfit_starts:
assert f == 0 or f - 1 in all_counterfit_stops
forbidden_splits += [(u, u + 1)
for u in range(start, stop)]
#append to forbidden splits plateaus that are already shorter than
#min_step_length
for p, (start, stop, height) in enumerate(plateaus):
if stop - start < min_step_length:
forbidden_splits += [(u, u + 1) for u in range(start, stop)]
#append to forbidden splits those cases that would result in a plateau
#shorter than min_step_length
for p, (start, stop, height) in enumerate(plateaus):
for u in range(start, stop):
if u - start < min_step_length or stop - u < min_step_length:
forbidden_splits.append((u, u + 1))
best_split_index = None
best_split_residuals = len(luminosities) * (np.amax(luminosities) -
np.amin(luminosities))**2
best_split_results = None
for p, plateau in enumerate(plateaus):
start, stop, height = plateau
(left_plateau, left_residuals,
right_plateau, right_residuals,
total_residuals) = \
_split_plateau(luminosities=luminosities,
plateau=plateau,
forbidden_splits=forbidden_splits,
min_step_magnitude=min_step_magnitude)
if (left_plateau is not None and
right_plateau is not None and
total_residuals < best_split_residuals):
best_split_index, best_split_residuals = p, total_residuals
best_split_results = (left_plateau, left_residuals,
right_plateau, right_residuals,
total_residuals)
if best_split_index is not None:
assert best_split_results is not None
(best_left_plateau, best_left_residuals, best_right_plateau,
best_right_residuals, best_total_residuals) = best_split_results
return_result = (plateaus[:best_split_index] +
[best_left_plateau, best_right_plateau] +
plateaus[best_split_index + 1:])
else:
return_result = None
logger.debug("stepfitting_plateau._best_split: returning result = " +
str(return_result))
return return_result
def _fit_steps(luminosities, num_plateaus, bestfit_plateaus=None,
existing_fit=None, min_step_length=2,
min_step_magnitude=5000):
"""
Utility function for chi_squared_step_fitter containing the fitting iteration loop.
Arguments:
luminosities: List of luminosities of the Spot through frames.
num_plateaus: Number of plateaus to fit to the luminosities. Cannot be
greater than the number of luminosities.
bestfit_plateaus: If not None, this list of plateaus is taken as the
set of best-fitting plateaus against which this function is to make
a counter-fit. (See chi_squared_step_fitter for details about counter-fits.) If
bestfit_plateaus is not None, then num_plateaus must be
len(bestfit_plateaus) + 1.
existing_fit: If not None, initializes algorithm with these existing
plateaus. Thus, _fit_steps tries to find the best fit that is some
combination of plateau splits performed on those in existing_fit.
If this is given, num_plateaus must be less than or equal to
len(existing_fit). This function does not check whether
bestfit_plateaus and existing_fit contradict each other if both are
given.
min_step_length: Minimum length of fitted steps.
min_step_magnitude: Minimum size of allowed fitted steps.
Returns:
If bestfit_plateaus is None, _fit_steps returns a best-fit on
luminosities as a list of num_plateaus number of plateaus. If
bestfit_plateaus is not None, then _fit_steps returns the counter-fit
on luminosities, given bestfit_plateaus.
"""
logger = logging.getLogger()
if len(luminosities) < num_plateaus:
raise ValueError("num_plateaus = " + str(num_plateaus) +
" is greater than len(luminosities) = " +
str(len(luminosities)))
if (bestfit_plateaus is not None and
len(bestfit_plateaus) + 1 != num_plateaus):
raise ValueError("len(bestfit_plateaus) + 1 = " +
str(len(bestfit_plateaus) + 1) +
" != num_plateaus = " + str(num_plateaus))
if existing_fit is not None and num_plateaus < len(existing_fit):
raise ValueError("num_plateaus = " + str(num_plateaus) + " but " +
"len(existing_fit) = " + str(len(existing_fit)))
#Initialize the algorithm
if existing_fit is None:
initial_plateau = _fit_plateau(luminosities, 0, len(luminosities) - 1)
plateaus = [initial_plateau]
else:
plateaus = existing_fit
while len(plateaus) < num_plateaus:
new_plateaus = _best_split(luminosities=luminosities,
plateaus=plateaus,
bestfit_plateaus=bestfit_plateaus,
min_step_length=min_step_length,
min_step_magnitude=min_step_magnitude)
if new_plateaus is None:
logger.debug("stepfitting_library._fit_steps: _best_split could " +
"not split plateaus further; locals() = " +
str(locals()))
break #no further splits could be done by _best_split
else:
plateaus = new_plateaus
logger.debug("stepfitting_library._fit_steps: returning plateaus = " +
str(plateaus))
return plateaus
def chi_squared_step_fitter(luminosity_sequence, num_steps_multiplier=1,
num_steps=None, min_step_length=2,
min_step_magnitude=0.0, ignore_counterfits=False):
"""
A step-fitting algorithm for Spot luminosity.
Taken from "Assembly dynamics of microtubules at molecular resolution,"
by Kerssemakers et al. doi:10.1038/nature04928
We strongly recommend reading through Supplementary Methods 3 of the paper
for a clear graphical illustration of the algorithm.
Input to the algorithm is a sequence of a Spot's luminosity as it varies
with time across frames. The algorithm tries to find step-wise increases or
decreases in the Spot's luminosity. The decreases, for example, may be due
to fluor photobleaching, fluor removal via Edman, etc.
We hope that the behavior of the Spots due to fluors is (as in the paper)
"a step train with with steps of varying size and duration, hidden in
Gaussian noise with RMS amplitude sigma."
Note that the plateaus cannot have horizontal gaps between them: all frames
must be covered.
The quality of a series of plateau fits across a sequence of frames is
evaluated by the total sum of squares of all luminosity deviations for all
frames from their respective fitted plateau. We are searching for the
sequence of best-fitting plateaus, with the lowest sum of squared
deviations, but at the same time not over-fit.
In summary, the plateau-fitting algorithm works as follows:
1. Initializes by fitting one plateau across all frames.
2. For subsequent iterations, the algorithm repeatedly splits one
fitted plateau into two segments, re-fitting each of the segments
to their respective frames. This is equivalent to adding a step. At
each iteration, the plateau that the algorithm chooses to split is
the one whose split results in the largest vertical distance
between the newly-created segments.
3. Iterative splitting continues until either there are no more
plateaus left to split (i.e. there is one plateau per frame) or
some specified number of plateaus is reached.
4. At some point, with enough plateaus fitted, it is probable that the
algorithm starts to overfit the data. Following Kerssemakers et
al., each the number of plateaus increases, we perform what they
call a "counter-fit" ("counterfit"?), and compare the quality of
the counter-fit with the best fit. A counter-fit is composed of
plateaus whose connecting steps occur in the middle of the best-
fitted plateaus, with one step per best-fitting plateau. The
counter-fitted plateaus are also best-fitted to their respective
intervals, however these intervals are constrained as just
described. Just as for the best-fitting plateau sequence, we
compute a total sum of squared deviations for the counter-fit
plateaus. The ratio of this sum from the counter-fit, divided by
the same sum from the best-fit is defined as the "step-indicator"
S. Kerssemakers et al.: "When the number of steps in the best fit
is very close to the real number of steps in the data, the value of
S will be large... If however the data are severley under- or
overfitted, or when the data consist of gradual non-stepped growth,
the value for S will be close to 1." Thus, we perform counterfits
for the increasingly refined series of best-fits, and compute their
S's. The number of best-fit steps with the highest S is chosen as
the correct number of plateaus/steps.
Arguments:
luminosity_sequence: Sequence of a Spot's photometries through the
frames as a list.
num_steps_multiplier: Keep adding steps to the fit until their number
equals or is greater than num_steps_multiplier *
len(luminosity_sequence). Must be in the open interval (0, 1). The
algorithm will attempt to fit at most len(luminosity_sequence) - 2
number of steps. (The -2 is to allow for a counter-fit.)
num_steps: If not None, overrides num_steps_multiplier. Must be between
1 and len(luminosity_sequence) - 1, inclusive.
min_step_length: Minimum length of fitted steps.
min_step_magnitude: Minimum size of allowed fitted steps.
ignore_counterfit: If True, return the largest number of plateaus
fitted, regardless of counterfits.
Returns:
List of horizontal plateaus fit to the luminosity sequence. Each
plateau is specified by (start_frame, stop_frame, height).
[( start_frame_1, stop_frame_1, height_1),
( stop_frame_1, stop_frame_2, height_2),
...
(stop_frame_n-1, stop_frame_n, height_n)
]
"""
logger = logging.getLogger()
if not 0 < num_steps_multiplier <= 1:
raise ValueError("num_steps_multiplier has an invalid value of " +
str(num_steps_multiplier))
if (num_steps is not None and
not 0 < num_steps < len(luminosity_sequence)):
raise ValueError("num_steps has an invalid value of " +
str(num_steps) +
" vs len(luminosity_sequence) = " +
str(len(luminosity_sequence)))
#if not explicitly given, compute the number of steps to (over)fit to
if num_steps is None:
num_steps = min(int(np.ceil(num_steps_multiplier *
len(luminosity_sequence))),
len(luminosity_sequence) - 2)
#num_steps implies this num_plateaus to fit
num_plateaus = num_steps + 1
logger.debug("stepfitting_library.chi_squared_step_fitter locals() = " +
str(locals()))
#plateau_fits stores the best-fit, the counter-fit, and S for each of
#the increasing number of plateaus.
#
#Each member is (best_fit, counter_fit, S).
#
#Each of the best_fit's or counter_fit's is a list of plateaus in the
#same list format as in this function's docstring's "Returns" section.
plateau_fits = []
for p in range(1, num_plateaus + 1):
logger.debug("stepfitting_library.chi_squared_step_fitter: at " +
"plateau fit p = " + str(p) + " at time " +
str(datetime.datetime.fromtimestamp(time.time())))
if len(plateau_fits) > 0:
existing_fit = plateau_fits[-1][0]
else:
existing_fit = None
best_fit = _fit_steps(luminosities=luminosity_sequence,
num_plateaus=p,
bestfit_plateaus=None,
existing_fit=existing_fit,
min_step_length=min_step_length,
min_step_magnitude=min_step_magnitude)
#check to see if we can't increase number of best-fits due to some
#constraint
if (len(plateau_fits) > 0 and
len(best_fit) == len(plateau_fits[-1][0])):
break
bestfit_residuals = \
_plateaus_squared_residuals(luminosities=luminosity_sequence,
plateaus=best_fit)
counter_fit = _fit_steps(luminosities=luminosity_sequence,
num_plateaus=p + 1,
bestfit_plateaus=best_fit,
existing_fit=None,
min_step_length=0,#counterfits must allow
#length 0
min_step_magnitude=min_step_magnitude)
counterfit_residuals = \
_plateaus_squared_residuals(luminosities=luminosity_sequence,
plateaus=counter_fit)
if float(bestfit_residuals) != 0:
S = float(counterfit_residuals) / float(bestfit_residuals)
else:
logger.debug("stepfitting_library.chi_squared_step_fitter: " +
"bestfit_residuals = 0, locals() = " + str(locals()))
S = 10**10
plateau_fits.append((best_fit, counter_fit, S))
if ignore_counterfits:
return_fit = sorted(plateau_fits, key=lambda x:len(x[0]),
reverse=True)[0][0]
else:
return_fit = sorted(plateau_fits, key=lambda x:x[2], reverse=True)[0][0]
logger.debug("stepfitting_library.chi_squared_step_fitter: returning " +
"fit = " + str(return_fit))
return return_fit
def plateau_value(plateaus, frame):
"""
Get value of fitted plateaus at a frame.
Arguments:
plateaus: List of fitted plateaus.
frame: Frame number to get value for.
Returns:
Height of the plateau at frame.
"""
logger = logging.getLogger()
logger.debug("stepfitting_library.plateau_value locals() = " +
str(locals()))
for (start, stop, height) in plateaus:
if start <= frame <= stop:
value = height
break
else:
raise ValueError("frame " + str(frame) + " is outside of " +
"plateaus " + str(plateaus))
return value
def mean_filter(luminosities, rank):
raise DeprecationWarning("This function was made, but not used. I'm not "
"sure it handles edges the way I want it to "
"right now.")
if rank < 1:
raise ValueError("Rank cannot be less than one.")
padded_luminosities = ([luminosities[0] for r in range(rank)] +
luminosities +
[luminosities[-1] for r in range(rank)])
return [np.mean(padded_luminosities[L:L + 2 * rank + 1])
for L, lum in enumerate(luminosities)]
def _pairwise(iterable):
"""
Produces an iterable that yields "s -> (s0, s1), (s1, s2), (s2, s3), ..."
From Python itertools recipies.
e.g.
a = _pairwise([5, 7, 11, 4, 5])
for v, w in a:
print [v, w]
will produce
[5, 7]
[7, 11]
[11, 4]
[4, 5]
The name of this function reminds me of Pennywise.
"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.izip(a, b)
def _triplewise(iterable):
"""
Produces an iterable that yields "s -> (s0, s1, s2), (s1, s2, s3),
(s2, s3, s4), ..."
Variant of _pairwise.
a = _triple([5, 7, 11, 4, 5])
for u, v, w in a:
print [u, v, w]
will produce
[5, 7, 11]
[7, 11, 4]
[11, 4, 5]
"""
a, b, c = itertools.tee(iterable, 3)
next(b, None)
next(c, None)
next(c, None)
return itertools.izip(a, b, c)
def plateaus_to_steps(plateaus):
"""
Converts step-fitted plateaus as to a list of steps. Steps only happen
between frames.
Arguments:
plateaus: List of fitted plateaus.
Returns:
List of steps in the form
[(pre_step1_frame_#, post_step1_frame_#, step1_magnitude),
(pre_step2_frame_#, post_step2_frame_#, step2_magnitude),
...
(pre_stepN_frame_#, post_stepN_frame_#, stepN_magnitude)]
Each step's tuple contains three values. pre_stepI_frame_# is the frame
index of the frame directly preceeding the I'th step,
post_stepI_frame_# is the frame index of the frame directly following
the I'th step. stepI_magnitude is the size of the step, with steps
increasing in luminosity being positive.
"""
steps = []
#uses _pairwise to generate pairs of plateaus
for (plateau_A, plateau_B) in _pairwise(plateaus):
start_A, stop_A, height_A = plateau_A
start_B, stop_B, height_B = plateau_B
steps.append((stop_A, start_B, height_B - height_A))
return steps
def last_step_info(steps, frame):
"""
Get information about the last step that took place before the frame.
Arguments:
steps: Steps representing the output of plateaus_to_steps.
frame: Frame # for which to get the last preceeding step's information.
Returns:
(last_step_num, last_step_position, last_step_magnitude)
where last_step_num is the (0-indexed) number of the last step,
last_step_position is the (0-indexed) frame # after which the step took
place (i.e. the step is between this frame # and # + 1), and
last_step_magnitude is the magnitude of the step (increase in
luminosity is positive).
Returns (None, None, None) if there was not step before frame.
"""
if frame < 0:
raise ValueError("frame must be a positive integer.")
return_values = None, None, None
for s, (step_A, step_B) in enumerate(_pairwise(steps)):
pre_frame_A, post_frame_A, magnitude_A = step_A
pre_frame_B, post_frame_B, magnitude_B = step_B
if post_frame_A <= frame <= pre_frame_B:
return_values = (s, pre_frame_A, magnitude_A)
break
else:
if len(steps) == 0:
return_values = None, None, None
else:
last_pre_frame, last_post_frame, last_magnitude = steps[-1]
if frame >= last_pre_frame:
return_values = (len(steps) - 1, last_pre_frame,
last_magnitude)
return return_values
def frame_plateau(plateaus, frame):
"""
Get the plateau containing frame, and the plateau's index in plateaus.
"""
return_plateau = None, None, None
return_index = None
for p, plateau in enumerate(plateaus):
start, stop, height = plateau
if start <= frame <= stop:
return_plateau = start, stop, height
return_index = p
break
return return_plateau, return_index
def _consecutive_integers(integers):
"""
Given a list of integers, returns sub-lists of consecutive integers.
From http://stackoverflow.com/questions/2361945/detecting-consecutive\
-integers-in-a-list
e.g. given
integers = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
this function will return
[[1], [4, 5, 6], [10], [15, 16, 17, 18], [22], [25, 26, 27, 28]]
"""
consecutive_integers = []
for k, g in itertools.groupby(enumerate(integers),
lambda (i, x):i - x):
consecutive_integers.append(map(itemgetter(1), g))
return consecutive_integers
def _merge_plateaus(luminosities, plateau_a, plateau_b):
"""
Given two consecutive plateaus, return a single plateau fitted to
luminosities spanned by both.
"""
start_a, stop_a, height_a = plateau_a
start_b, stop_b, height_b = plateau_b
if stop_a + 1 != start_b:
raise ValueError("Merged plateaus must be consecutive. locals() = " +
str(locals()))
return _fit_plateau(luminosities, start_a, stop_b)
#Not sure how to do this yet...
#def _repeat_filter(target_filter):
# """
# To be used as a decorator for applying filters to a plateaus repeatedly.
# Specifically, applies target_filter (len(plateaus) - 1) number of times.
#
# plateaus must be the second argument passed to target_filter.
# """
# def target_filter_wrap(*args, **kwargs):
# plateaus = kwargs['plateaus']
# filtered_plateaus = target_filter(*args, **kwargs)
# for n in range(len(plateaus) - 1):
# filtered_plateaus = target_filter(plateaus=filtered_plateaus,
# *args, **kwargs)
# return target_filter_wrap
def _filter_upsteps_singlepass(luminosities, plateaus):
"""
Utility function for filter_upsteps. Will merge at least one existing
upstep every time it is called. To guarantee all upsteps are eliminated,
this function must be applied at least as many times as there are plateaus,
minus one.
Arguments:
luminosities: List of luminosities of a Spot through frames.
plateaus: Plateaus to filter.
Returns:
List of plateaus with at least one upstep removed via plateau merging.
"""
if len(plateaus) < 2:
downstep_filtered = plateaus
else:
downstep_filtered = []
for a, b in _pairwise(plateaus):
start_a, stop_a, height_a = a
start_b, stop_b, height_b = b
#Check to see if a was merged in the prior iteration.
if len(downstep_filtered) > 0:
last_start, last_stop, last_height = downstep_filtered[-1]
if stop_a == last_stop:
continue
else:
assert last_stop + 1 == start_a
if height_b > height_a:
merged = _merge_plateaus(luminosities, a, b)
downstep_filtered.append(merged)
else:
downstep_filtered.append(a)
#If the last two plateaus were not merged, append the last b.
last_start, last_stop, last_height = plateaus[-1]
last_flt_start, last_flt_stop, last_flt_height = downstep_filtered[-1]
if last_stop != last_flt_stop:
downstep_filtered.append(plateaus[-1])
return downstep_filtered
def filter_upsteps(luminosities, plateaus):
"""
Merges plateaus until no upsteps remain.
Upsteps are defined as a step from a plateau with a lower height to a
plateau with a higher height.
Note that this function does not do any kind of global fit optimization. It
merely iteratively merges plateaus until no upsteps are left. The algorithm
immediately merges an upstep as soon as it finds one. It does not check
whether this new merge produces a new upstep or not. Every successive merge
to remove an upstep may generate a new one. It is hence quite possible to
feed in a sequence of plateaus that will be merged until there is only one
left.
Arguments:
luminosities: List of luminosities of a Spot through frames.
plateaus: Plateaus to filter.
Returns:
List of plateaus with upsteps removed via plateau merging.
"""
filtered_plateaus = plateaus
for n in range(len(plateaus) - 1):
filtered_plateaus = _filter_upsteps_singlepass(luminosities,
filtered_plateaus)
return filtered_plateaus
def _filter_small_steps_singlepass(luminosities, plateaus, min_magnitude=None,
min_noise_ratio=None):
"""
Utility function for filter_small_steps. Will merge at least one existing
upstep every time it is called. To guarantee all small steps are
eliminated, this function must be applied at least as many times as there
are plateaus, minus one.
Arguments:
luminosities: List of luminosities of a Spot through frames.
plateaus: Plateaus to filter.
min_magnitude: Remove any steps smaller than this size.
min_noise_ratio: Remove any steps that are smaller than
min_noise_ratio * max(sqrt(plateau A squared residuals),
sqrt(plateau B squared residuals))
where plateau A and plateau B are the two plateaus around the step.
Note that min_magnitude and min_noise criteria are applied in an OR
fashion: i.e. both can be provided, and a step is removed if either is
satisfied. If the parameter is None, it is not applied.
Consequentially, if both criteria are None, no steps are filtered.
Returns:
List of plateaus with at least one small step removed via plateau
merging.
"""
if min_magnitude is not None and min_magnitude < 0:
raise ValueError("min_magnitude < 0 makes no sense. locals() = " +
str(locals()))
if min_noise_ratio is not None and min_noise_ratio < 0:
raise ValueError("min_noise_ratio < 0 makes no sense. locals() = " +
str(locals()))
if len(plateaus) < 2:
magnitude_filtered = plateaus
else:
magnitude_filtered = []
for a, b in _pairwise(plateaus):
start_a, stop_a, height_a = a
start_b, stop_b, height_b = b
#Check to see if a was merged in the prior iteration.
if len(magnitude_filtered) > 0:
last_start, last_stop, last_height = magnitude_filtered[-1]
if stop_a == last_stop:
continue
else:
assert last_stop + 1 == start_a
#step size to be tested against the criteria
step_size = abs(height_a - height_b)
#booleans to track whether merging is necessary
noise_ratio_merge, magnitude_merge = False, False
#if min_noise_ratio was given, compute max noise for plateaus and
#test
if min_noise_ratio is not None:
max_noise = \
max(math.sqrt(_plateau_squared_residuals(luminosities, a)),
math.sqrt(_plateau_squared_residuals(luminosities, b)))
if step_size < max_noise * min_noise_ratio:
noise_ratio_merge = True
#if min_magnitude given, test it
if (min_magnitude is not None and
step_size < min_magnitude):
magnitude_merge = True
#if either of the criteria indicates a merge needs to take place,
#do it
if noise_ratio_merge or magnitude_merge:
merged = _merge_plateaus(luminosities, a, b)
magnitude_filtered.append(merged)
else:
magnitude_filtered.append(a)
#If the last two plateaus were not merged, append the last b.
last_start, last_stop, last_height = plateaus[-1]
last_flt_start, last_flt_stop, last_flt_height = magnitude_filtered[-1]
if last_stop != last_flt_stop:
magnitude_filtered.append(plateaus[-1])
return magnitude_filtered
def filter_small_steps(luminosities, plateaus, min_magnitude=None,
min_noise_ratio=None):
"""
Merges plateaus until no steps smaller than a specified size remain.
Note that this function does not do any kind of global fit optimization. It
merely iteratively merges plateaus until no small steps are left. The
algorithm immediately merges a small step as soon as it finds one. It does
not check whether this new merge produces a new small step or not. Every
successive merge to remove a small step may generate a new one. It is hence
quite possible to feed in a sequence of plateaus that will be merged until
there is only one left.
Arguments:
luminosities: List of luminosities of a Spot through frames.
plateaus: Plateaus to filter.
min_magnitude: Remove any steps smaller than this size.
min_noise_ratio: Remove any steps that are smaller than
min_noise_ratio * max(sqrt(plateau A squared residuals),
sqrt(plateau B squared residuals))
where plateau A and plateau B are the two plateaus around the step.
Note that min_magnitude and min_noise criteria are applied in an OR
fashion: i.e. both can be provided, and a step is removed if either is
satisfied. If the parameter is None, it is not applied.
Consequentially, if both criteria are None, no steps are filtered.
Returns:
List of plateaus with small steps removed via plateau merging.
"""
if min_magnitude is not None and min_magnitude < 0:
raise ValueError("min_step_magnitude < 0 makes no sense. " +
"locals() = " + str(locals()))
if min_noise_ratio is not None and min_noise_ratio < 0:
raise ValueError("min_step_noise_ratio < 0 makes no sense. " +
"locals() = " + str(locals()))
filtered_plateaus = plateaus
for n in range(len(plateaus) - 1):
filtered_plateaus = \
_filter_small_steps_singlepass(luminosities,
filtered_plateaus,
min_magnitude=min_magnitude,
min_noise_ratio=min_noise_ratio)
return filtered_plateaus
def sliding_t_fitter(luminosity_sequence, window_radius=20, p_threshold=0.001,
median_filter_size=None, downsteps_only=False,
min_step_magnitude=None):
"""
Fits steps to a luminosity sequence using a sliding-window two-tailed
Welch's t-test.
Between every frame, this algorithm computes a two-tailed Welch's t-test,
with frame luminosities to its left being compared to frame luminosities on
its right. If the two-tailed p-value is below a specified threshold, we
reject the null hypothesis of equal average luminosities for frames to the
left and right, and claim that a step occurs.
Currently, this uses the default scipy.stats.ttest_ind function to compute
t and p values. For those inter-frame positions whose p values are above a
threshold, their t values are compared to adjacent frames and those frames
with a local t maxima are chosen as steps.
Once the locations of steps is found, each plateau between each two steps
is fitted as the mean of the luminosities in that interval.
An alternative t-test variant that is currently not implemented but may be
be useful later is described below:
===BELOW NOT (YET?) IMPLEMENTED===
The method below is taken from "A Comparison of Step-Detection Methods: How
Well Can You Do?" by Carter et al. doi: 10.1529/biophysj.107.110601 and
"Mechanics of the kinesin step" by Carter & Cross doi: 10.1038/nature03528.
This method is not the one currently implemented.
Between every frame, this algorithm computes a two-tailed Welch's t-test,
with frame luminosities to its left being compared to frame luminosities on
its right:
(mean(left_frames) - mean(right_frames))
t = ----------------------------------------
sqrt(var(left_frames)/num_left_frames +
var(right_frames)/num_right_frames)
with degrees of freedom determined by
(var(left_frames) + var(right_frames))
--------------------------------------
max(num_left_frames, num_right_frames)
df = ---------------------------------------------------------
(var(left_frames)**2 + var(right_frames)**2)
-------------------------------------------------
(max(num_left_frames, num_right_frames) *
(max(num_left_frames, num_right_frames) - 1))
===ABOVE NOT (YET?) IMPLEMENTED===
Arguments:
luminosity_sequence: Sequence of a Spot's photometries through the
frames as a list.
window_radius: Radius to use for t-test window, i.e. a window_radius
number of frames on each side will be included in the test.
p_threshold: Threshold of p-value to use as a cutoff for the t-test.
median_filter_size: If not None, first apply median filter of this
size.
downsteps_only: If True, will only fit downward steps. Any upward
trends are ignored.
min_step_magnitude: If not None, will remove steps that are smaller in
magnitude than this number.
Returns:
List of horizontal plateaus fit to the luminosity sequence. Each
plateau is specified by (start_frame, stop_frame, height).
[( start_frame_1, stop_frame_1, height_1),
( stop_frame_1, stop_frame_2, height_2),