Skip to content

Conversation

@christinahedges
Copy link
Contributor

@christinahedges christinahedges commented Mar 30, 2022

This PR is for developing a new PerturbationMatrix class which will ultimately factor our a lot of the headaches we're having inside machine

To Do

  • Documentation (docstrings!)
  • Merge into machine.py
  • Write some tests for Kepler/K2/TESS data

@christinahedges christinahedges added the enhancement New feature or request label Mar 30, 2022
@christinahedges
Copy link
Contributor Author

Roughly how to use:

If flux and flux_err are arrays with dimensions (ntimes x npixels)

P = PerturbationMatrix3d(time=time, dx=dx, dy=dy)
P.fit(flux, flux_err)
model = P.model()

model will then also have (ntimes x npixels)

Copy link
Contributor

@jorgemarpa jorgemarpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I think the basic functionalities are implemented here.

Something that will be good to have is a pixel_mask argument in PerturbationMatrix3D so we can create matrix with the source_mask (for fitting) pixels and the uncontaminated_source_mask (for building) in one perturbation object.

I did a couple of "performance" test with TPF data and turned out that the self.model method runs really slow, about 1.7 sec/iter. We need to find why this is so slow.

@jorgemarpa jorgemarpa mentioned this pull request Apr 6, 2022
5 tasks
@christinahedges christinahedges changed the title [WIP] ✨ Developing PerturbationMatrix class ✨ Developing PerturbationMatrix class Apr 8, 2022
@christinahedges
Copy link
Contributor Author

christinahedges commented Apr 11, 2022

I added a PCA method so you can do something like:

y = [whitened_timeseries, whitened_timeseries]
p = PerturbationMatrix(...)
p.pca(y, ncomponents=10)
p.fit(y)

Copy link
Contributor

@jorgemarpa jorgemarpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinahedges I found the bug with the inconsistent shape when doing self.model(). The problem is when using PerturbationMatrix through PerturbationMatrix3D that some attributes from the latter are not updated to reflect the addition of PCA components in self.vectors. See my comments for more details.

@christinahedges
Copy link
Contributor Author

christinahedges commented Apr 12, 2022

Any vectors that aren't the time polynomial are now set to zero mean in each "segment" during _clean_vectors method. The pca method can now be repeated and it will just update the pca components, not append ad infinitum

Co-authored-by: Jorge Martínez-Palomera <[email protected]>
Copy link
Contributor

@jorgemarpa jorgemarpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I tested a handful of component combinations and other parameters to check the matrices look Ok.
It fulfills what we always thought about the perturbation matrix API.

I just added a couple of comments regarding docstrings. Besides those, I think this is ready to be merged.


return func

def pca(self, y, ncomponents=5, smooth_time_scale=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with the rest

Suggested change
def pca(self, y, ncomponents=5, smooth_time_scale=0):
def pca(self, y, ncomponents=3, smooth_time_scale=0):

Comment on lines 301 to 302
Will add two time scales of principal components, definied by `long_time_scale`
and `med_time_scale`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Will add two time scales of principal components, definied by `long_time_scale`
and `med_time_scale`.

Comment on lines 310 to 316
long_time_scale: float
The time scale where variability is considered "long" term. Should be
in the same units as `self.time`.
med_time_scale: float
The time scale where variability is considered "medium" term. Should be
in the same units as `self.time`. Variability longer than `long_time_scale`,
or shorter than `med_time_scale`, will be removed before building components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the scale of smooth_time_scale?

Suggested change
long_time_scale: float
The time scale where variability is considered "long" term. Should be
in the same units as `self.time`.
med_time_scale: float
The time scale where variability is considered "medium" term. Should be
in the same units as `self.time`. Variability longer than `long_time_scale`,
or shorter than `med_time_scale`, will be removed before building components
smooth_time_scale: float
Amount to smooth the components, using a spline in time.
If 0, the components will not be smoothed.

Comment on lines +136 to +154
# def _clean_vectors(self):
# """Remove time polynomial from other vectors"""
# nvec = self.poly_order + 1
# if self.focus:
# nvec += 1
# if self.segments:
# s = nvec * (len(self.breaks) + 1)
# else:
# s = nvec
#
# if s != self.vectors.shape[1]:
# X = self.vectors[:, :s]
# w = np.linalg.solve(X.T.dot(X), X.T.dot(self.vectors[:, s:]))
# self.vectors[:, s:] -= X.dot(w)
# # Each segment has mean zero
# self.vectors[:, s:] -= np.asarray(
# [v[v != 0].mean() * (v != 0) for v in self.vectors[:, s:].T]
# ).T
# return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can still keep this function in case we want to remove the temporal trend for other components. But it isn't necessary with the current API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-to-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants