-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spectrum as a read-only class obtainable from an imageset #201
Conversation
Details: - Add Spectrum C++ object that implements methods get_energies (eV), get_weights (unitless), and get_weighted_wavelenth (angstroms). Spectrum can only be set from constructor. - Add get_spectrum method to Format and FormatMultiImage - Add get_spectrum method to ImageSet. Does not cache it like it does for detector, beam, etc. - Implement get_spectrum in NeXus. Specifically implements nexusformat/definitions#717, using optional variants to specify a calibrated wavelength and a spectrum.
This is a draft PR that re-implements #158. Note that @dagewa's comment that suggests adding experiment.spectrum doesn't work as then the spectrum would have to be serialized, which defeats the whole point of this new version. TODO:
|
Example of use on a JF16M dataset with per-shot spectrum and a calibrated per-shot wavelength that differs from the weighted average of the specta:
Output
|
Fixed |
Ok, ready for comments, including re: the race condition. |
Comments ☝️ are not a review per se more a set of observations as I looked through the code, not tested anything. Two questions:
|
BTW, FWIW I am much less worried about this PR than the previous one, since this does not include serialization so if we need to modify things in the future that remains a possibility. The previous iteration - once merged - would have set in stone our model of multiwavelength beam data. With this iteration any subsequent changes need only be fixed up here, in The polymorphism question is real though, as presumably at some stage you'll want to use these new data? When doing so e.g. in prediction code (if in dials) considering extension to (i) rotation data and (ii) Gaussian pink beam would be welcome. |
* @param energies The spectrum energies (eV) | ||
* @param energies The spectrum weights (unitless) | ||
*/ | ||
Spectrum(vecd energies, vecd weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strikes me while defining this as a type that adding convenience methods to give minimum / maximum energy for prediction purposes could be helpful i.e. where the weights are measurably above 0 (I appreciate that there is some judgement here)
Motivation: for purposes of computing predictions means we have a window to predict through, e.g. if the spectrum in question is very narrow, rather than min(energies) -> max(energies)
which could be a lot wider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I'd need an algorithm laid out before I implemented something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, algorithm -
- compute CDF from spectrum
- identify the bin below the 1% point
- identify the bin above the 99% point
There you go - a pretty plausible min, max window with nothing more than floating adds and comparisons (i.e. super cheap in C++)
If we add a Gaussian constructor can use the same logic to define 3σ window or something, but gives a consistent and reasonable definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model/spectrum.h
Outdated
return weights_; | ||
} | ||
|
||
double get_weighted_wavelength() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we also had get_weighted_wavelength_variance
then one could almost model a Gaussian beam with this 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please can has get_weighted_energy_eV
and it's variance
equivalent as part of the API - we may not use this but we should probably have it.
This could be a simple refactor as keeping most of the code in this current method as get_weighted_energy_eV()
and then converting the result to a wavelength before returning, for consistency.
model/spectrum.h
Outdated
summed_weights += weights_[i]; | ||
} | ||
DXTBX_ASSERT(weighted_sum > 0 && summed_weights > 0); | ||
return 12398.4187 / (weighted_sum / summed_weights); // eV per Å conversion factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this magic number worth defining somewhere central as a constant value? (see e.g. deg_as_rad etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used this factor hardcoded all over the place. Would love it as a constant somewhere. Maybe a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess what? scitbx::constants::factor_ev_angstrom
🙂
Though this evaluates to a slightly different constant value to your one -
//! Factor for keV <-> Angstrom conversion.
/*!
http://physics.nist.gov/PhysRefData/codata86/table2.html
h = Plank's Constant = 6.6260755e-34 J s
c = speed of light = 2.99792458e+8 m/s
1 keV = 1.e+3 * 1.60217733e-19 J
1 A = Angstrom = 1.e-10 m
E = (h * c) / lamda;
Exponents: (-34 + 8) - (3 - 19 - 10) = 0
*/
static const double
factor_kev_angstrom = 6.6260755 * 2.99792458 / 1.60217733;
static const double
factor_ev_angstrom = 6626.0755 * 2.99792458 / 1.60217733;
=>
>>> 6626.0755 * 2.99792458 / 1.60217733
12398.424468024265
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scitbx is out of date. The Planck constant has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question was asked whether we can expose those numbers in Python. They are readily available (and maintained) in
>>> import scipy.constants
>>> scipy.constants.Planck
6.62607015e-34
>>> scipy.constants.elementary_charge
1.602176634e-19
amongst many others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that scipy is universally a dependency of dxtbx? If so, works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but we can just add it. There is precedence in other packages.
/** A class to represent a 2D spectrum. */ | ||
class Spectrum { | ||
public: | ||
/** Default constructor: initialise all to zero */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're talking constructors, one which takes a minimum and maximum energy window, a central energy and a width to compute a Gaussian representation (see comments elsewhere) would probably be enough to stop me nagging on the topic of Polymorphism 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation of Spectrum uses energy channels, not bandpass. I'm interested in this idea, but I think I need more detail and perhaps a more complete API spec. It sounds like an expansion of the Spectrum class, not necessarily a polymorph, which could be done with a new constructor as you say. Could that be separate work from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detail and API spec I can do; and as slight expansion of this class is doable => indeed no need for polymorphism. I would like to give it a chance to get the changes in here as I think in the long game it will make it more useful in one go. Min, max energy window at the very least seems like something useful to have as part of this PR, since this is what we will be using to decide what to predict on the images.
BTW, re: min / max - was thinking the narrowest interval which contains say 98% of the total flux so it is well defined for measured and Gaussian input spectra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, proposed change pushed in 3269bf5 with test
double get_weighted_wavelength() const { | ||
if (energies_.size() == 0) | ||
return 0; | ||
double weighted_sum = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a static data table, worth computing these in the constructor and saving the values? That way if (just sayin') someone were to pass in nominal values into some theoretical constructor then they could be recovered without having to create a data array to recompute the values from. Would also mean calling get_weighted_wavelength
could be treated compute wise as get_wavelength()
without excessive compute overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in our use case we aren't even using get_weighted_wavelength at all, it's just a convenience function. We are using the channels and energies directly. I would rather not compute the weighted sums if they won't be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna keep this discussion open, I think there is a lot of merit in computing some metrics in the constructor which would be real useful down the line when it comes to treating the data in other ways (e.g. pink beam)
My comment might not have been clear as it was not a proposal to serialise the spectrum as |
I think that is not incompatible with what is happening here - maybe it is just a slightly different perspective. |
@phyy-nx I've made a lot of comments on here, I acknowledge. I do however feel like this is potentially a lot closer to something which covers the bases. I'd be very happy to have a chat about these topics / comments etc. to understand your feelings on the suggestions. |
Wondering about the appearance of "new Spectrum(" in the spectrum.cc code.
"new" must be balanced with "delete" or else a memory leak ensues. Unless
the "new Spectrum" is used to construct a shared pointer, which does not
seem to be the case. Aaron, could you stress-test spectrum.cc to check for
memory? -- Nick
…On Tue, Jul 28, 2020 at 8:55 AM Aaron S. Brewster ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In model/boost_python/spectrum.cc
<#201 (comment)>:
> + return true;
+ }
+ };
+
+ template <>
+ boost::python::dict to_dict<Spectrum>(const Spectrum &obj) {
+ boost::python::dict result;
+ result["energies"] = obj.get_energies();
+ result["weights"] = obj.get_weights();
+ return result;
+ }
+
+ template <>
+ Spectrum *from_dict<Spectrum>(boost::python::dict obj) {
+ Spectrum *s = new Spectrum(
+ boost::python::extract<vecd>(obj["energies"]),
Yah, I grepped cctbx for this and found an instance I liked. I think the
clarity is nice.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#201 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ24VTFEGHHHR5MCQZAS23R53YGLANCNFSM4PJJ2NFQ>
.
|
Aaron,
My mistake, I guess the "manage_new_object" bit takes care of the memory
leak?
Nick
…On Tue, Jul 28, 2020 at 9:04 AM Nicholas Sauter ***@***.***> wrote:
Wondering about the appearance of "new Spectrum(" in the spectrum.cc code.
"new" must be balanced with "delete" or else a memory leak ensues. Unless
the "new Spectrum" is used to construct a shared pointer, which does not
seem to be the case. Aaron, could you stress-test spectrum.cc to check for
memory? -- Nick
On Tue, Jul 28, 2020 at 8:55 AM Aaron S. Brewster <
***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In model/boost_python/spectrum.cc
> <#201 (comment)>:
>
> > + return true;
> + }
> + };
> +
> + template <>
> + boost::python::dict to_dict<Spectrum>(const Spectrum &obj) {
> + boost::python::dict result;
> + result["energies"] = obj.get_energies();
> + result["weights"] = obj.get_weights();
> + return result;
> + }
> +
> + template <>
> + Spectrum *from_dict<Spectrum>(boost::python::dict obj) {
> + Spectrum *s = new Spectrum(
> + boost::python::extract<vecd>(obj["energies"]),
>
> Yah, I grepped cctbx for this and found an instance I liked. I think the
> clarity is nice.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#201 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADQ24VTFEGHHHR5MCQZAS23R53YGLANCNFSM4PJJ2NFQ>
> .
>
|
- API changes: get_energies->get_energies_eV, add get_weighted_energy_eV - Attempt to eliminate race condition by adding read_models method. - Tidy - Test more wavelengths in test_spectrum.py - Remove unused headers
Right. This code came from beam.cc. To be sure, I ran a quick stress test just now and saw no leak. Done w/ code review for now. @graeme-winter can you review my responses? I've resolved conversations where I think I addressed your feedback, feel free to unresolve them and respond if needed. |
The def get_thing(self): function should really be a @property
def thing(self): property. Naturally, the introduction of |
@phyy-nx your 5pm deadline for me to implement all proposed changes is proving impossible because I have a workshop tomorrow which has written off ~ 5 working hours. Am working on it but I would be very disappointed if me missing the deadline means this gets merged as is. |
Computes: emin_ev = 1% on CDF and emax_ev = 99% on CDF so 98% or more of the spectrum is within the range. Added a test for this, which has a computed approximation to a real SwissFEL spectrum (so could be useful in other tests) Also ran clang-format on the C++ files I touched
I'll wager a chocolate bar or so that this will make zero measurable difference to the run time performance. Doing this as a prelude to computing variance
Recover the weighted variance around the weighted mean. Add test which makes a clean Gaussian and recovers the same parameters.
OK, sitrep - am not there yet but I have enough of the API in here that I wanted that all I am really short of is a different constructor, which I can add later on (i.e. construct with known Gaussian-like beam parameters and be able to recover these rather than deriving them as in the example here) In my modified version it would not probably have a table for the full spectrum but that would not matter, since it would return cached values anyways. I have made some changes which pre-compute a couple of values in the constructors however these calculations are:
So I am very confident that in real life you will never see the changes. The search for the bandwidth region could be cached from constructor rather than at first run but I am not sure this is worthwhile. I have made some assumptions in these calculations that the spectrum is background subtracted. I don't know if that is a good assumption to make. Anyhow, I think the API this has right now is what I was hoping for, and I can see it being useful. Cleaning up the correct physical constants would be nice. |
Would welcome a wider code review now to make sure we all have got our sums right and people are happy with the implied API changes in here. |
@phyy-nx ev -> eV 👍 |
Changes all look good to me. I'd like to fix the constants but it's not a stopper. They need to be fixed all over. Seems like a good candidate for a separate issue. Since we are under time pressure here I'm going to merge this this morning. Thanks for the changes. |
OK, since these have all gone through really quick, I consider this to be an alpha API which can and will change - I don't want to see it baked in in places which can never be fixed or revised - @phyy-nx please could you agree to this? |
Yup. Alpha API. Subject to change as needed. |
We have a mechanism now in the dials bootstrap to explicitly specify branches to use for individual repositories. That sounds like it’d be worth you adding to the cctbx bootstrap so changes like this don’t need to be rushed through without proper consideration or discussion. |
While I appreciate the need for stability in the code and having good review cycles, we need to be able to work on master, have bootstrap point to master, and move things through quickly on occasion. Just the way things go sometimes. Hopefully not often though. |
I don't understand this response. Branches are a first-class feature of git. Working this way is a deliberate choice, not a technical limitation, as is forcing this through on one of the busiest weeks we've had in over six months. Even ignoring the ability to just check out a different branch (as far as I know all your deployments use git still), the dials bootstrap I mentioned would work like this, for instance:
Unfortunately I think this came across more patronising than you intended, but personally I feel that this mode of operation classes as "frequent" rather than "not often". |
Details: