Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: tensorize the interpolation codes (more general) #251

Merged
merged 25 commits into from
Sep 19, 2018

Conversation

kratsg
Copy link
Contributor

@kratsg kratsg commented Sep 9, 2018

Description

The overarching theme for this PR is to get the interpolation codes to be tensorizable. This is one of the small pieces needed for #231 and we'll be doing this through a series of smaller PRs to break this up a bit more.

Related: #231, #246, #219.

A notebook is added describing the process of tensorizing these functions to aid with future understanding.

Additionally, I've added a (first draft of a) neat decorator in utils.py that tensorizes the inputs to the interpolator functions if they're not tensorized.

Checklist Before Requesting Approver

  • Tests are passing
  • "WIP" removed from the title of the pull request

@kratsg kratsg mentioned this pull request Sep 9, 2018
2 tasks
@matthewfeickert matthewfeickert added the feat/enhancement New feature or request label Sep 9, 2018
@coveralls
Copy link

coveralls commented Sep 9, 2018

Coverage Status

Coverage decreased (-0.09%) to 97.345% when pulling 5b56bdd on update/newInterpCodes into 725aa63 on master.

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Sep 9, 2018

just dumping here what we want to ultimately do: (this shapes can be more jagged, but for now we can assume that it's as nice as we want it to (e.g. MBJ is very regular)

(writing this in a jagged way, reminded me again of awkward-array, maybe this is a good/interesting test-case for a-a itself (if not, please ignore :-p @jpivarski ) would the kitchen_sink function be easily implementable in a-a?)

def kitchen_sink(histogramssets,alphasets):
    all_results = []
    for histoset, alphaset in zip(histogramssets,alphasets):
        all_results.append([])
        set_result = all_results[-1]
        for histo in histoset:
            set_result.append([])
            histo_result = set_result[-1]
            for alpha in alphaset:
                alpha_result = []
                for down,nom,up in zip(histo[0],histo[1],histo[2]):
                    delta_up = up - nom
                    delta_down = nom - down
                    if alpha > 0:
                        delta =  delta_up*alpha
                    else:
                        delta =  delta_down*alpha
                    v = nom + delta
                    alpha_result.append(v)
                histo_result.append(alpha_result)
    return all_results


histogramssets = [ 
    [#all histos affected by syst1 
        [#sample 1, syst1
            [10,11,12],
            [15,16,19],
            [19,20,25]
        ],
        [#sample 2, syst1
            [10,11],
            [15,16],
            [19,20]
        ]
    ],
    [#all histos affected by syst2
        [#sample 1, syst2
            [10,11,12,13],
            [15,16,19,20],
            [19,20,25,26]
        ],
    ]
]
alphasets = [
    [-1,0,1], #set of alphas for which to interpolate syst1 histos
    [0,2] #set of alphas for which to interpolate syst2 histos
]
kitchen_sink(histogramssets,alphasets)

@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 12, 2018

As @kratsg has pointed out, MXNet doesn't currently support einsum though there is a PR out on it. Until that PR is in I would suggest that we disable all MXNet support by commenting out or removing the invocations.

Please don't remove the actual files as it would be nice to easily add MXNet back in once support is added by the community.

This is now Issue #256.

@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 12, 2018

To get a request/whinge that will come up later in now, can we please rename _kitchensink_<X> functions? That name doesn't really convey any helpful information.

To be slightly more helpful then just complaining, an example of names (not saying they should be used) would be _kitchensink_looper_interpolate_systematics, _kitchensink_code0_interp_code0_linear, _kitchensink_code1_interp_code1_exponential. To be clear, these are just suggestions that I came up with after scanning the code. A more careful reading may reveal these to be bad suggestions.

@kratsg
Copy link
Contributor Author

kratsg commented Sep 14, 2018

To be slightly more helpful then just complaining, an example of names (not saying they should be used) would be _kitchensink_looper_interpolate_systematics, _kitchensink_code0_interp_code0_linear, _kitchensink_code1_interp_code1_exponential. To be clear, these are just suggestions that I came up with after scanning the code. A more careful reading may reveal these to be bad suggestions.

Note that the kitchensink codes are the poorly optimized versions of the actual interpcodes that I added from @lukasheinrich which are more optimized. The idea is that the slow, loop manually and calculate, is the poor version that will always be the correct calculation -- and we just want the tensorizable calculations to match them (which makes testing significantly easier).

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Sep 14, 2018

my latest stab at it

these should be equivalent. tested with notebook that generate random histograms with variable shapes (padded with NaN)

def kitchen_sink(histogramssets,alphasets):
    all_results = []
    for histoset, alphaset in zip(histogramssets,alphasets):
        all_results.append([])
        set_result = all_results[-1]
        for histo in histoset:
            set_result.append([])
            histo_result = set_result[-1]
            for alpha in alphaset:
                alpha_result = []
                for down,nom,up in zip(histo[0],histo[1],histo[2]):
                    delta_up = up - nom
                    delta_down = nom - down
                    if alpha > 0:
                        delta =  delta_up*alpha
                    else:
                        delta =  delta_down*alpha
                    v = nom + delta
                    alpha_result.append(v)
                histo_result.append(alpha_result)
    return all_results

def new_kitchen_sink(histogramssets,alphasets):
    histogramssets = np.asarray(histogramssets)
    alphasets      = np.asarray(alphasets)

    allset_all_histo_deltas_up = histogramssets[:,:,2] - histogramssets[:,:,1]
    allset_all_histo_deltas_dn = histogramssets[:,:,1] - histogramssets[:,:,0]
    allset_all_histo_nom = histogramssets[:,:,1]
    
    allsets_all_histos_alphas_times_deltas_up = np.einsum('sa,s...u->s...au',alphasets,allset_all_histo_deltas_up)
    allsets_all_histos_alphas_times_deltas_dn = np.einsum('sa,s...u->s...au',alphasets,allset_all_histo_deltas_dn)
    allsets_all_histos_masks = np.einsum('sa,s...u->s...au',alphasets > 0,np.ones_like(allset_all_histo_deltas_dn))
    
    allsets_all_histos_deltas = np.where(allsets_all_histos_masks,allsets_all_histos_alphas_times_deltas_up, allsets_all_histos_alphas_times_deltas_dn)
    allsets_all_histos_noms_repeated = np.einsum('sa,shn->shan',np.ones_like(alphasets),allset_all_histo_nom)
    set_results = allsets_all_histos_deltas + allsets_all_histos_noms_repeated
    return set_results

better.zip

@lukasheinrich
Copy link
Contributor

this is without ellipses

def new_kitchen_sink(histogramssets,alphasets):
    histogramssets = np.asarray(histogramssets)
    alphasets      = np.asarray(alphasets)

    allset_all_histo_deltas_up = histogramssets[:,:,2] - histogramssets[:,:,1]
    allset_all_histo_deltas_dn = histogramssets[:,:,1] - histogramssets[:,:,0]
    allset_all_histo_nom = histogramssets[:,:,1]
    
    #x is dummy index
    
    allsets_all_histos_alphas_times_deltas_up = np.einsum('sa,sxu->sxau',alphasets,allset_all_histo_deltas_up)
    allsets_all_histos_alphas_times_deltas_dn = np.einsum('sa,sxu->sxau',alphasets,allset_all_histo_deltas_dn)
    allsets_all_histos_masks = np.einsum('sa,sxu->sxau',alphasets > 0,np.ones_like(allset_all_histo_deltas_dn))
    
    allsets_all_histos_deltas = np.where(allsets_all_histos_masks,allsets_all_histos_alphas_times_deltas_up, allsets_all_histos_alphas_times_deltas_dn)
    allsets_all_histos_noms_repeated = np.einsum('sa,sxu->sxau',np.ones_like(alphasets),allset_all_histo_nom)
    set_results = allsets_all_histos_deltas + allsets_all_histos_noms_repeated
    return set_results
histogramsets = [
    [
        [
            [0.5],
            [1.0],
            [2.0]
        ]
    ]
]

alphasets = [
    [-2,-1,0,1,2]
]

new_kitchen_sink(histogramsets,alphasets)[0]

gives

array([[[0. ],
        [0.5],
        [1. ],
        [2. ],
        [3. ]]])

@lukasheinrich
Copy link
Contributor

for the record a notebook on how we arrived at this highly vectorized code
StepByStep.ipynb.zip

pyhf/interpolate.py Outdated Show resolved Hide resolved
@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 14, 2018

Note that the kitchensink codes are the poorly optimized versions of the actual interpcodes that I added from @lukasheinrich which are more optimized. The idea is that the slow, loop manually and calculate, is the poor version that will always be the correct calculation -- and we just want the tensorizable calculations to match them (which makes testing significantly easier).

Yup, that was clear from the presence of a do_optimal bool.

My point is that the name "kitchensink" conveys that "this does everything" but that conveys no useful information to what they do. If the idea is that the _kitchensink<X> functions are slow checks then why not name them something along the _slowcheck<X> instead? Or am I missing some reason why this should be informative naming?

To be clear, this isn't pressing. This is just something I have unnecessarily strong and verbose opinions on (that I'd like to think are loosely held enough to be swayed). :P

@kratsg
Copy link
Contributor Author

kratsg commented Sep 14, 2018

My point is that the name "kitchensink" conveys that "this does everything" but that conveys no useful information to what they do. If the idea is that the _kitchensink<X> functions are slow checks then why not name them something along the _slowcheck<X> instead? Or am I missing some reason why this should be informative naming?

To be clear, this isn't pressing. This is just something I have unnecessarily strong and verbose opinions on (that I'd like to think are loosely held enough to be swayed). :P

Yeah, I'll rename these functions more clearly in a bit.

@matthewfeickert
Copy link
Member

From taking a quick scan over the notebook it is going to be super helpful. 👍 Super great! :)

@kratsg kratsg changed the title [WIP] Feature: tensorize the interpolation codes (more general) Feature: tensorize the interpolation codes (more general) Sep 17, 2018
@kratsg
Copy link
Contributor Author

kratsg commented Sep 17, 2018

Currently, the tests that fail on pytorch/tensorflow are due to issues with single float precision versus double float precision versus what python does. The interpcode1 is fine (non-linear, a**b) but interpcode0 shows issues which @lukasheinrich and I tracked down to the subtraction of something like nom - up or so on.

@matthewfeickert
Copy link
Member

Wow, this is both a massive and impressive PR! Super job to both of you guys! 👍

I don't think I have any changes to request but

Currently, the tests that fail on pytorch/tensorflow are due to issues with single float precision versus double float precision versus what python does

what do we do with regards to that^?

self.at_plus_one[channel['name']][sample['name']]
]
]
]), tensorlib.astensor([tensorlib.tolist(pars)])
Copy link
Contributor

@lukasheinrich lukasheinrich Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to we call astensor(tolist(..)) here? this should be a no-op (if it's not it's an issue we should fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets factored out soon enough. It's a bandage for this inconsistency:

pars = [1.0]
np.array([pars]) == [[1.0]]
tf.Tensor([pars]) == [[1.0]]
torch([pars]) == [1.0]

self.at_plus_one[channel['name']][sample['name']]
]
]
]), tensorlib.astensor([tensorlib.tolist(pars)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets factored out soon enough. It's a bandage for this inconsistency:

pars = [1.0]
np.array([pars]) == [[1.0]]
tf.Tensor([pars]) == [[1.0]]
torch([pars]) == [1.0]

@kratsg
Copy link
Contributor Author

kratsg commented Sep 18, 2018

Currently, the tests that fail on pytorch/tensorflow are due to issues with single float precision versus double float precision versus what python does

what do we do with regards to that^?

There's a couple of ways we can go.

  • relax the approximation checks for tensorflow/pytorch since numpy passes at a stricter level. This works because we already check the consistency of the backends with each other in a different set of tests, so we should be reasonably confident that it will be fine here
  • force numpy to use single float precision (this requires casting everything to float32 dtype and making that configurable in the numpy backend)
  • force the slow interpolator to do calculations on 32-bit numbers for pytorch/tensorflow and 64-bit for numpy.

The main issue we're seeing is that when we calculate the results of the slow interpolation (in vanilla python) and compare against the backend, we're comparing python's double/arbitrary float precision against the single precision of the pytorch/tensorflow backends.

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Sep 18, 2018

once we get this merged and #269 merged we can define CombinedInterpolators. I'm attaching a notebook, that adheres to the signature defined in #269 and uses the interpolators in this PR.

IntegrateNoCube.ipynb.zip

It would be very good though if we could get to the bottom of the discrepancy first.

I'd be fine if we at least know a setting in which the to are very close together (as @kratsg e.g. by forcing single precision in numpy) so that we are confident that we understand where it is coming from.

this is the timing factor 12-15x speed up

screenshot

and the interpolation is now the fastest part, the slowness is in the extraction of the results in the python structures, which are not as deeply nested and can be optimized later

(tested on MBJ)

@kratsg
Copy link
Contributor Author

kratsg commented Sep 18, 2018

Currently, the tests that fail on pytorch/tensorflow are due to issues with single float precision versus double float precision versus what python does

what do we do with regards to that^?

There's a couple of ways we can go.

  • relax the approximation checks for tensorflow/pytorch since numpy passes at a stricter level. This works because we already check the consistency of the backends with each other in a different set of tests, so we should be reasonably confident that it will be fine here
  • force numpy to use single float precision (this requires casting everything to float32 dtype and making that configurable in the numpy backend)
  • force the slow interpolator to do calculations on 32-bit numbers for pytorch/tensorflow and 64-bit for numpy.

The main issue we're seeing is that when we calculate the results of the slow interpolation (in vanilla python) and compare against the backend, we're comparing python's double/arbitrary float precision against the single precision of the pytorch/tensorflow backends.

@matthewfeickert @lukasheinrich The last commit here 4e23366 is representing my perspective on this:

  • numpy can calculate using double float precision which mirrors how python normally does the calculations
  • tensorflow/pytorch can only calculate using single float precision which does not mirror how python normally does the calculations

The goal of the slow interpolation code (effectively python-only loops) is to make sure that the fast interpolation code (tensorized form) returns the same results. Therefore, the slow interpolation code should be reduce to single-float precision in the situation when we only have single-float precision.

This will make the tests pass. It can be an issue we file later for a feature to configure the numpy backend as single-float precision, and that simply means we can pass in dtype=np.float32 in all the functions for the numpy backend. For now, I chose not to do this, since I think that's not what people will want to expect -- especially because ROOT does the calculations in double-float precision and if we need to validate against it, use NumPy which has double-float precision.

Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the description here I'm in agreement. So LGTM.

@lukasheinrich
Copy link
Contributor

I see @kratsg. so do I understand correctly, that we did not need to loosen any tolerances and forcing the slow implementation to use single precision is sufficient to make the tests pass?

@kratsg
Copy link
Contributor Author

kratsg commented Sep 18, 2018

I see @kratsg. so do I understand correctly, that we did not need to loosen any tolerances and forcing the slow implementation to use single precision is sufficient to make the tests pass?

Yes!

@kratsg kratsg merged commit 421ce3f into master Sep 19, 2018
@kratsg kratsg deleted the update/newInterpCodes branch September 19, 2018 00:18
@lukasheinrich lukasheinrich mentioned this pull request Sep 19, 2018
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants