Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming of statistical outputs #125

Closed
tyarkoni opened this issue Mar 21, 2019 · 24 comments · Fixed by #131
Closed

Naming of statistical outputs #125

tyarkoni opened this issue Mar 21, 2019 · 24 comments · Fixed by #131

Comments

@tyarkoni
Copy link
Collaborator

Currently we're using the nistats names for the output images produced by model fitting: ['z_score', 'stat', 'p_value', 'effect_size', 'effect_variance']. Since fitlins is theoretically capable of supporting multiple backends, there's no particular reason we need to adhere to these; we should probably standardize on a set of names that we can reuse everywhere. My take on the names:

  • z_score and p_value are clear and self-explanatory.
  • stat: From a quick glance, it looks like this is either t or F, depending on the analysis. I think this is problematic, because you can't tell just from the filename what it represents. I suggest renaming maps to t_stat or F_stat as appropriate.
  • effect_size: this is technically accurate, but I dislike it because people tend to use the term "effect size" when talking about standardized effects, so there's a risk of confusion. I suggest beta, par_est, parameter_estimate, or just effect.
  • effect_variance: I'm ambivalent about this one; I think just variance would probably be fine. It's possible that there will be other things we want to call variance, but I think the surrounding context should always make the semantics relatively clear (e.g., an estimate of between-subject variance at a higher level wouldn't have any subject or run id attached).

Tagging @nicholst, @bthirion, @cmaumet, @jbpoline for potential input.

@tyarkoni
Copy link
Collaborator Author

I guess a related question is what happens to the underscores when files are saved. I assume we want these to be BIDS-compliant, so are we just turning p_value into, e.g., pvalue? If so, maybe we should just drop the underscores everywhere.

@effigies
Copy link
Collaborator

I'd suggest:

  • effect
  • variance
  • zscore
  • tstat/fstat (I would prefer to keep capitals out)
  • pval or pvalue

I'm disinclined toward beta simply because we don't really distinguish between parameter estimates and contrasts of parameter estimates anymore. Betas are just identity contrasts.

@tyarkoni
Copy link
Collaborator Author

Agree with all of the above. We could probably further simplify and go with z, t, and F (I would also prefer not to use caps, but the [strong] convention is to use F for F-tests, and I think this has the potential to create confusion if we use f).

@effigies
Copy link
Collaborator

I could go that way, but it might be good to run this one by the larger BIDS community before staking claim to 3 single-character suffixes.

@bthirion
Copy link

Sorry, but I'd rather avoid 'z', 't', 'F': my experience is that explicit naming shuoold be preferred.
i also find beta a bit statistician jargon.

The reason to keep 'effect_variance' rather than 'variance' is that people may confuse the residual's variance with that of the contrast; this being said, I'm fine with changing, especially if this becomes a community standard.

@jbpoline
Copy link

jbpoline commented Mar 21, 2019

We could take those from STATO:

for "white spaces" or "underscore" I am not too sure of what are the possible options, if we cannot use '-' or '_' then Camel case ?

@nicholst
Copy link

I'm with @bthirion for avoiding beta for the sake of clarity.

Also, I'm not familiar with fitlin... do you need to separate store regression coefficients and contrast estimates?

"effect size" is fraught because (in my experience) psychologists read "effect size" as "standardised effect size". Whatever we do we should be prepared to define:

  • R2 - coefficient of variation
  • d - Cohen's d - standardised effect size

"variance" to me is also ambiguous... Is it residual error variance or variance of a parameter estimate?

@jbpoline
Copy link

jbpoline commented Mar 21, 2019

the advantage of taking the STATO terminology is that you get the definition for free :

I could not find something for "standardised effect size" - I suppose it could be added. There are things like

but that would need to be generalized ...

@tyarkoni
Copy link
Collaborator Author

Okay so it sounds like we have convergence on pval, zscore, tstat, and Fstat. We have near-consensus on effect (JB, for the reason both @nicholst and I identified above, I don't think "effect size" is ideal). Regarding variance, I don't think I see much potential for confusion between model residual error and variance of a PE. The latter will presumably have the effect name it corresponds to, whereas the former will either have nothing, or something like "model" or "residual".

In general, I don't think we need to be very concerned about cases where people in a rush accidentally pick out the wrong map based on the filename and then catch themselves later (lord knows this already happens enough with other packages); the concern is more about someone who carefully reads the filename being able to understand roughly or exactly what the map represents.

@nicholst I believe (but @effigies can correct me) that we don't maintain any distinction between PE maps and contrast maps. The only outputs at each level of analysis are those defined as contrasts. If you don't define contrasts, you get nothing.

@tyarkoni
Copy link
Collaborator Author

@jbpoline I'm not sure the STATO terminology works for us, if only because it seems very verbose. Do we really want things like ses-movie_task-movie_run-1_bold_contrast-faceConfidence_StandardizedMeanDifference.nii.gz? I'm not saying we don't, necessarily, it just feels quite user- (and developer) unfriendly.

@jbpoline
Copy link

jbpoline commented Mar 22, 2019

I actually did not suggest StandardizedMeanDifference :)
but tstat or tstatistics and the others should be ok, not sure a few characters make a big difference here.
I agree also that effect size is not ideal - but at least if we have a place where it is defined that alleviates some of the issue. I like parameter estimate (again, a bit long ? but I am not sure we should care), STATO even has contrast estimate
I think whatever we choose, giving it a link to a proper definition would be good !

@adelavega
Copy link
Collaborator

I think a 1:1 shorthand for STATO terms might work. I have a strong preference for shorter filenames.

@jbpoline
Copy link

I also prefer shorter name - but if we have already quite a bit of information coded in the filename (sub-XX_ses-XX_.... ) so not too sure this will make a big difference. But yes, shorter is better if we have the 1:1 mapping

@effigies
Copy link
Collaborator

How do people feel about adding a test-<t|F|chi2|...> type of entity? A generic stat suffix would replace fstat and tstat (or F and t, depending on what our final decision was...) and this might also make the zscore and other suffixes more informative as individual filenames. As we've mapped this out, you would kind of need to reference the associated statistical filename to quickly grasp what kind of effect map you're looking at.

I want to rope in @Gilles86 and @JohnGriffiths, as there was some discussion at the recent computational models meeting about statistical outputs, to see if they have any thoughts on this or the discussion as a whole.

Obviously nothing's set in stone, but identifying paths that are going to be problematic when we start looking outside our narrow use case seems prudent.

@tyarkoni
Copy link
Collaborator Author

tyarkoni commented Apr 19, 2019

I think a stat suffix is a good idea, but I think that should subsume all existing statistic suffixes (e.g., _zscore, _pvalue...), not just t and F stats. By the same token, I would want to have the entity named stat rather than test, as the latter is pretty limiting. I can see an argument for wanting to separately encode the procedure that generated a statistic from the statistic itself (e.g., a p-value can come from a t-test, F-test, or any number of other things), but in practice I think that's best handled via the desc field, or another new freeform field like proc or model, as the space of common procedures is just too large to reasonably encode in a limited vocabulary.

Also, in most cases the interpretation of the statistic itself doesn't really depend on knowing the procedure. E.g., one-sample and two-sample t-tests can be thought of as different procedures (or as two minor variants of a GLM analysis), but the resulting/derived t, z, p, and stderr maps have essentially the same meaning regardless.

I guess what I'm proposing is something like sub-01_task-nback_run-1_contrast-parametricload_desc-onesampttest_stat-p_statmap.nii.gz, which would be read as "a map of p-values obtained from a one-sample t-test on the parametric load contrast for subject 1, run 1, during the n-back task."

EDIT: And just to be clear, on this proposal, the standardized names proposed above would all move from suffix to the stat entity—i.e., *_stat-zscore_statmap.nii.gz rather than *_zscore.nii.gz.

EDIT2: One other reason I prefer this approach is that I think we should work to keep the number of valid suffixes created BIDS-wide to a minimum. IMO extensions that need to introduce a whole bunch of new distinctions that all have some common logical grouping (like statistic type) should get one suffix and a new entity rather than a bunch of suffixes and no new entities.

@cmaumet
Copy link

cmaumet commented Apr 19, 2019

Sorry for jumping in just now... But I'm a little bit confused as to what is the goal here.

Are we defining:

  1. the names of the output files for fitlins (in which case I think we have a lot of freedom) -or-
  2. the BIDS specification for statistical outputs (in which case the discussion is much broader and we should spend some time thinking how it fits w/ other standards (NIDM-Results) but also bring the discussion up to the BIDS mailing list / BEP discussion channel)

?

I think this is an important distinction as it has consequences on how much we should think about it and more practically on how/where to have this discussion.

@effigies
Copy link
Collaborator

@cmaumet Mostly 1, but it seems like a good idea to do 1 with 2 in the back of one's mind, to improve the odds that moving to the standardization stage will not require massive rearchitecting. I'm taking as a model that we made a lot of pragmatic decisions in fMRIPrep that later informed the derivatives spec meeting, but were not binding on that group. It turned out that some changes were needed, but they were pretty minor.

One of the results of the computational model meeting was the notion that we should have a statistical outputs BEP, but I do not believe that the drafting has begun. You can see the sketch of a few of the sorts of outputs we were discussing in the document I linked above.

@effigies
Copy link
Collaborator

@tyarkoni I get that p-values and z-scores are kind of test-agnostic, but still having the context of the statistic they're describing seems useful, and your expansion seems overly verbose for what's needed here. What about:

sub-01_task-nback_run-1_contrast-parametricload_stat-t_estimate.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-t_variance.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-t_stat.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-t_pvalue.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-t_zscore.nii.gz

Or what would be your proposal for the five? I've seen them for p and z.

@tyarkoni
Copy link
Collaborator Author

@effigies I think this relates to @cmaumet's question. If we see this is as just a fitlins convention, and we don't expect there to ever be anything more than these five suffixes, that would be fine. But going forward I can certainly see a use case for wanting other maps (e.g., standard errors, standardized effect size measures like d or r, etc.). The thing I think we desperately want to avoid is having to have a (large) controlled vocabulary of statistical suffixes. Whereas having a new uncontrolled entity like stat is not a big deal.

I'm less concerned about verbosity (well, within reason) than about complexity and clarity for the spec and tools that implement the spec. Having a single statmap suffix is nice because it immediately orients the receiving tool that it needs to speak stats in order to handle the file (and can be ignored if desired). Developers, similarly, don't have to know the spec well to decide that _statmap is not something they care about (whereas _estimate and _variance are generic enough that it's not hard to imaging semantic collisions or confusions across different parts of the ecosystem).

FWIW, I find your examples actually a bit misleading, in that:

  • stat-t immediately makes me think that the value of the statistic in the map is a t score (which it isn't), and not that the statistic came from a t test (renaming to test-t would be an improvement, but as soon as you add other test types, the confusion returns; e.g., would people understand test-r or test-pearson?)
  • I think it's quite confusing to have a generic _stat suffix whose meaning varies based on the originating procedure, and doesn't actually tell you what the value is. I.e., *_test-t_stat.nii.gz contains t scores and *_test-F_stat.nii.gz contains F scores, but *_test-t_pvalue.nii.gz and _test-F_pvalue.nii.gz both contain p-values. Separating into a _statmap suffix and a stat entity is unambiguous in that the suffix tells you the map contains statistical quantities, the stat tells you what those values are, and desc or some other field (or associated metadata) tells you where they came from.

@effigies
Copy link
Collaborator

Fair enough. Still unclear on what you want to see. This?

sub-01_task-nback_run-1_contrast-parametricload_stat-estimate_statmap.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-variance_statmap.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-t_statmap.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-p_statmap.nii.gz
sub-01_task-nback_run-1_contrast-parametricload_stat-z_statmap.nii.gz

@tyarkoni
Copy link
Collaborator Author

Yeah, that looks good to me. If we're allowing that this is just provisional fitlins nomenclature, and may or may not make it into a stats spec later, then I'm fine adding a test entity that currently only takes values t and F. But I'm also fine putting that in desc if we're not using it for anything else (and that might be better actually, because I think desc-Ftest reads a bit better than test-F).

@effigies
Copy link
Collaborator

In the case of an F-test, it would be:

sub-01_task-nback_run-1_contrast-parametricload_stat-F_statmap.nii.gz

I'm kind of ignoring desc for right now, though if it's important we can stipulate that it will be added/argued over before merge, as it will apply to all of the files or none, and I want to get something working before tweaking it.

@bthirion
Copy link

bthirion commented Apr 19, 2019 via email

@jbpoline
Copy link

jbpoline commented Apr 21, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants