-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: specification versioning #194
Comments
Following on to this, if we want to automate the versioning then we could add tags such as SAM:v1.4, CRAM:v3.0 etc as above and then use It's not perfect as the number of edits since the last tag doesn't apply to just that document, but its monotonically increasing and clearly demonstrates one revision as newer than another; something we cannot do with a comittish alone. |
What the SAM spec needs is an appendix describing the functionality introduced by each |
Summarising the meeting.
My proposal is that the latter two points are solved at the same time. A manually curated summary of significant changes (ie not punctuation or minor text clarifications - they're already visible in git logs) at the start of the document would also act to define the current version number. Eg for SAM it shows just how much has changed between 1.4 and 1.5, without all the commits discussing TeX formatting tweaks or tidying up of wording and footnotes. 1.5 (current)
1.4 (to March 2012).
1.3 (to April 2011)It's quite clear from all of these that the actual spec version number is only loosely descriptive of what you can expect to work for the minute details, but major things like supplementary reads are the cause of spec version bumping. The above is also culled from the first 20k of git log for SAMv1.tex, so a summary most definitely is useful to have and we don't simply want a ChangeLog file with the appropriate bits of git commit messages automatically pasted in. |
Bump on this. We have choices to discuss at the conference call.
|
I would be fine with that granularity level, but would prefer it was in the appendix instead of at the beginning of the document. If that model was accepted, we must ensure that substantial changes are added to the appendix before merging. If the PDF was also updated before merging, wouldn't that help to make the commit hash unnecessary? |
IMHO what the SAM spec needs re format version is an extra paragraph in §1 to the effect of
(We can argue about whether This along with an appendix describing the extant historical versions and their significant differences. (I am in agreement with @cyenyxe that this information is less important than the description of the format so should be in an appendix rather than at the start of the document.) I am still hunting out my branch with a draft of that paragraph and appendix; below are my notes from constructing that text. There are fewer extant version numbers than you might guess; the appendix is in reverse chronological order, viz 1.5 (23 May 2013)Added SUPPLEMENTARY flag bit; added SA:Z tag; PNEXT/RNEXT points to next read, not segment 1.4 (12 April 2011)Added "B" arrays. 1.3 (July 2010)Added CIGAR N means intron; [etc]. 1.0 (2009)Initial edition. 2009 1.0 as in 0.1.2-draft pages doc July 2010 VERSION VN:1.3; First commit of v1.3 SAMv1.tex [b351164] Nov 2010 Adding BAM description Dec 2010 CIGAR N means intron 2 March 2011 Re-add CC:Z, CP:i 2 April 2011 Add @RG-FO,KS, FZ:, apply IUPAC in MD,SEQ 29 Aug 2011 Change FZ:H to FZ:B:S 7 Sept 2011 Add BC:Z; s/fragment/segment/g Sep/Oct 2011 Added the padding guide 27 Sept 2011 Max chr length from 2^29 to 2^31; (clarify MD5) 16 Mar 2012 Peter Cock adds "Annotation dummy reads" recommendations 23 May 2013 VERSION VN:1.5; added SUPPLEMENTARY flag bit [80bc68f] 26 July 2013 JM removed v1.5-r49 from the title, added git describe instead 18 Oct 2013 Update PNEXT/TLEN from 2^29 to 2^31 (fixed Sept 2011 oops) 5 Dec 2013 Added @PG-DS 19 Dec 2013 Describe BGZF trailer blocks 28 Feb 2014 Added MC:Z mate cigar 12 Sep 2014 Add links to CSI etc 21 Nov 2014 Merge BAI stats pseudobins and n_no_coor (from April :-() 2 March 2015 Reverse flag is meaningful for unmapped reads 4 March 2015 Merge SAM 'i' range limit removal (from November :-() |
@cyenyxe wrote:
I'm not sure what you mean. The point of the commit hash on the title page is so that someone who just has a printout of the PDF can go to a git repository and find out exactly which commits have contributed to the document that they're looking at. Are you saying that if the PDF is updated on the branch, you don't need to update it again post-merge as the embedded on-the-PR-branch commit hash will be close enough? Unfortunately this fails when other changes to that spec have been made on or merged to master in the meantime. Because of that, to get a meaningful embedded commit hash, PDF updates can only take place as non-merge commits on the mainline. (This means that updating the PDF during a PR, like in some of the commits on #200, is mostly pointless and only bloats the repo.) |
There's some interesting discrepancy between when changes are made and which version they occur in. Clearly the I'm working on it anyway. |
Most of the tag changes are immaterial and could be noted in passing, if at all. For example, The significant changes that need to be listed in this Format Version History section are the ones that do affect compatibility. At a first stab, this is limited to: 1.5 introduced the SUPPLEMENTARY flag. If your file contains supplementary reads, it must identify itself as 1.5 or later; files with version numbers <= 1.4 can be assumed not to contain supplementary reads. 1.4 introduced the B array tag type. If your file contains B-arrays, it must identify itself as 1.4 or later; files with version numbers <= 1.3 can be assumed not to contain B-arrays. etc. What else is there? |
The key changes are SUPPLEMENTARY flag (1.5), B array (1.4), IUPAC sequence (1.4), maybe QNAME "*" (1.4) and FLAG no longer supporting letters (1.3). My change adds these in bold to indicate they are the primary causes of the version number being bumped. However all the other things are still important and need to go somewhere. I'm not after a full and gory "git log", but a summary saying when we introduced N as a cigar OP for example (half way through a version release) or added the ability to store padded sequence assemblies in SAM/BAM (early 1.4) is still useful information. Some of it is fluff though, so we can decide if it needs culling. |
In addition to the above, I state in my change "The key changes that caused the version number to change are shown in bold". However maybe it should be phrased as "Changes that are likely to cause compatibility issues are shown in bold". This should include the CIGAR "N" operator as old code that didn't support it would almost certainly choke or give the wrong alignments, yet it appeared half way through a version without a bump in version number so we apparently didn't consider it vital enough to update the version. More debatable, FZ:H to FZ:B may break some tools, but most programs don't decode this tag anyway so it would be very isolated. |
[Since this issue was filed, the SAM specification has gained an appendix listing historical significant changes and a clear statement in the introduction noting the version that the document describes. There remains the question of whether adding tags in this repository would be useful in general (not just for SAM) — see also related discussion in #316.] To my mind, there isn't much to be gained from tagging commits in this repository. As noted in the original issue comment, the repository contains a disparate collection of unrelated documents, so a tag would not pertain to the repository as a whole. A convention involving What information would a version tag provide for the CRAM or VCF specifications? These have separate documents for each extant major/minor format version, so presumably the best document to look at for e.g. VCF 4.2 is the master HEAD edition of VCFv4.2.tex/.pdf. So a purely version tag like Conversely, the SAM/SAMtags, htsget, and refget specifications have a single document apiece, reflecting the most up-to-date version of the format or protocol. These documents contain appendices listing the significant differences between historical format/protocol versions. The current version of SAM is 1.6. On the face of it, it would be useful to have a tag pointing at e.g. SAM version 1.4, but this would have different meanings for different audiences. General readers would want to be pointed at the best document for SAM 1.4, which would be the last commit before it was bumped to VN:1.5. Maintainers would want to record the initial revision that can be considered to be SAM 1.4, which would be approximately the commit in which it was bumped to VN:1.4. The situation is similar for htsget and refget, though they are so new that they do not have multiple extant versions or much history as yet. IMHO most needs are best served by the history appendix in the master HEAD edition of the various documents. So version tags are unnecessary. Anyone wanting to see details of the exact historical e.g. SAM v1.4 specification will be spelunking around in the Git repository history anyway. |
The versioning is text based. It's a big step forward, but a change log is not the same thing as a version control system. If I want to actually see a spec at a specific version number then currently it is a complex task. Git could make this a cinch with "git checkout SAM_1.5; make" and voila. It's not a biggy, but IMO you're conflating two issues together; documentation and tagging. |
I have two main points:
In your initial comment starting this issue, you said:
For example, from b1ae9f9: Bump SAM version to VN:1.6 due to >65535 CIGAR strings (PR#307). IMHO adhering to such a convention for these commit messages is all we really need. |
Agreed, point 2 fixes things going forwards. It doesn't for historical commits which have been bad at this (it was a surprising challenge to write that history section), but you're right there's not a huge need so I'm happy. |
Discussed again at September 20th meeting. All comments on issue and in meeting equate to “happy with the status quo”. No further points raised subsequently. |
Following on from discussion here: #40 (comment). This is SAM specific discussion, but the general thought processes apply to all specs. (Although SAM appears unique here in that it has multiple versions in one document instead of multiple documents?)
The SAM specification no longer contains an explicit string explaining the version number. We have examples that contain HD lines, but there is more than one example in the specification and historically they haven't always agreed with each other on the version number. Further more, a SAM file with
@HD VN:1.4
is still a valid file within the 1.5 specification so it shouldn't necessarily be interpreted that such an example is implying the entire rest of the document is for 1.4 only.Commit 321f786 removed the version number string from the specification. I propose adding this back. It used to also have
-r49
etc attached. That's hard to keep up to date manually and also tricky for git describe as we have lots of commits in other specifications (and no tags). However maybe it's still beneficial. I find a commit hash of limited use as by itself it doesn't inform readers which of two documents are newest.I would also like a standard procedure for bumping major or minor version number including commit messages. Commit 80bc68f changed SAM spec from 1.4 to 1.5, but there is no mention of this in the commit message. It took some digging and binary searching with git blame to figure out when this version changed. It would be wrong to go back and correct these historically, but for future we could require something like "Updated version to X.Y" somewhere in the commit message.
Therefore I'd also think about adding tags. These aren't ideal for hts-specs as we have many formats, but it's possible to add say SAM:v1.4, SAM:v1.5, CRAM:v2.1, CRAM:v3.0 etc. Yes a CRAM tag will be meaningless for the SAM spec and vice versa, but it does at least mean we can checkout the SAM v1.4 spec without lots of git blaming.
Comments?
The text was updated successfully, but these errors were encountered: