Select Git revision
htslib
James Bonfield authored
See samtools/samtools#717 for discussion.
The NM tag is ambiguous and infact differs in implementation between
htsjdk and htslib. Specifically N in both ref and seq is considered
to be a mismatch for samtools and a match in picard.
If we detect this case we now also store NM and MD verbatim, along
with the suspect case of falling off the end of the reference (who
knows what people write to these fields in that ill-defined case).
This makes it more likely that a round-trip from SAM -> CRAM -> SAM
will work even when the input SAM was produced via htsjdk.
To be ultra careful, we also add store_md and store_nm options to
always store this data verbatim. When combined with decode_md (note
this implicitly also implies decode_nm) this means it is possible to
round-trip while keeping these fields perfect even when they are set
to complete hogwash that neither picard nor samtools accepts, and also
to distinguish between the case of some reads having these fields
while others do not.
For example:
samtools view -O cram,store_md=1,store_nm=1 in.sam -o out.cram
samtools view -I decode_md=0 out.cram -o out.cram.sam
I thought about having a join store_md field that covers NM too, but
there are reasons why we'd want to store NM verbatim and not MD such
as NM being tiny in comparison to MD and MD being more tightly defined
in the spec.| Name | Last commit | Last update |
|---|