-
- Downloads
Improved round-trip support for NM and MD tags in CRAM.
See samtools/samtools#717 for discussion. The NM tag is ambiguous and infact differs in implementation between htsjdk and htslib. Specifically N in both ref and seq is considered to be a mismatch for samtools and a match in picard. If we detect this case we now also store NM and MD verbatim, along with the suspect case of falling off the end of the reference (who knows what people write to these fields in that ill-defined case). This makes it more likely that a round-trip from SAM -> CRAM -> SAM will work even when the input SAM was produced via htsjdk. To be ultra careful, we also add store_md and store_nm options to always store this data verbatim. When combined with decode_md (note this implicitly also implies decode_nm) this means it is possible to round-trip while keeping these fields perfect even when they are set to complete hogwash that neither picard nor samtools accepts, and also to distinguish between the case of some reads having these fields while others do not. For example: samtools view -O cram,store_md=1,store_nm=1 in.sam -o out.cram samtools view -I decode_md=0 out.cram -o out.cram.sam I thought about having a join store_md field that covers NM too, but there are reasons why we'd want to store NM verbatim and not MD such as NM being tiny in comparison to MD and MD being more tightly defined in the spec.
Showing
- NEWS 7 additions, 0 deletionsNEWS
- cram/cram_encode.c 34 additions, 30 deletionscram/cram_encode.c
- cram/cram_io.c 10 additions, 0 deletionscram/cram_io.c
- cram/cram_structs.h 2 additions, 0 deletionscram/cram_structs.h
- hts.c 8 additions, 0 deletionshts.c
- htslib/hts.h 2 additions, 0 deletionshtslib/hts.h
Loading
Please register or sign in to comment