1. 24 May, 2018 7 commits
  2. 18 May, 2018 1 commit
    • Rob Davies's avatar
      Fix gcc 8.1 warnings · 55d7bef5
      Rob Davies authored
      Change kputsn() and kputsn_() to take size_t instead ot int to
      fix possible integer overflow in kputs.  They are both static inline
      so we can do this without breaking the ABI.  Guards on integer
      wrap-around should catch attempts to add over- or negatively- long
      strings.
      
      Possible buffer overflow in cram_populate_ref().
      
      Possible use of uninitialised value in bcf_sr_regions_next().  In
      that function it is possible that `from` and `to` may not be set if
      either `ifrom` or `ito` is negative as that could cause
      _regions_parse_line() to skip setting them.
      
      String truncation warning in bam_hdr_write() when compiled with
      no optimisation.  There was no need to copy the string anyway,
      it can just be written out directly.
      
      Not caught by gcc: Possible overflow in expand_cache_path().
      
      Silence false positive uninitialised value warning in cram_encode.c
      process_one_read() when using optimisation -Og or -Os.  gcc fails
      to spot that `new = 1` prevents `k` from being used if not set.
      55d7bef5
  3. 16 May, 2018 1 commit
    • daviesrob's avatar
      New interfaces to add or update bam integer, float and array aux tags (#694) · bceff25a
      daviesrob authored
      * Pull repeated code to expand bam data to its own function
      
      Adds some missing overflow checks and fixes a few places where
      l_data was incremented before trying to expand the data buffer
      so it would no longer be valid on failure.
      
      * Add bam_aux_update_int() interface
      
      Makes adding or changing the values of integer tags much easier.
      Updated tags will grow in size if needed, including moving
      any following data.  They will not shrink - if the new data fits
      in the old space the size will remain unchanged even if it is
      bigger than stricly necessary.
      
      * Add bam_aux_update_float() interface
      
      * Add bam_aux_update_array() interface
      bceff25a
  4. 30 Apr, 2018 2 commits
  5. 27 Apr, 2018 1 commit
  6. 26 Apr, 2018 2 commits
  7. 24 Apr, 2018 1 commit
  8. 19 Apr, 2018 1 commit
    • James Bonfield's avatar
      Replace tabs with spaces to the next tab stop (multiples of 8). · f1000743
      James Bonfield authored
      Plus also a few manual tweaks to indentation levels, mostly caused by
      failure to keep indentation correct after old search and replaces.
      
      See also e770a1b3 for the earlier commit
      to change these elsewhere in htslib.  Use git blame -w to compare beyond
      these commits.
      
      For reference, the command used to do this was perl:
      
          perl -e 'while(<>){while(s/^(.*?)\t/"$1".(" "x(8-length($1)%8))/e){};s/\s*$/\n/;print}'
      
      However GNU "expand | sed 's/ *$//'" also works if you have it
      installed.
      f1000743
  9. 18 Apr, 2018 1 commit
  10. 11 Apr, 2018 1 commit
  11. 05 Apr, 2018 1 commit
  12. 04 Apr, 2018 2 commits
  13. 03 Apr, 2018 4 commits
    • Rob Davies's avatar
      5b60dd3f
    • Rob Davies's avatar
      Release 1.8 · be22a2a1
      Rob Davies authored
      be22a2a1
    • James Bonfield's avatar
      More tweaks to cram threading. · 107e7d17
      James Bonfield authored
      The previous commit, while valid, also revealed more woes of multi-slice
      containers, specifically when cancelling the current read by seeking
      (draining the read-ahead decode queue).
      107e7d17
    • rpetrovski's avatar
      Memory leak iterating multiple queries over cram · 651a936b
      rpetrovski authored
      The following code consumes memory indefinitely. Memory leak is gone once the change is applied.
      Steps:
      1. build htslib
      2. compile test.c with:
      gcc -O0  -ggdb -I htslib/install/include test.c -L htslib/install/lib/ -l:libhts.a -lz -lpthread -llzma -lbz2
      3. run ./a.out some.cram chr1
      4. watch virtual memory going up in top
      5. apply patch, rebuild test.c, notice virtual memory does not change
      
      test.c:
      ```c++
      
      int main(int argc,char** argv)
      {
              hts_itr_t *iter=NULL;
              hts_idx_t *idx=NULL;
              samFile *in = NULL;
              bam1_t *b= NULL;
              bam_hdr_t *header = NULL;
              if(argc!=3) return -1;
              in = sam_open(argv[1], "r");
      
              if(in==NULL) return -1;
              if ((header = sam_hdr_read(in)) == 0) return -1;
      
              idx = sam_index_load(in,  argv[1]);
              if(idx==NULL) return -1;
      
              b = bam_init1();
              fputs("reading\n",stdout);
              do
              {
                      if (iter) hts_itr_destroy(iter);
                      iter = sam_itr_querys(idx, header, argv[2]);
                      if(!iter) return -1;
      //              fputs("DO STUFF\n",stdout);
              }
              while (sam_itr_next(in, iter, b) >= 0);
      
              fputs("done reading\n",stdout);
      
              hts_itr_destroy(iter);
              bam_destroy1(b);
              hts_idx_destroy(idx);
              bam_hdr_destroy(header);
              sam_close(in);
              return 0;
      }
      ```
      651a936b
  14. 29 Mar, 2018 2 commits
  15. 28 Mar, 2018 2 commits
  16. 26 Mar, 2018 1 commit
    • James Bonfield's avatar
      Improved round-trip support for NM and MD tags in CRAM. · a5dc7e00
      James Bonfield authored
      See samtools/samtools#717 for discussion.
      
      The NM tag is ambiguous and infact differs in implementation between
      htsjdk and htslib.  Specifically N in both ref and seq is considered
      to be a mismatch for samtools and a match in picard.
      
      If we detect this case we now also store NM and MD verbatim, along
      with the suspect case of falling off the end of the reference (who
      knows what people write to these fields in that ill-defined case).
      
      This makes it more likely that a round-trip from SAM -> CRAM -> SAM
      will work even when the input SAM was produced via htsjdk.
      
      To be ultra careful, we also add store_md and store_nm options to
      always store this data verbatim.  When combined with decode_md (note
      this implicitly also implies decode_nm) this means it is possible to
      round-trip while keeping these fields perfect even when they are set
      to complete hogwash that neither picard nor samtools accepts, and also
      to distinguish between the case of some reads having these fields
      while others do not.
      
      For example:
      
          samtools view -O cram,store_md=1,store_nm=1 in.sam -o out.cram
          samtools view -I decode_md=0 out.cram -o out.cram.sam
      
      I thought about having a join store_md field that covers NM too, but
      there are reasons why we'd want to store NM verbatim and not MD such
      as NM being tiny in comparison to MD and MD being more tightly defined
      in the spec.
      a5dc7e00
  17. 21 Mar, 2018 1 commit
  18. 15 Mar, 2018 9 commits
    • Rob Davies's avatar
      Suppress unused function warnings in kseq · 53241915
      Rob Davies authored
      Left over work from commit 5ffc4a20
      53241915
    • Rob Davies's avatar
      Avoid unintended macro expansion in KSORT_INIT_GENERIC · b49bb2d7
      Rob Davies authored
      Netbsd's libc #defines uint16_t to __uint16_t (and similarly for
      other stdint types).  This was expanded by KSORT_INIT_GENERIC()
      resulting in functions being defined with slightly different names
      compared to the ones produced by using KSORT_INIT() directly.
      The names also no longer matched the results of expanding
      ks_mergesort() and friends.
      
      Fix this by adjusting where the underscore gets pasted into the
      names.  This means KSORT_INIT_GENERIC can use the argument
      in a token pasting operation, which prevents it from being
      expanded.
      b49bb2d7
    • Rob Davies's avatar
      Fix use of (possibly) signed char with ctype functions · 738c16d2
      Rob Davies authored
      For portability with platforms that still implement isspace() etc.
      as an array.
      
      cram/cram_io.c, vcf.c and hfile_libcurl.c use the wrappers in
      textutils_internal.h
      
      knetfile.c and kstring.c use casts so we can push the changes
      upstream if we want to.
      738c16d2
    • Rob Davies's avatar
      Turn off format warnings in appveyor for mingw-w64 · 5fef1be2
      Rob Davies authored
      It appears -Wformat is broken in mingw-w64's gcc at present.
      For example, see discussion at:
      https://github.com/facebook/rocksdb/pull/2052#discussion_r108785499
      
      As there is no easy fix, turn off format warnings in appveyor
      configuration to reduce clutter in the output.
      5fef1be2
    • Rob Davies's avatar
      Change some thread workers to use return instead of pthread_exit · fc6a5f93
      Rob Davies authored
      The effect is the same and it fixes some warnings in MinGW's
      gcc.
      fc6a5f93
    • Rob Davies's avatar
      Fix SunStudio compiler warnings · 2796b437
      Rob Davies authored
      Mainly unreachable code, a couple of integer issues and anonymous
      unions.  Suppressed an anonymous union warning in cram_codecs.h
      as it would need a lot of changes to fix.  A bogus 'end of loop not
      reached' error is suppressed in knetfile.c (the compiler does
      not like macros of the form `do { return; } while (0)`).
      
      Replaced use of `diff -q` with `cmp` in test.pl as the -q option
      is not supported in Solaris-derivatives.
      2796b437
    • James Bonfield's avatar
      More general CRAM multi-threading fixes (not related to multi-slice). · 1c4acb41
      James Bonfield authored
      These were spotted by gcc -fsanitize=thread.
      
      1. Decoding: fd->no_ref is set while decoding the compression header
      and used in subsequent slice decodes, but we may then decode the next
      container compression header before the previous slices have finished
      decoding.
      
      2. Decoding: avoid race when limiting data via required_fields.
      
      fd->decode_md is now copied to s->decode_md, so the slice can disable
      this itself if required (such as when the user asked for MD tag to be
      filled out while also asking not to return any auxiliary blocks).
      
      Although technically fixing a threading violation, the practical
      implementations means this is just a tidyup rather than any real
      behavior changes.
      
      3. Encoding: Cram_encode_aux needed an extra guard surrounding the
      fd->tags_used field.  This is used to hold tag types seen so far in
      the file, within any container or block, so we can keep track of the
      compression methods that work best for a any specific tag type.
      1c4acb41
    • James Bonfield's avatar
      Fixed trivial memory leak in test harness · 18931338
      James Bonfield authored
      18931338
    • James Bonfield's avatar
      Bug fixes to multi-slice containers (mostly threading related). · 6b6c8a3f
      James Bonfield authored
      1) When skipping past slices to find one that overlaps the start of
      our region, don't free container here (affects non-threading also).
      Fixes a crash reported by xiaofeng.liu@sentieon.com.
      
      2) Don't cache cram_get_block_by_id values in the codec as multiple
      slices may be decoding in parallel using the same codec.  This also
      removes the need for the reset function.
      
      Instead we use the already existing per-slice lookup array, but
      improved so it works (mostly without linear scan) on large ID aux
      blocks too.  We could conceivably go the whole hog of using a hash
      table, but I think it's overkill and this is minimum code.
      
      3) We now distinguish between fd->ctr, c->curr_slice (being consumed
      by get_seq calls) and fd->ctr_mt, c->curr_slice_mt (the read-ahead
      for dispatching thread tasks).  Similarly for EOF / OOC (out of
      container) parameters.
      
      4) Cram multi-threaded flush now does the freeing of containers
      better.
      
      5) Added a larger input file of 1000 reads and a test using
      multi-slice containers.
      
      Also added the ability to debug the test harness with e.g. valgrind
      
      Set the TEST_PRECMD first.  For example:
      
          TEST_PRECMD="valgrind --leak-check=full" make check
      6b6c8a3f