- May 24, 2018
-
-
Rob Davies authored
Eliminates a memcpy from the uncompressed to compressed buffer.
-
Rob Davies authored
-
Rob Davies authored
When the buffer is empty and the request won't fit, go straight to hwrite2() without trying to copy data into the buffer first. Avoids a memcpy and an extra call to fp->backend->write().
-
Rob Davies authored
-
Rob Davies authored
Reduces the number of times the output consumer will be woken up, only to be disappointed.
-
Rob Davies authored
Compiler warning was correct.
-
James Bonfield authored
Silence a clang 5.0.0 warning caused by the recent gcc 8.1.0 warning fixes.
-
- May 18, 2018
-
-
Rob Davies authored
Change kputsn() and kputsn_() to take size_t instead ot int to fix possible integer overflow in kputs. They are both static inline so we can do this without breaking the ABI. Guards on integer wrap-around should catch attempts to add over- or negatively- long strings. Possible buffer overflow in cram_populate_ref(). Possible use of uninitialised value in bcf_sr_regions_next(). In that function it is possible that `from` and `to` may not be set if either `ifrom` or `ito` is negative as that could cause _regions_parse_line() to skip setting them. String truncation warning in bam_hdr_write() when compiled with no optimisation. There was no need to copy the string anyway, it can just be written out directly. Not caught by gcc: Possible overflow in expand_cache_path(). Silence false positive uninitialised value warning in cram_encode.c process_one_read() when using optimisation -Og or -Os. gcc fails to spot that `new = 1` prevents `k` from being used if not set.
-
- May 16, 2018
-
-
daviesrob authored
* Pull repeated code to expand bam data to its own function Adds some missing overflow checks and fixes a few places where l_data was incremented before trying to expand the data buffer so it would no longer be valid on failure. * Add bam_aux_update_int() interface Makes adding or changing the values of integer tags much easier. Updated tags will grow in size if needed, including moving any following data. They will not shrink - if the new data fits in the old space the size will remain unchanged even if it is bigger than stricly necessary. * Add bam_aux_update_float() interface * Add bam_aux_update_array() interface
-
- Apr 30, 2018
-
-
James Bonfield authored
-
James Bonfield authored
With slices that use the multi-ref mode (ref id -2 and tid encoded using RI data series) we need to decode the slice, but only enough to get rname, pos and alignment end. We use the required fields parameter to prevent wasting time decompressing quality values and auxiliary tags.
-
- Apr 27, 2018
-
-
Valeriu Ohan authored
caused some reads to be wrongly skipped when reading a BAM file.
-
- Apr 26, 2018
-
-
Petr Danecek authored
When alleles are updated also bcf1_t.rlen needs to be udpated. In absence of INFO/END tag, the length of the REF allele is used, otherwise rlen is calculated from INFO/END. However, by mistake the END coordinate was used instead of the allele length.
-
Petr Danecek authored
-
- Apr 24, 2018
-
-
Petr Danecek authored
Fixes #691
-
- Apr 19, 2018
-
-
James Bonfield authored
Plus also a few manual tweaks to indentation levels, mostly caused by failure to keep indentation correct after old search and replaces. See also e770a1b3 for the earlier commit to change these elsewhere in htslib. Use git blame -w to compare beyond these commits. For reference, the command used to do this was perl: perl -e 'while(<>){while(s/^(.*?)\t/"$1".(" "x(8-length($1)%8))/e){};s/\s*$/\n/;print}' However GNU "expand | sed 's/ *$//'" also works if you have it installed.
-
- Apr 18, 2018
-
-
Valeriu Ohan authored
Fixes `cram_ptell` function by using the correct indicator of maximum records in a container. Minor code improvements.
-
- Apr 11, 2018
-
-
Valeriu Ohan authored
-
- Apr 05, 2018
-
-
jrayner authored
-
- Apr 04, 2018
- Apr 03, 2018
-
-
Rob Davies authored
-
Rob Davies authored
-
James Bonfield authored
The previous commit, while valid, also revealed more woes of multi-slice containers, specifically when cancelling the current read by seeking (draining the read-ahead decode queue).
-
rpetrovski authored
The following code consumes memory indefinitely. Memory leak is gone once the change is applied. Steps: 1. build htslib 2. compile test.c with: gcc -O0 -ggdb -I htslib/install/include test.c -L htslib/install/lib/ -l:libhts.a -lz -lpthread -llzma -lbz2 3. run ./a.out some.cram chr1 4. watch virtual memory going up in top 5. apply patch, rebuild test.c, notice virtual memory does not change test.c: ```c++ int main(int argc,char** argv) { hts_itr_t *iter=NULL; hts_idx_t *idx=NULL; samFile *in = NULL; bam1_t *b= NULL; bam_hdr_t *header = NULL; if(argc!=3) return -1; in = sam_open(argv[1], "r"); if(in==NULL) return -1; if ((header = sam_hdr_read(in)) == 0) return -1; idx = sam_index_load(in, argv[1]); if(idx==NULL) return -1; b = bam_init1(); fputs("reading\n",stdout); do { if (iter) hts_itr_destroy(iter); iter = sam_itr_querys(idx, header, argv[2]); if(!iter) return -1; // fputs("DO STUFF\n",stdout); } while (sam_itr_next(in, iter, b) >= 0); fputs("done reading\n",stdout); hts_itr_destroy(iter); bam_destroy1(b); hts_idx_destroy(idx); bam_hdr_destroy(header); sam_close(in); return 0; } ```
-
- Mar 29, 2018
-
-
Rob Davies authored
-
Rob Davies authored
Avoids fiddling with the internals of the BGZF struct.
-
- Mar 28, 2018
-
-
Rob Davies authored
Also minor tweak to bgzip usage.
-
Nathan T. Weeks authored
-
- Mar 26, 2018
-
-
James Bonfield authored
See samtools/samtools#717 for discussion. The NM tag is ambiguous and infact differs in implementation between htsjdk and htslib. Specifically N in both ref and seq is considered to be a mismatch for samtools and a match in picard. If we detect this case we now also store NM and MD verbatim, along with the suspect case of falling off the end of the reference (who knows what people write to these fields in that ill-defined case). This makes it more likely that a round-trip from SAM -> CRAM -> SAM will work even when the input SAM was produced via htsjdk. To be ultra careful, we also add store_md and store_nm options to always store this data verbatim. When combined with decode_md (note this implicitly also implies decode_nm) this means it is possible to round-trip while keeping these fields perfect even when they are set to complete hogwash that neither picard nor samtools accepts, and also to distinguish between the case of some reads having these fields while others do not. For example: samtools view -O cram,store_md=1,store_nm=1 in.sam -o out.cram samtools view -I decode_md=0 out.cram -o out.cram.sam I thought about having a join store_md field that covers NM too, but there are reasons why we'd want to store NM verbatim and not MD such as NM being tiny in comparison to MD and MD being more tightly defined in the spec.
-
- Mar 21, 2018
-
-
Andrew Whitwham authored
-
- Mar 15, 2018
-
-
Rob Davies authored
Left over work from commit 5ffc4a20
-
Rob Davies authored
Netbsd's libc #defines uint16_t to __uint16_t (and similarly for other stdint types). This was expanded by KSORT_INIT_GENERIC() resulting in functions being defined with slightly different names compared to the ones produced by using KSORT_INIT() directly. The names also no longer matched the results of expanding ks_mergesort() and friends. Fix this by adjusting where the underscore gets pasted into the names. This means KSORT_INIT_GENERIC can use the argument in a token pasting operation, which prevents it from being expanded.
-
Rob Davies authored
For portability with platforms that still implement isspace() etc. as an array. cram/cram_io.c, vcf.c and hfile_libcurl.c use the wrappers in textutils_internal.h knetfile.c and kstring.c use casts so we can push the changes upstream if we want to.
-
Rob Davies authored
It appears -Wformat is broken in mingw-w64's gcc at present. For example, see discussion at: https://github.com/facebook/rocksdb/pull/2052#discussion_r108785499 As there is no easy fix, turn off format warnings in appveyor configuration to reduce clutter in the output.
-
Rob Davies authored
The effect is the same and it fixes some warnings in MinGW's gcc.
-
Rob Davies authored
Mainly unreachable code, a couple of integer issues and anonymous unions. Suppressed an anonymous union warning in cram_codecs.h as it would need a lot of changes to fix. A bogus 'end of loop not reached' error is suppressed in knetfile.c (the compiler does not like macros of the form `do { return; } while (0)`). Replaced use of `diff -q` with `cmp` in test.pl as the -q option is not supported in Solaris-derivatives.
-
James Bonfield authored
These were spotted by gcc -fsanitize=thread. 1. Decoding: fd->no_ref is set while decoding the compression header and used in subsequent slice decodes, but we may then decode the next container compression header before the previous slices have finished decoding. 2. Decoding: avoid race when limiting data via required_fields. fd->decode_md is now copied to s->decode_md, so the slice can disable this itself if required (such as when the user asked for MD tag to be filled out while also asking not to return any auxiliary blocks). Although technically fixing a threading violation, the practical implementations means this is just a tidyup rather than any real behavior changes. 3. Encoding: Cram_encode_aux needed an extra guard surrounding the fd->tags_used field. This is used to hold tag types seen so far in the file, within any container or block, so we can keep track of the compression methods that work best for a any specific tag type.
-
James Bonfield authored
-
James Bonfield authored
1) When skipping past slices to find one that overlaps the start of our region, don't free container here (affects non-threading also). Fixes a crash reported by xiaofeng.liu@sentieon.com. 2) Don't cache cram_get_block_by_id values in the codec as multiple slices may be decoding in parallel using the same codec. This also removes the need for the reset function. Instead we use the already existing per-slice lookup array, but improved so it works (mostly without linear scan) on large ID aux blocks too. We could conceivably go the whole hog of using a hash table, but I think it's overkill and this is minimum code. 3) We now distinguish between fd->ctr, c->curr_slice (being consumed by get_seq calls) and fd->ctr_mt, c->curr_slice_mt (the read-ahead for dispatching thread tasks). Similarly for EOF / OOC (out of container) parameters. 4) Cram multi-threaded flush now does the freeing of containers better. 5) Added a larger input file of 1000 reads and a test using multi-slice containers. Also added the ability to debug the test harness with e.g. valgrind Set the TEST_PRECMD first. For example: TEST_PRECMD="valgrind --leak-check=full" make check
-