About

ReadCounter is a tool that counts reads mapping to genes/features using .sam or .bam files as input.

Requirements:

How to use the script:

Values for the variables should be concatenated to the argument using "=" e.g. strandSpecific=true

Example command line with arguments: java -jar -Xmx4G readcounter.jar inputFile=input.sam annotation=genome.gtf strandSpecific=true disregardOverlapSize=true beSmart=False

Arguments (accolades are excluded):

The results consist of 2 files.

This tool gives the similar results as other counting tools. To obtain the exact same results you would obtain from HTseq (under default settings) include the following argument: "disregardOverlapSize=true" and "beSmart=false". ReadCounter uses the following strategies to increase accuracy, causing possible discrepancies with other tools:

How does this tool work? creates a new file from the GTF file that is non ambiguous and describes the region of each gene and its exons and introns. Next ReadCounter puts each gene into a bin that describes a region of the genome and contains all genes in this region. The name of the bin is related to the position allowing the tool to instantly find the genes of interest. the size of the overlap between genes in this bin and the read or read pair (in case of paired end reads) is determined. Then the read is counted toward the biggest overlapping gene. Then same process is used for counting exons and intron specific reads. The CIGAR sequence is also accounted for when considering the size of the overlap.

TPM value calculation:
Reads counts are normalized using the following formula:
TPM = Rg X 10^6/(FLg X T)
Rg = Reads per gene
FLg = Length of gene
T = SUM(Rg/FLg) of all genes
For exon read counts only the length of the exonic regions is considered. For intronic read counts only the lenght of intronic regions is considered.
To calculate FLg the full length of either the intron or exon is considered rather then only the size of the overlapping region. For these reasons the ambiguous TPM values may be biased.
To calculate T all reads that map to either introns exons or ambiguous maps are included.