import_bams.Rd
Import and process BAM files.
import_bams( experiment, paired, sample_sheet = NULL, soft_remove = 3, proper_pair = NULL, remove_secondary = TRUE, remove_duplicate = FALSE )
experiment | TSRexploreR object. |
---|---|
paired | Whether the BAMs are paired (TRUE) or unpaired (FALSE). |
sample_sheet | A sample sheet data.frame or tab delimited file. Must have the columns 'sample_name', 'file_1', and 'file_2'. Additional meta-data columns can be added with sample information such as condition and batch. |
soft_remove | Remove read if greater than this number of soft-clipped bases is present at its 5' most end. |
proper_pair | Remove reads flagged as improperly paired. TRUE by default when data is paired-end. |
remove_secondary | Remove secondary alignments (TRUE). |
remove_duplicate | Remove duplicate reads (paired-end only) (TRUE). |
TSRexploreR object with BAM GRanges and soft-clipped base information.
Import BAMs using the information from the sample sheet. If the BAMs are from paired-end data, 'proper_pair' allows removal of reads without a proper-pair SAM flag. Additionally 'remove_secondary' and 'remove_duplicate' will remove reads with the secondary alignment and duplicate flags set.
Most TSS mapping methodologies tend to add at least one non-templated base at the 5' end of the read. Furthermore, template switching reverse transcription (TSRT)-based methods such as STRIPE-seq or nanoCAGE can have up to 3 or 4 non-templated 5' bases. We recommend setting `soft_remove` to at minimum 3 because of this, which removes the read if the given number of soft-clip bases is exceeded.
bam_file <- system.file("extdata", "S288C.bam", package="TSRexploreR") assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR") samples <- data.frame(sample_name="S288C", file_1=bam_file, file_2=NA) exp <- tsr_explorer(sample_sheet=samples, genome_assembly=assembly) import_bams(exp, paired=TRUE)#> Warning: NAs introduced by coercion#> An object of class "tsr_explorer" #> Slot "experiment": #> $TSSs #> $TSSs$S288C #> GRanges object with 12425 ranges and 2 metadata columns: #> seqnames ranges strand | seq_soft n_soft #> <Rle> <IRanges> <Rle> | <character> <numeric> #> [1] I 32391 - | none 0 #> [2] I 32684 - | none 0 #> [3] I 32708 - | none 0 #> [4] I 33359 + | G 1 #> [5] I 36525 + | G 1 #> ... ... ... ... . ... ... #> [12421] XVI 919321 + | G 1 #> [12422] XVI 924152 + | G 1 #> [12423] XVI 928403 + | none 0 #> [12424] XVI 931146 - | none 0 #> [12425] XVI 942767 + | G 1 #> ------- #> seqinfo: 17 sequences from an unspecified genome; no seqlengths #> #> #> $TSRs #> [1] NA #> #> #> Slot "counts": #> $TSSs #> $TSSs$raw #> list() #> #> #> $TSRs #> $TSRs$raw #> list() #> #> #> #> Slot "correlation": #> list() #> #> Slot "diff_features": #> $TSSs #> $TSSs$results #> list() #> #> #> $TSRs #> $TSRs$results #> list() #> #> #> #> Slot "shifting": #> $results #> list() #> #> #> Slot "settings": #> list() #> #> Slot "meta_data": #> $sample_sheet #> sample_name #> 1: S288C #> file_1 #> 1: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRexploreR/extdata/S288C.bam #> file_2 #> 1: NA #> #> $genome_annotation #> NULL #> #> $genome_assembly #> class: FaFile #> path: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRexplor.../S288C_Assembly.fasta #> index: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRe.../S288C_Assembly.fasta.fai #> gzindex: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TS.../S288C_Assembly.fasta.gzi #> isOpen: FALSE #> yieldSize: NA #> #>