Import and process BAM files.

import_bams(
  experiment,
  paired,
  sample_sheet = NULL,
  soft_remove = 3,
  proper_pair = NULL,
  remove_secondary = TRUE,
  remove_duplicate = FALSE
)

Arguments

experiment

TSRexploreR object.

paired

Whether the BAMs are paired (TRUE) or unpaired (FALSE).

sample_sheet

A sample sheet data.frame or tab delimited file. Must have the columns 'sample_name', 'file_1', and 'file_2'. Additional meta-data columns can be added with sample information such as condition and batch.

soft_remove

Remove read if greater than this number of soft-clipped bases is present at its 5' most end.

proper_pair

Remove reads flagged as improperly paired. TRUE by default when data is paired-end.

remove_secondary

Remove secondary alignments (TRUE).

remove_duplicate

Remove duplicate reads (paired-end only) (TRUE).

Value

TSRexploreR object with BAM GRanges and soft-clipped base information.

Details

Import BAMs using the information from the sample sheet. If the BAMs are from paired-end data, 'proper_pair' allows removal of reads without a proper-pair SAM flag. Additionally 'remove_secondary' and 'remove_duplicate' will remove reads with the secondary alignment and duplicate flags set.

Most TSS mapping methodologies tend to add at least one non-templated base at the 5' end of the read. Furthermore, template switching reverse transcription (TSRT)-based methods such as STRIPE-seq or nanoCAGE can have up to 3 or 4 non-templated 5' bases. We recommend setting `soft_remove` to at minimum 3 because of this, which removes the read if the given number of soft-clip bases is exceeded.

Examples

bam_file <- system.file("extdata", "S288C.bam", package="TSRexploreR") assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR") samples <- data.frame(sample_name="S288C", file_1=bam_file, file_2=NA) exp <- tsr_explorer(sample_sheet=samples, genome_assembly=assembly) import_bams(exp, paired=TRUE)
#> Warning: NAs introduced by coercion
#> An object of class "tsr_explorer" #> Slot "experiment": #> $TSSs #> $TSSs$S288C #> GRanges object with 12425 ranges and 2 metadata columns: #> seqnames ranges strand | seq_soft n_soft #> <Rle> <IRanges> <Rle> | <character> <numeric> #> [1] I 32391 - | none 0 #> [2] I 32684 - | none 0 #> [3] I 32708 - | none 0 #> [4] I 33359 + | G 1 #> [5] I 36525 + | G 1 #> ... ... ... ... . ... ... #> [12421] XVI 919321 + | G 1 #> [12422] XVI 924152 + | G 1 #> [12423] XVI 928403 + | none 0 #> [12424] XVI 931146 - | none 0 #> [12425] XVI 942767 + | G 1 #> ------- #> seqinfo: 17 sequences from an unspecified genome; no seqlengths #> #> #> $TSRs #> [1] NA #> #> #> Slot "counts": #> $TSSs #> $TSSs$raw #> list() #> #> #> $TSRs #> $TSRs$raw #> list() #> #> #> #> Slot "correlation": #> list() #> #> Slot "diff_features": #> $TSSs #> $TSSs$results #> list() #> #> #> $TSRs #> $TSRs$results #> list() #> #> #> #> Slot "shifting": #> $results #> list() #> #> #> Slot "settings": #> list() #> #> Slot "meta_data": #> $sample_sheet #> sample_name #> 1: S288C #> file_1 #> 1: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRexploreR/extdata/S288C.bam #> file_2 #> 1: NA #> #> $genome_annotation #> NULL #> #> $genome_assembly #> class: FaFile #> path: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRexplor.../S288C_Assembly.fasta #> index: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRe.../S288C_Assembly.fasta.fai #> gzindex: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TS.../S288C_Assembly.fasta.gzi #> isOpen: FALSE #> yieldSize: NA #> #>