Import BAMs — import_bams • TSRexploreR

Import and process BAM files.

import_bams(
  experiment,
  paired,
  sample_sheet = NULL,
  soft_remove = 3,
  proper_pair = NULL,
  remove_secondary = TRUE,
  remove_duplicate = FALSE
)

Arguments

experiment	TSRexploreR object.
paired	Whether the BAMs are paired (TRUE) or unpaired (FALSE).
sample_sheet	A sample sheet data.frame or tab delimited file. Must have the columns 'sample_name', 'file_1', and 'file_2'. Additional meta-data columns can be added with sample information such as condition and batch.
soft_remove	Remove read if greater than this number of soft-clipped bases is present at its 5' most end.
proper_pair	Remove reads flagged as improperly paired. TRUE by default when data is paired-end.
remove_secondary	Remove secondary alignments (TRUE).
remove_duplicate	Remove duplicate reads (paired-end only) (TRUE).

Value

TSRexploreR object with BAM GRanges and soft-clipped base information.

Details

Import BAMs using the information from the sample sheet. If the BAMs are from paired-end data, 'proper_pair' allows removal of reads without a proper-pair SAM flag. Additionally 'remove_secondary' and 'remove_duplicate' will remove reads with the secondary alignment and duplicate flags set.

Most TSS mapping methodologies tend to add at least one non-templated base at the 5' end of the read. Furthermore, template switching reverse transcription (TSRT)-based methods such as STRIPE-seq or nanoCAGE can have up to 3 or 4 non-templated 5' bases. We recommend setting `soft_remove` to at minimum 3 because of this, which removes the read if the given number of soft-clip bases is exceeded.

Examples

bam_file <- system.file("extdata", "S288C.bam", package="TSRexploreR")
assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR")
samples <- data.frame(sample_name="S288C", file_1=bam_file, file_2=NA)

exp <- tsr_explorer(sample_sheet=samples, genome_assembly=assembly)
import_bams(exp, paired=TRUE)
#> Warning: NAs introduced by coercion
#> An object of class "tsr_explorer"
#> Slot "experiment":
#> $TSSs
#> $TSSs$S288C
#> GRanges object with 12425 ranges and 2 metadata columns:
#>           seqnames    ranges strand |    seq_soft    n_soft
#>              <Rle> <IRanges>  <Rle> | <character> <numeric>
#>       [1]        I     32391      - |        none         0
#>       [2]        I     32684      - |        none         0
#>       [3]        I     32708      - |        none         0
#>       [4]        I     33359      + |           G         1
#>       [5]        I     36525      + |           G         1
#>       ...      ...       ...    ... .         ...       ...
#>   [12421]      XVI    919321      + |           G         1
#>   [12422]      XVI    924152      + |           G         1
#>   [12423]      XVI    928403      + |        none         0
#>   [12424]      XVI    931146      - |        none         0
#>   [12425]      XVI    942767      + |           G         1
#>   -------
#>   seqinfo: 17 sequences from an unspecified genome; no seqlengths
#> 
#> 
#> $TSRs
#> [1] NA
#> 
#> 
#> Slot "counts":
#> $TSSs
#> $TSSs$raw
#> list()
#> 
#> 
#> $TSRs
#> $TSRs$raw
#> list()
#> 
#> 
#> 
#> Slot "correlation":
#> list()
#> 
#> Slot "diff_features":
#> $TSSs
#> $TSSs$results
#> list()
#> 
#> 
#> $TSRs
#> $TSRs$results
#> list()
#> 
#> 
#> 
#> Slot "shifting":
#> $results
#> list()
#> 
#> 
#> Slot "settings":
#> list()
#> 
#> Slot "meta_data":
#> $sample_sheet
#>    sample_name
#> 1:       S288C
#>                                                                    file_1
#> 1: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRexploreR/extdata/S288C.bam
#>    file_2
#> 1:     NA
#> 
#> $genome_annotation
#> NULL
#> 
#> $genome_assembly
#> class: FaFile 
#> path: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRexplor.../S288C_Assembly.fasta
#> index: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TSRe.../S288C_Assembly.fasta.fai
#> gzindex: /tmp/RtmpTsVxFl/temp_libpath66d16d6b7269/TS.../S288C_Assembly.fasta.gzi
#> isOpen: FALSE 
#> yieldSize: NA 
#> 
#>