Generate Sequence Logo

Create a sequence logo for the sequences around TSSs.

plot_sequence_logo(
  experiment,
  samples = "all",
  genome_assembly = NULL,
  threshold = NULL,
  use_normalized = FALSE,
  distance = 10,
  dominant = FALSE,
  data_conditions = NULL,
  ncol = 1,
  font_size = 6,
  base_colors = c(A = "#109649", C = "#255C99", G = "#F7B32C", T = "#D62839"),
  ...
)

Arguments

experiment	TSRexploreR object.
samples	A vector of sample names to analyze.
genome_assembly	Genome assembly in FASTA or BSgenome format.
threshold	TSSs or TSRs with a score below this value will not be considered.
use_normalized	Whether to use the normalized (TRUE) or raw (FALSE) counts.
distance	Bases to add on each side of each TSS.
dominant	If TRUE, will only consider the highest-scoring TSS per gene, transcript, or TSR or highest-scoring TSR per gene or transcript.
data_conditions	Apply advanced conditions to the data.
ncol	Integer specifying the number of columns to arrange multiple plots.
font_size	Font size for plots.
base_colors	Colors for each base.
...	Arguments passed to ggseqlogo.

Value

ggplot2 object with sequence logo.

Details

This plotting function uses the ggseqlogo library to make sequence logos from the sequences retrieved by the 'tss_sequences' function. Sequence logos illustrate positional biases for certain bases at specific positions in a set of centered sequences. This is particularly important for TSS analysis since literature has shown strong base preferences spanning TSSs and surrounding sequences.

'genome_assembly' must be a valid genome assembly in either fasta or BSgenome format. fasta formatted genome assemblies should have the file extension '.fasta' or '.fa'. BSgenome assemblies are precompiled Bioconductor libraries for common organisms.

'distance' controls the length upstream and downstream of the TSS for which the sequence will be retrieved.

The color of each base is set using the 'base_colors' argument. The argument input should be a named vector, with the base as the name and the desired color of the base as the vector element.

A set of functions to control data structure for plotting are included. 'threshold' will define the minimum number of reads a TSS or TSR must have to be considered. 'dominant' specifies whether only the dominant TSS or TSR is considered from the 'mark_dominant' function. For TSSs this can be either dominant per TSR or gene, and for TSRs it is just the dominant TSR per gene. 'data_conditions' allows for the advanced filtering, ordering, and grouping of data.

Examples

data(TSSs_reduced)
assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR")

exp <- TSSs_reduced %>%
  tsr_explorer(genome_assembly=assembly) %>%
  format_counts(data_type="tss")

p <- plot_sequence_logo(exp, distance=5)

Arguments

Value

Details

See also

Examples