COVID-19 SpliceDB

About

CASA is a new database that assists the user to identify alternative splicing (AS) events at whole transcriptome scale in COVID-19 and its related diseases, as well as to select candidates for further enquiry. Alternative splicing is a regulated process that occurs in nearly all multi-exon genes and play important roles in many viral infections. In CASA, we integrated bulk RNA-seq datasets and related information to identify COVID-19-related AS events in different biological states or treatment conditions, incorporating 2,159 samples from 58 projects across 15 body sites. Totally, five types of AS events across 13 body sites were deposited, including 5,340,848 SE, 407,556 A5SS, 634,455 A3SS, 953,142 MXE and 320,844 RI. For each dataset, an individual library, we provided a genome-wide survey, gene ontology (GO) enrichment, KEGG pathway enrichment, potential regulated RBP and drug discovery of AS events.
CASA supports Google Chrome and Microsoft Edge browsers.

In CASA, users can:

Browse or search AS events across different body sites (including lung, colon, etc.) and different disease stages (including healthy, convalescent, moderate, severe, etc.);
Browse or search AS related functional annotations between groups in each independent dataset;
Explore potential drugs for genes with differentially alternative splicing level;
Explore potential regulators associated with dynamic AS events;
Perform analysis for alternative splicing by submitting new transcriptome data (FASTAQ or BAM formatted files);
Download all results and figures for further research.

Citation:

CASA: a database for genome-wide identification of COVID-19-related alternative splicing events

URL: http://www.splicedb.net/casa

News

The CASA database has been developed and was released in Jul. 2022.

The CASA project development started in Jan. 2022.

The CASA requirements analysis and functional design since Oct. 2021.

Bioinformatic analysis and computation on sample data since Sep. 2021.

The CASA project launched and started collecting COVID-19 related sample data in Jun. 2021.

Help

The CASA is an alternative splicing (AS) database that assists the user to identify COVID-19-related AS events at whole transcriptome scale. CASA was constructed by applying rMATS methods to RNA-seq samples from 15 human body sites, covering 2,159 samples and 27 tissues, 16 organoids and 22 cell lines. Totally, 5,340,848 SE, 407,556 A5SS, 634,455 A3SS, 953,142 MXE and 320,844 RI were identified and CASA provides search and visualization functions for identification of splicing pattern differences between biological states or treatment conditions.

In CASA, users can:

Browse or search AS events across different body sites (including lung, colon, etc.) and different disease stages (including healthy, convalescent, moderate, severe, etc.);
Browse or search AS related functional annotations between groups in each independent dataset;
Explore potential drugs for genes with differentially alternative splicing level;
Explore potential regulators associated with dynamic AS events;
Perform analysis for alternative splicing by submitting new transcriptome data (FASTAQ or BAM formatted files);
Download all results and figures for further research.

A splicing event is a change in splicing patterns in a multi-exon gene among samples. In CASA, five different types of splicing events are detected:

How can splicing event be quantified?

All alternative splicing events were identified and quantified at whole transcriptomic scale by using rMATS paired/unpaired model. For each sample, every possible splice event (e.g., an exon skipping event) was estimated as a ψ value that is similar to percent-splice-in (PSI). For a skipped exon, the exon inclusion level ψ can be estimated as the formula (Figure A). The exon inclusion reads are the reads from the upstream splice junction, the alternative exon itself, and the downstream splice junction. The exon skipping reads are the reads from the skipping splice junction that directly connects the upstream exon to the downstream exon (Figure B).

For each exon, we estimated the group mean of exon inclusion levels of groups 1 and 2 (ψi1 and ψi2) as fixed effects. Using a likelihood-ratio test, we test whether the difference of the group mean between the two sample groups exceeds a user-defined threshold, against the null hypothesis |Δψi| = |ψi1| − |ψi2| ≤ c.

CASA is a user-friendly web interface that provides to allow users to query the database through multiple criteria, such as sample type, sample name, microbe, gene symbol or dataset (Figure A). Once one or more datasets are selected, result tables are shown and the splicing event can be visualized (Figure B). There were several sections in the detail page. First, the filter criteria for AS events were shown in the top of the page, such as AS type and the statistical criteria. Then the detailed information of AS events, such as location, project ID and related clinical information, were shown in the middle of the page. The basic information of AS events, for example the exon inclusion level of each sample, could be obtained by clicking 'Database ID' in the result table, and the distribution of the exon inclusion levels of each sample was compared between two different groups. Subsequent splice graph of the splicing event was shown in the bottom of the page, providing the read distributions across the different exons.

Currently, CASA includes a total of 2,159 human samples from 58 independent datasets, covering 15 body sites, 11 microbes and 15 countries/regions (Figure). Of these samples, the most (64.47%, 1392/2159) were generated from tissue, and the second (28.35%, 612/2159) was from the cell line. The top three body sites are the blood, lung and colon, and the top three countries are the USA (56.14%, 1212/2159), France (19.92%, 430/2159) and Spain (4.72%, 102/2159). Samples from tissues, organoids or cell lines under different conditions were collected, including disease severity (mild, moderate or severe COVID-19), infection time and therapy. This related phenotype/clinical information helps us infer the condition-specific splicing patterns, as well as their potential regulators.

The pie chart shows sample sizes and sample types of 13 body sites (Figure A). The histograms show the sample sizes and percent of five AS events in each tissue (Figure B), organoid (Figure C) and cell line (Figure D), respectively.

Currently five toolkits were embedded in CASA, including GO/KEGG enrichment analysis, RBP prediction, map of AS events, drug discovery and submission. The tutorial of those toolkits as follows: 6.1 GO/KEGG Enrichment Analysis The GO/KEGG enrichment analysis is a helpful application to discover and identify the biological function of genes with significantly differential alternative splicing events in the specific physiological and pathological condition. Once a special dataset the user chooses, the results are displayed as bar plot and the bar plot can be freely download and saved as a PDF file with high resolution. As GO enrichment analysis an example, the query page and the result page are shown below.

6.2 RBP Prediction RNA-binding proteins (RBPs) play a critical role in the regulation of alternative splicing (AS). Given a set of differentially regulated alternative exons in a special dataset, we apply the rMAPS web server (http://rmaps.cecsresearch.org/) with default parameters to perform motif analyses of RBPs in the vicinity of alternatively spliced exons and plots of RNA maps that depict the spatial patterns of RBP motifs. All results of RBP prediction for alternative splicing events in CASA were curated for users retrieval.

6.3 Map of AS events The tool is used to query and browse the region on chromosome of differentially regulated alternative exons in a special dataset. Once the user chooses a special dataset, the distribution of AS events on chromosomes was shown in Figure B. Importantly, the details of all AS events in the special dataset also were shown as a tabular form and every AS event was visualized in Figure C.

6.4 Drug Discovery Hundreds or even thousands of significantly differential AS events between cases and their controls were identified, but it is very challenging to identify potential drugs and actionable targets based on such genes with AS events. Thus, the tool “Drug Discovery” was developed and aims to discover and prioritize the potential drugs and actionable targets against human disease at whole transcriptome scale, such as COVID-19 and its related diseases. The users can input all interested genes as the type of gene symbol or Ensembl ID, all query results are presented in tabular form. In the query results, there are two tables including the drug-target-pathway interactions and the number of drug-target interactions and target-pathway (see Figure B).

6.5 Submission The users are encouraged to submit their new human RNA-seq data of SARS-CoV-2 and other related infection to detect AS events. The users should fill in the form and the form consist of basic information and important parameters setting. As for other parameters in the workflow, default parameters are used and the detailed pipeline can be download in the document page. The "*" symbol indicates the required fields.

Download

Thousands of RNA-seq data generated from COVID-19 and COVID-19-related specimens were enrolled in CASA. All relevant clinical/phenotype information were manually collected and integrated, including disease severity (e.g., mild, moderate or severe COVID-19), cell types, and infection time, etc.