| Gene | Static3A | Static3B | Static3C | Static12A | Static12B | Static12C | Static24A | Static24B | Static24C | Sheared3A | Sheared3B | Sheared3C | Sheared12A | Sheared12B | Sheared12C | Sheared24A | Sheared24B | Sheared24C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000278267 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ISIB Program sponsored by the National Heart Lung and Blood Institute (NHLBI), Grant # HL161716-01. Faculty Mentor: Patrick Breheny
Metastasis: process by which cancer cells break from their original cluster, spread to other parts of the body, and form a new cluster there
break off –> circulatory system –> invasion
Most cancers are treatable when cells don’t spread; become fatal once they spread throughout the body (e.g. prostate cancer spreading to the lungs or bone)
unique to cancer cells; healthy cells cannot detach and function in another organ/tissue
Cancer cells possibly primed for metastasis through fluid shear stress, specifically from blood circulation
Fluid shear stress on cancer cells may cause them to express certain genes differently, and this gene expression could help prepare them better for metastasis
Previous studies convey that cancer cells metastasize less when placed in a new location without stress from the circulatory system
Researchers (under PI Michael Henry) exposed cancer cells to fluid shear stress and measured the different gene expressions before and after the forces
Measured the gene expressions using RNA-sequencing at 3 hours, 12 hours, and 24 hours for 3 trials
sample size of \(n = 18\) for each specific gene
Method of turning RNA material from tissues or cells into readable genomic data
Very widely used in genetics research and helps us quantify gene expression within a specific sample
How do the forces of blood flow affect the gene expression of cancer cells?
Analyzing 58735 genes, which take on roles like helping to make proteins, cell regulation, and non-coding material
Previous work suggests that cancer cells exposed to fluid shear stress tend to be better suited to metastasize (Leeuw et al. 2016)
Test whether certain genes are expressed significantly more or less under the stresses of blood flow, and whether the difference in gene expression might prepare cells for metastasis
How do we model & test this question?
Limma is an R package for fitting linear models to RNA-Seq dataFor fitting a linear model, the current data has some poor behavior.
Many genes have zero (or near zero) expression counts
There is a clear trend in the variance
Small sample sizes means little data to base gene-wise variability estimates off of
⠀
Example of a gene filtered out:
| Gene | Static3A | Static3B | Static3C | Static12A | Static12B | Static12C | Static24A | Static24B | Static24C | Sheared3A | Sheared3B | Sheared3C | Sheared12A | Sheared12B | Sheared12C | Sheared24A | Sheared24B | Sheared24C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000278267 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Example of a gene kept in:
| Gene | Static3A | Static3B | Static3C | Static12A | Static12B | Static12C | Static24A | Static24B | Static24C | Sheared3A | Sheared3B | Sheared3C | Sheared12A | Sheared12B | Sheared12C | Sheared24A | Sheared24B | Sheared24C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000268903 | 62 | 72 | 51 | 103 | 102 | 120 | 95 | 138 | 260 | 53 | 43 | 61 | 94 | 108 | 98 | 81 | 104 | 106 |
⠀
Use the voom function within limma
Transforms count data to \(\log_2\text{CPM}\) and estimates the mean-variance relationship
voom also computes weights for each observation based on precision
No variance trend after accounting for the weights from voom
Few observations means little data to base variability estimates off of
Use an Empirical Bayes method to pool information across genes
This gives a sense of how variability is distributed in the total gene population
The additional info allows for more accurate standard error estimates in model parameters
Interested in gene expression differences between static and sheared groups on average
Different genes may be expressed more or less at different time points
We fit the following linear model to each gene:
\[(\log_2\text{CPM})_i = \beta_{1,1} + \beta_{1,2}\text{(12H)}_i + \beta_{1,3}\text{(24H)}_i + \bigg(\beta_{2,1} + \beta_{2,2}\text{(12H)}_i + \beta_{2,3}\text{(24H)}_i\bigg)(\text{Sheared})_i + \varepsilon_i\]
\[\begin{align*} \text{3 Hours}&: \quad \log_2(\text{FC}) = \beta_{2,1}\\ \text{12 Hours}&: \quad \log_2(\text{FC}) = \beta_{2,1} + \beta_{2,2}\\ \text{24 Hours}&: \quad \log_2(\text{FC}) = \beta_{2,1} + \beta_{2,3} \end{align*}\]
decideTests function of limma tests using a moderated t statistic of the form\[\frac{\log_{2}(\text{FC})}{se(\log_2(\text{FC}))}\]
Can be interpreted the same as a standard t statistic, a difference divided by the standard error of the difference
The standard error here uses the pooled information from the Empirical Bayes
Significance is not determined from standard p-values
Adjusted p-values based on false discovery rate account for multiple hypothesis tests
There were 84 significant genes.
There were 290 significant genes.
There were 254 significant genes.
High-throughput data allows for reliable variability estimates even with small sample sizes
We were able to narrow down which gene pathways were significantly affected due to fluid shear stress
Time had a noticeable effect on gene expression
Although the RNA-seq data provides a good foundation, biological expertise is necessary to conduct deeper analyses on each gene pathway