Major Study./Bioinformatics

ECCB 학회 노트필기

sosal 2016. 12. 4. 02:12
반응형

세션 들으면서 정리했던건데.. 벌써 갔다온지 몇달이 훌쩍 지났다.


노트필기 하면서 무슨 생각을 했는지 사실 기억도 안나는데,

이거 이러다가 잊어버릴 것 같아서 그냥 블로그에 흔적을 남겨본다.


A survey of best practices for RNA-seq data analysis - Genome Biology

Perturbation biology nominates upstream–downstream drug combinations in RAF inhibitor resistant melanoma cells

http://www.intogen.org : Cancer Drivers Database - 이거 내가 만드는 네트워크에서 한번 비교해보자

http://CancerGenomeInterpreter.org/

Jaccard Coefficient를 써서, Mutation이 일어난 유전자, 안일어난 유전자의 Mutually exclusivity를 체크할 수 있지 않을까.

 - GS로 네트워크를 그릴 때, co-mutation network, mutual exclusivity network 따로 그려보자(후자는 jaccard)

 - 그럼 Hub-gene이 Mutation이 없다면, 그것이 타겟으로 될 확률이 굉장히 높을 것임

Mutual exclusivity로 GS <0.3 유전자들을 

 

- Background

Mammalian systems constitute over 200 cell types, each specialized to perform a distinct function, and yet all cell types share the same genome.

This cell type specificity is achieved by a context-specific interpretation of the DNA sequence to produce a cell type specific transcription signature.

Advances in sequencing techniques have accelerated the characterization of transcription landscapes across many normal and malignant cell types.

The challenge now is to integrate these data to understand transcriptional control at a systems level.

Over the years, powerful machine learning algorithms have been developed for inferring transcriptional networks from expression data,

thereby revealing new aspects of complex biological systems.

- Aims and scope

This one day SIG session will bring together experts from computational biology and machine learning to present recent advances in the development and

application of gene regulatory network inference methods, as well as novel emerging single-cell and epigenomics data types suitable for network inference.

The SIG will be split into two half day sessions. The first half will focus entirely on novel network inference methods,

while the second half will focus on opportunities and challenges arising from new data types.

Each session will feature an invited speaker and three short talks.


####################################################################

Keynote talk 1: Prof. Sushmita Roy (University of Wisconsin-Madison)

####################################################################


Regulatory network dynamics on developmental and evolutionary lineages.

computational method to look at regulatory network


- Dynamics of regulatory networks on lineages.

1. Cell lineage, 2.Species phylogeny. (Time scale)


What determines a regulatory edge?

Regulator -> Target

(1. Transciption factors, 2. Long-range regulation, 3. Chromatin, 4. Signaling networks)


COmputational tools to dissect regulatory networks.

- Network inference (MERLIN (Roy et al, Plos Comp bio 2013), MERLIN0P)

Who regulate whom? (Several regulator -> y)

Experimental:  1. ChIP-chip and ChIP-seq,

  2. Factor/regulator knockout followed by genome-wide transcriptome measurements.

  3. DNase I/ATAC-seq + motif finding

Computational: 1. Supervised network inference

  2. Unsupervised network inference

Expression-based network inference

-Can potentially recover genome-wide regulatory networks.

Regulatory gene expression modules

A module: set of genes that are co-expressed across conditions.


Two popular paradigms to network inference

1. Reconstruction per GENE: Learn precise models of regulation for each gene.

2. Reconstruction pr MODUE: Reveals modular organization of regulatory networks. -> Easily interpretet


Per-module (LeMoNe) vs Per-gene (CLR) method

MERLIN: A network reconstruction method to predict regulators of genes and modules

Key idea: Favor regulators that are already associated with a module.

MERLIN uses a Bayesian formulation of network inference. D(Matrix) -> Algorithm -> G(Graph) : P(G|D) -> P(D|G)P(G)

#Could you tell me one more time about the way to define interesting module in detail?


Chasman et al., PLOS Computational Biology 2016

- Regulatory networks on evolutionary lineages (Arbortum MRTLE)

Shor-term change: Devleopment

Long-term change: Evolution

- Integrating chromatin and expression in cel fat CMINT

Reprogramming factors - MEF(Mouse Embryonic Fibroblasts) -> iPS cells (Induced Plurpotent Stemcell)

Somatic cells -> Intermediate cells (Transient population) -> iPS cells

Partially reprogrammed cells (Stable cell line)


A regulatory module: set of genes with similar regulatory state


- PRedicting enhancer promoter inteactions (RIPPLE, Arboretum-Hi-C)

Long-range gene regulation by distal elements

RIPPLE: A machine learning approach to predict enhancer-promoter interactions


############################################################

Talk 1: Van Anh Huynh-Thu (University of Liege)

Combining tree-based and dynamical systems for the inference of gene regulatory networks.

############################################################


There are two main families of methods.

1. Score-based: Compute statistical dependencies etween pairs of expression profiles.

:Fast, but can not make preictions


2. Model-based: learn am odel capturing the dynamics of the network (e.g Differential equations)

:Realistic but are limited to small networks.


Hybrid approach: Jump3 - Model for gene expression, Tree-based method for network reconstruction

 -> On/Off model of gene expression

 what is Kinetic parameters?


뭔소린지 모르겠다

http://bioinformatics.oxfordjournals.org/content/31/10/1614.long

좋은저널에 나오긴 했는데.. 뭔소린지 모르겠다

Time seires data.


############################################################

Talk 2: Dragan Bosnacki (Eindhoven University of Technology)

REGENT: Logarithms, Hubs and Thresholded Transitive Reduction for Enhanced Scalable Network Inference

############################################################


Goal: Network reconstruction fom perturbation experiments.

Experiment on genes 'knockout'

Perturbing one gene and measuring activities of others.

Actually, there was an research about Perturbation network in our laboratory.


Han, Hyun Wook, et al. "Yin and Yang of disease genes and death genes between reciprocally scale-free biological networks." Nucleic acids research (2013): gkt683.


if --

a -(10)> b -(6)> c, 

a -----(4)-----> c

This algorithm will delete the edge a--(4)--c



Non-existing direct influence is inferred!


############################################################

Talk 4: Hervé Isambert (Institut Curie)

3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics

############################################################


Can one infer causality from mere correlation data?

Two-point Correlation does not imply causation

습도계? <- 태풍(구름?) -> 날씨?


Three-or-more-point Correlation might imply causation

Common cause principle: [E1 <- C - E2] (Hans Reichenbach 1891-1953. THE DIRECTION OF TIME

Independent CAuse Principle: [C1 -> E <- C2] (CAUSALITY, JUDEA PEARL)


Ancestral Graphs

V-structure: X - Z <- Y

L -> z     y <- L

(Latent variables L) x -> z <-> y <- w

http://ohnologs.curie.fr/ PLoS Comp Biol 2014




############################################################

Talk 6: Anthony Mathelier (University of Oslo).

Getting new data from old data: extracting information from TF ChIP-seq data sets. 

############################################################


Identifying diversity in transcriptional regulation from high-throughput sequencing experients.

*Understand architecture of non-coding DNA

Current appraoch: average analysis (Histone occupancy with CTCF occupancy, / Distance from CTCF binding site)

Don't throw away data-points just because they don't "fit"


No data left behind -> gene promoters.

What is a promoter? region(~100bp) near the transcription start site (TSS)

Promoter elements: Heterogeneous: no "single" architecture or rule. (BRE^u, TATA, BRE^d, Inr, MTE..)

Lots of TSS data: 5'CAGE, RACE, etc. quantify of TSSs at single nucleotide resolution


.. TATA .. CGA .. Prooter.

How do we find the rules? (Promoter)

We do not know:

identify of elements

number of elements that are part of an architecture

number of distinct architectures knockout

Probabilistic model:

arg max P(O | M, X)


Gibbs sampling to learn best model

Cortes et al. 2013 report TSSs in M.tuberculosis (Mtb) under exponential growth and starvation

They identify two classes: those with and those without the -10 motif TANNNT


No promoter left behind

nplb.ncl.res.in


No data let behind -> Protein-DNA interaction





############################################################

Talk 7: Anil Korkut (Memorial Sloan Kettering Cancer Center)

Perturbation biology: Inferring signaling networks in cellular systems.

############################################################


Perturbations and cellular response.

Cell: A collection of molecules

Evolved as a robust system under bombardment of perturbations.


Perturbation biology. Proteins & Phosphoproteins, Drug combinations.. -> Data generation: Perturbation experiments, phosphoproteomic profiles

1. Design

2. Response profiles

3. Model

4. Network inference 

5. Predict & Test


Model inference: Deriving models of a (biological) system.


Upstream node,

Interaction strength

Purturbation Strength


PERA Algorithm

Pathway extraction and reduction algorithm


genes or protein list

extract neigborhood maps of each entity from pathway commons using Biopax/Patools

Compute the shortest pair wise distance between nodes of ..


Korkut a focus: pathway analysis SKCM TCGA


ostdocs wanted

Network pharmacology


================================================

R2 Tuorials, Release 3.0.0 / The R2 support team

================================================


R2 genomics analysis and visualization platform

- Web (cloud) based (http://r2.amc.nl/)

- public collection datasets (uniform normalization)

- PRivate datasets (Shielded access group / user)

- Analysis/Visualization tools


Intended users

- Biomedical researchers

- Wetlab biologists


mRNA, microRNA, SNP, CGH, Methylation, ChIP..


http://www.pnas.org/content/102/36/12837.full

Method: Significance analysis of time course microarray experiments


================================================

Talk 7: DisGeNET

A discovery platform to support translational research on human diseases / Janet Pinero and Laura I. Furlong

================================================


Questions:

What are the disease associated to the gene SIRT1?

What are the genes associated to a Alzheimer's disease?

What are the genes shared by comorbid diseases?

What are the genetic variants associated to obesity?

What are the druggable proteins associated to schiizophrenia?

Which are the pathways perturbed in Lafora diseases?


High throughput genomic technologies are helping to find disease genes and pathogenic variants.

A typical whole exome sequencing experiment produces 30,000-100,000 variants relative to the reference genome

Approximately 10,000 of these variants willhave a consequence at the protein function

Only one or few may be causative


Main difference with OMIM, GAD, Clinvar, CTD, MalaCards

DisGeNET: Intergrated all dataset, We can get all of this database informations.


- Knowledge platform on human diseases and their genesCovers all disease therapeutic areas

[ Integraes information from exper-curated resources and from the literature.

- GDA, and supporting evidnece.


(Gene-disease associations)

MEDLINE -> DisGeNET <- Biomedical databases


Genotype: Gene, protein, SNPs, NCBI ID, HUGO, Uniprot, dbSNP..

Phenotype:

Genotype <-> Phenotype


Signs, symptoms and diseases in DisGeNET

: Phenotypes, sign, symtoms, diseases, disease class..


DisGeNET score = S(curated) + S(predicted) + S(literature)


- Disease Specificity Index (DSI): DSI = Log2(Nd/NT) / log2(1/NT)

Indicates how specific is a gene with respect to diseases.

- Disease Pleiotropy Index (DPI): DPI = Ndc/Ntc


# Cytoscape App, R Package, Web Interface, Semantic WEb, Custom Scripts



1. import style (XML file)

http://disgenet.org/web/DisGeNET/menu/app/#style

DisGeNET Cytoscape Style - Download the DisGeNET Cytoscape Style

2. Cytoscape -> Controlpannel [Style] -> DisGeNETstyle -> Apps install anyway.

3. Apps -> DisGeNET -> Configure DisGeNET Database [Download]

4. Start DisGeNET


Gene Disease Network

Disease Projections

 -> Disease-Disease interaction (sharing common genes)

Gene Projections

 -> Gene-Gene Interaction ( sharing common disease)


What are the perturbed pathways in Lafora isease?

What proteins associated with Aarskog syndrome are potential drug?

What..


RDF: Resource Description Framework

 - Captures logical structure of the data

 - Graph representation

 - SPARQL: RDF query language


Queralt-Rosinach, Núria, et al. "DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases." Bioinformatics (2016): btw214.


http://rdf.disgenet.org/sparql/?default-graph-uri=&query=

SELECT+DISTINCT+%3Fdisease+%0D%0A++++%3Fdiseaselabel+%0D%0A++++%3Fdiseasename+%0D%0AWHERE+%7B++%0D%0A++++%3Fpanther+rdfs%3AsubClassOf+sio%3ASIO_000275+%3B%0D%0A++++++++dcterms%3Atitle+%3Fpanthername+.+%0D%0A++++FILTER+regex%28%3Fpanthername%2C+%22transporter%22%29+%0D%0A++++%3Fgene+sio%3ASIO_000095+%3Fpanther+.%0D%0A++++%3Fgda+sio%3ASIO_000628+%3Fgene%2C+%3Fdisease+.+%0D%0A++++FILTER+regex%28STR%28%3Fdisease%29%2C+%22umls%2Fid%22%29%0D%0A++++%3Fdisease+dcterms%3Atitle+%3Fdiseasename+.%0D%0A++++%3Fdisease+rdfs%3Alabel+%3Fdiseaselabel%0D%0A%7D+%0D%0ALIMIT+100&format=text%2Fhtml&timeout=0&debug=on


Jaccard Coefficient

https://bitbucket.org/ibi_group/disgenet2r/src/aeaf5d7c0f79de5c74959c004788b8f2c2d0a95d/vignettes/generalOverview.pdf?at=master&fileviewer=file-view-default

https://en.wikipedia.org/wiki/Jaccard_index




================================================

Monday_ Keynote speaker

Dynamic cell states and their genomic regulation

================================================


1 genome sequence

100 trillion (single) cells in our body.

How many distinct programs?

How many cell "types"?

......................

scRNA aalysis: single-cell analysis (phenomenology)

Clusters can be visualized in 2D space.

Projection is extremely subjective-depending on features, metrics getting what you assumed a-priori...

Network graph - Cardiomyocytes, Endothelial precursors, Primitive Erythrocytes.


A decision tree? Waddington Canalization?


1. Tree: Defined, discrete progenitor states

2. Canalization: Programmed trajectories / http://plato.stanford.edu/entries/innate-acquired/figure4A.jpg

3. Mini-golf models: Escape, explore, converge


Topological domain(TAD), TADs(and chromosome architecture)

.A new way to look at Hi-C contact maps.


================================================

Mississipi session1-1.

Genome-wide predictions of miRNA regulation by transcription factors.

================================================


Gene Regulation. Which transcription factor bind to DNA to regulate expression?

Which TFs regulate each miRNA?: lots of work on predicting gene targets of TFs, gene targets of miRNAs, but not much on TF regulation of miRNAs themselves.

Follow-up problem: If a miRNA is genic (intronic), is it regulated by the same transcription factors?

If a miRNA is genic(iintronic), is it regulated by the same transcription factors that regulate the gene, or independently?

TF-miRNA interactions from TransmiR database.

semi-supervised learning.

Network-based label imputation.


Label propagation: network -> Network smoothing.

compute vector F that is smooth over the network and acounts for prior knowledge Y about each node.

Extend previous work on network propagation/smoothing for semi-supervised label detection.


What network?: k-NN network over data points (here, k=25)

Gene/miRNA bicluster enriched for cancer-related GO terms, plotted with...



================================================

Mississipi session1-2.

Cell systems. Cross-tissue regulatory gene networks in coronary artery disease.

[Multi-organ expression profiling

================================================

Systems genetics: A network view of complex diseases.

 - Genic risk loci identified by GWAS are mainly located in intergenic regions, explain small proportion of disease risk.

 Presumably play gene regulatory role via differential TF binding.

 Gene regulatory netowkrs affected by multiple genetic and environmental perturbations can be reconstructed from multi-omics data. / Albert and kruglaak, Nat Rev Genet(2016)


The stockholm atherosclerosis gene expression study.

Multi-organ expression profiling of patients undergoing coronary artery bypass grafting surgery at Karolinska university hospital.

STAGE has distinct tissue profiles. (AAW, IMA, Liv, SM, SF, VF, WB)

STAGE informs on tissue-specific effects of GWAS loci.

-> 1000 genome에서 Perturbation sensitivity network 연구 했었는데, 그러면 Multi-organ에서 Perturbation sensitivity를 구해볼 수 있지 않을까?


Cross-tissue coexpression analysis identifies multi-tissue processes.

left axis: genes, bottom axis: gene, 위, 오른쪽에 AAW, IMA, LIV, SM, SF, VF, WB 별로 따로 따로 matrix.

WGCNA identified 171 co-expression clusters (94 tissue-specific/77 corss-tissue) ****

12 clusters conserved in mouse with same phenotype association in same tissue.

Key drivers of athero-causal bayesian gene networks validated by siRNA targeting in THP-1 foam cells.


Assessment of causality using eQTLs and GWAS.

- correlation with phenotype does not imply causation.

- Genetic variation is causally upstream of gene expression and phenotypic variation.

Test enrichment of module-eQTLs for risk variants from CAD GWAS. [Lamparter et al, PLOS CB (2016)]


* Highlight.

Bayesian gene network reconstruction.

Bayesian network: Model causal relations between gene expression traits by a directed acyclic graph g and joint distribution.


Inference: Made..

Causal inference using eQTLs as instrumental variables disentangles correlated traits.


================================================

Mississipi session1-3.

Loical model specification aided by model checking: application to the mammalian cell cycle regulation.

================================================


Logical modeling framework for systems biology.

Model building: BIological system -> regulatory network. -> Abstract representation (boolean model) 

-> Differential model (continuous?)

mommaian cell cycle [Faure, Naldi, Chaouiya, bioinformatics 2006]

Cyclins: CycA CycB..CyCd,

Activator E2F..


Synchronous dynamics

Asynchronous dynamics

모르겠음 ;; Cell cycle에 따라서 Cyclins들의 발현?인가 뭔가 보는 메소드 논문인거같음


================================================

A landscape of pharmacogenomic interactions in cancer [Resource: cell]

================================================


GDSC1000: Genomics of Drug Sensitivity in Cancer.

High confidence cancer driver genes.

Chromosomal reguons of recurrent focal deletion, focal amplification

Informative CpG islands.


LOBICO model..

CancerRxGene.org

Computational cancer biology (ccb.nki.nl)

질문: TCGA에서 CFE를 찾고, Cell line에서 보았는데, 반대로 해보면 어떤가? 해봤나?


================================================

Gene-set association test

================================================


Introduction

- rare variant associations

- gene & Gene-set level set

Proposed methods

- Gene-level: Quadratic Test (QT-test)


Issue for rare variants: Collapsing method

Region can be a gene or other biological grouping unit.

Colllapsing multiple rare variants (Burden-type test) can increase power.


Bi-directional effects.

When there are only deleterious causal variants, Collapsing increase power especially if non-causal variants are a few.

When there are deleterious and protective causal variants together, This yields low power of collapsing method

Another type of test (Non-burden test) is required in this case.


Gene-set-level test for rare variants?

Gene set analysis for microarray data -> gene set analysis for GWAS -> [Gene set analysis for rare variants.] 이게 이 그룹에서 한거

Focus on gene sets rather than on individual genes or variants.


QTest1: Burden type test(Inverse variance weighting method)

- Multiple regression framework.

QTest2: Non-burden type test.

통계 공식 설명하는데 이거 알아먹을수 없음

QTest3: Optimal quadratic test.


KARE(Korean Association REsource)

MSigDB

Sample: 900 Korean individuals , KOBRA??;;


T2D-GENES Consortium.




================================================

Session 2-3

================================================

Challenges

Networks can be large and rich on data.

- Graph databases provide the necessary performance.


Graph databases

neo4j: Integration (full fledged DB server)

  Persistence: easy to launch

  Performance: Traversing is fast searching via iindex.

  Query & Analytics: Cypher plugins

  Multiple Graps: Simple to handle and transfer.


Network library

NetLib Primer : Creates and annotates nodes.

NetLib Edger : Parses data sources and imports information as edges.

NetLib Curator : modifies an existing network

NetLib Scribe : Extracts statistics or subnetwork

cyNeo4j : Cytoscape app to connect to a Neo4j database


NetLib Primer

TAB-Tab filemiRBase & aliases.

DisGeNET diseases.

GmtIds


Which miRNA targets ~~?

visualizing the (sub)network

match (h:hypertrophy)-[s:interacts_with]->(n)<-.. from.. where..

RDF같은 뭔가 자기들만의 query system을 만들어서 제공하는 것 같다.




================================================

Session 4-1

Edge-based sensitivity analysis of signaling networks by using boolean dynamics

================================================

울산대학교

wHICH EDGES HAVE STRONG INFLUENCE ON THE NETWORK DYNAMICS?

-> bOOLEAN NETWORK MODEL.

v = {A, B, C, D}

e = {(a,b,+), (a,c,-)....}


Node-based mutations.


*Database?

Shuffled networks from WANG network

bioCartar, Cancer Cell MAP


Degree(DEG), BEW.. 이런 약자들 쓰는데 무슨뜻일까?

Most of the highly sensitive edges are located at the centre of the signaling network.

LCC proortion?

Genes forming the highly sensitive interactions are more central than the other genes.


Application to edgetic drug discovery: p53 signaling pathway.


================================================

Session 4-2

On cross-conditional and fluctuation correlations in competitive RNA networks.

================================================


Biological Background

Consider 2 mRNAs target by a miRNA.

Correlations between mRNA can emerge for different reasons.

- Stochastic fluctuations correlation: In a simulation( e.g.gillespie) are the pearson coefficient of these 2 variabels. They are relevant in single-cell experiments.

- Cross conditional correlation: Pearson coefficient of two variables at the steady-state of the mean-field behaviour.

In a real experiment CC correlations can mean: Tissue samples(cancer or not), different type of cells.


SF correlations and CC correlations tend to arise under different conditions.


Network topology II: influence on miRNA-mRNA balance.

Network topology has a very important effect on the correlations.



================================================

Inferring causal molecular networks: empirical assessment through a community-based effort

================================================


Molecular networks represent causal relationships between nodes.

A casual edge predicts that intervention on the parent node (A) leads to a change in the child node(B).

Causal edges mat represent indirect effects via unmeasured intermediate nodes.


Inference of causal molecular netwkr.

Given data on set of molecular variables, learn directed network that describes causal relationships between them.

Causal network inference is challenging!

No guarantees will work well in a given setting.

* Empirical assessment of casual validity of inferred networks is essential.


Empirical assessment

ASsessment often performed on simulated data.

-True causal network is known.

Lack of "gold standard network" in most applications.

NEed an approach to empirically assess causal network inference methods within the setting of intrest.

HPN-DREAM network inferene challenge.

Assess ability of computational methods to learn causal olecular networks in a compelx. mammalian setting.

Proposed a causal assessment approach taht uses a held-out test data..

HPN-DREAM network inference challenge.

Inference of causal protein signalling networks in beast cancer.

rPPA data from cancer cell lines(with spellman lab, OHSU; mills lab, MDACC)

Infer a network for each of 32 biological contexts (cellline x stimulus)

data for each context...

test data consists of:

Time courses for the 32 contexts.

Under an inhibitor not contained in training data (mTOR inhibitor)


Inhibition of mTOR reveals descendants of mTOR in the true causal network for a given context, k.

DMSO

mTORi, q-value = 3.2 * 10^-5..


Scoring procedure

Set of predicted edge..


y: in silico data task - AUROC

X: experimental data task - mean AUROC

r = 0.35

p = 0.011


================================================

Ketnode speaker. (월요일 오후 5시)

Using single-cell transcriptomics to understand cellular heterogeneity

John Marioni. Senior Group Leader, CRUK Cambridge Institute, University of Cambridge. Associate Faculty, WT Sanger Institute.

Research group leader: EMBL-EBI.

================================================


Most studies have focused on examing molecular measurements in large populations of cells.

However, some biological processes require the study of variation at the single-cell level.


- Current technologies.

scRNA-seq - tens of thousands of cells in a single experiment (Drop-Seq)

Genome Sequencing - still a smaller number of cells but beginning to scale up.

Epigenetic sequencing - bisulfite sequencing is most mature.

Chromatin structure - ATAC-Seq, HiC

- LArge volumes of data, especially for scRNA-seq, with new computational and storage challenges.


Single-cell RNA-seq.


Identifying highly variable genes.

RNA extracted from single cell -> Process libraries and sequence <- RNA from a spike in pool (RNA pool)

Current approaches first estimate technical error and plug into downstream analysis - motivated development of BASiCS

 - Cell specific "sequencing depth / technic" normalization parameter is inferred using spike-in and endogeneous genes.

 - Cell sepcific mRNA content parameter is inferred using only endogenous genes.

 

* Jointly modelling and inferring normalization and gene specific parameters (such as the mean and dispersion) leads to more stable results


vallejos, rechiardon & marioni, Genome Biol, 2016



================================================

Keynote speaker: Nuria lopez-bigas (pompeu fabra univ) / 화요일 오전 9시

Somatic mutational processes and cancer vulnerabilities

================================================

Tumor genomes

> understanding mutatinal processes.

- Figure. Mike stratton, EMBO Molecular medicine(2013) (좋아한다고함.. 되게 넓은 그림, 반원으로 되어있는 그림) - http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=&url=http%3A%2F%2Fwww.jeantet.ch%2Fe%2Fwinners%2Fimages%2Fclip_image002.png&psig=AFQjCNG7BXwelMlA1J3lV3jm9HTKdS30Gg&ust=1473232293100137&cad=rjt

> Finding drivers of cancer.

> Precision cancer medicine



>Tumor genomes


The large amount of tumor whole-genomes provides a unique opportunity to understand mutational and repair processes.


Mutational burden and mutational signatures.

- Figure. Lawrence et al. Nature 2013

Alexandrov et al. Nature 2013: C>T mutation is highly..


Mutation rate correlates with chromatin features at megabase scale.

Melanoma C>T mutation density

Melanocyte DNaseI accessibility index (reverse scale)


Increased mutation rate in active TFBS in melanomas.

DHS, noDHS, DHS-backgroind, noDHS-background 그래프 ㅇㅇ.

http://www.nature.com/nature/journal/v532/n7598/fig_tab/nature17661_F1.html


Nucleotide Excision Repair (NER): GLobal and Transcription-couipled repair

- Lans et al. Epigenetics & Chromatin 2012 5:4

- Hu et al., G & D, 2015*Sancar lab) - XR-seq (eXcision Repair Sequencing)


Mutation/repair rate in TFBS correlate with TF binding strength


Is transcription coupled NER also affected by TFs binding to DNA?

Decreased NER activity at active TFBS in transcribed regions.


Is this a pathogenic affect?

Is mutation rate in TFBS also increased in other tumor types?

Increased mutation rate at active TFBS in lung cancer.

High mutation rate in TFBSs caused by impaired access of repaired machinery.



> Finding drivers of cancer


http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=&url=http%3A%2F%2Fwww.jeantet.ch%2Fe%2Fwinners%2Fimages%2Fclip_image002.png&psig=AFQjCNG7BXwelMlA1J3lV3jm9HTKdS30Gg&ust=1473232293100137&cad=rjt

Tumor development follows a Darwinian process.

Variation -> Selection.

Drivers confer selective advantage to the cell.

Clonal expansion.


Signals of positive selection (methods)

OncodriverFM:   Identifies genes with a bias towards high functional mutations (FM bias)

OncodriveCLUST: Identifies genes with a significant regional clustering of mutations.

MutSig-CV: Identify genes mutated more frequently than background mutation rate.

==> 459 cancer driver genes (156 BLAD, 75 GBM, 184 BRCA ...)

(Constructing a background model according to mutational processes)


Complementary signals of positive selection.

rubio-Perez, Tamborero et al., Cancer Cell 2015.

- Cell cycle, DNA damage, Angiogenesis, Cell adhesion, Proliferation, Cytoskeleton, Appotosis, Chromatin regulatory region...


http://www.intogen.org : Cancer Drivers Database


Drivers in Noncoding Regions?

Challenges

Tumours have thousands of mutations

Unknown function of most of noncoding regions


Mularoni et al. Genome Biology 2016. OncodriveFML

https://www.researchgate.net/publication/304027382_OncodriveFML_A_general_framework_to_identify_coding_and_non-coding_regions_with_cancer_driver_mutations

- Measuring functional impact.


Combined scores: eg. Combined Annotation Dependent Depletion (CADD)

Fitness COnsequence Scores (FitCons)

QQ-plot: Y: Observed pvalues (-log10), X: Expected pvalues (-log10)

Identifies prmoters with driver mutations



> Precision cancer medicine


Drugs targeting cancer drivers

BRAF(V600E) <- Vemurafenib

How many patients could benefit from curent and future targeted drugs against cnacer drivers?

TCGA Tumor smaples..

Rubio-Perez, Tamborero et al., Cancer Cell 2015. 이거 내가 dgidb 발표하면서 인용했던 그림임 ㅋㅋ 하나도 이해 못하고 발표했었네.


From cohorts to individual tumors.

- Cancer Genome Interpreter


OncodriveMUT


Not all mutations in driver genes are driver mutations! (Passenger mutation)

We need to one more step, dividing driver mutations and passenger mutation in driver genes.


Predicted TP5 driver mutations by OncodriveMUT exhibit lower biological activity.

Experimental validation of rare mutations in oncogenes.

(Kim et al., Cancer Discovery 2016)

Cancer bioMarkers-db

* http://CancerGenomeInterpreter.org/


======================================================================================

Natwork-based approaches for drug response prediction and drug combinations in cancer.

======================================================================================


Atlas of cancer signalling network: navigating cancer biology with GoogleMaps (Oncogenesis, 2015)

NaviCell Web Service for Network-based Data Visualization

Drug Driven Synthetic Lethality: bypassing tumor cell genetics with a combination..


Cancer: complex system.

Signalling pathway / Signalling network.

Atlas of Cancer Signalling Networks: Resource of knowledge on molecular mechanisms and analytical tool

Ongoing: signaling maps in construction.

 - immune, regulated cell death, telomere maintenance, Angiogenesis, Centrosome regulation..

 

 Regulated cell death map: many modes to die (onoing)

 Immune response and tumor microenvironment map: Communication .. (ongoing)

 A web tool for navigation, curation and data analysis in the context of signaling networks.

- NaviCell Web service:NAR (2015),

- NaviCell: BMC system biology (2013)


'Vertical' map navigation: highlighting neighbors of an entity...


Visualization of omics data on ntworks using NaviCom

Distribution of oncogenes frequently mutated in cancer..


ACSN 'staining' for module activity: stratification, mRNA expression data.

Basal-like, HER2+, Luminal A, Luminal B..


Treatment approaches in cancer: Targeted drugs: synthetic lethality paradigm.

BRCA/PARP Synthetic lethal pair.

Improve synthetic lethality paradigm for cancer treatment: from pair to set.

*** Synthetic lethal gene set = intervention gene set (Proliferation)


Regulators of DNA repair steps are potential targets to restore sensitivity to genotoxic treatment.

DNA repair network, -> state transition graph retrieval -> State transition graph and all regulations on DNA repair map.

-> Organization of regulators for each step in DNA pair.

E1: Regulators of first level, E2: second level, E3: third level...


OSCANA: optimal combinations of interventions from network analysis: 2013 bioinformatics.

: to reveal minimal cut sets.

Identifying points of fragility in the network, gene omibnations to interfere whith a phenotype.

*** Enrichment of SL from shRNA screen lines. (DECIPHER projet)


Explaing synergistic effect of combined treatment : DNA repair inhibitors Dbait and PARPi

Survival of Triple negative breast cancer cell lines to Dbail or Olaparib

Drug driven synthetic lethality: bypassing tumor cell genetics with a combination of Dbait and PARP inhibitors. (Clinical cancer research 2016)


Unique genes robustly correlated with resistance to Dbait or Olaparib in triple neative breast cancer cell line.

ACSN modules enrichment in triple negative breast cancer cell lines.



Biological network odelling and precision medicine in oncology.

The shortest path is not the ome you now: application of biological network resources in precision oncology research.

NEtwork-based approaches for drug response prediction...



======================================================================================

A weighted Exact Test for Mutually Exclusive Mutations in Cancer.

======================================================================================


Driver mutations cause cancer.

Driver mutations target pathways.


Different combinations of driver mutations in diffeent patients.

*** Driver mutations are often mutually exclusive.

*** 1. usually no fitness advantage for utltiple mutations in same pathway.

*** 2. few driver mutations, which are distributed across multiple pathway.

*** Approximately, one driver mutation per mutated pathway per patient.


* Mutual exclusivity scores: combinatorial..

* Combinatorial scores

* Muex, Mutex, CoMEt

* Mutual exclusivity scores: Statistical, Row-exclusivity test (R-exclusivity): One-side Fisher's exact test for independence for pairs of genes.

* Highly mutated patients are more likely to have many passenger mutations.

* Top 5 R-exclusivity triples from endometrial carcinomas [TCGA, Nature 2014] using CoMEt

* Long genes(more than 11,000 nucleotides)

* Hypermutators(patients with more than 500 mutated genes)


MEMo: Ciriello et al Genome Res, 2012, 

MEMCover: kim et al., ISMB 2015

WeSME: Kim et al, bioinformatics 2016


Row-column-exclusivity test (RC-exclusivity)

10^4 samples for MEMo, MEMCover, more than samples possible for WeSME


WExT: Weighted Exclusivity Test.

Weighted-row-exclusivity (WR-exclusivity) test.

For set M of k genes,

파이WR(M) = Pr(TM >=TM | Y1=y1...yk=yk),

where tM = observed number of ....


Estimate mutation probabilities W by permuting mutation matrix A.

Approximate the RC-exclusivity test.

using weight matrix W-RC*.

시발 존나 어려운데 내가 하고싶은 연구다!!


WR-exclusivity test approaximates RC-exclusivity test.


WExT results on endometrial cancer.

6/7 Predictions are known ancer genes (CTNNB1, RPL22, TP53, KRAS, MLL4, CTCF...)