ECCB 학회 노트필기
세션 들으면서 정리했던건데.. 벌써 갔다온지 몇달이 훌쩍 지났다.
노트필기 하면서 무슨 생각을 했는지 사실 기억도 안나는데,
이거 이러다가 잊어버릴 것 같아서 그냥 블로그에 흔적을 남겨본다.
A survey of best practices for RNA-seq data analysis - Genome Biology
Perturbation biology nominates upstream–downstream drug combinations in RAF inhibitor resistant melanoma cells
http://www.intogen.org : Cancer Drivers Database - 이거 내가 만드는 네트워크에서 한번 비교해보자
http://CancerGenomeInterpreter.org/
Jaccard Coefficient를 써서, Mutation이 일어난 유전자, 안일어난 유전자의 Mutually exclusivity를 체크할 수 있지 않을까.
- GS로 네트워크를 그릴 때, co-mutation network, mutual exclusivity network 따로 그려보자(후자는 jaccard)
- 그럼 Hub-gene이 Mutation이 없다면, 그것이 타겟으로 될 확률이 굉장히 높을 것임
Mutual exclusivity로 GS <0.3 유전자들을
- Background
Mammalian systems constitute over 200 cell types, each specialized to perform a distinct function, and yet all cell types share the same genome.
This cell type specificity is achieved by a context-specific interpretation of the DNA sequence to produce a cell type specific transcription signature.
Advances in sequencing techniques have accelerated the characterization of transcription landscapes across many normal and malignant cell types.
The challenge now is to integrate these data to understand transcriptional control at a systems level.
Over the years, powerful machine learning algorithms have been developed for inferring transcriptional networks from expression data,
thereby revealing new aspects of complex biological systems.
- Aims and scope
This one day SIG session will bring together experts from computational biology and machine learning to present recent advances in the development and
application of gene regulatory network inference methods, as well as novel emerging single-cell and epigenomics data types suitable for network inference.
The SIG will be split into two half day sessions. The first half will focus entirely on novel network inference methods,
while the second half will focus on opportunities and challenges arising from new data types.
Each session will feature an invited speaker and three short talks.
####################################################################
Keynote talk 1: Prof. Sushmita Roy (University of Wisconsin-Madison)
####################################################################
Regulatory network dynamics on developmental and evolutionary lineages.
computational method to look at regulatory network
- Dynamics of regulatory networks on lineages.
1. Cell lineage, 2.Species phylogeny. (Time scale)
What determines a regulatory edge?
Regulator -> Target
(1. Transciption factors, 2. Long-range regulation, 3. Chromatin, 4. Signaling networks)
COmputational tools to dissect regulatory networks.
- Network inference (MERLIN (Roy et al, Plos Comp bio 2013), MERLIN0P)
Who regulate whom? (Several regulator -> y)
Experimental: 1. ChIP-chip and ChIP-seq,
2. Factor/regulator knockout followed by genome-wide transcriptome measurements.
3. DNase I/ATAC-seq + motif finding
Computational: 1. Supervised network inference
2. Unsupervised network inference
Expression-based network inference
-Can potentially recover genome-wide regulatory networks.
Regulatory gene expression modules
A module: set of genes that are co-expressed across conditions.
Two popular paradigms to network inference
1. Reconstruction per GENE: Learn precise models of regulation for each gene.
2. Reconstruction pr MODUE: Reveals modular organization of regulatory networks. -> Easily interpretet
Per-module (LeMoNe) vs Per-gene (CLR) method
MERLIN: A network reconstruction method to predict regulators of genes and modules
Key idea: Favor regulators that are already associated with a module.
MERLIN uses a Bayesian formulation of network inference. D(Matrix) -> Algorithm -> G(Graph) : P(G|D) -> P(D|G)P(G)
#Could you tell me one more time about the way to define interesting module in detail?
Chasman et al., PLOS Computational Biology 2016
- Regulatory networks on evolutionary lineages (Arbortum MRTLE)
Shor-term change: Devleopment
Long-term change: Evolution
- Integrating chromatin and expression in cel fat CMINT
Reprogramming factors - MEF(Mouse Embryonic Fibroblasts) -> iPS cells (Induced Plurpotent Stemcell)
Somatic cells -> Intermediate cells (Transient population) -> iPS cells
Partially reprogrammed cells (Stable cell line)
A regulatory module: set of genes with similar regulatory state
- PRedicting enhancer promoter inteactions (RIPPLE, Arboretum-Hi-C)
Long-range gene regulation by distal elements
RIPPLE: A machine learning approach to predict enhancer-promoter interactions
############################################################
Talk 1: Van Anh Huynh-Thu (University of Liege)
Combining tree-based and dynamical systems for the inference of gene regulatory networks.
############################################################
There are two main families of methods.
1. Score-based: Compute statistical dependencies etween pairs of expression profiles.
:Fast, but can not make preictions
2. Model-based: learn am odel capturing the dynamics of the network (e.g Differential equations)
:Realistic but are limited to small networks.
Hybrid approach: Jump3 - Model for gene expression, Tree-based method for network reconstruction
-> On/Off model of gene expression
what is Kinetic parameters?
뭔소린지 모르겠다
http://bioinformatics.oxfordjournals.org/content/31/10/1614.long
좋은저널에 나오긴 했는데.. 뭔소린지 모르겠다
Time seires data.
############################################################
Talk 2: Dragan Bosnacki (Eindhoven University of Technology)
REGENT: Logarithms, Hubs and Thresholded Transitive Reduction for Enhanced Scalable Network Inference
############################################################
Goal: Network reconstruction fom perturbation experiments.
Experiment on genes 'knockout'
Perturbing one gene and measuring activities of others.
Actually, there was an research about Perturbation network in our laboratory.
Han, Hyun Wook, et al. "Yin and Yang of disease genes and death genes between reciprocally scale-free biological networks." Nucleic acids research (2013): gkt683.
if --
a -(10)> b -(6)> c,
a -----(4)-----> c
This algorithm will delete the edge a--(4)--c
Non-existing direct influence is inferred!
############################################################
Talk 4: Hervé Isambert (Institut Curie)
3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics
############################################################
Can one infer causality from mere correlation data?
Two-point Correlation does not imply causation
습도계? <- 태풍(구름?) -> 날씨?
Three-or-more-point Correlation might imply causation
Common cause principle: [E1 <- C - E2] (Hans Reichenbach 1891-1953. THE DIRECTION OF TIME
Independent CAuse Principle: [C1 -> E <- C2] (CAUSALITY, JUDEA PEARL)
Ancestral Graphs
V-structure: X - Z <- Y
L -> z y <- L
(Latent variables L) x -> z <-> y <- w
http://ohnologs.curie.fr/ PLoS Comp Biol 2014
############################################################
Talk 6: Anthony Mathelier (University of Oslo).
Getting new data from old data: extracting information from TF ChIP-seq data sets.
############################################################
Identifying diversity in transcriptional regulation from high-throughput sequencing experients.
*Understand architecture of non-coding DNA
Current appraoch: average analysis (Histone occupancy with CTCF occupancy, / Distance from CTCF binding site)
Don't throw away data-points just because they don't "fit"
No data left behind -> gene promoters.
What is a promoter? region(~100bp) near the transcription start site (TSS)
Promoter elements: Heterogeneous: no "single" architecture or rule. (BRE^u, TATA, BRE^d, Inr, MTE..)
Lots of TSS data: 5'CAGE, RACE, etc. quantify of TSSs at single nucleotide resolution
.. TATA .. CGA .. Prooter.
How do we find the rules? (Promoter)
We do not know:
identify of elements
number of elements that are part of an architecture
number of distinct architectures knockout
Probabilistic model:
arg max P(O | M, X)
Gibbs sampling to learn best model
Cortes et al. 2013 report TSSs in M.tuberculosis (Mtb) under exponential growth and starvation
They identify two classes: those with and those without the -10 motif TANNNT
No promoter left behind
nplb.ncl.res.in
No data let behind -> Protein-DNA interaction
############################################################
Talk 7: Anil Korkut (Memorial Sloan Kettering Cancer Center)
Perturbation biology: Inferring signaling networks in cellular systems.
############################################################
Perturbations and cellular response.
Cell: A collection of molecules
Evolved as a robust system under bombardment of perturbations.
Perturbation biology. Proteins & Phosphoproteins, Drug combinations.. -> Data generation: Perturbation experiments, phosphoproteomic profiles
1. Design
2. Response profiles
3. Model
4. Network inference
5. Predict & Test
Model inference: Deriving models of a (biological) system.
Upstream node,
Interaction strength
Purturbation Strength
PERA Algorithm
Pathway extraction and reduction algorithm
genes or protein list
extract neigborhood maps of each entity from pathway commons using Biopax/Patools
Compute the shortest pair wise distance between nodes of ..
Korkut a focus: pathway analysis SKCM TCGA
ostdocs wanted
Network pharmacology
================================================
R2 Tuorials, Release 3.0.0 / The R2 support team
================================================
R2 genomics analysis and visualization platform
- Web (cloud) based (http://r2.amc.nl/)
- public collection datasets (uniform normalization)
- PRivate datasets (Shielded access group / user)
- Analysis/Visualization tools
Intended users
- Biomedical researchers
- Wetlab biologists
mRNA, microRNA, SNP, CGH, Methylation, ChIP..
http://www.pnas.org/content/102/36/12837.full
Method: Significance analysis of time course microarray experiments
================================================
Talk 7: DisGeNET
A discovery platform to support translational research on human diseases / Janet Pinero and Laura I. Furlong
================================================
Questions:
What are the disease associated to the gene SIRT1?
What are the genes associated to a Alzheimer's disease?
What are the genes shared by comorbid diseases?
What are the genetic variants associated to obesity?
What are the druggable proteins associated to schiizophrenia?
Which are the pathways perturbed in Lafora diseases?
High throughput genomic technologies are helping to find disease genes and pathogenic variants.
A typical whole exome sequencing experiment produces 30,000-100,000 variants relative to the reference genome
Approximately 10,000 of these variants willhave a consequence at the protein function
Only one or few may be causative
Main difference with OMIM, GAD, Clinvar, CTD, MalaCards
DisGeNET: Intergrated all dataset, We can get all of this database informations.
- Knowledge platform on human diseases and their genesCovers all disease therapeutic areas
[ Integraes information from exper-curated resources and from the literature.
- GDA, and supporting evidnece.
(Gene-disease associations)
MEDLINE -> DisGeNET <- Biomedical databases
Genotype: Gene, protein, SNPs, NCBI ID, HUGO, Uniprot, dbSNP..
Phenotype:
Genotype <-> Phenotype
Signs, symptoms and diseases in DisGeNET
: Phenotypes, sign, symtoms, diseases, disease class..
DisGeNET score = S(curated) + S(predicted) + S(literature)
- Disease Specificity Index (DSI): DSI = Log2(Nd/NT) / log2(1/NT)
Indicates how specific is a gene with respect to diseases.
- Disease Pleiotropy Index (DPI): DPI = Ndc/Ntc
# Cytoscape App, R Package, Web Interface, Semantic WEb, Custom Scripts
1. import style (XML file)
http://disgenet.org/web/DisGeNET/menu/app/#style
DisGeNET Cytoscape Style - Download the DisGeNET Cytoscape Style
2. Cytoscape -> Controlpannel [Style] -> DisGeNETstyle -> Apps install anyway.
3. Apps -> DisGeNET -> Configure DisGeNET Database [Download]
4. Start DisGeNET
Gene Disease Network
Disease Projections
-> Disease-Disease interaction (sharing common genes)
Gene Projections
-> Gene-Gene Interaction ( sharing common disease)
What are the perturbed pathways in Lafora isease?
What proteins associated with Aarskog syndrome are potential drug?
What..
RDF: Resource Description Framework
- Captures logical structure of the data
- Graph representation
- SPARQL: RDF query language
Queralt-Rosinach, Núria, et al. "DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases." Bioinformatics (2016): btw214.
http://rdf.disgenet.org/sparql/?default-graph-uri=&query=
SELECT+DISTINCT+%3Fdisease+%0D%0A++++%3Fdiseaselabel+%0D%0A++++%3Fdiseasename+%0D%0AWHERE+%7B++%0D%0A++++%3Fpanther+rdfs%3AsubClassOf+sio%3ASIO_000275+%3B%0D%0A++++++++dcterms%3Atitle+%3Fpanthername+.+%0D%0A++++FILTER+regex%28%3Fpanthername%2C+%22transporter%22%29+%0D%0A++++%3Fgene+sio%3ASIO_000095+%3Fpanther+.%0D%0A++++%3Fgda+sio%3ASIO_000628+%3Fgene%2C+%3Fdisease+.+%0D%0A++++FILTER+regex%28STR%28%3Fdisease%29%2C+%22umls%2Fid%22%29%0D%0A++++%3Fdisease+dcterms%3Atitle+%3Fdiseasename+.%0D%0A++++%3Fdisease+rdfs%3Alabel+%3Fdiseaselabel%0D%0A%7D+%0D%0ALIMIT+100&format=text%2Fhtml&timeout=0&debug=on
Jaccard Coefficient
https://bitbucket.org/ibi_group/disgenet2r/src/aeaf5d7c0f79de5c74959c004788b8f2c2d0a95d/vignettes/generalOverview.pdf?at=master&fileviewer=file-view-default
https://en.wikipedia.org/wiki/Jaccard_index
================================================
Monday_ Keynote speaker
Dynamic cell states and their genomic regulation
================================================
1 genome sequence
100 trillion (single) cells in our body.
How many distinct programs?
How many cell "types"?
......................
scRNA aalysis: single-cell analysis (phenomenology)
Clusters can be visualized in 2D space.
Projection is extremely subjective-depending on features, metrics getting what you assumed a-priori...
Network graph - Cardiomyocytes, Endothelial precursors, Primitive Erythrocytes.
A decision tree? Waddington Canalization?
1. Tree: Defined, discrete progenitor states
2. Canalization: Programmed trajectories / http://plato.stanford.edu/entries/innate-acquired/figure4A.jpg
3. Mini-golf models: Escape, explore, converge
Topological domain(TAD), TADs(and chromosome architecture)
.A new way to look at Hi-C contact maps.
================================================
Mississipi session1-1.
Genome-wide predictions of miRNA regulation by transcription factors.
================================================
Gene Regulation. Which transcription factor bind to DNA to regulate expression?
Which TFs regulate each miRNA?: lots of work on predicting gene targets of TFs, gene targets of miRNAs, but not much on TF regulation of miRNAs themselves.
Follow-up problem: If a miRNA is genic (intronic), is it regulated by the same transcription factors?
If a miRNA is genic(iintronic), is it regulated by the same transcription factors that regulate the gene, or independently?
TF-miRNA interactions from TransmiR database.
semi-supervised learning.
Network-based label imputation.
Label propagation: network -> Network smoothing.
compute vector F that is smooth over the network and acounts for prior knowledge Y about each node.
Extend previous work on network propagation/smoothing for semi-supervised label detection.
What network?: k-NN network over data points (here, k=25)
Gene/miRNA bicluster enriched for cancer-related GO terms, plotted with...
================================================
Mississipi session1-2.
Cell systems. Cross-tissue regulatory gene networks in coronary artery disease.
[Multi-organ expression profiling
================================================
Systems genetics: A network view of complex diseases.
- Genic risk loci identified by GWAS are mainly located in intergenic regions, explain small proportion of disease risk.
Presumably play gene regulatory role via differential TF binding.
Gene regulatory netowkrs affected by multiple genetic and environmental perturbations can be reconstructed from multi-omics data. / Albert and kruglaak, Nat Rev Genet(2016)
The stockholm atherosclerosis gene expression study.
Multi-organ expression profiling of patients undergoing coronary artery bypass grafting surgery at Karolinska university hospital.
STAGE has distinct tissue profiles. (AAW, IMA, Liv, SM, SF, VF, WB)
STAGE informs on tissue-specific effects of GWAS loci.
-> 1000 genome에서 Perturbation sensitivity network 연구 했었는데, 그러면 Multi-organ에서 Perturbation sensitivity를 구해볼 수 있지 않을까?
Cross-tissue coexpression analysis identifies multi-tissue processes.
left axis: genes, bottom axis: gene, 위, 오른쪽에 AAW, IMA, LIV, SM, SF, VF, WB 별로 따로 따로 matrix.
WGCNA identified 171 co-expression clusters (94 tissue-specific/77 corss-tissue) ****
12 clusters conserved in mouse with same phenotype association in same tissue.
Key drivers of athero-causal bayesian gene networks validated by siRNA targeting in THP-1 foam cells.
Assessment of causality using eQTLs and GWAS.
- correlation with phenotype does not imply causation.
- Genetic variation is causally upstream of gene expression and phenotypic variation.
Test enrichment of module-eQTLs for risk variants from CAD GWAS. [Lamparter et al, PLOS CB (2016)]
* Highlight.
Bayesian gene network reconstruction.
Bayesian network: Model causal relations between gene expression traits by a directed acyclic graph g and joint distribution.
Inference: Made..
Causal inference using eQTLs as instrumental variables disentangles correlated traits.
================================================
Mississipi session1-3.
Loical model specification aided by model checking: application to the mammalian cell cycle regulation.
================================================
Logical modeling framework for systems biology.
Model building: BIological system -> regulatory network. -> Abstract representation (boolean model)
-> Differential model (continuous?)
mommaian cell cycle [Faure, Naldi, Chaouiya, bioinformatics 2006]
Cyclins: CycA CycB..CyCd,
Activator E2F..
Synchronous dynamics
Asynchronous dynamics
모르겠음 ;; Cell cycle에 따라서 Cyclins들의 발현?인가 뭔가 보는 메소드 논문인거같음
================================================
A landscape of pharmacogenomic interactions in cancer [Resource: cell]
================================================
GDSC1000: Genomics of Drug Sensitivity in Cancer.
High confidence cancer driver genes.
Chromosomal reguons of recurrent focal deletion, focal amplification
Informative CpG islands.
LOBICO model..
CancerRxGene.org
Computational cancer biology (ccb.nki.nl)
질문: TCGA에서 CFE를 찾고, Cell line에서 보았는데, 반대로 해보면 어떤가? 해봤나?
================================================
Gene-set association test
================================================
Introduction
- rare variant associations
- gene & Gene-set level set
Proposed methods
- Gene-level: Quadratic Test (QT-test)
Issue for rare variants: Collapsing method
Region can be a gene or other biological grouping unit.
Colllapsing multiple rare variants (Burden-type test) can increase power.
Bi-directional effects.
When there are only deleterious causal variants, Collapsing increase power especially if non-causal variants are a few.
When there are deleterious and protective causal variants together, This yields low power of collapsing method
Another type of test (Non-burden test) is required in this case.
Gene-set-level test for rare variants?
Gene set analysis for microarray data -> gene set analysis for GWAS -> [Gene set analysis for rare variants.] 이게 이 그룹에서 한거
Focus on gene sets rather than on individual genes or variants.
QTest1: Burden type test(Inverse variance weighting method)
- Multiple regression framework.
QTest2: Non-burden type test.
통계 공식 설명하는데 이거 알아먹을수 없음
QTest3: Optimal quadratic test.
KARE(Korean Association REsource)
MSigDB
Sample: 900 Korean individuals , KOBRA??;;
T2D-GENES Consortium.
================================================
Session 2-3
================================================
Challenges
Networks can be large and rich on data.
- Graph databases provide the necessary performance.
Graph databases
neo4j: Integration (full fledged DB server)
Persistence: easy to launch
Performance: Traversing is fast searching via iindex.
Query & Analytics: Cypher plugins
Multiple Graps: Simple to handle and transfer.
Network library
NetLib Primer : Creates and annotates nodes.
NetLib Edger : Parses data sources and imports information as edges.
NetLib Curator : modifies an existing network
NetLib Scribe : Extracts statistics or subnetwork
cyNeo4j : Cytoscape app to connect to a Neo4j database
NetLib Primer
TAB-Tab filemiRBase & aliases.
DisGeNET diseases.
GmtIds
Which miRNA targets ~~?
visualizing the (sub)network
match (h:hypertrophy)-[s:interacts_with]->(n)<-.. from.. where..
RDF같은 뭔가 자기들만의 query system을 만들어서 제공하는 것 같다.
================================================
Session 4-1
Edge-based sensitivity analysis of signaling networks by using boolean dynamics
================================================
울산대학교
wHICH EDGES HAVE STRONG INFLUENCE ON THE NETWORK DYNAMICS?
-> bOOLEAN NETWORK MODEL.
v = {A, B, C, D}
e = {(a,b,+), (a,c,-)....}
Node-based mutations.
*Database?
Shuffled networks from WANG network
bioCartar, Cancer Cell MAP
Degree(DEG), BEW.. 이런 약자들 쓰는데 무슨뜻일까?
Most of the highly sensitive edges are located at the centre of the signaling network.
LCC proortion?
Genes forming the highly sensitive interactions are more central than the other genes.
Application to edgetic drug discovery: p53 signaling pathway.
================================================
Session 4-2
On cross-conditional and fluctuation correlations in competitive RNA networks.
================================================
Biological Background
Consider 2 mRNAs target by a miRNA.
Correlations between mRNA can emerge for different reasons.
- Stochastic fluctuations correlation: In a simulation( e.g.gillespie) are the pearson coefficient of these 2 variabels. They are relevant in single-cell experiments.
- Cross conditional correlation: Pearson coefficient of two variables at the steady-state of the mean-field behaviour.
In a real experiment CC correlations can mean: Tissue samples(cancer or not), different type of cells.
SF correlations and CC correlations tend to arise under different conditions.
Network topology II: influence on miRNA-mRNA balance.
Network topology has a very important effect on the correlations.
================================================
Inferring causal molecular networks: empirical assessment through a community-based effort
================================================
Molecular networks represent causal relationships between nodes.
A casual edge predicts that intervention on the parent node (A) leads to a change in the child node(B).
Causal edges mat represent indirect effects via unmeasured intermediate nodes.
Inference of causal molecular netwkr.
Given data on set of molecular variables, learn directed network that describes causal relationships between them.
Causal network inference is challenging!
No guarantees will work well in a given setting.
* Empirical assessment of casual validity of inferred networks is essential.
Empirical assessment
ASsessment often performed on simulated data.
-True causal network is known.
Lack of "gold standard network" in most applications.
NEed an approach to empirically assess causal network inference methods within the setting of intrest.
HPN-DREAM network inferene challenge.
Assess ability of computational methods to learn causal olecular networks in a compelx. mammalian setting.
Proposed a causal assessment approach taht uses a held-out test data..
HPN-DREAM network inference challenge.
Inference of causal protein signalling networks in beast cancer.
rPPA data from cancer cell lines(with spellman lab, OHSU; mills lab, MDACC)
Infer a network for each of 32 biological contexts (cellline x stimulus)
data for each context...
test data consists of:
Time courses for the 32 contexts.
Under an inhibitor not contained in training data (mTOR inhibitor)
Inhibition of mTOR reveals descendants of mTOR in the true causal network for a given context, k.
DMSO
mTORi, q-value = 3.2 * 10^-5..
Scoring procedure
Set of predicted edge..
y: in silico data task - AUROC
X: experimental data task - mean AUROC
r = 0.35
p = 0.011
================================================
Ketnode speaker. (월요일 오후 5시)
Using single-cell transcriptomics to understand cellular heterogeneity
John Marioni. Senior Group Leader, CRUK Cambridge Institute, University of Cambridge. Associate Faculty, WT Sanger Institute.
Research group leader: EMBL-EBI.
================================================
Most studies have focused on examing molecular measurements in large populations of cells.
However, some biological processes require the study of variation at the single-cell level.
- Current technologies.
scRNA-seq - tens of thousands of cells in a single experiment (Drop-Seq)
Genome Sequencing - still a smaller number of cells but beginning to scale up.
Epigenetic sequencing - bisulfite sequencing is most mature.
Chromatin structure - ATAC-Seq, HiC
- LArge volumes of data, especially for scRNA-seq, with new computational and storage challenges.
Single-cell RNA-seq.
Identifying highly variable genes.
RNA extracted from single cell -> Process libraries and sequence <- RNA from a spike in pool (RNA pool)
Current approaches first estimate technical error and plug into downstream analysis - motivated development of BASiCS
- Cell specific "sequencing depth / technic" normalization parameter is inferred using spike-in and endogeneous genes.
- Cell sepcific mRNA content parameter is inferred using only endogenous genes.
* Jointly modelling and inferring normalization and gene specific parameters (such as the mean and dispersion) leads to more stable results
vallejos, rechiardon & marioni, Genome Biol, 2016
================================================
Keynote speaker: Nuria lopez-bigas (pompeu fabra univ) / 화요일 오전 9시
Somatic mutational processes and cancer vulnerabilities
================================================
Tumor genomes
> understanding mutatinal processes.
- Figure. Mike stratton, EMBO Molecular medicine(2013) (좋아한다고함.. 되게 넓은 그림, 반원으로 되어있는 그림) - http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=&url=http%3A%2F%2Fwww.jeantet.ch%2Fe%2Fwinners%2Fimages%2Fclip_image002.png&psig=AFQjCNG7BXwelMlA1J3lV3jm9HTKdS30Gg&ust=1473232293100137&cad=rjt
> Finding drivers of cancer.
> Precision cancer medicine
>Tumor genomes
The large amount of tumor whole-genomes provides a unique opportunity to understand mutational and repair processes.
Mutational burden and mutational signatures.
- Figure. Lawrence et al. Nature 2013
Alexandrov et al. Nature 2013: C>T mutation is highly..
Mutation rate correlates with chromatin features at megabase scale.
Melanoma C>T mutation density
Melanocyte DNaseI accessibility index (reverse scale)
Increased mutation rate in active TFBS in melanomas.
DHS, noDHS, DHS-backgroind, noDHS-background 그래프 ㅇㅇ.
http://www.nature.com/nature/journal/v532/n7598/fig_tab/nature17661_F1.html
Nucleotide Excision Repair (NER): GLobal and Transcription-couipled repair
- Lans et al. Epigenetics & Chromatin 2012 5:4
- Hu et al., G & D, 2015*Sancar lab) - XR-seq (eXcision Repair Sequencing)
Mutation/repair rate in TFBS correlate with TF binding strength
Is transcription coupled NER also affected by TFs binding to DNA?
Decreased NER activity at active TFBS in transcribed regions.
Is this a pathogenic affect?
Is mutation rate in TFBS also increased in other tumor types?
Increased mutation rate at active TFBS in lung cancer.
High mutation rate in TFBSs caused by impaired access of repaired machinery.
> Finding drivers of cancer
http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=&url=http%3A%2F%2Fwww.jeantet.ch%2Fe%2Fwinners%2Fimages%2Fclip_image002.png&psig=AFQjCNG7BXwelMlA1J3lV3jm9HTKdS30Gg&ust=1473232293100137&cad=rjt
Tumor development follows a Darwinian process.
Variation -> Selection.
Drivers confer selective advantage to the cell.
Clonal expansion.
Signals of positive selection (methods)
OncodriverFM: Identifies genes with a bias towards high functional mutations (FM bias)
OncodriveCLUST: Identifies genes with a significant regional clustering of mutations.
MutSig-CV: Identify genes mutated more frequently than background mutation rate.
==> 459 cancer driver genes (156 BLAD, 75 GBM, 184 BRCA ...)
(Constructing a background model according to mutational processes)
Complementary signals of positive selection.
rubio-Perez, Tamborero et al., Cancer Cell 2015.
- Cell cycle, DNA damage, Angiogenesis, Cell adhesion, Proliferation, Cytoskeleton, Appotosis, Chromatin regulatory region...
http://www.intogen.org : Cancer Drivers Database
Drivers in Noncoding Regions?
Challenges
Tumours have thousands of mutations
Unknown function of most of noncoding regions
Mularoni et al. Genome Biology 2016. OncodriveFML
https://www.researchgate.net/publication/304027382_OncodriveFML_A_general_framework_to_identify_coding_and_non-coding_regions_with_cancer_driver_mutations
- Measuring functional impact.
Combined scores: eg. Combined Annotation Dependent Depletion (CADD)
Fitness COnsequence Scores (FitCons)
QQ-plot: Y: Observed pvalues (-log10), X: Expected pvalues (-log10)
Identifies prmoters with driver mutations
> Precision cancer medicine
Drugs targeting cancer drivers
BRAF(V600E) <- Vemurafenib
How many patients could benefit from curent and future targeted drugs against cnacer drivers?
TCGA Tumor smaples..
Rubio-Perez, Tamborero et al., Cancer Cell 2015. 이거 내가 dgidb 발표하면서 인용했던 그림임 ㅋㅋ 하나도 이해 못하고 발표했었네.
From cohorts to individual tumors.
- Cancer Genome Interpreter
OncodriveMUT
Not all mutations in driver genes are driver mutations! (Passenger mutation)
We need to one more step, dividing driver mutations and passenger mutation in driver genes.
Predicted TP5 driver mutations by OncodriveMUT exhibit lower biological activity.
Experimental validation of rare mutations in oncogenes.
(Kim et al., Cancer Discovery 2016)
Cancer bioMarkers-db
* http://CancerGenomeInterpreter.org/
======================================================================================
Natwork-based approaches for drug response prediction and drug combinations in cancer.
======================================================================================
Atlas of cancer signalling network: navigating cancer biology with GoogleMaps (Oncogenesis, 2015)
NaviCell Web Service for Network-based Data Visualization
Drug Driven Synthetic Lethality: bypassing tumor cell genetics with a combination..
Cancer: complex system.
Signalling pathway / Signalling network.
Atlas of Cancer Signalling Networks: Resource of knowledge on molecular mechanisms and analytical tool
Ongoing: signaling maps in construction.
- immune, regulated cell death, telomere maintenance, Angiogenesis, Centrosome regulation..
Regulated cell death map: many modes to die (onoing)
Immune response and tumor microenvironment map: Communication .. (ongoing)
A web tool for navigation, curation and data analysis in the context of signaling networks.
- NaviCell Web service:NAR (2015),
- NaviCell: BMC system biology (2013)
'Vertical' map navigation: highlighting neighbors of an entity...
Visualization of omics data on ntworks using NaviCom
Distribution of oncogenes frequently mutated in cancer..
ACSN 'staining' for module activity: stratification, mRNA expression data.
Basal-like, HER2+, Luminal A, Luminal B..
Treatment approaches in cancer: Targeted drugs: synthetic lethality paradigm.
BRCA/PARP Synthetic lethal pair.
Improve synthetic lethality paradigm for cancer treatment: from pair to set.
*** Synthetic lethal gene set = intervention gene set (Proliferation)
Regulators of DNA repair steps are potential targets to restore sensitivity to genotoxic treatment.
DNA repair network, -> state transition graph retrieval -> State transition graph and all regulations on DNA repair map.
-> Organization of regulators for each step in DNA pair.
E1: Regulators of first level, E2: second level, E3: third level...
OSCANA: optimal combinations of interventions from network analysis: 2013 bioinformatics.
: to reveal minimal cut sets.
Identifying points of fragility in the network, gene omibnations to interfere whith a phenotype.
*** Enrichment of SL from shRNA screen lines. (DECIPHER projet)
Explaing synergistic effect of combined treatment : DNA repair inhibitors Dbait and PARPi
Survival of Triple negative breast cancer cell lines to Dbail or Olaparib
Drug driven synthetic lethality: bypassing tumor cell genetics with a combination of Dbait and PARP inhibitors. (Clinical cancer research 2016)
Unique genes robustly correlated with resistance to Dbait or Olaparib in triple neative breast cancer cell line.
ACSN modules enrichment in triple negative breast cancer cell lines.
Biological network odelling and precision medicine in oncology.
The shortest path is not the ome you now: application of biological network resources in precision oncology research.
NEtwork-based approaches for drug response prediction...
======================================================================================
A weighted Exact Test for Mutually Exclusive Mutations in Cancer.
======================================================================================
Driver mutations cause cancer.
Driver mutations target pathways.
Different combinations of driver mutations in diffeent patients.
*** Driver mutations are often mutually exclusive.
*** 1. usually no fitness advantage for utltiple mutations in same pathway.
*** 2. few driver mutations, which are distributed across multiple pathway.
*** Approximately, one driver mutation per mutated pathway per patient.
* Mutual exclusivity scores: combinatorial..
* Combinatorial scores
* Muex, Mutex, CoMEt
* Mutual exclusivity scores: Statistical, Row-exclusivity test (R-exclusivity): One-side Fisher's exact test for independence for pairs of genes.
* Highly mutated patients are more likely to have many passenger mutations.
* Top 5 R-exclusivity triples from endometrial carcinomas [TCGA, Nature 2014] using CoMEt
* Long genes(more than 11,000 nucleotides)
* Hypermutators(patients with more than 500 mutated genes)
MEMo: Ciriello et al Genome Res, 2012,
MEMCover: kim et al., ISMB 2015
WeSME: Kim et al, bioinformatics 2016
Row-column-exclusivity test (RC-exclusivity)
10^4 samples for MEMo, MEMCover, more than samples possible for WeSME
WExT: Weighted Exclusivity Test.
Weighted-row-exclusivity (WR-exclusivity) test.
For set M of k genes,
파이WR(M) = Pr(TM >=TM | Y1=y1...yk=yk),
where tM = observed number of ....
Estimate mutation probabilities W by permuting mutation matrix A.
Approximate the RC-exclusivity test.
using weight matrix W-RC*.
시발 존나 어려운데 내가 하고싶은 연구다!!
WR-exclusivity test approaximates RC-exclusivity test.
WExT results on endometrial cancer.
6/7 Predictions are known ancer genes (CTNNB1, RPL22, TP53, KRAS, MLL4, CTCF...)