FONDAZIONE FILARETE: Genomics & Bioinformatics

The goal of the Genomics Platform is to analyze and dissect from a genetic point of view complex phenotypes, i.e. pathologies like hypertension or dementia, and elements of clinical interest like the response to drugs (pharmacogenetics). The Platform makes use of last generation technologies for genotyping, gene expression and genome sequencing that allow for production of high-density and high-speed genetic and genomic data, thanks to an automated system for managing these processes.

Biological data generation is supported by data analysis through methods ranging from statistical genetics to bioinformatics analysis of data to in silico reconstruction of molecular pathways to the definition of the pathology model.

The Platform's directors possess a long experience in the genetic analysis of complex traits that allowed for identification of genes involved in the etiology of schizophrenia, Alzheimer's disease, hypertension and, more recently, has brought to the formulation of predictive models of responses to antipsychotic and anti-hypertension drugs or to improving the efficiency of erythropoietin in patients undergoing dialysis treatment. Generalization of these approaches is possible for each pathology or trait presenting a genetic component, including genetic traits besides humans.

Genomics Platform works also for animals world, in the process for improving the quality, quantity and safety of food by means of genetic selection of mammals (pigs and cattle for example).

Main Research Activities:

The main research areas of the Genomic & Bioinformatic Platform are therefore:

Identification of genetic variants influencing complex diseases in humans. Complex traits are phenotipically and genetically heterogeneous and thus require a global approach to understand their etiology and pathogenesis.
Identification of genetic variants influencing qualitative and quantitative traits for genomic selection in livestocks.

To the first cathegory belong for example the following projects:

1. HYPERGENES: European Network for Genetic Epidemiological studies: building a method to dissect complex genetic traits, using essential hypertension as a disease model. - Support: European Commission - FP7

detection of new SNPs associated to the phenotype of interest, to end organ damage and to response of specific treatment: to identify common genetic variants relevant for the pathogenesis of essential hypertension (EH) and hypertension associated Target Organ Damage (TOD) we performed between July 2008-June 2009 a whole genome association study (WGA) with the Illumina 1M SNP chip on 2000 hypertensives and 2000 normotensive controls recruited from historical European cohorts (discovery phase). The 1M chip includes 250k SNPs in genes guaranteeing coverage of >99% of RefSeq Genes, toghether with SNPs in ADME-related genes and SNPs in MHC region. HumanHap 1M chip The cohorts are well characterized for EH and TOD as well as for environmental risk factors and/or confounders like ethnicity, diet, smoking habits. The ongoing statistical analysis of the genotyping data has identified a number of SNPs significantly associated to EH and/or TOD and other relevant endophenotypes. Among the latter, two rigorous pharmacogenomic studies are available, one with Losartan (421 patients) and one with Hydrochlorothyazide (530 patients).
validation of the findings: the significantly associated SNPs has been assembled on a custom chip, together with candidate SNPs from the literature for a total of about 15.000 SNPs. The custom chip has been used for a replication study on 8.000 subjects (replication step).
target sequencing of the most relevant findings: in parallel to the replication step we are doing target sequencing on the Illumina Genome Analyzer platform of the most interesting associated regions, in a sample of cases and controls, in order to individuate the real causal variants and further validate the significant associations found in the discovery phase.
construction of a Lab on Chip: once the most relevant findings obtained through the validation phase and the sequencing phase have been confirmed, the information is passed to STMicroelectronics that will develop a custom made LabOnChip (LOC) for Hypertension that will be tested on a large part of the Hypergenes sample in the premises of the Filarete Foundation. Such LOC will eventually be patented and will be a deliverable of the project.
post genomics: a post genomic approach is applied in interaction with other platforms within the Filarete Foundation: the final validation of the findings of our association study will be the demonstration of a functional role of the polymorphisms found significantly associated. A proteomic approach aimed at characterizing proteins differently expressed in a cell model according to the genotypes at a specific SNP could help to elucidate a possible functional role of that SNP. Therefore a collaboration with the Proteomic Platform could be necessary for the post genomic phase of Hypergenes Project. From the GWA study on Hypertension and Hypertension associated TOD we expect to find signals of association in genes coding for renal ion transports and therefore it will be helpful the collaboration with the CNS cell Platform that has the expertise in Imaging of calcium and sodium and Biochemistry assays. The collaboration with the CNS cell Platform will also include the set up of cell models (MDCK cells) expressing the gene variants found significantly associated to Hypertension or TOD.

2. POCEMON: Point-of Care MONitoring and diagnostics for autoimmune diseases. - Support: European Commission - FP7

identification of common genetic variants relevant for the pathogenesis of autoimmune diseases. a WGA study (discovery phase) was performed with the Illumina 370K SNP chip on 800 subjects affected by reumathoid artrithis (RA) and 800 controls. From the analisys of association a set of 100 significantly associated SNPs.
validation of the findings: the SNPs identified in the discovery phase are tested together with candidate SNPs for a total of 768 SNPs on a custom Illumina chip on 900 DNAs from RA and 900 multiple sclerosis (MS) patients and 1000 controls for each disease respectively. Based on the SNPs found significantly associated in the discovery/confirmation phases we will develop a predictive alghorithm of susceptibility to the disease that will be tested on 1700 MS with 100 controls, 1700 RA with 100 controls.
construction of a lab-on chip (LOC): one of the major aims of this Project is the development of a diagnostic LOC platform based on genomic microarrays of HLA-typing. This goal will be achieved in collaboration with Bruno Kessler Foundation (FBK).

3. SCHIZOPHRENIA: A family-based study of genetic loci associated to schizophrenia.
We analyzed on the HumanCNV-370 BeadArrays (Illumina, San Diego, USA) two family samples: a sample of Arab-Israeli origin, termed TKT and a Jewish Israeli (JS) sample. The TKT consisted of 198 genotyped individuals and 16 dummy parents, grouped in 58 nuclear families with 99 affected individuals. The TKT were made up in thirty-six trio families and twenty-two with multiple offspring. In detailed 11 with two offspring, 6 with three, 2 with four, 2 with five and 1 with seven. Four multigenerational families have been broken down in singular nuclear family units.
The Jewish sample consisted in 354 genotyped individuals and 23 dummy parents were added for the completion of the pedigree structure. For the JS families we analyzed 107 nuclear families with 145 affected offspring. The JS sample was made up by 138 Ashkenazic Jewish and 239 non-Ashkenazic. The Ashkenazi Jewish were grouped into 35 trio families and 8 families with two offspring while the non Ashkenazi were grouped in 34 trios families and 31 families with more than one offspring. In detailed 19 families with 2 offspring, 10 with three, 1 with four and 1 with five.
The affected individuals of both populations were diagnosed with schizophrenia or chronic schizoaffective disorder on the basis of interview with the Schedule for Affective Disorder Schizophrenia-Lifetime Version (SADS-L) or the Structured Clinical Interview for DSM-III-R Disorders (SCID-II, Fyer et al., 1985 ).
The new wave of genetic investigations is now focusing on the implication of rare and common copy number variants (CNVs) in SCZ. Four recent genome wide screens for large CNVs (> 100kb) suggest that rare (inherited or de novo) and highly penetrant variants may be relevant to SCZ (Walsh T et al., 2008 Stone JL et al., 2008; Xu B. et al., 2008; Stefansson H, et al., 2008; Need et al., 2009), in particular those disrupting genes involved in brain development even if the real contribution of common and rare CNVs to SCZ has not been already clarified. In summary pathogenic mutations or “causal” genetic variants have not been yet identified, although many of the identified genes have potentially a pathophysiological role in determining predisposition to SCZ. Intriguingly, several of these genes are involved in brain development, in line with the neurodevelopmental model which has been a dominant explanatory theory for SCZ for more than two decades (Rapoport et al., 2000; Harrison et al., 2003). We carried out a whole-genome CNV analysis (using humancnv370 illumina Beadchips) in this Arab-Israeli sample to assess if also in our sample was present a bulk eccess of rare de novo CNVs contributing to the genetic component of SCZ.

4. COPY NUMBER VARIANTS
Copy number variants (CNVs) are DNA segments of 1 Kb or larger, present at variable copy number in comparison with a reference genome. CNVs include deletions, insertions and duplications larger than 1 kb to several. Together with SNPs, CNVs represent the prevalent source of nucleotide variation between individuals given their major contiribution in terms of the total number of base pairs difference than all SNPs together.
The aim of the project is the identification of CNVs associated with several complex disorders, like Schizophrenia and Essential Hypertension. For the analysis of CNVs we used the data on SNPs derived from the GWAs on Schzophrenia and Hypertension The basis of CNV analysis is the combination of two genetic parameters derived from intensity data of each SNP: 1) a normalized intensity measurement (logR ratio) which is the log (base 2) ratio of the normalized intenisty (R) value for a SNP, divided by the expected normalized R value which is calculated from a reference cluster file, 2) an allelic composition measurement (B allele frequency) which gives an estimate of the proportion of times and individual allele at a SNP is called A or B. These parameters for each genotyped SNP are analysed with Nexus Copy Number software v4.
The algorithm, on which Nexus is based, segments the genome into clusters of uniform Log R ratios. The algorithm is a recursive algorithm that keeps dividing the genome into smaller and smaller segments until no region can be further segmented and has a single parameter called Significance Threshold that controls if a region is to be segmented out or not. To assess the quality of the Log R ratio parameter a robust variance sample QC analysis in Nexus is performed, which allows to remove extreme outliers expected to be due to copy number breakpoints calculating the probe to probe variation and then removing the 3% of these outliers in a sample. After CNVs identification, their possible involvement in the disease has been tested in a case-control study in which CNVs frequencies have been compared between cases and controls. The aim is to find CNVs enriched in cases and to develop CNV patterns predictive of the disease under study.
For CNV analysis in family based studies we have developed an analysis pipeline starting from the CNV detection using a Hidden Markov Model based software (i.e. Penncnv) that allows to perform analysis on familial data obtaining a combined CNVs list.
To perform this study we have developed an analysis pipeline starting from the CNV detection using a Hidden Markov Model based software (i.e. Penncnv) that allows to perform analysis on familial data obtaining a combined CNVs list.

To the second cathegory belong the following project:
QUANTOMICS: From Sequence to Consequence -Tools for the Exploitation of Livestock Genomes - Support: European Commission - FP7
The sequencing of the genomes of the major livestock species places animal agriculture on the threshold of a new era in which challenges in sustainability can be tackled much more effectively than previously. The development of genomic tools has provided the ability to map genes associated with welfare, quality and production traits. However, the application of these findings in breeding programmes has been hampered by two factors:

The problem of finding markers sufficiently closely associated with the causative polymorphism to be fully effective in selection.
The challenge of identifying the causative polymorphisms themselves to provide an optimally informative and portable tool for selection, breeding and trait dissection.

The strategy of this project is to develop a set of complementary technologies and tools that will assist the exploitation of sequenced livestock genomes by; i) enabling the use in breeding of dense marker information whilst simultaneously, ii) enhancing radically our capability to identify and utilise the most important causative polymorphisms.

Major activity of the Genomic Platform in the QUANTOMICS Massive SNP genotyping, with the most recent available chip set, will be centrally performed for cattle (2000 individuals) and chicken (2000 individuals) using Illumina platforms.
The main populations will represent the two main breeding designs in use in animal genetics today: pure line breeding, as exemplified by dairy cattle and crossbred breeding as exemplified by broiler chickens. Thus, the populations will consist of 1 to 3 purebred dairy breeds in cattle, and a three-way broiler cross in chicken. Target traits in cattle are associated with mastitis, measured directly and indirectly using somatic cell count in milk (SCC). SCC is widely accepted as a surrogate biomarker for mastitis resistance and has been used for years in cattle selection programs. The target trait in chicken is Avian Pathogenic E. Coli (APEC), a disease responsible for large economic losses in broiler production. Control traits are various production and functional traits in cattle, body weight in chicken.

Technologies

At the Genomic and Bioinformatic platform the following technologies are available:

the Illumina technology (iScan and beadscan) for high density SNPs genotyping
the Illumina Genome Analyzer for deep sequencing, target sequencing of interesting regions, transcriptome analysis, mRNA and ChipSeq to analyze protein interactions with DNA and identify binding sites of DNA associated proteins
an IT infrastructure including a lot of computational servers, more than 200 TB of disks, a tape library and many personal computers
a bioinformatic platform, built by internal biostatistics and bioinformatics experts, in partnership with several companies and institutions. This platform, which includes several tools and predictive algorithms developed internally, is routinely used for genomic and post genomic data analysis.

REFERENT SCIENTIST

Daniele Cusi

Professor of Nefrology,
University of Milan

Daniele Cusi CV