Genome-wide approachtes towards identification of sucerptibility genes in complex diseases
Bastienne Wentzel

1 december 2008, Interface

For an increasing amount of disorders it is becoming clear that many genetic variants are involved. Finding these variants and the affected disease genes makes looking for a needle in a haystack seem like a walk in the park. Bioinformatician Lude Franke devised new statistical methods to identify such variants.

Although Franke claims the various types of research he conducts differ widely, the title of his thesis and especially the word 'genome-wide' in it, covers much of the contents. Most of his studies have followed genome-wide approaches. "Only a few years ago, genome-wide DNA oligonucleotide arrays became available. With these chips, we are able to assess single nucleotide polymorphisms (SNPs), deletions and duplications for hundreds of thousands of loci at once. Before, we could only assess a subset of these variants using rather time consuming PCRs and assays," says Lude Franke.

During his PhD research at the Complex Genetics department of the UMC Utrecht, Franke has been working predominantly on data from these chips. Clinical researchers of various hospitals provided him with data of thousands of patients with coeliac disease and amyotrophic lateral sclerosis (ALS). Franke used DNA chips that compare patients with controls for no less than 300.000 SNPs. "Such studies have proven to be very valuable to identify associated SNPs. But additional information can be extracted from these chips," says Franke. For example, he used them to investigate copy number variation. "Deletions and duplications have turned out to be much more common than we had initially thought. We developed a method to genotype small but common deletions with considerable accuracy. I applied the method to my own DNA, resulting in the identification of many deletions. The results of this analysis are printed on the sides of the pages of my thesis," he explains.

Common deletions
The approach Franke took to identify these deletions, is different from other approaches. When no deletion is present three genotypes are possible for a SNP, but in the presence of a common deletion at the place of the SNP three additional genotypes emerge. When plotted in a graph, these genotypes form six clusters that overlap as these DNA chips had not been specifically designed to assess deletions. To overcome this, Franke reasoned that nearby SNPs provide information about the likely genotype of the SNP under investigation. He used this 'linkage disequilibrium' to improve genotyping. "Through resequencing we corroborated that our method indeed can accurately assign deletion genotypes to these SNPs."

Not only sick people carry deletions, as testified by Franke's genome. Franke thinks that paralogs are one of the reasons that those deletions do not necessarily cause disease. "A paralog is an evolutionary duplicate of a gene and often has a function that is comparable to the original. If the original gene is not present because of a homozygous deletion, the paralog can sometimes take over its function. We also found, by assessing the number of biological interactions these genes have, that genes that are often deleted generally have a less important biological function."

These gene interactions also play a role in a different part of Franke's thesis. He has predicted functional gene interactions into a network called GeneNetwork.nl. Interacting genes may encode for different parts of the same protein complex. Or one gene may be coding for an enzyme which metabolizes a protein from another gene. The third mode of interaction is a gene coding for a protein that influences the expression of another gene. Franke used the network to prioritize potential disease genes in loci that had been identified in linkage studies. A program called Prioritizer analyses these loci and determines whether some of the genes are functionally more closely related than expected. Soon after Franke┬ĺs publication, a paper appeared that had employed Prioritizer to identify potential Type 2 Diabetes genes. Another paper found genetic association for one of these genes. "This provides some validation that our hypothesis might hold truth," the researcher says.

Genetical genomics
"While GeneNetwork.nl can provide evidence that genes are interacting, the data used to predict these is not perfect. One way to improve these is by using genetical genomics," says Franke. "The expression of many genes is heritable. What if a SNP in one gene influences the expression of another? That would imply a biological relationship between the two genes." By using genotypes and expression levels from over a hundred samples, Franke could assess this.

Unfortunately, he did not find strong evidence for this in a human dataset. "This was somewhat disappointing. We were unable to find SNPs that strongly influence the expression of genes that map elsewhere." Franke thinks that the sample sizes were too small. For many diseases, thousands of individuals were required to identify causative variants. Presumably the same holds for SNPs that influence expression of genes that map elsewhere.

Franke found a way around this problem. "Many genes are co-expressed," Franke explains. "A relationship between the expression of gene X and gene Y indicates that these genes are potentially biologically related, as they are likely to be controlled by another upstream gene. However, when the expression of gene X or gene Y is strongly influenced by a underlying genetic variation, this relationship gets disturbed." He therefore devised a method to remove this genotypic effect. "We then observed a considerable increase in co-expressing genes, enabling us to identify more biological relationships, highlighting the relevance of genetical genomics. I am currently following this up as a post-doc at the Genetics department of the UMC Groningen and at Queen Mary, University of London."

Meta-analysis of a PhD period
Apart from being a researcher, Franke is also graphic designer of, among others, the NBIC corporate identity. His thesis, obviously designed by himself and his colleagues at Clever - Franke, is packed with graphical gadgets. The dust cover in fact is a poster which not only cites a rather disturbing statement from a personalized genetic testing company (implying their results should be regarded worthless), but also depicts a word-by-word statistical analysis of the entire thesis. Inside, Franke's period as a PhD student is graphically depicted through a network analysis, an analysis of how scientific papers develop and an analysis of his e-mail correspondence. "I like generating images from data," is his simple explanation. "As there are no photos in my thesis, I wrote some software in the hope to create a few interesting and visually appealing illustrations."

Dit artikel is verschenen in het Engelstalige Interface nr. 2, 2008.