Copy Number Variation (CNV)
Genetic Variability

Author: Simona Perga
Date: 07/07/2011

Description

DEFINITION

Copy-number variations —a form of structural variation—are alterations of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sections of the DNA. CNVs correspond to relatively large regions of the genome that have been deleted (fewer than the normal number) or duplicated (more than the normal number) on certain chromosomes. For example, the chromosome that normally has sections in order as A-B-C-D might instead have sections A-B-C-C-D (a duplication of "C") or A-B-D (a deletion of "C").
This variation accounts for roughly 12% of human genomic DNA and each variation may range from about one kilobase (1,000 nucleotide bases) to several megabases in size. CNVs contrast with single-nucleotide polymorphisms (SNPs), which affect only one single nucleotide base.

Discovering Copy Number Variants

In 2002, Charles Lee was trying to genotype patients, but his experiments were repeatedly unsuccessful. He was finding that healthy control patients showed major variations in their genetic sequences, with some having more copies of specific genes than others. Lee began to collaborate with Steven Scherer, who had made similar observations, and together their labs used array-based comparative genomic hybridization approaches to measure the occurrence of these copy variants across the genome. Meanwhile, Michael Wigler was also observing differences in copy numbers in healthy individuals using a complementary microarray technique involving representational oligonucleotide probes to detect amplifications and deletions in the genome. Thus, in 2004, both sets of researchers published findings that indicated large-scale variations in copy number were common and occurred in hundreds of places in the human genome, including areas coding for disease-related genes (Figure 1).
These differences were named copy number variants (CNV), and they described these modification as a segment of DNA that is 1 kilobase or larger and present at a variable copy number in comparison with a reference genome. Copy number variants are mutations and can include deletions, insertions, and duplications. Sometimes, a copy number variant may even be so large that half a million nucleotides are affected.

Figure 1 Different types of Copy Number Variations

What is copy number variation?

The human genome is comprised of 6 billion chemical bases (or nucleotides) of DNA packaged into two sets of 23 chromosomes, one set inherited from each parent. The DNA encodes 30,000 genes. It was generally thought that genes were almost always present in two copies in a genome. However, recent discoveries have revealed that large segments of DNA, ranging in size from thousands to millions of DNA bases, can vary in copy-number. Such copy number variations can encompass genes leading to dosage imbalances. For example, genes that were thought to always occur in two copies per genome have now been found to sometimes be present in one, three, or more than three copies. In a few rare instances the genes are missing altogether (see figure 1).

The importance of CNVs

p<>. Differences in the DNA sequence of our genomes contribute to our uniqueness. These changes influence most traits including susceptibility to disease. It was thought that single nucleotide changes (called SNPs) in DNA were the most prevalent and important form of genetic variation, responsible for most of the physiological phenotypic variations. The current studies reveal that CNVs comprise at least three times the total nucleotide content of SNPs constituting the most important source of genetic and phenotypic variation. In particular, It was startling to discover that 12% of the human genome was copy number variable in the 270 DNA samples tested. About 2900 genes, or 10% of those known, are encompassed by these CNVs. Some CNVs found in the general population can be millions of bases in size, affecting numerous genes, yet they have no observable consequence. To date, approximately 2000 CNVs have been described and 1447 of them are from the current study. There could be thousands more CNVs in the human population. About 100 CNVs were detected in each genome examined with the average size being 250,000 bases (an average gene is 60,000 bases). Additional CNVs will be discovered as technologies for detection improve and more DNA samples from worldwide populations are examined.

Figure 2 Distribution of large-scale copy-number variations in the human genome

Figure 2. Circles to the right of each chromosome ideogram show the number of individuals with copy gains (blue) and losses (red) for each clone among 39 unrelated, healthy control individuals. Green circles to the left indicate known genome sequence gaps within 100 kb of the clone, or segmental duplications known to overlap the clone, as compared to the Human Recent Segmental Duplication Browser. Cytogenetic band positions are shown to the left.

What types of genes are found to be copy number variable?

Most CNVs are benign variants and that will not directly cause disease. However, there are several instance where CNVs encompass genes influencing genes expression, phenotypic variation and adaptation by destroying or altering the gene dosage, playing an important roles both in human disease and drug response. Understanding the mechanisms of CNV formation may also help us better understand human genome evolution.
In particular, it has been shown that genes involved in the immune system and in brain development and activity – two functions that have evolved rapidly in humans – tend to be enriched in CNVs. By contrast, genes that play a role in early development and some genes involved in cell division – both critical to fundamental biology – tend to be spared.

Copy Number Variants in healthy individuals

As previously mentioned, copy number variants do not necessarily have a negative effect on health. One example is the chemokine CCL3L1, which can potently suppress human immunodeficiency virus 1 (HIV-1). Gonzalez and his colleagues found that individuals who carried fewer copy number variants encoding CCL3L1 than average were significantly more susceptible to HIV and acquired immunodeficiency syndrome (AIDS). This means that bearing extra copies of CCL3L1 can protect an individual against contracting HIV and developing AIDS.
Similarly, other copy number variants carried by healthy individuals that seem to have no function might actually be evolutionarily retained in populations if they provide a selective advantage.
Current research aims to identify the functional mechanisms by which copy number variation cause diseases. Although preliminary findings suggest that the presence of copy number variants might be associated with certain disease phenotypes, the variants are not necessarily the causes of these diseases. As studies relating copy number variation to diseases expand, our understanding of human diversity, the causes and development of complex diseases, and disease-resistance will grow accordingly, which will allow the development of improved diagnostic and treatment strategies.

Copy Number Variants and diseases mechanisms

Upon learning of copy number variants, scientists immediately began to speculate that they might underlie genetic diversity and susceptibility to certain diseases, including neurological disorders and leukemia. For example, after studying 270 individuals, discovered that copy number variants covered approximately 12% of the human genome; another research team determined that there is an average of 12 copy number variants per individual. Given these values, it seems that the sheer scale of copy number variants in our genome might profoundly affect our health. Indeed, there is growing interest in the influence of this variation upon complex disease phenotypes because approximately half of the copy number variants detected so far overlap with protein-coding regions. As said above, most copy number variants exist in healthy individuals; however, these variants are hypothesized to cause diseases through several mechanisms, as shown in Figure 3. First, copy number variants can directly influence gene dosage through insertions or deletions, which can result in altered gene expression and potentially cause genetic diseases. Gene dosage describes the number of copies of a gene in a cell, and gene expression can be influenced by higher and lower gene dosages. For example, deletions can result in a lower gene dosage or copy number than what is normally expressed by removing a gene entirely (Figure 3a). Deletions can also result in the unmasking of a recessive allele that would normally not be expressed (Figure 3b). Structural variants that overlap a gene can reduce or prevent the expression of the gene through inversions, deletions, or translocations (Figure 3b). Variants can also affect a gene's expression indirectly by interacting with regulatory elements. For instance, if a regulatory element is deleted, a dosage-sensitive gene might have lower or higher expression than normal (Figure 3c). Sometimes, the combination of two or more copy number variants can produce a complex disease, whereas individually the changes produce no effect (Figure 3d). Some variants are flanked by homologous repeats, which can make genes within the copy number variant susceptible to nonallelic homologous recombination and can predispose individuals or their descendants to a disease. Additionally, complex diseases might occur when copy number variants are combined with other genetic and environmental factors.

Figure 3. Influence of structural variation on phenotype

For example, certain breast cancers are associated with overexpression of the ERBB2 gene, which codes for human epidermal growth factor receptor 2. Copy number variations and other types of mutations can cause overexpression Measuring high copies of ERBB2 is associated with aggressive forms of breast cancer and is a major target of treatment . Therefore, measuring the ERBB2 copy number can provide a diagnostic tool for breast cancer and other cancers. Similarly, copy number variations were identified on chromosome 22 in regions involved with spinal muscle atrophy and DiGeorge syndrome, as well as in the imprinted chromosome 15 region associated with Prader-Willi syndrome and Angelman syndrome (Redon et al., 2006). These diseases might be caused by copy number variants due to inversions and deletions in critical genes. Copy number variants were also detected in genetic regions associated with complex neurological diseases, such as Alzheimer's disease's_disease and schizophrenia.

Another important example is Cytochrome P450 2D6 (CYP2D6) copy number variation (CNV) that it has been shown to influences the metabolism of 15–25% of clinical drugs. Similarly, CNVs has been descrive for many other member of cytochrome P450 family, influencing drug response metabolism.

Genome-wide analysis of DNA copy number changes using cDNA microarrays, 1999

By using DNA microarrays, scientists are now screening patients with genetic diseases and comparing them to unaffected control individuals to examine which copy number variants are truly associated with disease states and which are common in the population. Information about these copy number variants might allow for the identification of specific disease-related genes that were previously unknown.

CNV detection and analysis

Copy number analysis usually refers to the process of analyzing data produced by a test for DNA copy number variation in patient's sample. Such analysis helps detect chromosomal copy number variation that may cause or may increase risks of various critical disorders. Copy number variation can be detected with various types of tests such as:

fluorescence in situ,
hybridization,
comparative genomic hybridization
high-resolution array-based tests based on array comparative genomic hybridization
SNP array technologies, while
quantitative-PCR based teqnique (TaqMan assay) for analysis and/or validation of known CNVs.

Array-based methods have been accepted as the most efficient in terms of their resolution and high-throughput nature and they are also referred to as Virtual Karyotype. Data analysis for an array-based DNA copy number test can be very challenging though due to very high volume of data that come out of an array platform.
BAC_Bacterial Artificial Chromosome arrays were historically the first microarray platform to be used for DNA copy number analysis. This platform is used to identify gross deletions or amplifications in DNA. Such anomalies for example are common in cancer and can be used for diagnosis of many developmental disorders. PerkinElmer and Bluegnome are the leading providers of BAC-based aCGH platforms. Data produced by such platforms are usually low to medium resolution in terms of genome coverage. Usually, log-ratio measurements are produced by this technology to represent deviation of patient's copy number state from normal. Such measurements then are studied and those that significantly differ from zero value are announced to represent a part of a chromosome with an anomaly (an abnormal copy number state). Positive log-ratios indicate a region of DNA copy number gain and negative log-ratio values mark a region of DNA copy number loss. Even a single data point can be declared an indication of a copy number gain or a copy number loss in BAC arrays.

What is next?

The next-generation of DNA microarray-based technologies will allow equal detection of large and small CNVs. Also on the horizon are new DNA sequencing technologies enabling rapid (and ultimately inexpensive) personalized genome sequencing projects. Coupled together, these technologies will capture almost all the variation in a genome.

DATABASES

FOR CNVs studies:
Database of Genomic Variants_DGV
DECIPHER
CNV Control Database

In these CNVs databses users can search by keywords, chromosome location, genes, sequence, accession number,cytobands and more.

Database of genomic variants
The objective of this database is to provide a comprehensive summary of structural variation in the human genome. The content of the database is only representing structural variation identified in healthy control samples. It provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer reviewed research studies.
Es: accession with gene name: EFGR gene variations

DECIPHER (DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources)
The DECIPHER database of submicroscopic chromosomal imbalance collects clinical information about chromosomal microdeletions/duplications/insertions, translocations and inversions and displays this information on the human genome map with the aims of:
• Increasing medical and scientific knowledge about chromosomal microdeletions/duplications
• Improving medical care and genetic advice for individuals/families with submicroscopic chromosomal imbalance
• Facilitating research into the study of genes which affect human development and health
DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources

CNV Control Database
This copy number variation database (CNV DB) is a repository system and has been constructed to achieve permanent data management and information sharing of CNV data. CNV-DB contains CNV region data. Currently CNV DB contains CNV results of several reseach laboratories.

For protein pathway studies:
KEGG PATHWAY
Reactome

Examples of use of CNVs Databases

DECIPHER Database: Case Study (DECIPHER 00000128)—A Rare Microdeletion Syndrome
An 8-year-old boy with complex cyanotic congenital heart disease, including an atrioventricular septal defect, reflux nephropathy, behavioural problems (impulsivity and hyperkinetic conduct disorder), and mild learning disability was seen in the genetics clinic. His first cardiac surgery was undertaken at the age of 4months, and he had a stormy post-operative course, during which he spent several weeks in intensive care. He was reviewed by a number of pediatricians and a pediatric psychiatrist, and it was uncertain to what extent his perioperative complications were the cause of his learning and behavior problems.
An array-CGH study revealed a small deletion of approximately 4 Mb in size on chromosome 8p23.1.2. From DECIPHER it was immediately apparent that this deletion is a rare syndrome, the ‘‘8p23.1 deletion syndrome,’’ which includes the gene GATA4 (MIM 600576), which encodes a transcription factor involved in heart formation. The 8p23.1 deletion syndrome is characterized by ‘‘developmental delay and a characteristic behavior profile with hyperactivity and impulsiveness,’’ which explains many aspects of this child’s phenotype, including his behavioral and developmental problems.

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources

Rules

Genetic Background

MeSH

Genome

Comments

2011-07-07T13:51:26 - Simona Perga

CNV DATABASES

As a result of discussions surrounding the representation of structural variants at the recent ISCA meeting, groups at DGV, NCBI and DECIPHER have decided to standardize colour schemes for gains and losses. Moving forward, deletions/losses will be displayed as red, gains/duplications will be displayed as blue. Regions where both gains and losses occur at the same locus will be represented as brown, and we will continue to represent inversions as purple. In addition to ensuring the colour schemes are consistent across databases, changes have also been implemented to ensure ease of use for individuals with red-green colour blindness.

In the above listened CNVs databses users can search by keywords, chromosome location, genes, sequence, cytobands and more.

Copy Number Variation (CNV)Genetic Variability

DEFINITION

DATABASES

Copy Number Variation (CNV)
Genetic Variability