Genes useful for protein coding are less than expected: 20% of the human genome would be "useless". This portion would consist of sequences fallen into disuse or without meaning. The discovery comes from the Spanish National Cancer Research Center (CNIO) and could have a major practical impact. In fact, it will influence research on genetic diseases and tumors, also influencing the therapies.
In 2003, the human DNA map was completed. The map included 20,000 coding genes, that is, involved in the production of proteins. The CNIO decided to verify the reliability of this number. For this purpose he compared the data contained in the three most important databases in the world: GENCODE / Ensembl, RefSeq and UniProtKB.
Put together the three databases contain all 22,210 coded genes, but only 19,446 are present in all three. The researchers then examined the remaining 2,764, comparing the data with experimental evidence and annotations from other researchers. It turned out that they were almost all non-coding or pseudogenes. At this point they also examined the other genes, finding that 1,470 considered coding may not be.
The genome would have 20% more non-coding genes than what was expected in 2003. For the time being, researchers have analyzed in detail only part of the genome. To date, the "downgraded" genes have been 300, some of which are also very studied. There are about 100 publications concerning these genes, which give them certainty that they are coding. A clue to how important this discovery could be for medical research on genetic diseases and cancers.