MediLexicon Logo
MediLexicon Logo
Abbreviations        Abbrev Definitions        Dictionary        ICD9 Codes        Equipment        Hospitals        Drugs        More..
  

Useful Websites

Global Translations
Medical and Clinical Translation

specialistinfo.com
Details of over 40,000 UK Specialists and over 42,000 GPs

Global RPH
medical software

Doctors Lounge
Ask a Doctor and Disease Information

Health News
from Medical News Today.

MRCP 1 Revision
123 Doc medical courses for junior doctors.

CoreyNahman
pharmaceutical news daily

Hospital Search
Worldwide hospital database, search by country or keyword.

Metric Conversions
The Converter Site - unit conversion tool.
headlines news headlines   email email to a friend   printer printer friendly   newsletter sign up to newsletter  

Putting The Squeeze On DNA

Main Category: Genetics
Also Included In: IT / Internet / E-mail
Article Date: 12 Nov 2009

Researchers in Egypt have developed a technique to compress DNA sequences of the kind used in medical research so that they take up a lot less space in a computer database but without loss of information. The approach is described in detail in a forthcoming issue of the International Journal of Bioinformatics Research and Applications.

Molecular sequence databases, such as those at EMBL, GenBank, and Entrez contain millions of DNA sequences filling many thousands of gigabytes of computer storage capacity of sequences. With almost every new scientific publication in genetics and related sciences, a new sequence is added and the rate at which the data is accumulating is on the rise.

These sequences play a vital role in medical research, disease diagnosis, and the design and development of new drugs. However,

DNA sequences are comprised of just four different bases labelled A, C, G, and T. Each base can be represented in computer code by a two character binary digit, two bits in other words, A (00), C (01), G (10), and T (11). At first glance, one might imagine that this is the most efficient way to store DNA sequences.

DNA sequences, however, are not random, they contain repeating sections, palindromes, and other features that could be represented by fewer bits than is required to spell out the complete sequence in binary. A repeat pattern could be abbreviated to say the binary equivalent of "six times G" for instance, which would be a few bits shorter than explicitly writing "GGGGGG" in binary. Similarly, palindromes could be abbreviated in code relative to their complementary pattern in the DNA sequence.

Many computer users are familiar with compression software that can remove "redundant" code from a music file - to produce an mp3 - or an image - to make a jpg. However, these compression methods lose information. Less familiar to many users are lossless compression methods such as FLAC for sound files, TIFF for images, and the "zip" format for documents and other files. Lossless compression exploits the repeats, palindromes and patterns present in the digital data to reduce the overall size of the file in question.

Now, Taysir Soliman of the Faculty of Computer and Information, at Assiut University, and colleagues Tarek Gharib, Alshaimaa Abo-Alian, and M.A. El Sharkawy of the Faculty of Computer and Information Sciences, at Ain Shams University, have developed a Lossless Compression Algorithm that works with digitized DNA sequences to reduce the amount of computer storage needed for each sequence.

LCA achieves a better compression ratio than existing compression algorithms for DNA, such as GenCompress, DNACompress, and DNAPack, the team says. The same approach could also be used for protein sequences.

The compression algorithm may also have direct application in DNA research, the team suggests. They are now investigating ways in which the results of the compression might be used to differentiate between sections of a DNA sequence that code for proteins and those in the sequence that do not, so-called non-coding regions.

Source: Inderscience

Original article posted on Medical News Today.
Articles not to be reproduced without permission of Medical News Today

Medical News Today publishes the latest health news and health videos for consumers and health professionals. It has a searchable archive of over 100,000 health news articles.





For any corrections of factual information, or to contact the editors please use our feedback form.
Send your press releases to








free web search box


pda medical dictionary
pda software - $15

PDA Medical Dictionary

only $15

Take MediLexicon's abbreviations search with you where-ever you go with our PDA software. As an extra, this software is available with an extra medical dictionary...

>> Click here for more on the PDA Medical Dictionary <<




add to google

Add our searches to your Google homepage.

Add to Google

The 60 seconds challenge: Add these searches to your Google homepage within 60 seconds - simply click here and follow these instructions


Receive the latest medical news on your Google homepage.

Add to Google

The 60 seconds challenge: Receive the latest medical news on your Google homepage within 60 seconds - simply click here and follow these instructions





Privacy Policy   |    Disclaimer      

MediLexicon International Ltd, UK Office: +44 (0) 1625 415 347
MediLexicon International Ltd © 2009 All rights reserved.