Digitisation and the role of museums in modern science
Digitisation of museum collections, taxonomy, and evolutionary research is a new and growing field, and museums such as the Western Australian Museum are in the process of enhancing all their physical objects into digital objects.
Museums are expanding their tissue collections, collecting digital genetic data, and making this data accessible through online genetic data repositories–-which have become crucial in advancing scientific research and understanding the vast diversity of life.
WA Museum’s contribution to global DNA repositories
The WA Museum has been sequencing DNA from preserved specimens for many years and currently holds an estimated 60,000 DNA sequences. Its Genetic Resources department has uploaded more than 10,000 of these sequences in the past six months to GenBank—a major global genetic data repository managed by the National Centre for Biotechnology Information (NCBI).
The evolution of genetic data repositories
Modern DNA sequencing and the creation of genetic databases began in the 1970s. Since the 1990s, the development of automated and more advanced genetic sequencing technologies–-along with the decreasing cost and greater accessibility of these technologies, and greater data storage capacities–-resulted in large repositories being created to store and share the enormous amounts of new data. The widely used online public repositories for storing and sharing genetic data include:
- GenBank (NCBI)
- The European Molecular Biology Laboratory
- The DNA Data Bank of Japan.
These repositories hold vast amounts of genetic sequence data (i.e., the order of DNA building blocks—the nucleotides adenine (A), guanine (G), cytosine (C) and thymine (T)—that make up an organism’s genetic code)and collaborate closely with each other, sharing this data in a way that ensures global accessibility. All three are part of the International Nucleotide Sequence Database Collaboration and exchange data daily to ensure accuracy and consistency.
Challenges facing online DNA databases
However, the fast growth of online databases has not come without hurdles. It wasn’t until the early 2000s, with the rise of genetic barcoding (identifying an organism by a short section of its genetic code), that the challenges with species identification and data quality became more apparent.
Issues include:
- Species may not be formally described or correctly identified.
- Sequences may only be assigned species codes rather than scientific names.
- It is estimated that less than four out of 10 invertebrate sequences uploaded have been identified at species level–-meaning that they are uploaded at genus or even family level.
Understanding the causes of these limitations
In part, these limitations exist from:
- Rapid, bulk data uploads without capability for thorough validation
- Difficulty in species identification
- Lack of standardised metadata, and
- The volume of data outpacing available resources for verification.
How museums are helping fill the gaps
In recent years, there has been a concerted effort to address these issues, with museums and research organisations working to provide high-quality, verified sequences to help fill the gaps in genetic data.
By contributing expertly identified sequences linked to voucher specimens–-the actual animal the sequence came from–-the WA Museum is helping improve data reliability, address the issue of unidentified sequences online, and support more accurate species identification and biodiversity research.
The combination of decades of carefully curated specimen collections with modern DNA sequencing technologies, organisations like the WA Museum are also playing a key role in enriching these databases and supporting future research and conservation efforts.
This Legacy Collection Project is funded by the Foundation for the WA Museum as part of the strategic initiative Improving Accessibility to the WA Museum State Collection for All.