ChemSpider: a Chemistry Centric Hub to Navigate the Internet
The Royal Society of Chemistry (RSC) is the largest organization in Europe with the specific mission of advancing the chemical sciences. As part of its mission RSC provides the ChemSpider database to the community so that chemists can source data, participate in the annotation and curation of existing data, and can expose and publish their own chemistry data in a publicly accessible database.
RSC intends for ChemSpider to become the world’s foremost free access chemistry database and central hub connecting chemistry in its various forms – experimental and predicted data, publications, patents, etc.. While one common measure of success and importance for a chemistry database is its size, the importance of quality of the data remains paramount and stringent efforts, using both algorithmic approaches and crowd-based curation, are brought to bear to improve the data quality over that commonly available in online databases. There is of course inherently more value in finding the correct answer to a question rather than just an answer, a situation that is common with internet-based queries!
The ChemSpider database now contains over 28 million structures aggregated from over 400 data sources, grows daily, and has been developed as a result of contributions and depositions from chemical vendors, commercial database vendors, government databases, publishers, members of the Open Notebook Science community and individual scientists. One of the data sources is RSC content databases and data associated with articles published with RSC. This ensures that new science published with RSC is linked up and made available via a freely accessible compound database. ChemSpider provides flexible querying including structure and substructure searching (especially important to chemists!) and alphanumeric text searching of chemical names and both intrinsic, as well as predicted molecular properties. Various searches have been added to the system to cater to various user personae including, for example, mass spectrometrists and medicinal chemists. ChemSpider is very flexible in its applications and nature of available searches.
The top of the chemical record for Domoic Acid (http://www.chemspider.com/4445428) in ChemSpider. The entire record spans multiple pages including links to patents and publications, pre-calculated and experimental properties and links to many data external data sources and informational websites.
ChemSpider has a special role in the chemical sciences in terms of providing a platform for chemists to participate. With the increasing prevalence of social network type engagement where members of the community can contribute directly (consider Wikipedia as one of the most successful of these social participation experiments!) the database can be expanded with an individual’s data and, as a result, ChemSpider has a few thousand analytical spectra (primarily NMR) as well as detailed synthetic procedures for how to make a chemical, detailed in the ChemSpider Synthetic Pages (http://cssp.chemspider.com). Deposited data can be checked to a certain extent at deposition but data curation can be challenging and game-based approaches are being utilized to check the quality of the spectral data. For example, the Spectral Game (http://www.spectralgame.com) is a web site where users can learn to interpret NMR spectra in a game-based environment and the results obtained are monitored and have been used to expose errors in the data that can then be dealt with. Game-based curation of this nature lowers the barrier to getting participation. Asking the community to “Please inspect these spectra one at a time and validate them” would definitely result in lower participation than “Learn how to interpret NMR spectra with this game.” Such creative approaches to improving data quality continue to be considered.
As a rich database resource of chemistry related data ChemSpider is in demand, not only by the chemists themselves but also by analytical instrument vendors who have integrated their instrumentation directly with the available programming interface. In this way mass spectrometry instrumentation vendors such as Waters, Bruker, Agilent and Thermo have plugged ChemSpider into their software packages and can send a list of mass spectral peaks to query the database and return a set of compounds consistent with the data. Such integrations are becoming increasingly popular and ChemSpider underpins large scale projects for the life sciences including the Open PHACTS project (http://www.openphacts.org) that will deliver a semantic web solution integrating chemistry and biology data primarily across public domain databases.
ChemSpider started as a structure-centric hub attempting to link together structure-based data sources on the web. In the five years since its inception the number of websites appearing online containing chemistry data has exploded. New web-based technologies have been introduced or existing approaches have morphed dramatically so that what is feasible in terms of integration, visualization or programming interfaces has also changed. The new “mobile technologies” capabilities have introduced us to the need for mobile optimized websites and the need to deliver “apps” to run on phones and tablets. ChemSpider has already delivered both mobile optimized sites as well as apps running on the iOS operating system (for iPhone and iPad for example). So, what is the future for the ChemSpider platform?
Chemistry is far more complicated than small molecules that can be hosted in an online database. Materials that cannot be represented in ChemSpider yet but which have properties and can be characterized with data include polymers, gels and allotropes. Our future data model will envelop these data. Data are the foundation of predictive models and we are already considering the potential of hosting existing models and rebuilding models on the fly based on the addition of new data from new publications or members of the chemistry community. What can be done to further integrate chemistry data with publications, patents, open notebook science data and the increasing flow of data to the web? We have no shortage of ideas and encourage users of ChemSpider to give feedback on the existing service, challenge us to provide solutions for their needs and, in general, extend the capabilities of the platform to serve the community. By combining innovative approaches to solve the issues of providing an integrated hub for chemistry while supporting our growing community of users we are likely to retain our role as one of the underpinning resources for science on the internet.
ConsultingInnovative Business Consulting
OBR-Consulting is a unique platform to meet the growing need for consultancy at the intersection of science, business and entrepreneurship.Learn more