An integral part of doing research is collecting data. Data analysis as published in the form of findings, discussion and conclusions is a key aspect of the research article, which is the basis of science communication. In the past, data sets were rarely included in the final publication shared with the public. Some researchers indicated that data sets for their research were available upon request, but more often they were locked away in hard drives and file cabinets. As servers are upgraded and research staff move on to new institutions, much of that data- and specifically the information about the data- is lost.
Sharing data openly with the research community encourages collaboration and reproducibility (Kim & Burns 2016). Openly shared data itself encourages scientific investigation. One study analyzed 1,200 articles, published in over 250 journals, which cited data shared in the Genome Expression Omnibus (GEO) (Xia & Liu 2013). This study shows that researchers are able to publish highly cited articles in reputable journals reusing data from other sources.
One reason that the sharing of data files hasn’t always been a part of the scholarly publication process is that size and organization of data files is not a good fit for traditional print journals. Online only publications have more flexibility in how they present and link to additional sources, but not often storage space. An increasing number of data repositories are now connecting the data and the published research to the benefit of the entire scientific community.
Data repositories are collection hubs that store and organize data sets created by research labs and institutions. Data repositories also make the data searchable by indexing it for the web. New funding guidelines often include that all data generated by a research project be permanently housed in a data repository.
Data repositories are often organized by discipline. You can find a home for your data sets or find data related to your research project by searching the site r3data.org. r3data is a global registry of data sites started in 2012 and funded by the German Research Foundation. Several organizations and institutions help run r3data including Purdue Libraries. Many journal publishers are beginning to partner with sites like r3data to connect data sets to the published research.
UIC has its own data repository, the newly upgraded Indigo. Using Indigo, UIC researchers can permanently store data sets. If made public, those data sets can also be easily found by internet searches. The Scholarly Communications Librarian, Sandy DeGroote can help you with setting up accounts and uploading files.
Look at UIC’s Quick Guide to Data Repositories to find out more about some of larger sites that accept work in all disciplines. UIC Library also provides a list of repositories by discipline. We regularly offer classes and webinars about data management and storage. If you would like more information please email me Cathy Lantz (email@example.com).
1. Kim, Y., & Burns, C. S. (2016). Norms of data sharing in biological sciences: The roles of metadata, data repository, and journal and funding requirements. Journal of Information Science, 42(2), 230-245.
2. Xia, J., & Liu, Y. (2013). Usage patterns of open genomic data. College & Research Libraries, 74(2), 195-207.