10 tips to preserve data for the long haul
InfoWorld, December 24, 2008
Considering the popularity of electronic communication these days, it isn’t surprising to see the issue of digital storage become a problem. According to IDC, more than 452 exabytes of information were created and replicated in 2008, an amount greater than the world’s available storage capacity. So what gives, why can’t many of us decipher what needs to be stored from what should be purged?
“Not all data should be preserved, but efforts to save important information are being stymied by many factors: complacency, fear that the problem of long-term digital access and preservation is too big to take on, inadequate funding, confusion, and lack of alignment among stakeholders, a new report says.”
So what can IT do to better prioritize their information and improve their data preservation plans? Fran Berman, director of the San Diego Supercomputer Center (SDSC) at the University of California, has 10 tips that can help:
- Make a detailed plan for the stewardship and preservation of your data, from its inception to the end of its lifetime.
- Be aware of data costs including hardware, software, support and time, and include them in your overall IT budget. Determine whether it is more cost-effective to regenerate some of your information rather than preserve it over a long period.
- Associate metadata with your data. Identify relevant standards for data and metadata content and format, and follow them to make sure the data can be used by others.
- Make multiple copies of valuable data. Store some copies off-site and in different systems.
- Plan ahead of time for the transition of digital data to new storage media. Plan budgets for new storage and software technologies, file-format migrations, and time. Move data to new technologies before your storage media become obsolete. (Compare storage products.)
- Plan for transitions in data stewardship. If the data eventually will be turned over to a formal repository, institution or other custodial environment, make sure it meets the requirements of the new environment and that the new steward indeed agrees to take it on.
- Determine the level of “trust” required when choosing how to archive data. Are the resources of the U.S. National Archives and Records Administration necessary, or will Google do?
- Tailor plans for preservation and access to the specific needs of users. Gene-sequence data used daily by hundreds of thousands of researchers worldwide may need a preservation and access infrastructure that’s different from the infrastructure needed, for example, for digital photos viewed occasionally by family members.
- Pay attention to security. Be aware of what you must do to maintain the integrity of your data.
- Know the regulations. Know whether copyright, the Health Insurance Portability and Accountability Act of 1996, the Sarbanes-Oxley Act of 2002, the U.S. National Institutes of Health publishing expectations, or other policies or regulations are relevant to your data. That way, you can make sure your approach to stewardship and publication is compliant.
Tags: Data Preservation, electronic archives, electronic records, IDC, metadata