Curation Core Service Specification
- The Libraries commits to maintaining this service (minimally as defined in this specification) for 10 years, 2011-2021. It assumes a hosted instance of HUBzero as its primary infrastructure.
- All publicly-accessible data objects will be "curated data". This service applies only to curated data, which will be stewarded by the Libraries but may be a subset of data objects resident in the dataHUB.
- All curated data objects will be assigned a DataCite DOI that resolves to a representation of the object in the dataHUB.
- A metadata record for each curated data object will be exposed by an OAI-PMH web service be harvested. This will minimally be an unqualified Dublin Core record containing its title, date, and description of the object. The identifier will be exposed as its DOI. If additional metadata formats are available, these will also be exposed via OAI-PMH.
- Data objects will be arranged in the dataHUB in collections by subject or department/center, which will be the primary/default user interface for browsing.
- Metadata describing data objects will be indexed and searchable by keywords, minimally with Boolean operators, and then by field as possible.
- Librarians will be identified for each subject collection in the hub and will be the primary point of contact for it. Librarians will have access to the perform collection management for their collections in the dataHUB. The AskALibrarian widget will be embedded on all pages that relate to data collections.
- The subject specialist librarian for each corresponding subject or department will have the authority and responsibility for the management of the collection and will work with data creators and users, guided by data management plans, user needs, and library practice and policies, in the long-term curation of the collection.
- Libraries will ensure the bit-level preservation of all curated objects in the dataHUB. The fixity of data objects must be established when they become curated, and the dataHUB must support auditing and reporting for preservation purposes.
- The dataHUB must allow on-demand export or replication of all curated objects, their metadata, and semantics in an interoperable manner such that the collection can be adequately represented and this service continued in a different environment than HUBzero.
- All data will be submitted by the owner of these data and consent to a nonexclusive, perpetual license to Purdue to have their data managed. This license along with related ata management plans, preservation requirements, and other policies that apply to objects must be expressed and archived with the objects themselves, preferably in a machine-actionable format.
- Data objects can be embargoed for a period of time or have access limited. The lifting of embargoes can be automated (for example, after a period of time elapses).
Software Development Gap
Minimal software development needed to extend HUBzero platform to support these functions:
- Capability to promote and differentiate data objects to/from "curated" data objects.
- OAI-PMH implementation (from previous project)
- DataCite DOI integration (Handle integration exists from previous project)
- Embed QWidget in web interface.
- Capability to support multiple metadata formats?
- Capability to establish fixity and audit/report object existence and integrity.
- Capability to export or replicate dataHUB content on demand to another environment.
- Embargo functionality.
- Access control level for librarians by-collection.
11/29/10 Michael Witt
Labels:
None
