Skip to content
This repository was archived by the owner on Jun 18, 2024. It is now read-only.
This repository was archived by the owner on Jun 18, 2024. It is now read-only.

Guidance on defining collections to group datasets released as a series or in fragments #258

Closed
@philipashlock

Description

@philipashlock

Data.gov has had the notion of a "collection" that can be used to group multiple datasets that would logically be considered a single dataset, but have been released in separate parts. The most common scenario for this is a series of release over time. In some cases a dataset may by published in monthly or yearly releases, but if the only thing that distinguishes these is date, then they should really be packaged as a single dataset. This also makes browsing simpler - it prevents many similar datasets from crowding out more unique ones. Some datasets might also be published by location, such as data relating to each state being released as a separate file. These should also be grouped together to appear as a single dataset.

Ideally agencies should package these all together as a single file/release before publishing, eg one file that is continuously updated is preferable to separate releases over time, but at the very least there should be a way to define this kind of packaged grouping at the metadata level as is currently the case on data.gov.

The way data.gov handles this is that the collection is essentially treated just as a normal dataset entry but it refers to many child entries. Something similar could be done with the data.json schema, but we would need to establish a convention for defining that parent/child relationship between entries.

Here's a current example of a collection on data.gov

View of the collection "parent" metadata:
http://catalog.data.gov/dataset/tiger-line-shapefile-2010-series-information-file-for-the-2010-census-block-state-ba

View of all its "child" datasets:
http://catalog.data.gov/dataset?collection_package_id=2a8b7f0b-1ae5-453c-ba56-996547266a63

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions