DATALHUB a digital data repository and an alternative to CKAN

This user guide covers using DATALHUB’s web interface to organize, publish and find datasets and digital assets.

Datasets and resources

The DATALHUB is oriented toward the concept of a Dataset *and a *Datum.
A Dataset is a (digital) Data Container. This may be a structure that describes and provides some metadata on digital assets. As an example, the dataset of the budget of a country is contained in a Dataset. The dataset provides general information (metadata) on the provenance of the information, licencing and authorship. Digital assets (datum) such as factual budget, approved budget are related to the Dataset and made available for download. Each digital asset has additionl technical metadata.
A Datum entity is a digital asset, or in a simple language, a file.
Datum or a Digital Asset in DATALHUB

Using DATALHUB

Please refer to the Installation guide for the creation of the first user in Datalhub

Administration Section

Upon succesful login on the page, a mini-flyout administration section will be shown behind a Gear Icon on the right side of the screen.

Datalhub - Administration menu behind the Gear Icon

Clicking the Gear Icon will bring a simple menu that points to the group of elements that you are able to alter with the permissions you are granted.

Generally these elements are:

  • Dataset
  • Organisation
  • Category
  • License
  • Partners
  • (User) Profile

Adding a new dataset

Clicking on Dataset from the Administration menu, will lead to a page showing a list of all the existing datasets. From the list of the datasets, one can select a dataset to be shown a form for the editing of the selected dataset.

The form is also the same and self-explanatory. Important is the ability to ingest digital assets in parallell. Through the same form, the users of the website can upload a large number of digital assets in parallel. Technical metadata for these assets are created on the fly.

DATALHUB will ask for the following information about each dataset.

  • Title – A title describing the dataset.
  • Description – A free form section describing the dataset.
  • Organization – Relates a dataset to an organization. Normally this will point to the organisation that issued or contributed to the creation of the digital dataset.
  • Author – Indicates the creator of the dataset in the DATALHUB.
  • Maintainer – Refers to a person or organisation that maintains and takes care of the dataset. This is a free text field.
  • Created & Revision – Are date references that point each operation to a specific timeline
  • Lineage – is a free text field that should be used to point to prior datasets that are extended in the current dataset. This is a classical provenance reference.
  • Category – pins the current dataset to a specific category, or topic of interest. The category has to be selected from a drop down which is populated by categories defined in the Categories Section.
  • Update Frequency – Will probably be never used, or filled with never in most of the cases
  • Tags – Are yet another word for keywords that best describe the dataset being uploaded. Examples could be “health”, “North Albania” etc.
  • License – it is important to include license information so that people know how they can use the data. This field is a drop-down box.

Adding Organisations, Categories, Licenses

A Dataset contains references to list of predefined entities in the groups of organisations, categories and licences. The predefined entities can be ammended at any time through a similar modification menu. Clicking on the link for each of these entites in the Administration Fly-out menu will lead to respective sections where you can create, edit or remove (if not referenced) instances of each of these entities.

Search

Search is based on a AngularJS faceted search. This will behave well up to a dozen thousands of dataset records. A faceted search, that will have no problem of handling millions of records based on a document index is planned, although work has not started.

API

This section documents DATALHUB’s API, for developers who want to write code that interacts with DATALHUB sites and their data. The following are a set of operations you can handle through the API
Get JSON-formatted lists of a site’s datasets:

http://$DOMAIN/api/dataset

GET JSON/XML formatted list of a specific dataset

http://$DOMAIN/api/dataset/$datasetID

The default response is JSON, but you can easily switch to XML by appending .xml by the end of the call.

http://$DOMAIN/api/dataset.xml

GET JSON/XML formatted metadata for a specific digital asset

http://$DOMAIN/api/datum/$datumID

The list of the digital assets is provided in the retrieval of a dataset. The get datum call returns the metadata for a digital data. The download path for a specific digital asset is: http://$DOMAIN/download/download?datumId=$datumID&datasetId=$datasetID

API calls for creation, upload, update of Datasets and Digital Assets is not active at this stage. If you would like to use this application and need these calls, drop me a line.

%d bloggers like this: