DATALHUB a digital data repository and an alternative to CKAN

This user guide covers using DATALHUB’s web interface to organize, publish and find datasets and digital assets.

Datasets and resources

The DATALHUB is oriented toward the concept of a Dataset *and a *Datum.
A Dataset is a (digital) Data Container. This may be a structure that describes and provides some metadata on digital assets. As an example, the dataset of the budget of a country is contained in a Dataset. The dataset provides general information (metadata) on the provenance of the information, licencing and authorship. Digital assets (datum) such as factual budget, approved budget are related to the Dataset and made available for download. Each digital asset has additionl technical metadata.
A Datum entity is a digital asset, or in a simple language, a file.
Datum or a Digital Asset in DATALHUB

Using DATALHUB

Please refer to the Installation guide for the creation of the first user in Datalhub

Administration Section

Upon succesful login on the page, a mini-flyout administration section will be shown behind a Gear Icon on the right side of the screen.

Datalhub - Administration menu behind the Gear Icon

Clicking the Gear Icon will bring a simple menu that points to the group of elements that you are able to alter with the permissions you are granted.

Generally these elements are:

  • Dataset
  • Organisation
  • Category
  • License
  • Partners
  • (User) Profile

Adding a new dataset

Clicking on Dataset from the Administration menu, will lead to a page showing a list of all the existing datasets. From the list of the datasets, one can select a dataset to be shown a form for the editing of the selected dataset.

The form is also the same and self-explanatory. Important is the ability to ingest digital assets in parallell. Through the same form, the users of the website can upload a large number of digital assets in parallel. Technical metadata for these assets are created on the fly.

DATALHUB will ask for the following information about each dataset.

  • Title – A title describing the dataset.
  • Description – A free form section describing the dataset.
  • Organization – Relates a dataset to an organization. Normally this will point to the organisation that issued or contributed to the creation of the digital dataset.
  • Author – Indicates the creator of the dataset in the DATALHUB.
  • Maintainer – Refers to a person or organisation that maintains and takes care of the dataset. This is a free text field.
  • Created & Revision – Are date references that point each operation to a specific timeline
  • Lineage – is a free text field that should be used to point to prior datasets that are extended in the current dataset. This is a classical provenance reference.
  • Category – pins the current dataset to a specific category, or topic of interest. The category has to be selected from a drop down which is populated by categories defined in the Categories Section.
  • Update Frequency – Will probably be never used, or filled with never in most of the cases
  • Tags – Are yet another word for keywords that best describe the dataset being uploaded. Examples could be “health”, “North Albania” etc.
  • License – it is important to include license information so that people know how they can use the data. This field is a drop-down box.

Adding Organisations, Categories, Licenses

A Dataset contains references to list of predefined entities in the groups of organisations, categories and licences. The predefined entities can be ammended at any time through a similar modification menu. Clicking on the link for each of these entites in the Administration Fly-out menu will lead to respective sections where you can create, edit or remove (if not referenced) instances of each of these entities.

Search

Search is based on a AngularJS faceted search. This will behave well up to a dozen thousands of dataset records. A faceted search, that will have no problem of handling millions of records based on a document index is planned, although work has not started.

API

This section documents DATALHUB’s API, for developers who want to write code that interacts with DATALHUB sites and their data. The following are a set of operations you can handle through the API
Get JSON-formatted lists of a site’s datasets:

http://$DOMAIN/api/dataset

GET JSON/XML formatted list of a specific dataset

http://$DOMAIN/api/dataset/$datasetID

The default response is JSON, but you can easily switch to XML by appending .xml by the end of the call.

http://$DOMAIN/api/dataset.xml

GET JSON/XML formatted metadata for a specific digital asset

http://$DOMAIN/api/datum/$datumID

The list of the digital assets is provided in the retrieval of a dataset. The get datum call returns the metadata for a digital data. The download path for a specific digital asset is: http://$DOMAIN/download/download?datumId=$datumID&datasetId=$datasetID

API calls for creation, upload, update of Datasets and Digital Assets is not active at this stage. If you would like to use this application and need these calls, drop me a line.

Groovy/Grails Recursive Function/Closure

Since I keep waisting time in recursive functions (and forget what I developped a few months back), here is a piece of code for a recursive function in Groovy.

def getAllChildren(entityId) {
                //Container for the results
		def results = []
		//Retrieve your first element from somewhere
		def entity = entityService.getEntity(entityId)
		if (entity) {
			results.add([entity.id,entity.label])
			entity.children?.each { child ->
				results.addAll(getAllChildren(child))
			}
		}
		return results
	}

 

Grails – Language prefix in URL mappings – Language in URL as subdirectory

Starting a new project in Grails might lead to the need to support different languages. This can be done through a default ?lang=locale supported natively by Grails, but if you would like to provide a SEO friendly approach, then you might need to tweak your solution.
First of all, Google has provided a set of recommendation on how to support Multi-regional and multilingual sites. When it comes to URLs, they mention that best practices might include using different geo-domain for each language, using subdomains or subdirectories.

Google Recommendation

URL structures
Consider using a URL structure that makes it easy to geotarget parts of your site to different regions. The following table outlines your options:

URL structure Example Pros Cons
Country-specific example.al
  • Clear geotargeting
  • Server location irrelevant
  • Easy separation of sites
  • Expensive (can have limited availability)
  • Requires more infrastructure
  • Strict ccTLD requirements (sometimes)
Subdomains with gTLDS de.example.com
  • Easy to set up
  • Can use Webmaster Tools geotargeting
  • Allows different server locations
  • Easy separation of sites
  • Users might not recognize geotargeting from the URL alone (is “de” the language or country?)
Subdirectories with gTLDs example.com/de/
  • Easy to set up
  • Can use Webmaster Tools geotargeting
  • Low maintenance (same host)
  • Users might not recognize geotargeting from the URL alone
  • Single server location
  • Separation of sites harder
URL parameters site.com?loc=de
  • Not recommended.
  • URL-based segmentation difficult
  • Users might not recognize geotargeting from the URL alone
  • Geotargeting in Webmaster Tools is not possible

URL Mapping in Grails

Grails has a very neat way of mapping resources to URLs through the URL Mapping.

While the default URL Mapping is:

1
2
3
4
5
 /$controller/$action?/$id?(.$format)?"{
    constraints {
      // apply constraints here
    }
}

Adding support for directories is as simple as adding a $lang parameter

1
2
3
4
5
  /$lang/$controller/$action?/$id?(.$format)?"{
    constraints {
      // apply constraints here
    }
 }

Writing /de/controllername/action will automatically have support for the new language.

The challenge is in defining a default language where the /controllername/action can map without the need of the $lang-parameter.

Handling Exceptions in Grails

Improve error handling – Exceptions

The following document is a brief walk through to Exception Handling

"Grails controllers support a simple mechanism for declarative exception handling. If a controller declares a method that accepts a single argument and the argument type is java.lang.Exception or some subclass of java.lang.Exception, that method will be invoked any time an action in that controller throws an exception of that type. See the following example":

class ElloController  {
    def index() { 
        def message="Resource was not found"
        throw new NotFoundException(message);
    }

    def handleNotFoundExceptio(NotFoundException e) {
        response.status=404
        render ("error found")
}

In the previous example, a simple blank page will the message Error not found will be shown on the invocation of the controller
Important:

  • The exception handler method names can be any valid method name. The name is not what makes the method an exception handler, the Exception argument type is the important part.
  • The exception handler methods can do anything that a controller action can do including invoking render, redirect, returning a model, etc.

We need to avoid including redundant methods in every class. Therefore traits in Grails provide a clever way to include some methods to a class. With regard to Exceptions, we create a traits groovy file including the neccessary Exception Handlers. Example:

package com.xpo6.exception

trait NotFoundExceptionHandler {
    def handleNotFoundExceptio(NotFoundException e) {
        response.status=404
        render ("error found")
    }
}

This trait can be included in any controller through an implements directive. Thus our old controller becomes:

import com.xpo6.exception.NotFoundExceptionHandler
import com.xpo6.exception.NotFoundException

class ElloController implements NotFoundExceptionHandler {
    def index() { 
        throw new NotFoundException("Resource was not found");
    }
}

As it can be seen, the methods are no longer in the controller but are moved into the traits and included from the implements directive.

Exceptions can easily be invoked in services as well. *Controllers consuming such services should have the implementation of the traits or a method to Handle the Exceptions.

package test
import com.xpo6.exception.NotFoundException
class ElloService {
    def serviceMethod() {
        throw new NotFoundException("Resource was not found - thrown from service");
    }
}

References

References: