Session A3

A DDI Tools Session: Examples and Application Challenges

May 29, 2013

Track

Data Developers and Tools

Venue

Lambertus
10:30-12:30

Chair/Moderator

Thomas Bosch
GESIS – Leibniz Institute for the Social Sciences

The Data Documentation Initiative (DDI) is an acknowledged international standard for the documentation and management of data from the social, behavioral, and economic sciences. The DDI Tools Catalogue provides detailed information about available Open Source and commercial tools somehow related to the different DDI versions. In this session, presenters will give detailed descriptions as well as live demonstrations of newly developed, powerful, and representative DDI tools. The purpose of the Microdata Information System (MISSY) is to document European studies on the study and on the variable level. Software applications can reuse the same common data model and the freely available software components of the MISSY project. The software utility suite DataForge facilitates the reading and the writing of data across packages, produces various flavors of DDI metadata, and performs other useful operations around statistical datasets, to support data management, dissemination, or analysis activities. Colectica for Excel is a free Excel add-in to document spreadsheets using DDI. Colectica and Nesstar can be integrated. Datasets and studies can be documented in Colectica using DDI Lifecycle and datasets and studies can be published to the Nesstar server using the Nesstar API.

Presenters

A Business Perspective on Use-Case-Driven Challenges for Software Architectures to Document Study and Variable Information

Thomas Bosch, Matthäus Zloch, Dennis Wegener
GESIS – Leibniz Institute for the Social Sciences

The DDI Discovery Vocabulary represents the most important parts of DDI-Codebook and DDI-Lifecycle in the Web of Data covering the discovery use case. Now, researchers have the possibility to publish and link their data and metadata in the Linked Open Data Cloud. In various software projects in the statistical domain these basic DDI concepts are reused to a large extent. Thus, the DDI Discovery Vocabulary could serve as a common abstract data model for all of these software projects. As software products have individual requirements, the common abstract data model has to be customized in form of individual data models. The projects MISSY and StarDat are used to document variable and study information, respectively. They could serve as representative use cases to show how abstract data models and project-specific data models cover the requirements of these kind of projects. This presentation gives an overview of the project's software architecture and the interaction of their layers and components. In addition, the succeeding talk about the detailed technical perspective (on use-case-driven challenges for software architectures to document study and variable information) describes how to implement the proposed software architecture and how to implement different persistence formats like DDI-XML, DDI-RDF, and relational databases.

A Technical Perspective on Use-Case-Driven Challenges for Software Architectures to Document Study and Variable Information

Matthäus Zloch, Thomas Bosch, Dennis Wegener
GESIS – Leibniz Institute for the Social Sciences

Leveraging software architecture techniques to build up well defined software projects like the model-view-controller-pattern has several advantages: the separation of self-contained functionality and the generation of interactive modules. Creating an abstract application programming interface (API), the main intention is to develop individual software projects which can profit from sharing functionality. As shown in the previous presentation on the business perspective, the idea is to create a reusable core data model - the DDI Discovery Vocabulary, which can be extended and adjusted to the requirements of an individual project. In this presentation, however, we will show how the abstract implementation of the DDI Discovery Vocabulary model integrates well into structured software architecture of a software project and how it might be extended. This is shown by means of the technical implementation of the use case project called MISSY. Based on the business perspective and the requirements to the data model of a project, some possible physical persistence implementations to store data are presented as an API. We will also give a step-by-step guidance into how a project, which uses the DDI Discovery Vocabulary as an exchange format and core data model, can be build up from scratch.

DataForge

Pascal Heus
Metadata Technologies North America

Statistical data exist in many different shapes and forms such as proprietary software files (SAS, Stata, SPSS), ASCII text (fixed, CSV, delimited), databases (Microsoft, Oracle, MySql), or spreadsheets (Excel). Such wide variety of formats present producers, archivists, analysts, and other users with significant challenges in terms of data usability, preservation, or dissemination. These files also commonly contain essential information, like the data dictionary, that can be extracted and leveraged for documentation purposes, task automation, or further processing. Metadata Technology will be launching mid-2013 a new software utility suite, "DataForge", for facilitating reading/writing data across packages, producing various flavors of DDI metadata, and performing other useful operations around statistical datasets, to support data management, dissemination, or analysis activities. DataForge will initially be made available as desktop based products under both freeware and commercial licenses, with web based version to follow later on. IASSIST 2013 will mark the initial launch of the product. This presentation will provide an overview of DataForge capabilities and describe how to get access to the software.

Colectica for Excel: Increasing Data Accessibility Using Open Standards

Jeremy Iverson, Dan Smith
Colectica

Traditionally, data in spreadsheets and plain text formats do not contain rich documentation. Often, single-word column headers are the only hint given to data users, making it difficult to make sense of the data. Colectica for Microsoft Excel is a new, free tool to document your spreadsheet data using DDI, the open standard for data documentation. With this Excel add-in, users can add extensive information about each column of data. Variables, Code Lists, and the datasets can be globally identified and described in a standard format. This documentation is embedded with the spreadsheet, ensuring the information is available when data are shared. The add-in also adds support for SPSS and Stata formats to Excel. When opening an SPSS or Stata file in Excel, standard metadata is automatically created from the variable and value labels. Colectica for Excel can create print-ready reports based on the data documentation. The information can also be exported to the DDI standard, which can be ingested into other standards-based tools. This presentation will include a live demonstration of the Colectica for Excel tool, showing how to document the contents of a spreadsheet, publish the information, and use the documentation to access data in an informed way.

Integrating Colectica, Nesstar, and DDI-Lifecycle

Dan Smith
Colectica

Ørnulf Risnes
Nesstar

Both Colectica and Nesstar are software applications based on the DDI metadata standard. Many organizations use Nesstar for publishing data on the web, documented with DDI Codebook. Many organizations also use Colectica to document their studies and datasets using DDI Lifecycle. With the creation of the new Nesstar API, an integration opportunity exists to publish data described in DDI Lifecycle to the DDI Codebook based Nesstar server. This presentation will demonstrate the integration that is now implemented between Nesstar and Colectica. In this demo, a dataset and a study documented in Colectica using DDI Lifecycle metadata will be published to a Nesstar server using the Nesstar API. This integration allows users of both software packages greater opportunities for expanded metadata documentation and improved data publication and visualization.