Database Design and Management of Derived Data In the Context of Big Data
University of Maryland, USA
A database design methodology like the one offered in the Open Model Initiative (http://www.openmodels.at/web/sdbd) is a necessary tool for designing a database. In the old good days of databases, database design was a process that was followed by data entry, data loading, query and application development, testing and optimization before making the database operational. Each of these steps were distinct and time consuming. In the past decade or so, the environment has changed drastically. The majority of the data is generated by machines, scanners, sensors, cameras and a lot of it is offered by multiple external data services on the internet. The result is what we call today Big Data and is characterized by the 5Vs: Volume (very high), Variety (multiple types), Velocity (speed of change), Variability (inconsistencies), and Veracity (quality). The last two Vs require data to be curated and stored before use. This adds redundancy and lineage metadata to the database. In the Big Data context, what is the role of a database design methodology? And what are the necessary tools to deal with the challenges in these 5V dimensions? In this talk, we will cover the basic database design methodology and its extensions to make it a continuous process to handle the 5 Vs. We will also cover continuous database schema evolution and vertical data integration in-the-small necessary to deal with Velocity and derived data.
Lecture at NEMO2016
Date/Time: Wednesday, July 20, 2016 at 11:30