More than ever, reducing the cost of data integration by efficiently evaluating queries is an important challenge, given that today the economic cost in computing cycles (see your cloud invoice); and in energy consumption and the performance required for some critical tasks have become important. Besides, new applications require solving even more complex queries including millions of sources, and data with high levels of volume and variety. These new challenges call for intelligent processes that can learn from previous experiences, that can be adaptable to changing requirements and dynamic execution contexts.
Massive heterogeneous data integration is part of a continuum that starts with data, goes through sources and lands in knowledge extraction and decision making processes. Despite of the number of years of academic and industrial research and consolidated results, data integration is still an important topic with open issues like data quality, trusted data, data providers and processing operations, trusted infrastructures dealing with data, versus different understanding of what are trust, quality and acceptable levels of such properties according to different data consumers requirements.
Underlying approaches and algorithms continue to evolve particularly given the new levels of volume, velocity and value associated to data. Contemporary infrastructures for dealing with data are deployed in heterogeneous target architectures like cloud, multi-cloud, Internet of Things consisting of sensors and server farms deployed around the world. These infrastructures go beyond classic data management solutions and evolve towards different stacks configurations (data processing systems “a la carte”). They need to cope to new notions of scalability and of resources consumption guided by economic models, service level agreements and other quality warranties.
OBJECTIVE
The workshop STRAPS aims at promoting scientific discussion on the way data stemming from different providers and produced under different conditions can be efficiently integrated to answer simple, relational, analytical queries ensuring providers, algorithms and data trust.