Semantic Big Data (SBD 2016)

Workshop @ ACM SIGMOD 2016


International Workshop on
Semantic Big Data (SBD 2016)
Call for Papers: txtUTF-8 txtASCII pdf

The International Workshop on Semantic Big Data (SBD 2016)

In conjunction with ACM SIGMOD 2016


Aims of the Workshop

The current World-Wide Web enables an easy, instant access to a vast amount of online information. However, the content in the Web is typically for human consumption, and is not tailored for machine processing. The Semantic Web is hence intended to establish a machine-understandable Web, and is currently also used in many other domains and not only in the Web. The World Wide Web Consortium (W3C) has developed a number of standards around this vision. Among them is the Resource Description Framework (RDF), which is used as the data model of the Semantic Web. The W3C has also defined SPARQL as the RDF query language, RIF as the rule language, and the ontology languages RDFS and OWL to describe schemas of RDF. The usage of common ontologies increases interoperability between heterogeneous data sets, and the proprietary ontologies with the additional abstraction layer facilitate the integration of these data sets. Therefore, we can argue that the Semantic Web is ideally designed to work in heterogeneous Big Data environments.

We define Semantic Big Data as the intersection of Semantic Web data and Big Data. There are masses of Semantic Web data freely available to the public - thanks to the efforts of the linked data initiative. According to the current freely available Semantic Web data is approximately 90 billion triples in over 3,300 datasets, many of which are accessible via SPARQL query servers called SPARQL endpoints. Everyone can submit SPARQL queries to SPARQL endpoints via a standardized protocol, where the queries are processed on the datasets of the SPARQL endpoints and the query results are sent back in a standardized format. Hence, not only Semantic Big Data is freely available, but also distributed execution environments for Semantic Big Data are freely accessible. This makes the Semantic Web an ideal playground for Big Data research.

The goal of this workshop is to bring together academic researchers and industry practitioners to address the challenges and report and exchange the research findings in Semantic Big Data, including new approaches, techniques and applications, make substantial theoretical and empirical contributions to, and significantly advance the state of the art of Semantic Big Data.

Types of Papers

The workshop solicits papers of different types:

  • Research Papers propose new approaches, theories or techniques related to Semantic Big Data including new data structures, algorithms and whole systems. They should make substantial theoretical and empirical contributions to the research field.

  • Experiments and Analysis Papers focus on the experimental evaluation of existing approaches including data structures and algorithms for Semantic Big Data and bring new insights through the analysis of these experiments. Results of Experiments and Analysis Papers can be, for example, showing benefits of well-known approaches in new settings and environments, opening new research problems by demonstrating unexpected behavior or phenomena, or comparing a set of traditional approaches in an experimental survey.

  • Application Papers report practical experiences on applications of Semantic Big Data. Application Papers might describe how to apply Semantic Web technologies to specific application domains with big data demands like social networks, web search, e-business, collaborative environments, e-learning, medical informatics, bioinformatics and geographic information system. Application Papers might describe applications using linked data in a new way.

  • Vision Papers identify emerging new or future research issues and directions, and describe new research visions having demands for Semantic Big Data. The new visions will potentially have great impacts on society.

Topics of Interest

We welcome papers on the following topics:

  • Semantic Data Management, Query Processing and Optimization in

    • Big Data
    • Cloud Computing
    • Internet of Things
    • Graph Databases
    • Federations
    • Spatial and Spatio-Temporal Data

  • Evaluation strategies for Semantic Big Data of Rule-based Languages like RIF and SWRL
  • Ontology-based Approaches for Modeling, Mapping, Evolution and Real-world ontologies in the context of Semantic Big Data
  • Reasoning Approaches (Real-World Applications, Efficient Algorithms) especially designed for Semantic Big Data environments
  • Linked Data

    • Integration of Heterogeneous Linked Data
    • Real-World Applications
    • Statistics and Visualizations
    • Quality
    • Ranking Techniques
    • Provenance
    • Mining and Consuming Linked Data

  • Semantic Web stream processing (Dynamic Data, Temporal Semantics)
  • Semantic Internet of Things
  • Semantic Smart Homes/Companies/Cities
  • Performance, Evaluation and Benchmarking of Semantic Web Technologies, Applications and Databases
  • Semantic Web Services
  • Semantic Big Data Archives

    • Efficient Archiving and Preservation Techniques
    • Evolution Representation
    • Compression Approaches
    • Querying Techniques

  • Semantic Big Data on Emergent Hardware Technologies

    • FPGA
    • GPU
    • SSD
    • Main-Memory Databases

Important Dates

Time Schedule
Submission (extended): February 29, 2016
Notification: April 22, 2016
Workshop: July 1, 2016

Diversity Considerations of the Program Committee

We have currently recruited 46 PC members and chairs listed below who are experts in the topics of interest of our workshop. The current PC members and chairs are selected from 17 nations all over the world as shown also by the map below. While most PC members are from academia, we have 5 experts also from industry (11%). 8 of the PC members and chairs are women (17%).


Program committee members and chairs: 1  10

Program Committee Chairs

Program Committee

  • Muhammad Intizar Ali, DERI, National University of Ireland, Ireland
  • Carlos Buil Aranda, Universidad Técnica Federico Santa María, Chile
  • Feng Cao, IBM China Research Laboratory, China
  • Isabel Cruz, University of Illinois at Chicago, USA
  • Paulo Rupino da Cunha, University of Coimbra, Portugal
  • Melike Şah Direkoglu, Near East University, North Cyprus
  • Julian Dolby, IBM Research, USA
  • Vadim Ermolayev, Zaporozhye National University, Ukraine
  • Javier D. Fernández, Vienna University of Economics and Business, WU Vienna, Austria
  • Carlos Juiz García, Universitat de les Illes Balears, Spain
  • Panagiotis Germanakos, University of Cyprus, Cyprus
  • Katja Gilly de La Sierra-Llamazares, Miguel Hernandez University, Spain
  • Ekaterini Ioannou, Technical University of Crete, Greece
  • Prudhvi Janga, University of Cincinnati and Amazon Web Services, USA
  • Ioannis Konstantinou, National Technical University of Athens, Greece
  • Nectarios Koziris, National Technical University of Athens, Greece
  • Herbert Kuchen, University of Münster, Germany
  • Wookey Lee, Inha University, Korea
  • Isaac Lera, Universitat de les Illes Balears, Spain
  • Xiang Lian, University of Texas - Pan American Texas, USA
  • Qing Liu, CSIRO, Australia
  • Nuno Lopes, Smarter Cities Technology Centre, IBM Research, Dublin, Ireland
  • Fadi Maali, National University of Ireland Galway, Ireland
  • Ioana Manolescu, INRIA and Université Paris-Sud, France
  • Daniel Miranker, The University of Texas at Austin, USA
  • Z. Meral Özsoyoglu, Case Western Reserve University, USA
  • Grażyna Paliwoda-Pękosz, Cracow University of Economics, Poland
  • Nikolaos Papailiou, National Technical University of Athens, Greece
  • Richard Picking, Glyndwr University, UK
  • Alfredo Pulvirenti, University of Catania, Italy
  • Louiqa Raschid, University of Maryland, USA
  • Sherif Sakr, School of Computer Science and Engineering University of New South Wales, Australia
  • Ismael Sanz, Universitat Jaume I, Spain
  • Stephan Seufert, Trifacta, Inc., USA
  • Rudi Studer, Institute AIFB, Karlsruhe Institute of Technology (KIT), Germany
  • Dezhao Song, Research and Development of Thomson Reuters, USA
  • Martin Theobald, University of Ulm, Germany
  • Dimitrios Tsoumakos, Department of Informatics, Ionian University, Greece
  • Juergen Umbrich, Vienna University of Economics and Business, Vienna, Austria
  • Dongyan Zhao, Peking University Beijing, China
  • Xiang ZHAO, National University of Defense Technology, China
  • Weiguo Zheng, Chinese University of Hong Kong, China
  • Dimitrios Zissis, University of the Aegean, Greece
  • Lei Zou, Peking University, China

Evaluation of Papers

To verify the originality of submissions, we will use Plagiarism Detection Tools to check the content of the submitted manuscripts against previous publications.

Papers will be evaluated according to the following aspects:

  • Relevance to the Workshop
  • Novelty and Practical Impacts
  • Technical Soundness
  • Appropriateness and Adequacy of
    • Literature Review
    • Background Discussion
    • Analysis of Issues
  • Presentation, including
    • Overall Organization
    • English
    • Readability

Accepted Papers

The proceedings are available here in ACM DL.


Session 1

Time Type Description
8:30: keynote Pascal Hitzler:
Semantic Technologies for Big Data Integration
Abstract: Increasing amounts of data are shared, often publicly on the World Wide Web, for reuse by third parties. Such reuse usually necessitates the integration of this data with other data, or with software, in order to enable data-based applications, fine-grained search, data analytics, etc. This integration is often a significant cost factor due to the wide variance regarding representational choices for data, ranging from syntactic data formats to semantic heterogeneity stemming from different viewpoints of data providers. In this presentation, we will shed light on the role of knowledge modeling for data sharing and reuse. In particular, we will discuss how Semantic Web Technologies make it easier to integrate and thus reuse heterogeneous data.
Bio: Pascal Hitzler is (full) Professor and Director of Data Science at the Department of Computer Science and Engineering at Wright State University in Dayton, Ohio, U.S.A. His research record lists over 300 publications in such diverse areas as semantic web, neural-symbolic integration, knowledge representation and reasoning, machine learning, denotational semantics, and set-theoretic topology. He is Editor-in-chief of the Semantic Web journal by IOS Press, and of the IOS Press book series Studies on the Semantic Web. He is co-author of the W3C Recommendation OWL 2 Primer, and of the book Foundations of Semantic Web Technologies by CRC Press, 2010 which was named as one out of seven Outstanding Academic Titles 2010 in Information and Computer Science by the American Library Association's Choice Magazine, and has translations into German and Chinese. He is on the editorial board of several journals and book series and is a founding steering committee member of the Web Reasoning and Rule Systems (RR) conference series, of the Neural-Symbolic Learning and Reasoning (NeSy) workshop series, and of the Association for Ontology Design and Patterns (ODPA). For more information, see
9:15: paper Sagnik Ray Choudhury, Shuting Wang, C. Lee Giles:
Scalable Algorithms for Scholarly Figure Mining and Semantics
DOI: 10.1145/2928294.2928305
9:40: paper Jian Wu, Chen Liang, Huaiyu Yang, C. Lee Giles:
CiteSeerX Data: Semanticizing Scholarly Papers
DOI: 10.1145/2928294.2928306
10:05: break Coffee Break

Session 2

Time Type Description
10:30: paper Sangkeun Lee, Supriya Chinthavali, Sisi Duan, Mallikarjun Shankar:
Utilizing Semantic Big Data for realizing a National-scale Infrastructure Vulnerability Analysis System
DOI: 10.1145/2928294.2928295
10:55: paper Richard M. Keller, Shubha Ranjan, Mei Y. Wei, Michelle M. Eshow:
Semantic Representation and Scale-up of Integrated Air Traffic Management Data
DOI: 10.1145/2928294.2928296
11:20: paper Stefano Bortoli, Flavio Pompermaier, Paolo Bouquet, Andrea Molinari:
Semantic Big Data for Tax Assessment
DOI: 10.1145/2928294.2928297
11:45: paper Mohammad Sadnan Al Manir, Alexandre Riazanov, Harold Boley, Artjom Klein, Christopher J.O. Baker:
Automated Generation of SADI Semantic Web Services for Clinical Intelligence
DOI: 10.1145/2928294.2928298
12:10: break Lunch Break (lunch on your own)

Session 3

Time Type Description
13:30: keynote Ivan Bercovich:
General purpose semantic platform as an information retrieval system
Abstract: Over the past couple decades, information retrieval systems could be roughly categorized into two groups: keyword search and faceted search. Keyword search is the most popular offering, primarily driven by giant search engines like Google and Bing. Faceted search applications tend to be more narrow, and focused on specific verticals, such as e-commerce, travel, cars, etc. While traditional search provides convenience, breadth, and flexibility, it lacks when it comes to the precision and structure of the results. On the other hand, faceted search is more constrained, but the results often convey a higher degree of structure and context. Roughly, we can say keyword search retrieves documents, whereas faceted search returns records/entities. In essence, traditional search provides a more natural interface to prompt queries, while faceted search provides more optimal results. Therefore, the ideal experience would combine a natural language approach to query construction, combined with a structured knowledge base to power the results. In this presentation we will show a working product, powered by a comprehensive knowledge graph (data) and the corresponding knowledge platform (software), which leverages insights from the fields of data ingestion, semantic data, natural language processing, and faceted search, to create a hybrid information retrieval experience. In order to achieve this experience, we had to build a vast knowledge graph, with billions of entities and relationships and hundreds of billions of facts. We cover dozens of verticals, from politics, to sports, to health, and have hundreds of entity-collections for each one. Our knowledge graph is seen by over 300 million eyeballs a month, both through our owned and operated websites, as well as through our partnerships with publishers and other enterprises.
Bio: Ivan Bercovich is Vice President of Engineering at Graphiq. For more information, see
14:10: paper Hassan Issa, Ludger van Elst, Andreas Dengel:
Using Smartphones for Prototyping Semantic Sensor Analysis Systems
DOI: 10.1145/2928294.2928299
14:35: paper Shohreh Hosseinzadeh, Natalia Díaz Rodríguez, Seppo Virtanen, Johan Lilius:
A semantic security framework and context-aware role-based access control ontology for Smart Spaces
DOI: 10.1145/2928294.2928300
15:00: break Coffee Break

Session 4

Time Type Description
15:30: paper Rafael Peixoto, Thomas Hassan, Christophe Cruz, Aurélie Bertaux, Nuno Silva:
An unsupervised classification process for large datasets using web reasoning
DOI: 10.1145/2928294.2928301
Extended Version: DOI: 10.19210/1006.3.1.1
15:55: paper Marta Tatu, Steven Werner, Mithun Balakrishna, Tatiana Erekhinskaya, Dan Moldovan:
Semantic Question Answering on Big Data
DOI: 10.1145/2928294.2928302
Extended Version: DOI: 10.19210/1006.3.1.16
16:20: paper Pieter Pauwels, Tarcisio Mendes de Farias, Chi Zhang, Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle:
Querying and reasoning over large scale building data sets: an outline of a performance benchmark
DOI: 10.1145/2928294.2928303
16:45: paper Dieter De Witte, Laurens De Vocht, Ruben Verborgh, Kenny Knecht, Filip Pattyn, Hans Constandt, Erik Mannens, Rik Van de Walle:
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
DOI: 10.1145/2928294.2928304
17:10: break End of Workshop

Manuscript Preparation

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Manuscripts should be submitted electronically as PDF files using this webpage and be formatted using the camera-ready templates in the ACM proceedings double-column format. Papers cannot exceed 6 pages in length.

Accepted papers will be published online in the ACM digital library. The papers must include the standard ACM copyright notice on the first page.

The pdf version of your paper should consider the following items:

  • The pdf be optimized for fast web viewing.
  • The pdf should apply the ACM Computing Classification categories and terms (CCS concepts). The ACM templates provide space for this indexing and please consider the Computing Classification Scheme.
  • The pdf should contain the keywords.
  • The pdf should have the rights management statement and bibliographic strip on the bottom of the first page left column.
  • Please start numbering your paper with page number 1.
  • The pdf should have Type 1 fonts (scalable), not Type 3 (bit-mapped). All fonts MUST be embedded within the PDF file (to be corrected in the source files before the PDF is generated according to ACM documentation).


The submission is currently closed. Please check our Important Dates page.

Contact Program Chairs

Please contact us for any further information:

Previous Editions

Please use the following links for further information on the edition of the given year of the International Workshop on Semantic Big Data (SBD):