Semantic Big Data (SBD 2017)

Workshop @ ACM SIGMOD 2017


International Workshop on
Semantic Big Data (SBD 2017)
Call for Papers: txtUTF-8 txtASCII pdf

The International Workshop on Semantic Big Data (SBD 2017)

In conjunction with ACM SIGMOD 2017

Aims of the Workshop

The current World-Wide Web enables an easy, instant access to a vast amount of online information. However, the content in the Web is typically for human consumption, and is not tailored for machine processing. The Semantic Web is hence intended to establish a machine-understandable Web, and is currently also used in many other domains and not only in the Web. The World Wide Web Consortium (W3C) has developed a number of standards around this vision. Among them is the Resource Description Framework (RDF), which is used as the data model of the Semantic Web. The W3C has also defined SPARQL as the RDF query language, RIF as the rule language, and the ontology languages RDFS and OWL to describe schemas of RDF. The usage of common ontologies increases interoperability between heterogeneous data sets, and the proprietary ontologies with the additional abstraction layer facilitate the integration of these data sets. Therefore, we can argue that the Semantic Web is ideally designed to work in heterogeneous Big Data environments.

We define Semantic Big Data as the intersection of Semantic Web data and Big Data. There are masses of Semantic Web data freely available to the public - thanks to the efforts of the linked data initiative. According to the current freely available Semantic Web data is approximately 150 billion triples in over 2,800 datasets, many of which are accessible via SPARQL query servers called SPARQL endpoints. Everyone can submit SPARQL queries to SPARQL endpoints via a standardized protocol, where the queries are processed on the datasets of the SPARQL endpoints and the query results are sent back in a standardized format. Hence, not only Semantic Big Data is freely available, but also distributed execution environments for Semantic Big Data are freely accessible. This makes the Semantic Web an ideal playground for Big Data research.

The goal of this workshop is to bring together academic researchers and industry practitioners to address the challenges and report and exchange the research findings in Semantic Big Data, including new approaches, techniques and applications, make substantial theoretical and empirical contributions to, and significantly advance the state of the art of Semantic Big Data.

Types of Papers

The workshop solicits papers of different types:

  • Research Papers propose new approaches, theories or techniques related to Semantic Big Data including new data structures, algorithms and whole systems. They should make substantial theoretical and empirical contributions to the research field.

  • Experiments and Analysis Papers focus on the experimental evaluation of existing approaches including data structures and algorithms for Semantic Big Data and bring new insights through the analysis of these experiments. Results of Experiments and Analysis Papers can be, for example, showing benefits of well-known approaches in new settings and environments, opening new research problems by demonstrating unexpected behavior or phenomena, or comparing a set of traditional approaches in an experimental survey.

  • Application Papers report practical experiences on applications of Semantic Big Data. Application Papers might describe how to apply Semantic Web technologies to specific application domains with big data demands like social networks, web search, e-business, collaborative environments, e-learning, medical informatics, bioinformatics and geographic information system. Application Papers might describe applications using linked data in a new way.

  • Vision Papers identify emerging new or future research issues and directions, and describe new research visions having demands for Semantic Big Data. The new visions will potentially have great impacts on society.

Topics of Interest

We welcome papers on the following topics:

  • Semantic Data Management, Query Processing and Optimization in

    • Big Data
    • Cloud Computing
    • Internet of Things
    • Graph Databases
    • Federations
    • Spatial and Spatio-Temporal Data

  • Evaluation strategies for Semantic Big Data of Rule-based Languages like RIF and SWRL
  • Ontology-based Approaches for Modeling, Mapping, Evolution and Real-world ontologies in the context of Semantic Big Data
  • Reasoning Approaches (Real-World Applications, Efficient Algorithms) especially designed for Semantic Big Data environments
  • Linked Data

    • Integration of Heterogeneous Linked Data
    • Real-World Applications
    • Statistics and Visualizations
    • Quality
    • Ranking Techniques
    • Provenance
    • Mining and Consuming Linked Data

  • Semantic Web stream processing (Dynamic Data, Temporal Semantics)
  • Semantic Internet of Things
  • Semantic Smart Homes/Companies/Cities
  • Performance, Evaluation and Benchmarking of Semantic Web Technologies, Applications and Databases
  • Semantic Web Services
  • Semantic Big Data Archives

    • Efficient Archiving and Preservation Techniques
    • Evolution Representation
    • Compression Approaches
    • Querying Techniques

  • Semantic Big Data on Emergent Hardware Technologies

    • FPGA
    • GPU
    • SSD
    • Main-Memory Databases

Important Dates

Time Schedule
Submission (extended): March 6, 2017
Notification: March 20, 2017
Workshop: May 19, 2017

Diversity Considerations of the Program Committee

We have currently recruited 41 PC members and chairs listed below who are experts in the topics of interest of our workshop. The current PC members and chairs are selected from 18 nations all over the world as shown also by the map below. While most PC members are from academia, we have 5 experts also from industry (12%). 8 of the PC members and chairs are women (20%).


Program committee members and chairs: 1  8

Program Committee Chairs

Program Committee

  • Muhammad Intizar Ali, Insight, National University of Ireland, Galway
  • Carlos Buil Aranda, Universidad Técnica Federico Santa María, Chile
  • Mithun Balakrishna, Lymba Corporation, USA
  • Isabel Cruz, University of Illinois at Chicago, USA
  • Paulo Rupino da Cunha, University of Coimbra, Portugal
  • Melike Şah Direkoglu, Near East University, North Cyprus
  • Julian Dolby, IBM Research, USA
  • Vadim Ermolayev, Zaporizhzhya National University, Ukraine
  • Javier D. Fernández, Vienna University of Economics and Business, WU Vienna, Austria
  • Carlos Juiz García, Universitat de les Illes Balears, Spain
  • Katja Gilly de La Sierra-Llamazares, Miguel Hernandez University, Spain
  • Andreas Harth, Institute AIFB, Karlsruhe Institute of Technology (KIT), Germany
  • Ekaterini Ioannou, Technical University of Crete, Greece
  • Prudhvi Janga, University of Cincinnati and Amazon Web Services, USA
  • Ioannis Konstantinou, National Technical University of Athens, Greece
  • Nectarios Koziris, National Technical University of Athens, Greece
  • Herbert Kuchen, University of Münster, Germany
  • Wookey Lee, Inha University, Korea
  • Isaac Lera, Universitat de les Illes Balears, Spain
  • Xiang Lian, Kent State University, USA
  • Qing Liu, CSIRO, Australia
  • Nuno Lopes, TopQuadrant
  • Ioana Manolescu, INRIA and Ecole Polytechnique, France
  • Daniel Miranker, The University of Texas at Austin, USA
  • Grażyna Paliwoda-Pękosz, Cracow University of Economics, Poland
  • Nikolaos Papailiou, National Technical University of Athens, Greece
  • Alfredo Pulvirenti, University of Catania, Italy
  • Sherif Sakr, School of Computer Science and Engineering University of New South Wales, Australia
  • Stephan Seufert, Amazon Machine Learning (Industry), Germany
  • Omair Shafiq, Carleton University, Canada
  • Marta Tatu, Lymba Corporation, USA
  • Martin Theobald, University of Luxembourg, Luxembourg
  • Dimitrios Tsoumakos, Department of Informatics, Ionian University, Greece
  • Juergen Umbrich, Vienna University of Economics and Business, Vienna, Austria
  • Dongyan Zhao, Peking University Beijing, China
  • Xiang ZHAO, National University of Defense Technology, China
  • Weiguo Zheng, Chinese University of Hong Kong, China
  • Dimitrios Zissis, University of the Aegean, Greece
  • Lei Zou, Peking University, China

Evaluation of Papers

To verify the originality of submissions, we will use Plagiarism Detection Tools to check the content of the submitted manuscripts against previous publications.

Papers will be evaluated according to the following aspects:

  • Relevance to the Workshop
  • Novelty and practical impact
  • Technical soundness
  • Appropriateness and adequacy of:
    • Literature review
    • Background discussion
    • Analysis of issues
  • Presentation, including:
    • Overall organization and structure
    • Correctness of English language
    • Readability

Accepted Papers

The proceedings are available here in ACM DL.


Session 1

Time Type Description
9:00: keynote Martin Theobald (University of Luxembourg, Luxembourg):
Scalable RDF Data Management with a Touch of Uncertainty
Abstract: The keynote provides an overview of our recent research activities and also highlights a number of research challenges in the context of extracting, indexing and querying large collections of RDF data. A core part of our work focuses on handling uncertain facts obtained from various information-extraction techniques, where we aim to develop efficient algorithms for querying the resulting uncertain RDF knowledge base with the help of a probabilistic database. A second, very recent research focus lies in scaling out these approaches to a distributed setting. Here, we aim to process declarative queries, posed in either SQL or logical query languages such as Datalog, via a proprietary, asynchronous communication protocol based on the Message Passing Interface. Our current RDF engine, coined "TriAD", has proven to be one of the fastest engines over a number of RDF benchmarks with up to 1.8 billion triples.
Bio: Martin Theobald has been appointed as a Professor of Computer Science with a focus on "Big Data" by the University of Luxembourg in 2017. He previously held positions as a Professor and Co-Director of the Institute for Databases and Information Systems (DBIS) at the University of Ulm and as a Professor in the Advanced Database Research and Modeling (ADReM) group at the University of Antwerp. He obtained a doctoral degree from the Max Planck Institute for Informatics in Saarbrücken in 2006 and subsequently spent two years as a Postdoctoral Researcher at the Stanford University Infolab. Between 2008 and 2012, Martin led the research group for "Ranking and Uncertain Data Management" at the Max Planck Institute for Informatics. His current research interests are focused at the intersection of information extraction, probabilistic databases and distributed architectures. The "Big Data" group at the University of Luxembourg investigates the whole lifecycle of semantic-data management, beginning with the extraction of entities and relations from textual and semi-structured sources and on to data-cleaning aspects and probabilistic inference. Martin is an area editor for Elsevier’s "Information Systems" since 2013 and frequently serves as a reviewer and PC member of international journals and conferences such as CACM, TODS, TKDE, VLDB, SIGMOD, SIGIR, WSDM, CIKM and ICDE.
10:00: paper Daniel Janke, Steffen Staab, Matthias Thimm:
On Data Placement Strategies in Distributed RDF Stores
DOI: 10.1145/3066911.3066915
10:30: break Coffee Break

Session 2

Time Type Description
11:00: paper Thomas Hassan, Christophe Cruz, Aurélie Bertaux:
Ontology-based approach for unsupervised and adaptive focused crawling
DOI: 10.1145/3066911.3066912
11:30: paper Mayank Kejriwal, Pedro Szekely:
Supervised Typing of Big Graphs using Semantic Embeddings
DOI: 10.1145/3066911.3066918
Extended Version: URN: urn:nbn:de:101:1-2017100112160 URL: Publisher
12:00: paper Prashanti Manda, Todd J. Vision:
Evolution of anatomical concept usage over time: Mining 200 years of biodiversity literature
DOI: 10.1145/3066911.3066919
12:30: break Lunch Break

Session 3

Time Type Description
14:00: keynote Julian Dolby (IBM's Thomas J. Watson Research Center, U.S.A):
Toward Scalable Semantic Big Data
Abstract: SPARQL is the query language for RDF and linked data, and such data has been a focus of our work for quite a few years. In this talk, I shall start by summarizing some of our older work in the scalable semantics and reasoning space. The most basic is work scaling reasoning using refinement techniques. Built on that is work applying our reasoning to the medical domain, matching patients to clinical trials. Next, I shall discuss our work in scaling SPARQL queries in an RDF store. With this introduction, the main topic will be extending SPARQL to conveniently query across both RDF and non-RDF data. There are now standards to virtualize non-RDF datasets as RDF, such as R2RML, CSV2RDF and XSPARQL; thus SPARQL can be increasingly used to access RDF and non-RDF data. However, there are two chief shortcomings to using SPARQL in such contexts. First, SPARQL has no notion of modularity, and modularity is a key feature in assembling complex queries of the kind that are needed when one integrates very different datasets. Second, its support for query federation over different endpoints is limited: the endpoints all need to be SPARQL and the language does not allow for posting data to an endpoint. To rectify these shortcomings, we propose two simple extensions to the language to rectify these limitations: functions and generalized service. In designing these extensions, we were careful to keep the extensions minimal, to preserve SPARQL's declarative semantics. We define the semantics of each extension, and provide a open source reference implementation of this extended language, to provide processing over both relational and non-relational backends.
Bio: Julian Dolby has been a Research Staff Member at IBM's Thomas J. Watson Research Center since 2000. He works on a range of topics, including static program analysis, software testing and the Semantic Web. He was educated at the University of Wisconsin-Madison as an undergraduate, and at the University of Illinois at Urbana-Champaign as a graduate student where he worked with Professor Andrew Chien on programming systems for massively-parallel machines. His work has been included in various IBM products like Rational AppScan and in the RDF support in DB2.
15:00: paper Tien Duc Cao, Ioana Manolescu, Xavier Tannier:
Extracting Linked Data from statistic spreadsheets
DOI: 10.1145/3066911.3066914
15:30: break Coffee Break

Session 4

Time Type Description
16:00: paper Michael J. Lewis, George K. Thiruvathukal, Venkatram Vishwanath, Michael E. Papka, Andrew Johnson:
A Distributed Graph Approach For Pre-processing Linked RDF Data Using Supercomputers
DOI: 10.1145/3066911.3066913
16:30: paper Yogesh Pandey, Srividya K. Bansal:
Safety Check – A Semantic Web Application for Emergency Management
DOI: 10.1145/3066911.3066917
Extended Version: URL: Publisher
17:00: paper Michelle C. Krzyzanowski, Josh Levy, Grier P. Page, Nathan C. Gaddis, Robert F. Clark:
Using Semantic Web Technologies to Power LungMAP, a Molecular Data Repository
DOI: 10.1145/3066911.3066916
17:30: break End of Workshop

Manuscript Preparation

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Manuscripts should be submitted electronically as PDF files using this webpage and be formatted using the camera-ready templates in the ACM proceedings double-column format according to the "sigconf" proceedings template. Papers cannot exceed 6 pages in length.

Accepted papers will be published online in the ACM digital library. The papers must include the standard ACM copyright notice on the first page.

The pdf version of your paper should consider the following items:

  • The pdf be optimized for fast web viewing.

  • The pdf should apply the ACM Computing Classification categories and terms (CCS concepts). The ACM templates provide space for this indexing and please consider the Computing Classification Scheme.

  • The pdf should contain the keywords.

  • The pdf should have the rights management statement and bibliographic strip on the bottom of the first page left column.

  • Please start numbering your paper with page number 1.

  • The pdf should have Type 1 fonts (scalable), not Type 3 (bit-mapped). All fonts MUST be embedded within the PDF file (to be corrected in the source files before the PDF is generated according to ACM documentation).


The submission is currently closed. Please check our Important Dates page.

Contact Program Chairs

Please contact us for any further information:


Please use the following links for further information on the edition of the given year of the International Workshop on Semantic Big Data (SBD):