Big Data in Emergent Distributed Environments (BiDEDE 2022)

Workshop @ ACM SIGMOD 2022

Loading...

International Workshop on
Big Data in Emergent Distributed Environments (BiDEDE 2022)
Call for Papers: txtUTF-8 txtASCII pdf

The International Workshop on Big Data in Emergent Distributed Environments (BiDEDE 2022)

In conjunction with ACM SIGMOD 2022

Important Message on Covid-19:
We will operate the workshop as hybrid event, such that presenters and participants can choose to participate in-person in Philadelphia (USA) or remotely.

Important Message on Covid-19

We will operate the workshop as hybrid event, such that presenters and participants can choose to participate in-person in Philadelphia (USA) or remotely.

Aims of the Workshop

Today, new forms of distributed environments beyond Cloud Computing occur that offer new kinds of applications, but pose new challenges for data management. The recent efforts for serverless computing aim at simplifying the process of deploying code in the Cloud into production by hiding scaling, capacity planning and maintenance operations from the developer or operator. Other initiatives work on avoiding the communication to the Cloud by deploying and running environments for data processing near data sources in Internet-of-Things scenarios (e.g., fog and edge computing) for large-scale smart homes, companies and cities, and near the applications (e.g., Cloudlets for mobile applications and Offline First technologies for web applications).

Research on distributed data management evolves addressing new challenges specific to these new environments. Properties of emergent distributed environments regarding capabilities of nodes, bandwidth for communication, battery lifetime of nodes, reliability of nodes and communication, and heterogeneity of configurations impact data management mechanisms and approaches, such as those for fault tolerance, replication, resource provisioning, buffer management, query processing and optimization, and transaction management. In addition, federated approaches and polystores spanning over several emergent distributed environments are also remaining research challenges based on the need for combining these different distributed environments into one distributed runtime environment for easy handling of Big Data in different models and globally optimizing data management tasks across these different environments.

The goal of this workshop is to bring together academic researchers and industry practitioners to discuss the challenges and solutions, including new approaches, techniques and applications, that significantly would advance the state of the art of Big Data in emergent distributed environments.

Categories of Papers

The workshop solicits papers of different categories like:

  • Research Papers propose new approaches, theories or techniques related to Big Data in emergent distributed environments including new data structures, protocols and algorithms. They should make substantial theoretical and empirical contributions to the research field.

  • System Papers describe new data management tools, stream processing engines, databases and other systems, which are able to handle Big Data in emergent distributed environments.

  • Experiments and Analysis Papers focus on the experimental evaluation of existing approaches including data structures and algorithms for Big Data in emergent distributed environments and bring new insights through the analysis of these experiments. Results of Experiments and Analysis Papers can be, for example, showing benefits of well-known approaches in new settings and environments, opening new research problems by demonstrating unexpected behavior or phenomena, or comparing a set of traditional approaches in an experimental survey.

  • Application Papers report practical experiences on applications of Big Data in emergent distributed environments. Application Papers might describe how to apply technologies to specific application domains with big data demands in emergent distributed environments like social networks, web search, e-business, collaborative environments, e-learning, medical informatics, bioinformatics and geographic information system.

  • Vision Papers identify emerging new or future research issues and directions, and describe new research visions having demands for Big Data in emergent distributed environments. The new visions will potentially have great impacts on society.

  • Demo Papers deal with innovative systems and applications for Big Data in emergent distributed environments. These papers describe a showcase of the proposed system/application, but may also explain the novelty of the system's architecture. We are especially interested in demonstrations having a WOW-effect.

The length of papers must be within 4 pages to 6 pages. Accepted papers will be presented as oral presentations.

Topics of Interest

We are interested in all issues concerning the management of data to be processed in emergent distributed environments such as the following:

  • Cloud Computing

  • Serverless Computing

    • Cloud Functions
    • App Engines
    • Cloud Runs

  • Post-Cloud Computing

    • Cloudlet
    • Fog Computing
    • Edge Computing
    • Dew Computing
    • Offline First
    • Smart Home/Companies/Cities

The Data Management issues to be solved in the emergent distributed environments include, but are not limited to, the following:

  • Query Processing and Optimization
  • Transaction Management
  • Fault Tolerance Mechanisms
  • Cloud Data Warehouses
  • Distributed Databases
  • Federation/Polystore Architectures
  • Data Lakes
  • Artificial Intelligence in Big Data Environments
  • Interactive Data Analytics and Big Data Science

Important Dates

Time Schedule
Submission (extended): March 27, 2022
Notification: April 15, 2022
Workshop: June 12, 2022

Diversity Considerations of the Program Committee

We have currently recruited 27 PC members and chairs listed below who are experts in the topics of interest of our workshop. The current PC members and chairs are selected from 15 nations all over the world as shown also by the map below. While most PC members are from academia, we have 6 experts also from industry (22%). 7 of the PC members and chairs are women (26%).

Legend

Program committee members and chairs: 1  11

Program Committee Chairs

Steering Committee

Program Committee

  • Ahmed S. Abdelhamid, Purdue University, USA
  • Mithun Balakrishna, Lymba Corporation, USA
  • Brad Glasbergen, University of Waterloo, Canada
  • Jinghua Groppe, University of Lübeck, Germany
  • Ekaterini Ioannou, Tilburg University
  • Alekh Jindal, Keebo, USA
  • Ioannis Kontopoulos, Harokopio University of Athens, Greece
  • Xiang Lian, Kent State University, USA
  • Qing Liu, Data61, CSIRO, Australia
  • Renato Marroquín, Oracle
  • Grażyna Paliwoda-Pękosz, Cracow University of Economics, Poland
  • Alfredo Pulvirenti, University of Catania, Italy
  • Praveen Rao, University of Missouri-Columbia, USA
  • Arjun Satish, Confluent Inc., USA
  • Omair Shafiq, Carleton University, Canada
  • Katja Gilly de La Sierra-Llamazares, Miguel Hernandez University, Spain
  • Marta Tatu, Raytheon Technologies
  • Konstantinos Tserpes, Harokopio University of Athens, Greece
  • Xikui Wang, Google, USA
  • Benjamin Warnke, University of Lübeck, Germany
  • Robert Wrembel, Poznan University of Technology, Poland
  • Steffen Zeuch, Technische Universität Berlin, Germany
  • Xiang Zhao, National University of Defense Technology, China
  • Zhuoyue Zhao, University at Buffalo

Evaluation of Papers

To verify the originality of submissions, we will use Plagiarism Detection Tools to check the content of the submitted manuscripts against previous publications.

Papers will be evaluated according to the following aspects:

  • Relevance to the Workshop
  • Novelty and practical impact
  • Technical soundness
  • Appropriateness and adequacy of:
    • Literature review
    • Background discussion
    • Analysis of issues
  • Presentation, including:
    • Overall organization and structure
    • Correctness of English language
    • Readability

Accepted Papers

The proceedings are available here.
  • Maruth Goyal, Aditya Akella:
    Think Before You Shuffle: Data-Driven Shuffles for Geo-Distributed Analytics
    DOI: 10.1145/3530050.3532922
  • Chetan Phalak, Mayank Mishra, Shruti Kunde, Rekha Singhal, Sana Iqbal:
    Metamodel driven acceleration of actor-based simulation
    DOI: 10.1145/3530050.3532921
  • Patrick Hansert, Sebastian Michel:
    Ameliorating Data Compression and Query Performance through Cracked Parquet
    DOI: 10.1145/3530050.3532923
  • Ted Shaowang, Xi Liang, Sanjay Krishnan:
    Sensor Fusion on the Edge: Initial Experiments in the EdgeServe System
    DOI: 10.1145/3530050.3532924
  • Yuanli Wang, Baiqing Lyu, Vasiliki Kalavri:
    The Non-Expert Tax: Quantifying the cost of auto-scaling in Cloud-based data stream analytics
    DOI: 10.1145/3530050.3532925
  • Michal Bodziony, Rafal Morawski, Robert Wrembel:
    Evaluating push-down on NoSQL data sources
    DOI: 10.1145/3530050.3532916
  • Varad Pimpalkhute, Shruti Kunde, Rekha Singhal, Surya Palepu, Dheeraj Chahal, Amey Pandit:
    MetaFaaS: Learning to learn on serverless
    DOI: 10.1145/3530050.3532926
    Video
  • Benjamin Warnke, Johann Mantler, Sven Groppe, Yuri Cotrado Sehgelmeble, Stefan Fischer:
    A SPARQL Benchmark for Distributed Databases in IoT Environments
    DOI: 10.1145/3530050.3532929
    Video
  • Simon Paasche, Sven Groppe:
    Enhancing data quality and process optimization for smart manufacturing lines in industry 4.0 scenarios
    DOI: 10.1145/3530050.3532928
  • Thomas Bodner, Tobias Pietz, Lars Jonas Bollmeier, Daniel Ritter:
    Doppler: Understanding Serverless Query Execution
    DOI: 10.1145/3530050.3532919

Program

Keynote 1 and Paper

Time Type Description
8:30am (EDT)/2:30pm (CEST): keynote Rajkumar Buyya (Cloud Computing and Distributed Systems (CLOUDS) Lab, The University of Melbourne):
Neoteric Frontiers in Cloud and Edge Computing
Bio: Dr. Rajkumar Buyya is a Redmond Barry Distinguished Professor and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft, a spin-off company of the University, commercializing its innovations in Cloud Computing. He has authored over 850 publications and seven text books including "Mastering Cloud Computing" published by McGraw Hill, China Machine Press, and Morgan Kaufmann for Indian, Chinese and international markets respectively. Dr. Buyya is one of the highly cited authors in computer science and software engineering worldwide (h-index=152, g-index=332, and 122,300+ citations). Dr. Buyya is recognised as Web of Science "Highly Cited Researcher" for six consecutive years since 2016, IEEE Fellow, and Scopus Researcher of the Year 2017 with Excellence in Innovative Research Award by Elsevier. He has been recognised as the "Best of the World" twice for research fields (in Computing Systems in 2019 and Software Systems in 2021) as well as "Lifetime Achiever" and "Superstar of Research" in "Engineering and Computer Science" discipline twice (2019 and 2021) by the Australian Research Review. Recently, he received "Research Innovation Award" from IEEE Technical Committee on Services Computing and "Research Impact Award" from IEEE Technical Committee on Cloud Computing.
      Software technologies for Grid, Cloud, and Fog computing developed under Dr.Buyya's leadership have gained rapid acceptance and are in use at several academic institutions and commercial enterprises in 50+ countries around the world. Manjrasoft's Aneka Cloud technology developed under his leadership has received "Frost New Product Innovation Award". He served as founding Editor-in-Chief of the IEEE Transactions on Cloud Computing. He is currently serving as Editor-in-Chief of Software: Practice and Experience, a long standing journal in the field established 50+ years ago. For further information on Dr.Buyya, please visit his cyberhome: www.buyya.com
Abstract: Computing is being transformed to a model consisting of services that are delivered in a manner similar to utilities such as water, electricity, gas, and telephony. In such a model, users access services based on their requirements without regard to where the services are hosted or how they are delivered. Cloud computing paradigm has turned this vision of "computing utilities" into a reality. It offers infrastructure, platform, and software as services, which are made available to consumers as subscription-oriented services. Cloud application platforms need to offer (1) APIs and tools for rapid creation of elastic applications and (2) a runtime system for deployment of applications on geographically distributed computing infrastructure in a seamless manner.
      The Internet of Things (IoT) paradigm enables seamless integration of cyber-and-physical worlds and opening up opportunities for creating new class of applications for domains such as smart cities and smart healthcare. The emerging Fog/Edge computing paradigm is extends Cloud computing model to edge resources for latency sensitive IoT applications with a seamless integration of network-wide resources all the way from edge to the Cloud.
      This keynote presentation will cover (a) 21st century vision of computing and identifies various IT paradigms promising to deliver the vision of computing utilities; (b) innovative architecture for creating elastic Clouds integrating edge resources and managed Clouds, (c) Aneka 5G, a Cloud Application Platform, for rapid development of Cloud/Big Data applications and their deployment on private/public Clouds with resource provisioning driven by SLAs, (d) a novel FogBus software framework with Blockchain-based data-integrity management for facilitating end-to-end IoT-Fog/Edge-Cloud integration for execution of sensitive IoT applications, (e) experimental results on deploying Cloud and Big Data/ IoT applications in engineering, and health care (e.g., COVID-19), deep learning/Artificial intelligence (AI), satellite image processing, natural language processing (mining COVID-19 research literature for new insights) and smart cities on elastic Clouds; and (f) directions for delivering our 21st century vision along with pathways for future research in Cloud and Edge/Fog computing.
9:30am (EDT)/3:30pm (CEST): paper Varad Pimpalkhute, Shruti Kunde, Rekha Singhal, Surya Palepu, Dheeraj Chahal, Amey Pandit:
MetaFaaS: Learning to learn on serverless
DOI: 10.1145/3530050.3532926
Video
9:50am (EDT)/3:50pm (CEST): break Coffee Break

Paper Session 1

Time Type Description
11am (EDT)/5pm (CEST): paper Thomas Bodner, Tobias Pietz, Lars Jonas Bollmeier, Daniel Ritter:
Doppler: Understanding Serverless Query Execution
DOI: 10.1145/3530050.3532919
11:20am (EDT)/5:20pm (CEST): paper Chetan Phalak, Mayank Mishra, Shruti Kunde, Rekha Singhal, Sana Iqbal:
Metamodel driven acceleration of actor-based simulation
DOI: 10.1145/3530050.3532921
11:40am (EDT)/5:40pm (CEST): paper Michal Bodziony, Rafal Morawski, Robert Wrembel:
Evaluating push-down on NoSQL data sources
DOI: 10.1145/3530050.3532916
12am (EDT)/6pm (CEST): paper Benjamin Warnke, Johann Mantler, Sven Groppe, Yuri Cotrado Sehgelmeble, Stefan Fischer:
A SPARQL Benchmark for Distributed Databases in IoT Environments
DOI: 10.1145/3530050.3532929
Video
12:20am (EDT)/6:20pm (CEST): lunch Lunch Break

Keynote 2 and Paper

Time Type Description
1:30pm (EDT)/7:30pm (CEST): keynote Volker Markl (Database Systems and Information Management (DIMA) Group, Technische Universität Berlin):
NebulaStream: Data Management for the Internet of Things
Bio:

Copyrights: TUB/Phil Dera
Volker Markl is a German Professor of Computer Science. He leads the Chair of Database Systems and Information Management at TU Berlin and the Intelligent Analytics for Massive Data Research Department at DFKI. In addition, he is Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is a database systems researcher, conducting research at the intersection of distributed systems, scalable data processing, and machine learning. Volker led the Stratosphere project, which resulted in the creation of Apache Flink. Volker has received numerous honors and prestigious awards, including two ACM SIGMOD Research Highlight Awards and best paper awards at ACM SIGMOD, VLDB, ICDE, and EDBT. He was recognized as ACM Fellow for his contributions to query optimization, scalable data processing, and data programmability. He is a member of the Berlin -Brandenburg Academy of Sciences. In 2014, he was elected one of Germany's leading "Digital Minds" (Digitale Köpfe) by the German Informatics Society. He also is a member of the Berlin - Brandenburg Academy of Sciences and serves as advisor to academic institutions, governmental organizations, and technology companies. Volker holds eighteen patents and has been co -founder and mentor to several startups.
Abstract: The global database research community has greatly impacted the functionality and performance of data storage and processing systems along the dimensions that define "big data", i.e., volume, velocity, variety, and veracity. Although much progress has been made, when looking at the overall big data stack, a major challenge for database research community still remains. That is, how to maintain the ease-of-use despite the increasing heterogeneity and complexity of data analytics, involving a complex environment that stresses various aspects of an end-to-end data analytics pipeline. In particular, to operate in a massively distributed environment, processing thousands of concurrent, continuous queries on millions of streaming data sources. At TU Berlin, DFKI, and the Berlin Institute for Foundations of Learning and Data (BIFOLD) we currently aim to advance research in this field via the NebulaStream project. Our goal is to remedy some of the heterogeneity challenges that hamper developer productivity and limit the use of data science technologies to just the privileged few, who are coveted experts. In this talk, we will outline how state-of-the-art SPEs have to change to exploit the new capabilities of the IoT and showcase how we tackle specific challenges. We will present our vision for the NebulaStream system, provide an overview of its architecture, and discuss several of our key research challenges. Furthermore, we will present its current status as well as our steps towards establishing a thriving open-source community around the system.
Video
2:30pm (EDT)/8:30pm (CEST): paper Patrick Hansert, Sebastian Michel:
Ameliorating Data Compression and Query Performance through Cracked Parquet
DOI: 10.1145/3530050.3532923
2:50pm (EDT)/8:50pm (CEST): break Coffee Break

Paper Session 2

Time Type Description
3:30pm (EDT)/9:30pm (CEST): paper Yuanli Wang, Baiqing Lyu, Vasiliki Kalavri:
The Non-Expert Tax: Quantifying the cost of auto-scaling in Cloud-based data stream analytics
DOI: 10.1145/3530050.3532925
3:50pm (EDT)/9:50pm (CEST): paper Ted Shaowang, Xi Liang, Sanjay Krishnan:
Sensor Fusion on the Edge: Initial Experiments in the EdgeServe System
DOI: 10.1145/3530050.3532924
4:10pm (EDT)/10:10pm (CEST): paper Simon Paasche, Sven Groppe:
Enhancing data quality and process optimization for smart manufacturing lines in industry 4.0 scenarios
DOI: 10.1145/3530050.3532928
4:30pm (EDT)/10:30pm (CEST): paper Maruth Goyal, Aditya Akella:
Think Before You Shuffle: Data-Driven Shuffles for Geo-Distributed Analytics
DOI: 10.1145/3530050.3532922
4:50pm (EDT)/10:50pm (CEST): break End of Workshop

Manuscript Preparation

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Manuscripts should be submitted electronically as PDF files using this webpage and be formatted using the camera-ready templates in the ACM proceedings double-column format according to the "sigconf" proceedings template. Papers cannot exceed 6 pages in length.

Accepted papers will be published online in the ACM digital library. The papers must include the standard ACM copyright notice on the first page.

The pdf version of your paper should consider the following items:

  • The pdf be optimized for fast web viewing.

  • The pdf should apply the ACM Computing Classification categories and terms (CCS concepts). The ACM templates provide space for this indexing and please consider the Computing Classification Scheme.

  • The pdf should contain the keywords.

  • The pdf should have the rights management statement and bibliographic strip on the bottom of the first page left column.

  • Please start numbering your paper with page number 1.

  • The pdf should have Type 1 fonts (scalable), not Type 3 (bit-mapped). All fonts MUST be embedded within the PDF file (to be corrected in the source files before the PDF is generated according to ACM documentation).

Submission to International Workshop on Big Data in Emergent Distributed Environments (BiDEDE 2022)

Please submit your manuscript by carefully filling in the information in the following web form. If there are technical problems, you may also submit your manuscript by sending the information and the manuscript to .

Title

Please specify the title of your paper here:

Authors

Please provide necessary information about the authors of your submission here. Please mark the contact authors, which will be contacted for the main correspondence.

Author 1:


Name:
EMail:
Affiliation:
Webpage (optional):

Author 2:


Name:
EMail:
Affiliation:
Webpage (optional):

Author 3:


Name:
EMail:
Affiliation:
Webpage (optional):

Add Author

Conflicts of Interest

Please specify any conflicts of interests here. Conflicts of interest occur e.g. if the author and the reviewer are collegues, work or worked closely together, or are relatives.

Paper upload

Please choose your manuscript file for uploading. It should be a pdf file. Please take care that your manuscript is formatted according to the templates provided by ACM. Manuscripts not formatted according to the ACM templates will be rejected without review!

If you wish that the reviewers are not aware of your name, please submit a blinded manuscript leaving out identifiable information like authors' names and affiliations.

Choose PDF file...

Chosen PDF file: none

Captcha

Please fill in the characters of the image into the text field under the image.

Captcha

Submission

Please check all information about your manuscript above. For submission please press the SUBMIT button below:

Contact Program Chairs

Please contact us for any further information:

Editions

Please use the following links for further information on the edition of the given year of the International Workshop on Big Data in Emergent Distributed Environments (BiDEDE):