Big Data in Emergent Distributed Environments-Workshop: HOME

The International Workshop on Big Data in Emergent Distributed Environments (BiDEDE 2021)

In conjunction with ACM SIGMOD 2021

Program Committee

Our members of the Program Committee are coming from all around the world!

More...

Big Data in Emergent Distributed Environments

Do you want to know why we believe that Big Data in Emergent Distributed Environments is a hot topic in research?

More...

Questions

If you have any questions, please do not hesitate to contact the workshop chairs!

More...

Types of Papers

We accept six types of papers

Research Papers
System Papers
Experiments and Analysis Papers
Application Papers
Vision Papers
Demo Papers

More...

Evaluation Criteria

We evaluate our submissions according to a set of criteria...

More...

Topics of Interest

We are interested in submissions in any topic related to Big Data in Emergent Distributed Environments...

More...

This workshop is organized in cooperation with the International Federation for Information Processing (IFIP) Working Group WG2.6 Database.

Aims of the Workshop

Today, new forms of distributed environments beyond Cloud Computing occur that offer new kinds of applications, but pose new challenges for data management. The recent efforts for serverless computing aim at simplifying the process of deploying code in the Cloud into production by hiding scaling, capacity planning and maintenance operations from the developer or operator. Other initiatives work on avoiding the communication to the Cloud by deploying and running environments for data processing near data sources in Internet-of-Things scenarios (e.g., fog and edge computing) for large-scale smart homes, companies and cities, and near the applications (e.g., Cloudlets for mobile applications and Offline First technologies for web applications).

Research on distributed data management evolves addressing new challenges specific to these new environments. Properties of emergent distributed environments regarding capabilities of nodes, bandwidth for communication, battery lifetime of nodes, reliability of nodes and communication, and heterogeneity of configurations impact data management mechanisms and approaches, such as those for fault tolerance, replication, resource provisioning, buffer management, query processing and optimization, and transaction management. In addition, federated approaches and polystores spanning over several emergent distributed environments are also remaining research challenges based on the need for combining these different distributed environments into one distributed runtime environment for easy handling of Big Data in different models and globally optimizing data management tasks across these different environments.

The goal of this workshop is to bring together academic researchers and industry practitioners to address the challenges and report and exchange the research findings in Big Data in emergent distributed environments, including new approaches, techniques and applications, make substantial theoretical and empirical contributions to, and significantly advance the state of the art of Big Data in emergent distributed environments.

Categories of Papers

The workshop solicits papers of different categories:

Research Papers propose new approaches, theories or techniques related to Big Data in emergent distributed environments including new data structures, protocols and algorithms. They should make substantial theoretical and empirical contributions to the research field.
System Papers describe new data management tools, stream processing engines, databases and other systems, which are able to handle Big Data in emergent distributed environments.
Experiments and Analysis Papers focus on the experimental evaluation of existing approaches including data structures and algorithms for Big Data in emergent distributed environments and bring new insights through the analysis of these experiments. Results of Experiments and Analysis Papers can be, for example, showing benefits of well-known approaches in new settings and environments, opening new research problems by demonstrating unexpected behavior or phenomena, or comparing a set of traditional approaches in an experimental survey.
Application Papers report practical experiences on applications of Big Data in emergent distributed environments. Application Papers might describe how to apply technologies to specific application domains with big data demands in emergent distributed environments like social networks, web search, e-business, collaborative environments, e-learning, medical informatics, bioinformatics and geographic information system.
Vision Papers identify emerging new or future research issues and directions, and describe new research visions having demands for Big Data in emergent distributed environments. The new visions will potentially have great impacts on society.
Demo Papers deal with innovative systems and applications for Big Data in emergent distributed environments. These papers describe a showcase of the proposed system/application, but may also explain the novelty of the system's architecture. We are especially interested in demonstrations having a WOW-effect.

The length of papers must be within 4 pages to 6 pages. Accepted papers will be presented as oral presentations.

Topics of Interest

We are interested in all issues concerning the management of data to be processed in emergent distributed environments such as the following:

Cloud Computing
Serverless Computing
- Cloud Functions
- App Engines
- Cloud Runs
Post-Cloud Computing
- Cloudlet
- Fog Computing
- Edge Computing
- Dew Computing
- Offline First
- Smart Home/Companies/Cities

The Data Management issues to be solved in the emergent distributed environments include, but are not limited to, the following:

Query Processing and Optimization
Transaction Management
Fault Tolerance Mechanisms
Cloud Data Warehouses
Distributed Databases
Federation/Polystore Architectures
Data Lakes
Artificial Intelligence in Big Data Environments
Interactive Data Analytics and Big Data Science

Important Dates

Time Schedule
Submission (extended):	March 18, 2021
Notification:	April 15, 2021
Workshop:	June 20, 2021

Diversity Considerations of the Program Committee

We have currently recruited 33 PC members and chairs listed below who are experts in the topics of interest of our workshop. The current PC members and chairs are selected from 14 nations all over the world as shown also by the map below. While most PC members are from academia, we have 7 experts also from industry (21%). 7 of the PC members and chairs are women (21%).

Legend

Program committee members and chairs: 1 15

Program Committee Chairs

Sven Groppe, University of Lübeck, Germany
Le Gruenwald, University of Oklahoma, USA
Ching-Hsien Hsu, Asia University, Taiwan

Steering Committee

Nik Bessis, Edge Hill University, U.K.
Pedro Garcia Lopez, Universitat Rovira i Virgili, Spain
Claudio Agostino Ardagna, Universita' degli Studi di Milano, Italy
Schahram Dustdar, TU Wien, Austria
Konstantinos Karanasos, Microsoft, USA

Program Committee

Ahmed S. Abdelhamid, Purdue University, USA
Ehab Abdelhamid, Datometry, Inc., USA
Mithun Balakrishna, Lymba Corporation, USA
Brad Glasbergen, University of Waterloo, USA
Jinghua Groppe, University of Lübeck, Germany
Ekaterini Ioannou, Tilburg University
Alekh Jindal, Microsoft, USA
Ioannis Kontopoulos, Harokopio University of Athens, Greece
Isaac Lera, Universitat de les Illes Balears, Spain
Xiang Lian, Kent State University, USA
Qing Liu, Data61, CSIRO, Australia
Renato Marroquín, Oracle
Gourab Mitra, Datometry, Inc., USA
Ingo Müller, ETH Zurich, Switzerland
Grażyna Paliwoda-Pękosz, Cracow University of Economics, Poland
Alfredo Pulvirenti, University of Catania, Italy
Praveen Rao, University of Missouri-Columbia, USA
Arjun Satish, Confluent Inc., USA
Omair Shafiq, Carleton University, Canada
Katja Gilly de La Sierra-Llamazares, Miguel Hernandez University, Spain
Marta Tatu, Lymba Corporation, USA
Konstantinos Tserpes, Harokopio University of Athens, Greece
Xikui Wang, University of California Irvine, USA
Benjamin Warnke, University of Lübeck, Germany
Robert Wrembel, Poznan University of Technology, Poland
Chenggang Wu, UC Berkeley, USA
Steffen Zeuch, Technische Universität Berlin, Germany
Yi Zhang, University of Pennsylvania, USA
Xiang Zhao, National University of Defense Technology, China
Zhuoyue Zhao, University of Utah, USA

Evaluation of Papers

To verify the originality of submissions, we will use Plagiarism Detection Tools to check the content of the submitted manuscripts against previous publications.

Papers will be evaluated according to the following aspects:

Relevance to the Workshop
Novelty and practical impact
Technical soundness
Appropriateness and adequacy of:
- Literature review
- Background discussion
- Analysis of issues
Presentation, including:
- Overall organization and structure
- Correctness of English language
- Readability

Accepted Papers

The proceedings are available here.

Guanjin Qu, Huaming Wu, Naichuan Cui:
Joint Blockchain and Federated Learning-based Offloading in Harsh Edge Computing Environments
DOI: 10.1145/3460866.3461765
Shruti Kunde, Amey Pandit, Mayank Mishra, Rekha Singhal:
Distributed training for accelerating metalearning algorithms
DOI: 10.1145/3460866.3461773
Maximilian Böther, Tilmann Rabl:
Scale-Down Experiments on TPCx-HS
DOI: 10.1145/3460866.3461774
Qifan Deng, Mohammad Goudarzi, Rajkumar Buyya:
FogBus2: A Lightweight and Distributed Container-based Framework for Integration of IoT-enabled Systems with Edge and Cloud Computing
DOI: 10.1145/3460866.3461768
Alina Nesen, Bharat Bhargava:
Situational Awareness with Multimodal Streaming Data Fusion: Serverless Computing Approach
DOI: 10.1145/3460866.3461769
Servio Palacios, Drew Zabrocki, Bharat Bhargava, Vaneet Aggarwal:
Auditable Serverless Computing for Farm Management
DOI: 10.1145/3460866.3461770
Michal Bodziony, Hubert Krzyzanowski, Lukasz Pieta, Robert Wrembel:
On Discovering Semantics of User-Defined Functions in Data Processing Workflows
DOI: 10.1145/3460866.3461771
Christophe Cerin, Frédéric Andres, Danielle Geldwerth-Feniger:
Towards an Emulation Tool based on Ontologies and Data Life Cycles for Studying Smart Buildings
DOI: 10.1145/3460866.3461772

Program

We stream our workshop due to COVID-19, Times are according to Beijing time, i.e. CST, and in brackets US and European time zones (ET/PT/CEST). We try to avoid local night times for our presenters. For video conference and streaming links see http://2021.sigmod.org/program/program_overview.shtml
Session 1
Time	Type	Description
CST:20:00 (ET:8:00/PT:5:00/CEST:14:00):	paper	Qifan Deng, Mohammad Goudarzi, Rajkumar Buyya: FogBus2: A Lightweight and Distributed Container-based Framework for Integration of IoT-enabled Systems with Edge and Cloud Computing DOI: 10.1145/3460866.3461768
CST:20:20 (ET:8:20/PT:5:20/CEST:14:20):	paper	Guanjin Qu, Huaming Wu, Naichuan Cui: Joint Blockchain and Federated Learning-based Offloading in Harsh Edge Computing Environments DOI: 10.1145/3460866.3461765
CST:20:40 (ET:8:40/PT:5:40/CEST:14:40):	paper	Shruti Kunde, Amey Pandit, Mayank Mishra, Rekha Singhal: Distributed training for accelerating metalearning algorithms DOI: 10.1145/3460866.3461773
CST:21:00 (ET:9:00/PT:6:00/CEST:15:00):	paper	Maximilian Böther, Tilmann Rabl: Scale-Down Experiments on TPCx-HS DOI: 10.1145/3460866.3461774
CST:21:20 (ET:9:20/PT:6:20/CEST:15:20):	break	Coffee Break
Keynote 1
Time	Type	Description
CST:22:30 (ET:10:30/PT:7:30/CEST:16:30):	keynote	Konstantinos Karanasos (Microsoft's Gray Systems Lab (GSL)): Enterprise-Grade Machine Learning in Azure Data Bio: Konstantinos Karanasos is a Principal Scientist Lead at Microsoft's Gray Systems Lab (GSL), Azure Data's applied research group. He is the manager of the Bay Area branch of GSL and the tech lead for several systems-for-ML efforts within the group. Konstantinos' work at Microsoft previously focused on resource management for the company's production analytics clusters. This work was deployed in over 300K machines across Microsoft and was key to enable the company to operate the world’s largest YARN clusters. He has also contributed big part of his work at Microsoft to open source projects: he is a committer and member of the Project Management Committee (PMC) of Apache Hadoop, and a contributor to ONNX Runtime. Before joining Microsoft, he was a postdoctoral researcher at IBM Almaden Research Center. Konstantinos holds a PhD from Inria, France, and a Diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece. For more details, visit https://www.microsoft.com/en-us/research/people/kokarana/. Abstract: Machine learning (ML) is being widely adopted in the enterprise and is on track to revolutionize every industry, including healthcare, manufacturing, image and speech recognition, and autonomous vehicle management, just to name a few. Enterprise-Grade ML (EGML) is a complex endeavor that involves several personas (data scientists, data/business analysts, software engineers), processes (model training, model scoring, data governance), and systems (ML runtimes, data engines, model management/deployment systems). High-value enterprise data, typically stored in relational databases, data warehouses, or data lakes, lie in the heart of EGML. In this talk, I will discuss various efforts across Microsoft's Azure Data org to improve the customer experience, governance, and performance of EGML applications. Several of these efforts started as research projects and are finding their way to production. Specifically, I will discuss model scoring within various data engines with a goal to provide a unified experience for publishing and consuming models in a variety of form factors, be it on Azure, multi-cloud, on the edge, or on premises. Then I will describe our work on the end-to-end optimization of ML pipelines through novel transformations and hardware acceleration. Finally, I will present our efforts on provenance for data science applications.
CST:23:30 (ET:11:30/PT:8:30/CEST:17:30):	break	Coffee Break
Session 2
Time	Type	Description
CST:0:00+1d (ET:12:00/PT:9:00/CEST:18:00):	paper	Alina Nesen, Bharat Bhargava: Situational Awareness with Multimodal Streaming Data Fusion: Serverless Computing Approach DOI: 10.1145/3460866.3461769
CST:0:20+1d (ET:12:20/PT:9:20/CEST:18:20):	paper	Servio Palacios, Drew Zabrocki, Bharat Bhargava, Vaneet Aggarwal: Auditable Serverless Computing for Farm Management DOI: 10.1145/3460866.3461770
CST:0:40+1d (ET:12:40/PT:9:40/CEST:18:40):	paper	Michal Bodziony, Hubert Krzyzanowski, Lukasz Pieta, Robert Wrembel: On Discovering Semantics of User-Defined Functions in Data Processing Workflows DOI: 10.1145/3460866.3461771
CST:1:00+1d (ET:13:00/PT:10:00/CEST:19:00):	paper	Christophe Cerin, Frédéric Andres, Danielle Geldwerth-Feniger: Towards an Emulation Tool based on Ontologies and Data Life Cycles for Studying Smart Buildings DOI: 10.1145/3460866.3461772
CST:1:20+1d (ET:13:20/PT:10:20/CEST:19:20):	break	Coffee Break
Keynote 2
Time	Type	Description
CST:2:00+1d (ET:14:00/PT:11:00/CEST:20:00):	keynote	Alvin Cheung (UC Berkeley): A PACTful Agenda for Cloud Programming Research Bio: Alvin Cheung is an assistant professor in UC Berkeley's EECS Dept. His research focuses on designing new techniques to solve data systems problems. Alvin's research has been recognized through multiple early career awards such as the US Presidential Early Career Award for Scientists and Engineers, the Sloan Fellowship, along with a number of best paper and demo awards. For more details, visit https://people.eecs.berkeley.edu/~akcheung. Abstract: We have witnessed two decades of cloud computing research. Yet, programming the cloud remains a tedious task for both the application and cloud infrastructure developers: application developers need to consider various cloud deployment aspects as they write code, while infrastructure developers must determine how to execute and optimize code that mingles application semantics, fault tolerance, and hardware constraints, etc all into a single program. In this talk, I will describe our new Hydro project for next generation cloud programming research. The key ideas behind Hydro are: 1) exposing aspects such as how to scale and budgetary constraints programmatically for application developers to manipulate, 2) partitioning the user application into four facets called PACT: Program semantics, Availablity, Consistency and Targets of optimization, and 3) using program synthesis and other learning-based techniques to drive PACT program compilation rather than a static pattern-matching approach targeted for a specific backend. I will explain these ideas and describe the early progress we have made thus far.
CST:3:00+1d (ET:15:00/PT:12:00/CEST:21:00):	break	End of Workshop

Manuscript Preparation

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Manuscripts should be submitted electronically as PDF files using this webpage and be formatted using the camera-ready templates in the ACM proceedings double-column format according to the "sigconf" proceedings template. Papers cannot exceed 6 pages in length.

Accepted papers will be published online in the ACM digital library. The papers must include the standard ACM copyright notice on the first page.

The pdf version of your paper should consider the following items:

The pdf be optimized for fast web viewing.
The pdf should apply the ACM Computing Classification categories and terms (CCS concepts). The ACM templates provide space for this indexing and please consider the Computing Classification Scheme.
The pdf should contain the keywords.
The pdf should have the rights management statement and bibliographic strip on the bottom of the first page left column.
Please start numbering your paper with page number 1.
The pdf should have Type 1 fonts (scalable), not Type 3 (bit-mapped). All fonts MUST be embedded within the PDF file (to be corrected in the source files before the PDF is generated according to ACM documentation).

Submission

The submission is currently closed. Please check our Important Dates page.

Contact Program Chairs

Please contact us for any further information:

Editions

Please use the following links for further information on the edition of the given year of the International Workshop on Big Data in Emergent Distributed Environments (BiDEDE):

2021
2022
2023
2024

The International Workshop on Big Data in Emergent Distributed Environments (BiDEDE 2021)

Program Committee

Big Data in Emergent Distributed Environments

Questions

Types of Papers

Evaluation Criteria

Topics of Interest

Aims of the Workshop

Categories of Papers

Topics of Interest

Important Dates

Diversity Considerations of the Program Committee

Legend

Program Committee Chairs

Steering Committee

Program Committee

Evaluation of Papers

Accepted Papers

Program

We stream our workshop due to COVID-19, Times are according to Beijing time, i.e. CST, and in brackets US and European time zones (ET/PT/CEST). We try to avoid local night times for our presenters. For video conference and streaming links see http://2021.sigmod.org/program/program_overview.shtml

Session 1

Keynote 1

Session 2

Keynote 2

Manuscript Preparation

Submission

Contact Program Chairs

Editions