Big Data in Emergent Distributed Environments (BiDEDE 2021)

Workshop @ ACM SIGMOD 2021

Loading...

International Workshop on
Big Data in Emergent Distributed Environments (BiDEDE 2021)
in conjunction with the 2021 ACM SIGMOD Conference (online)
Call for Papers: txtUTF-8 txtASCII pdf

The International Workshop on Big Data in Emergent Distributed Environments (BiDEDE 2021)

In conjunction with ACM SIGMOD 2021

Aims of the Workshop

Today, new forms of distributed environments beyond Cloud Computing occur that offer new kinds of applications, but pose new challenges for data management. The recent efforts for serverless computing aim at simplifying the process of deploying code in the Cloud into production by hiding scaling, capacity planning and maintenance operations from the developer or operator. Other initiatives work on avoiding the communication to the Cloud by deploying and running environments for data processing near data sources in Internet-of-Things scenarios (e.g., fog and edge computing) for large-scale smart homes, companies and cities, and near the applications (e.g., Cloudlets for mobile applications and Offline First technologies for web applications).

Research on distributed data management evolves addressing new challenges specific to these new environments. Properties of emergent distributed environments regarding capabilities of nodes, bandwidth for communication, battery lifetime of nodes, reliability of nodes and communication, and heterogeneity of configurations impact data management mechanisms and approaches, such as those for fault tolerance, replication, resource provisioning, buffer management, query processing and optimization, and transaction management. In addition, federated approaches and polystores spanning over several emergent distributed environments are also remaining research challenges based on the need for combining these different distributed environments into one distributed runtime environment for easy handling of Big Data in different models and globally optimizing data management tasks across these different environments.

The goal of this workshop is to bring together academic researchers and industry practitioners to address the challenges and report and exchange the research findings in Big Data in emergent distributed environments, including new approaches, techniques and applications, make substantial theoretical and empirical contributions to, and significantly advance the state of the art of Big Data in emergent distributed environments.

Categories of Papers

The workshop solicits papers of different categories:

  • Research Papers propose new approaches, theories or techniques related to Big Data in emergent distributed environments including new data structures, protocols and algorithms. They should make substantial theoretical and empirical contributions to the research field.

  • System Papers describe new data management tools, stream processing engines, databases and other systems, which are able to handle Big Data in emergent distributed environments.

  • Experiments and Analysis Papers focus on the experimental evaluation of existing approaches including data structures and algorithms for Big Data in emergent distributed environments and bring new insights through the analysis of these experiments. Results of Experiments and Analysis Papers can be, for example, showing benefits of well-known approaches in new settings and environments, opening new research problems by demonstrating unexpected behavior or phenomena, or comparing a set of traditional approaches in an experimental survey.

  • Application Papers report practical experiences on applications of Big Data in emergent distributed environments. Application Papers might describe how to apply technologies to specific application domains with big data demands in emergent distributed environments like social networks, web search, e-business, collaborative environments, e-learning, medical informatics, bioinformatics and geographic information system.

  • Vision Papers identify emerging new or future research issues and directions, and describe new research visions having demands for Big Data in emergent distributed environments. The new visions will potentially have great impacts on society.

  • Demo Papers deal with innovative systems and applications for Big Data in emergent distributed environments. These papers describe a showcase of the proposed system/application, but may also explain the novelty of the system's architecture. We are especially interested in demonstrations having a WOW-effect.

The length of papers must be within 4 pages to 6 pages. Accepted papers will be presented as oral presentations.

Topics of Interest

We are interested in all issues concerning the management of data to be processed in emergent distributed environments such as the following:

  • Cloud Computing

  • Serverless Computing

    • Cloud Functions
    • App Engines
    • Cloud Runs

  • Post-Cloud Computing

    • Cloudlet
    • Fog Computing
    • Edge Computing
    • Dew Computing
    • Offline First
    • Smart Home/Companies/Cities

The Data Management issues to be solved in the emergent distributed environments include, but are not limited to, the following:

  • Query Processing and Optimization
  • Transaction Management
  • Fault Tolerance Mechanisms
  • Cloud Data Warehouses
  • Distributed Databases
  • Federation/Polystore Architectures
  • Data Lakes
  • Artificial Intelligence in Big Data Environments
  • Interactive Data Analytics and Big Data Science

Important Dates

Time Schedule
Submission (extended): March 18, 2021
Notification: April 15, 2021
Workshop: June 20, 2021

Diversity Considerations of the Program Committee

We have currently recruited 33 PC members and chairs listed below who are experts in the topics of interest of our workshop. The current PC members and chairs are selected from 14 nations all over the world as shown also by the map below. While most PC members are from academia, we have 7 experts also from industry (21%). 7 of the PC members and chairs are women (21%).

Legend

Program committee members and chairs: 1  15

Program Committee Chairs

Steering Committee

Program Committee

  • Ahmed S. Abdelhamid, Purdue University, USA
  • Ehab Abdelhamid, Datometry, Inc., USA
  • Mithun Balakrishna, Lymba Corporation, USA
  • Brad Glasbergen, University of Waterloo, USA
  • Jinghua Groppe, University of Lübeck, Germany
  • Ekaterini Ioannou, Tilburg University
  • Alekh Jindal, Microsoft, USA
  • Ioannis Kontopoulos, Harokopio University of Athens, Greece
  • Isaac Lera, Universitat de les Illes Balears, Spain
  • Xiang Lian, Kent State University, USA
  • Qing Liu, Data61, CSIRO, Australia
  • Renato Marroquín, Oracle
  • Gourab Mitra, Datometry, Inc., USA
  • Ingo Müller, ETH Zurich, Switzerland
  • Grażyna Paliwoda-Pękosz, Cracow University of Economics, Poland
  • Alfredo Pulvirenti, University of Catania, Italy
  • Praveen Rao, University of Missouri-Columbia, USA
  • Arjun Satish, Confluent Inc., USA
  • Omair Shafiq, Carleton University, Canada
  • Katja Gilly de La Sierra-Llamazares, Miguel Hernandez University, Spain
  • Marta Tatu, Lymba Corporation, USA
  • Konstantinos Tserpes, Harokopio University of Athens, Greece
  • Xikui Wang, University of California Irvine, USA
  • Benjamin Warnke, University of Lübeck, Germany
  • Robert Wrembel, Poznan University of Technology, Poland
  • Chenggang Wu, UC Berkeley, USA
  • Steffen Zeuch, Technische Universität Berlin, Germany
  • Yi Zhang, University of Pennsylvania, USA
  • Xiang Zhao, National University of Defense Technology, China
  • Zhuoyue Zhao, University of Utah, USA

Evaluation of Papers

To verify the originality of submissions, we will use Plagiarism Detection Tools to check the content of the submitted manuscripts against previous publications.

Papers will be evaluated according to the following aspects:

  • Relevance to the Workshop
  • Novelty and practical impact
  • Technical soundness
  • Appropriateness and adequacy of:
    • Literature review
    • Background discussion
    • Analysis of issues
  • Presentation, including:
    • Overall organization and structure
    • Correctness of English language
    • Readability

Accepted Papers

The proceedings are available here.
  • Guanjin Qu, Huaming Wu, Naichuan Cui:
    Joint Blockchain and Federated Learning-based Offloading in Harsh Edge Computing Environments
    DOI: 10.1145/3460866.3461765
  • Shruti Kunde, Amey Pandit, Mayank Mishra, Rekha Singhal:
    Distributed training for accelerating metalearning algorithms
    DOI: 10.1145/3460866.3461773
  • Maximilian Böther, Tilmann Rabl:
    Scale-Down Experiments on TPCx-HS
    DOI: 10.1145/3460866.3461774
  • Qifan Deng, Mohammad Goudarzi, Rajkumar Buyya:
    FogBus2: A Lightweight and Distributed Container-based Framework for Integration of IoT-enabled Systems with Edge and Cloud Computing
    DOI: 10.1145/3460866.3461768
  • Alina Nesen, Bharat Bhargava:
    Situational Awareness with Multimodal Streaming Data Fusion: Serverless Computing Approach
    DOI: 10.1145/3460866.3461769
  • Servio Palacios, Drew Zabrocki, Bharat Bhargava, Vaneet Aggarwal:
    Auditable Serverless Computing for Farm Management
    DOI: 10.1145/3460866.3461770
  • Michal Bodziony, Hubert Krzyzanowski, Lukasz Pieta, Robert Wrembel:
    On Discovering Semantics of User-Defined Functions in Data Processing Workflows
    DOI: 10.1145/3460866.3461771
  • Christophe Cerin, Frédéric Andres, Danielle Geldwerth-Feniger:
    Towards an Emulation Tool based on Ontologies and Data Life Cycles for Studying Smart Buildings
    DOI: 10.1145/3460866.3461772

Program

We stream our workshop due to COVID-19, Times are according to Beijing time, i.e. CST, and in brackets US and European time zones (ET/PT/CEST). We try to avoid local night times for our presenters. For video conference and streaming links see http://2021.sigmod.org/program/program_overview.shtml

Session 1

Time Type Description
CST:20:00 (ET:8:00/PT:5:00/CEST:14:00): paper Qifan Deng, Mohammad Goudarzi, Rajkumar Buyya:
FogBus2: A Lightweight and Distributed Container-based Framework for Integration of IoT-enabled Systems with Edge and Cloud Computing
DOI: 10.1145/3460866.3461768
CST:20:20 (ET:8:20/PT:5:20/CEST:14:20): paper Guanjin Qu, Huaming Wu, Naichuan Cui:
Joint Blockchain and Federated Learning-based Offloading in Harsh Edge Computing Environments
DOI: 10.1145/3460866.3461765
CST:20:40 (ET:8:40/PT:5:40/CEST:14:40): paper Shruti Kunde, Amey Pandit, Mayank Mishra, Rekha Singhal:
Distributed training for accelerating metalearning algorithms
DOI: 10.1145/3460866.3461773
CST:21:00 (ET:9:00/PT:6:00/CEST:15:00): paper Maximilian Böther, Tilmann Rabl:
Scale-Down Experiments on TPCx-HS
DOI: 10.1145/3460866.3461774
CST:21:20 (ET:9:20/PT:6:20/CEST:15:20): break Coffee Break

Keynote 1

Time Type Description
CST:22:30 (ET:10:30/PT:7:30/CEST:16:30): keynote Konstantinos Karanasos (Microsoft's Gray Systems Lab (GSL)):
Enterprise-Grade Machine Learning in Azure Data
Bio: Konstantinos Karanasos is a Principal Scientist Lead at Microsoft's Gray Systems Lab (GSL), Azure Data's applied research group. He is the manager of the Bay Area branch of GSL and the tech lead for several systems-for-ML efforts within the group. Konstantinos' work at Microsoft previously focused on resource management for the company's production analytics clusters. This work was deployed in over 300K machines across Microsoft and was key to enable the company to operate the world’s largest YARN clusters. He has also contributed big part of his work at Microsoft to open source projects: he is a committer and member of the Project Management Committee (PMC) of Apache Hadoop, and a contributor to ONNX Runtime. Before joining Microsoft, he was a postdoctoral researcher at IBM Almaden Research Center. Konstantinos holds a PhD from Inria, France, and a Diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece. For more details, visit https://www.microsoft.com/en-us/research/people/kokarana/.
Abstract: Machine learning (ML) is being widely adopted in the enterprise and is on track to revolutionize every industry, including healthcare, manufacturing, image and speech recognition, and autonomous vehicle management, just to name a few. Enterprise-Grade ML (EGML) is a complex endeavor that involves several personas (data scientists, data/business analysts, software engineers), processes (model training, model scoring, data governance), and systems (ML runtimes, data engines, model management/deployment systems). High-value enterprise data, typically stored in relational databases, data warehouses, or data lakes, lie in the heart of EGML.
      In this talk, I will discuss various efforts across Microsoft's Azure Data org to improve the customer experience, governance, and performance of EGML applications. Several of these efforts started as research projects and are finding their way to production. Specifically, I will discuss model scoring within various data engines with a goal to provide a unified experience for publishing and consuming models in a variety of form factors, be it on Azure, multi-cloud, on the edge, or on premises. Then I will describe our work on the end-to-end optimization of ML pipelines through novel transformations and hardware acceleration. Finally, I will present our efforts on provenance for data science applications.
CST:23:30 (ET:11:30/PT:8:30/CEST:17:30): break Coffee Break

Session 2

Time Type Description
CST:0:00+1d (ET:12:00/PT:9:00/CEST:18:00): paper Alina Nesen, Bharat Bhargava:
Situational Awareness with Multimodal Streaming Data Fusion: Serverless Computing Approach
DOI: 10.1145/3460866.3461769
CST:0:20+1d (ET:12:20/PT:9:20/CEST:18:20): paper Servio Palacios, Drew Zabrocki, Bharat Bhargava, Vaneet Aggarwal:
Auditable Serverless Computing for Farm Management
DOI: 10.1145/3460866.3461770
CST:0:40+1d (ET:12:40/PT:9:40/CEST:18:40): paper Michal Bodziony, Hubert Krzyzanowski, Lukasz Pieta, Robert Wrembel:
On Discovering Semantics of User-Defined Functions in Data Processing Workflows
DOI: 10.1145/3460866.3461771
CST:1:00+1d (ET:13:00/PT:10:00/CEST:19:00): paper Christophe Cerin, Frédéric Andres, Danielle Geldwerth-Feniger:
Towards an Emulation Tool based on Ontologies and Data Life Cycles for Studying Smart Buildings
DOI: 10.1145/3460866.3461772
CST:1:20+1d (ET:13:20/PT:10:20/CEST:19:20): break Coffee Break

Keynote 2

Time Type Description
CST:2:00+1d (ET:14:00/PT:11:00/CEST:20:00): keynote Alvin Cheung (UC Berkeley):
A PACTful Agenda for Cloud Programming Research
Bio: Alvin Cheung is an assistant professor in UC Berkeley's EECS Dept. His research focuses on designing new techniques to solve data systems problems. Alvin's research has been recognized through multiple early career awards such as the US Presidential Early Career Award for Scientists and Engineers, the Sloan Fellowship, along with a number of best paper and demo awards. For more details, visit https://people.eecs.berkeley.edu/~akcheung.
Abstract: We have witnessed two decades of cloud computing research. Yet, programming the cloud remains a tedious task for both the application and cloud infrastructure developers: application developers need to consider various cloud deployment aspects as they write code, while infrastructure developers must determine how to execute and optimize code that mingles application semantics, fault tolerance, and hardware constraints, etc all into a single program.
      In this talk, I will describe our new Hydro project for next generation cloud programming research. The key ideas behind Hydro are: 1) exposing aspects such as how to scale and budgetary constraints programmatically for application developers to manipulate, 2) partitioning the user application into four facets called PACT: Program semantics, Availablity, Consistency and Targets of optimization, and 3) using program synthesis and other learning-based techniques to drive PACT program compilation rather than a static pattern-matching approach targeted for a specific backend. I will explain these ideas and describe the early progress we have made thus far.
CST:3:00+1d (ET:15:00/PT:12:00/CEST:21:00): break End of Workshop

Manuscript Preparation

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.

Manuscripts should be submitted electronically as PDF files using this webpage and be formatted using the camera-ready templates in the ACM proceedings double-column format according to the "sigconf" proceedings template. Papers cannot exceed 6 pages in length.

Accepted papers will be published online in the ACM digital library. The papers must include the standard ACM copyright notice on the first page.

The pdf version of your paper should consider the following items:

  • The pdf be optimized for fast web viewing.

  • The pdf should apply the ACM Computing Classification categories and terms (CCS concepts). The ACM templates provide space for this indexing and please consider the Computing Classification Scheme.

  • The pdf should contain the keywords.

  • The pdf should have the rights management statement and bibliographic strip on the bottom of the first page left column.

  • Please start numbering your paper with page number 1.

  • The pdf should have Type 1 fonts (scalable), not Type 3 (bit-mapped). All fonts MUST be embedded within the PDF file (to be corrected in the source files before the PDF is generated according to ACM documentation).

Submission

The submission is currently closed. Please check our Important Dates page.

Contact Program Chairs

Please contact us for any further information:

Editions

Please use the following links for further information on the edition of the given year of the International Workshop on Big Data in Emergent Distributed Environments (BiDEDE):