Data Linking Infrastructure – Foundations and Architecture
Funded by DFG
Runtime: 01.01.2019 - 31.12.2025
Principal Investigator:
- Ralf Möller (Universität Hamburg)
Research Associates:
- Thomas Asselborn, M.Sc. (Universität Hamburg)
- Dr. Marcel Gehrke, M.Sc. (Universität Hamburg)
- Dr. Sylvia Melzer, Dipl.-Ing. (University of Lübeck)
- Simon Schiff, M.Sc. (University of Lübeck)
Project Description
A data linking infrastructure is envisioned to support humanities scholars from all research fields of the Cluster of Excellence "Understanding Written Artefacts” such that various kinds of data can be easily and systematically combined to foster scientific progress. On the one hand, there are images and videos of written artefacts, in some cases associated with text data making parts of image (or video) content explicit, e.g., using optical character recognition techniques. On the other hand, different kinds of chemistry and materials science data are collected to further describe written artefacts under investigation, almost always in combination with descriptive temporal and spatial data. Data of this kind must be made available to humanities scientists such that they are best supported in their scientific work. Publications from humanities projects will refer to artefact data of the kind described above, and, after a while, artefact data are referenced in quite some number of natural language publications resulting from scientific work in humanities projects, e.g., journal articles, conference papers, and PhD theses. Publications are provided as documents, which are represented, e.g., as PDF data. Further natural language data comes from existing humanities research databases. All data can be described in an appropriate way using suitable metadata formalisms (date of creation, author, etc.). In addition, and different from metadata, all kinds of base data (also called raw data) might be extended with derived data, with which certain features are made explicit (e.g., for supporting visualization, for information retrieval, or for other research efforts).
Link to Project Details
https://www.csmc.uni-hamburg.de/research/cluster-projects/field-f/rff01.html
Activities
Editorial
- S. Melzer, J. Gippert, S. Thiemann, H. Peukert: Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2021), CEUR Workshop Proceedings, 2022 (proceedings)
- S. Melzer, S. Thiemann, H. Peukert: Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2022), CEUR Workshop Proceedings, 2022 (proceedings)
- S. Melzer, H. Peukert, S. Thiemann: Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023), CEUR Workshop Proceedings, 2023 (proceedings)
Organisation
- R. Möller, S. Melzer: Data Linking Study Day 2021, Universität Hamburg, online, 15.06.2021, Organisator
- S. Melzer: 44th German Conference on Artificial Intelligence, September 27-October 1, 2021, Berlin, Germany (KI2021), Junior Research Chair (abstracts)
- S. Melzer, S. Thiemann, J. Gippert: Humanities-Centred Artificial Intelligence (CHAI), 44th German Conference on Artificial Intelligence, September 27-October 1, 2021, Berlin, Germany (KI2021), Workshop Organisator and Chair (proceedings, abstracts)
- S. Melzer, S. Thiemann, H. Peukert: 2nd Workshop on Humanities-Centred Artificial Intelligence (CHAI), 45th German Conference on Artificial Intelligence, September 19-September 23, 2022, Trier, Germany (KI2022), Workshop Organisator
- R. Möller, S. Melzer: Doctoral Symposium, 25th International Symposium on Formal Methods (FM 2023), 06.03.2023, Lübeck, PC member and mentor
- S. Melzer, H. Hu-von Hinüber: Data Linking Workshop 2023: Computer Vision and Natural Language Processing – Challenges in the Humanities, 27.-28. June 2023, Hamburg, Germany, Workshop Organisator and Chair
- S. Melzer, S. Thiemann, H. Peukert: 3rd Workshop on Humanities-Centred Artificial Intelligence (CHAI), 46th German Conference on Artificial Intelligence, September 26, 2023, Berlin, Germany (KI2023), Workshop Organisator
Publications
2024
Estimating Causal Effects in Partially Directed Parametric Causal Factor Graphs, in Proceedings of the Sixteenth International Conference on Scalable Uncertainty Management (SUM-2024) , Springer, Nov.2024. pp. 265--280.
DOI: | https://doi.org/10.1007/978-3-031-76235-2_20 |
Datei: | Dateilink |
Bibtex: | @inproceedings{LuBrMoGe24, author = {Malte Luttermann and Tanya Braun and Ralf Möller and Marcel Gehrke}, title = {{Estimating Causal Effects in Partially Directed Parametric Causal Factor Graphs}}, booktitle = {Proceedings of the Sixteenth International Conference on Scalable Uncertainty Management (SUM-2024)}, year = {2024}, pages = {265--280}, publisher = {Springer}, } |
Efficient Detection of Commutative Factors in Factor Graphs, PMLR, 092024.
Weblink: | https://proceedings.mlr.press/v246/luttermann24a.html |
Datei: | Dateilink |
Bibtex: | @inproceedings{LuMaGe24b, author = {Malte Luttermann and Johann Machemer and Marcel Gehrke}, title = {{Efficient Detection of Commutative Factors in Factor Graphs}}, booktitle = {Proceedings of the Twelfth International Conference on Probabilistic Graphical Models (PGM-2024)}, year = {2024}, volume = {246}, pages = {38--56}, publisher = {{PMLR}}, url = {https://proceedings.mlr.press/v246/luttermann24a.html} } |
Efficient Detection of Exchangeable Factors in Factor Graphs, in Proceedings of the Thirty-Seventh International FLAIRS Conference (FLAIRS-24) , Florida Online Journals, 052024.
Weblink: | https://journals.flvc.org/FLAIRS/article/view/135518 |
Datei: | Dateilink |
Bibtex: | @inproceedings{LuMaGe24, author = {Malte Luttermann and Johann Machemer and Marcel Gehrke}, title = {Efficient Detection of Exchangeable Factors in Factor Graphs}, booktitle = {Proceedings of the Thirty-Seventh International FLAIRS Conference (FLAIRS-24)}, year = {2024}, volume = {37}, publisher = {Florida Online Journals}, url = {https://journals.flvc.org/FLAIRS/article/view/135518}, } |
Lifted Causal Inference in Relational Domains, PMLR, 042024.
Weblink: | https://proceedings.mlr.press/v236/luttermann24a.html |
Datei: | Dateilink |
Bibtex: | @inproceedings{LuHaBrMoGe24, author = {Malte Luttermann and Mattis Hartwig and Tanya Braun and Ralf Möller and Marcel Gehrke}, title = {Lifted Causal Inference in Relational Domains}, booktitle = {Proceedings of the Third Conference on Causal Learning and Reasoning (CLeaR-24)}, year = {2024}, volume = {236}, pages = {827--842}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v236/luttermann24a.html}, } |
Colour Passing Revisited: Lifted Model Construction with Commutative Factors, in Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) , AAAI Press, 022024.
DOI: | https://doi.org/10.1609/aaai.v38i18.30034 |
Datei: | Dateilink |
Bibtex: | @inproceedings{LuBrMoGe24, author = {Malte Luttermann and Tanya Braun and Ralf M\"oller and Marcel Gehrke}, title = {{Colour Passing Revisited: Lifted Model Construction with Commutative Factors}}, booktitle = {Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)}, year = {2024}, volume = {38}, pages = {20500--20507}, publisher = {{AAAI} Press}, doi = {https://doi.org/10.1609/aaai.v38i18.30034}, } |
Aggregating Predicted Individual Hospital Length of Stay to Predict Bed Occupancy for Hospitals, in Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies. International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC-2024), February 21-23, Rome, Italy , SciTePress, 2024. pp. 175--184.
ISBN: | 978-989-758-688-0 |
Development of functional architectures for cyber-physical systems using interconnectable models, Systems Engineering , 2024. Wiley Online Library.
DOI: | 10.1002/sys.21761 |
Datei: | sys.21761 |
Implementation of information systems for the long-term reuse of data in humanities research, in CENTERIS - International Conference on ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems and Technologies 2024 , 2024.
ReFrESH – Relation-preserving Feedback-reliant Enhancement of Subjective Content Descriptions, in 18th IEEE International Conference on Semantic Computing, (ICSC 2024), February 5-7 , IEEE, 2024. pp. 17-24.
DOI: | 10.1109/ICSC59802.2024.00010 |
Datei: | Dateilink |
Bibtex: | @INPROCEEDINGS{BeBrMoGe, author ={Magnus Bender and Tanya Braun and Ralf M\"oller and Marcel Gehrke}, title ={ReFrESH – Relation-preserving Feedback-reliant Enhancement of Subjective Content Descriptions}, booktitle ={18th {IEEE} International Conference on Semantic Computing, ({ICSC} 2024), February 5-7}, year ={2024}, pages = {17--24}, publisher = {{IEEE}}, url = {https://dx.doi.org/10.1109/ICSC59802.2024.00010} } |
Sustainable Development of Information Systems Using SysML, FAS and DOL, in The 18th Annual International Systems Conference , 2024.
DOI: | 10.1109/SysCon61195.2024.10553629 |
Weblink: | https://ieeexplore.ieee.org/document/10553629 |
Unsupervised Estimation of Subjective Content Descriptions in an Information System, International Journal of Semantic Computing , vol. 18, no. 1, 2024.
DOI: | 10.1142/S1793351X24410034 |
Datei: | Dateilink |
Bibtex: | @article{BeBrMoGe24, author={Magnus Bender and Tanya Braun and Ralf M\"oller and Marcel Gehrke}, title={Unsupervised Estimation of Subjective Content Descriptions in an Information System}, journal = {International Journal of Semantic Computing}, volume= {18}, number={1}, pages= {}, year={2024}, doi = {} } |
Using Data Synthesis to Improve Length of Stay Predictions for Patients with Rare Diagnoses, The International FLAIRS Conference Proceedings , vol. 37, no. 1, 2024.
Datei: | 135651 |
2023
Federated Information Retrieval in Cross-Domain Information Systems, in Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023) , CEUR Workshop Proceedings, Dez.2023. pp. 52-67.
Datei: | paper7.pdf |
Bibtex: | @InProceedings{MePeDaLiAsMo2023, author = {Sylvia Melzer and Hagen Peukert and Eliana Dal Sasso and Charles Li and Thomas Asselborn and Ralf M\"oller}, booktitle = {Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023)}, year = {2023}, month = dec, title = {Federated Information Retrieval in Cross-Domain Information Systems}, pages = {52-67}, publisher = {CEUR Workshop Proceedings}, url = {https://ceur-ws.org/Vol-3580/paper7.pdf} } |
Fine-tuning BERT Models on Demand for Information Systems Explained Using Training Data from Pre-modern Arabic, in Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023) , CEUR Workshop Proceedings, Dez.2023. pp. 38--51.
Datei: | paper5.pdf |
Bibtex: | @inproceedings{AsMeAlBeMaHiMo, author = {Thomas Asselborn and Sylvia Melzer and Said Aljoumani and Magnus Bender and Florian Andreas Marwitz and Konrad Hirschler and Ralf M\"oller}, booktitle = {Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023)}, year = {2023}, month = dec, title = {Fine-tuning BERT Models on Demand for Information Systems Explained Using Training Data from Pre-modern Arabic}, pages = {38--51}, publisher = {CEUR Workshop Proceedings}, url ={https://ceur-ws.org/Vol-3580/paper5.pdf} } |
Lifting Factor Graphs with Some Unknown Factors, in Proceedings of the Seventeenth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-23) , Springer, Nov.2023. pp. 337--347.
DOI: | https://doi.org/10.1007/978-3-031-45608-4_25 |
Datei: | Dateilink |
Bibtex: | @inproceedings{LuMoGe23, author = {Malte Luttermann and Ralf Möller and Marcel Gehrke}, title = {Lifting Factor Graphs with Some Unknown Factors}, booktitle = {Proceedings of the Seventeenth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-23)}, year = {2023}, volume = {14294}, pages = {337--347}, publisher = {Springer}, doi = {https://doi.org/10.1007/978-3-031-45608-4_25}, } |
Dissertation Abstract: Taming Exact Inference in Temporal Probabilistic Relational Models, KI-Künstliche Intelligenz , pp. 1--6, 09 2023. Springer.
DOI: | 10.1007/s13218-023-00813-w |
Datei: | s13218-023-00813-w |
Bibtex: | @article{Geh, author={Marcel Gehrke}, title={Dissertation Abstract: Taming Exact Inference in Temporal Probabilistic Relational Models}, journal={KI-K{\"u}nstliche Intelligenz}, publisher={Springer} volume= {}, number={}, pages= {1--6}, year={2023}, url = {https://doi.org/10.1007/s13218-023-00813-w}, doi = {10.1007/s13218-023-00813-w} } |
EpiDoc Data Matching for Federated Information Retrieval in the Humanities, in 1st International Workshop on AI in Digital Humanities, Computational Social Sciences and Economics Research at part of the 18th Conference on Computer Science and Intelligence Systems (FedCSIS) , Proceedings of the 2023 Federated Conference on Computer Science and Intelligence Systems, 2023. pp. 1063--1068.
Datei: | 1515.pdf |
Bibtex: | @InProceedings{MeKlWeHaMo:2023, author = {Sylvia Melzer and Meike Klettke and Franziska Weise and Kaja Harter-Uibopuu and Ralf Möller}, title = {{EpiDoc Data Matching for Federated Information Retrieval in the Humanities}}, booktitle = {{1st International Workshop on AI in Digital Humanities, Computational Social Sciences and Economics Research at part of the 18th Conference on Computer Science and Intelligence Systems (FedCSIS)}}, year = 2023, publisher = {Proceedings of the 2023 Federated Conference on Computer Science and Intelligence Systems}, pages = {1063--1068}, url = {https://annals-csis.org/proceedings/2023/pliks/1515.pdf} } |
Poster: Digital Data Handling at UWA, in Digital Total - Computing & Data Science an der Universität Hamburg und in der Wissenschaftsmetropole Hamburg , 2023.
Datei: | Poster_DDH@UWA_A0_FINAL.pdf |
Bibtex: | @INPROCEEDINGS {MetThMo:2023, author={Sylvia Melzer and Stefan Thiemann and Ralf Möller}, doi={}, booktitle={Digital Total - Computing & Data Science an der Universität Hamburg und in der Wissenschaftsmetropole Hamburg}, title={Poster: Digital Data Handling at UWA}, year={2023}, month={October}, volume={}, pages={}, url = {https://www.conferences.uni-hamburg.de/event/387/contributions/1502/attachments/559/1055/Poster_DDH@UWA_A0_FINAL.pdf} } |
Unsupervised Estimation of Subjective Content Descriptions, in 17th IEEE International Conference on Semantic Computing, (ICSC 2023), February 1-3 , IEEE, 2023.
DOI: | 10.1109/ICSC56153.2023.00052 |
Datei: | Dateilink |
Bibtex: | @INPROCEEDINGS{BeBrMoGe, author ={Magnus Bender and Tanya Braun and Ralf M\"oller and Marcel Gehrke}, title ={Unsupervised Estimation of Subjective Content Descriptions}, booktitle ={17th {IEEE} International Conference on Semantic Computing, ({ICSC} 2023), February 1-3}, year ={2023}, pages = {}, publisher = {{IEEE}}, doi = {https://dx.doi.org/10.1109/ICSC56153.2023.00052}, keywords ={Subjective Content Descriptions; Text Mining;Text Annotation;Sentence clustering}, } |
Simulation of Database Interactions for Early Validation of Digitized Enterprise Processes, Procedia Computer Science, Elsevier , vol. 219, pp. 658--665, 2023.
DOI: | https://doi.org/10.1016/j.procs.2023.01.336 |
Datei: | S1877050923003459 |
Bibtex: | @article{Melzer2023658, author = {Sylvia Melzer and Oliver C. Eichmann and Hongxu Wang and Ralf God}, title = {Simulation of Database Interactions for Early Validation of Digitized Enterprise Processes}, journal = {Procedia Computer Science, Elsevier}, volume = {219}, pages = {658--665}, year = {2023}, issn = {1877-0509}, doi = {https://doi.org/10.1016/j.procs.2023.01.336}, url = {https://www.sciencedirect.com/science/article/pii/S1877050923003459}, note = {CENTERIS – International Conference on ENTERprise Information Systems / ProjMAN – International Conference on Project MANagement / HCist – International Conference on Health and Social Care Information Systems and Technologies 2022}, keywords = {Entity-Relationship Modeling, Relational Databases, Enterprise Process Digitization, Model-based Systems Engineering}, abstract = {Digitized enterprise processes often encompass interaction with relational databases. Describing and simulating large-scale and complex processes on different abstraction levels lead to the use of tools and methods of Model-based Systems Engineering. In practice, current entity-relationship modeling approaches solely enable modeling relational database structure without simulation of database interactions at an early development stage. However, in general, it is known that early validation improves common understanding and communication in the development team and reduces the risk of design flaws. This paper presents an approach for model-based enterprise process digitization and a previously developed and now enhanced broker-based SysML Toolbox for integrating real relational databases into SysML simulations. The approach comprises status quo documentation concerning enterprise processes, development of digitized processes and required relational database structures as well as validation of digitized processes using the SysML Toolbox.} } |
Query Transformation for Processing Streams in Decision-making Agents, in The International FLAIRS Conference Proceedings , 2023.
DOI: | 10.32473/flairs.36.133104 |
Bibtex: | @InProceedings{schiff2023transformation, author = {Simon Schiff and Mena Leemhuis and {\"{O}}zg{\"{u}}r L{\"{u}}tf{\"{u}} {\"{O}}z{\c{c}}ep and Ralf Möller}, title = {Query Transformation for Processing Streams in Decision-making Agents}, booktitle = {The International FLAIRS Conference Proceedings}, date = {2023-05-15}, language = {en}, pubstate = {to appear}, journaltitle = {The Thirty-Six International Flairs Conference} } |
PETS: Predicting Efficiently using Temporal Symmetries in Temporal PGMs, in Proceedings of the Seventeenth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-23) , Springer, 2023.
Datei: | Dateilink |
Bibtex: | @inproceedings{MaMoGe23, author = {Florian Andreas Marwitz and Ralf M\"oller and Marcel Gehrke}, title = {{ PETS: Predicting Efficiently using Temporal Symmetries in Temporal PGMs}}, booktitle = {Proceedings of the Seventeenth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-23)}, year = {2023}, pages = {}, publisher = {Springer}, } |
On Domain-specific Topic Modelling Using the Case of a Humanities Journal, in Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023) , CEUR Workshop Proceedings, 2023.
LESS is More: LEan Computing for Selective Summaries, in KI 2023: Advances in Artificial Intelligence , Springer Nature Switzerland, 2023. pp. 1--14.
DOI: | 10.1007/978-3-031-42608-7_1 |
Datei: | Dateilink |
Bibtex: | @InProceedings{BeBrMoGe23c, author={Magnus Bender and Tanya Braun and Ralf M\"oller and Marcel Gehrke}, title={LESS is More: LEan Computing for Selective Summaries}, journal = {International Journal of Semantic Computing}, booktitle= {KI 2023: Advances in Artificial Intelligence}, publisher= {Springer Nature Switzerland}, year={2023}, doi ={https://doi.org/10.1007/978-3-031-42608-7_1}, pages={1--14}, } |
Introduction to the Third Workshop on Humanities-Centred Artificial Intelligence, in Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023) , Sylvia Melzer and Hagen Peukert and Stefan Thiemann, Eds. CEUR Workshop Proceedings, 2023. pp. 1-3.
Datei: | preface.pdf |
Bibtex: | @inproceedings{melzer2023introduction, title = "Introduction to the Third Workshop on Humanities-Centred Artificial Intelligence", author = "Sylvia Melzer and Hagen Peukert and Stefan Thiemann", year = "2023", booktitle = "Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023)", editor = "Sylvia Melzer and Hagen Peukert and Stefan Thiemann", publisher = "CEUR Workshop Proceedings", volume = "3580", pages = "1-3", url = "https://ceur-ws.org/Vol-3580/preface.pdf" } |