Data Linking Infrastructure – Foundations and Architecture

Funded by DFG

Runtime: 01.01.2019 - 31.12.2025

Principal Investigator: 

Research Associates: 

Project Description

A data linking infrastructure is envisioned to support humanities scholars from all research fields of the Cluster of Excellence "Understanding Written Artefacts” such that various kinds of data can be easily and systematically combined to foster scientific progress. On the one hand, there are images and videos of written artefacts, in some cases associated with text data making parts of image (or video) content explicit, e.g., using optical character recognition techniques. On the other hand, different kinds of chemistry and materials science data are collected to further describe written artefacts under investigation, almost always in combination with descriptive temporal and spatial data. Data of this kind must be made available to humanities scientists such that they are best supported in their scientific work. Publications from humanities projects will refer to artefact data of the kind described above, and, after a while, artefact data are referenced in quite some number of natural language publications resulting from scientific work in humanities projects, e.g., journal articles, conference papers, and PhD theses. Publications are provided as documents, which are represented, e.g., as PDF data. Further natural language data comes from existing humanities research databases. All data can be described in an appropriate way using suitable metadata formalisms (date of creation, author, etc.). In addition, and different from metadata, all kinds of base data (also called raw data) might be extended with derived data, with which certain features are made explicit (e.g., for supporting visualization, for information retrieval, or for other research efforts).

Link to Project Details

https://www.csmc.uni-hamburg.de/research/cluster-projects/field-f/rff01.html

Activities

Editorial

  • S. Melzer, J. Gippert, S. Thiemann, H. Peukert: Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2021), CEUR Workshop Proceedings, 2022 (proceedings)
  • S. Melzer, S. Thiemann, H. Peukert: Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2022), CEUR Workshop Proceedings, 2022 (proceedings)
  • S. Melzer, H. Peukert, S. Thiemann: Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023), CEUR Workshop Proceedings, 2023 (proceedings)

Organisation

Publications

2025

Jan Speller, Malte Luttermann, Marcel Gehrke, and Tanya Braun,
Compression versus Accuracy: A Hierarchy of Lifted Models, in Proceedings of the Twenty-Eighth European Conference on Artificial Intelligence (ECAI-2025) , IOS Press, Okt.2025. pp. 5051-5058.
DOI:https://doi.org/10.3233/FAIA251420
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{SpLuGeBr25,
	author    = {Jan Speller and Malte Luttermann and Marcel Gehrke and Tanya Braun},
	title     = {{Compression versus Accuracy: A Hierarchy of Lifted Models}},
	booktitle = {Proceedings of the Twenty-Eighth European Conference on Artificial Intelligence (ECAI-2025)},
	year      = {2025},
	pages     = {5051--5058},
	publisher = {{IOS} Press},
}
Thomas Asselborn, Magnus Bender, Ralf Möller, and Sylvia Melzer,
Treating OCR Output as a Language (TOOL) – Improving OCR Output with Seq2Seq Translation, in Annals of Computer Science and Intelligence Systems – Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS) , Okt.2025. pp. 471–478.
DOI:10.15439/2025F1103
Bibtex: BibTeX
@inbook{AsBeMöMe25,
title = "Treating OCR Output as a Language (TOOL) – Improving OCR Output with Seq2Seq Translation",
author = "Thomas Asselborn and Magnus Bender and Ralf M{\"o}ller and Sylvia Melzer",
year = "2025",
month = oct,
day = "15",
doi = "10.15439/2025F1103",
language = "English",
volume = "43",
pages = "471–478",
booktitle = "Annals of Computer Science and Intelligence Systems",
note = "20th Conference on Computer Science and Intelligence Systems FedCSIS 2025 ; Conference date: 14-09-2025 Through 17-09-2025",
url = "https://2025.fedcsis.org/",

}
Marcel Gehrke, and Malte Luttermann,
StaRAI: From a Probabilistic Propositional Model to a Highly Compressed Probabilistic Relational Model (Extended Abstract), in Joint Proceedings of the ECSQARU 2025 Workshops and Tutorials , HAL Open Science, Okt.2025. pp. 71-74.
Weblink: https://hal.science/hal-05294280v1
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{Gehrke2025a,
    author    = {Marcel Gehrke and Malte Luttermann},
    title     = {{StaRAI: From a Probabilistic Propositional Model to a Highly Compressed Probabilistic Relational Model (Extended Abstract)}},
    booktitle = {Joint Proceedings of the ECSQARU 2025 Workshops and Tutorials},
    year      = {2025},
    pages     = {71--74},
    publisher = {{HAL} Open Science},
}
Jan Speller, Malte Luttermann, Marcel Gehrke, and Tanya Braun,
Towards Explainability of Approximate Lifted Model Construction: A Geometric Perspective, in Proceedings of the First Joint Workshop on Humanities-Centered Artificial Intelligence and Formal & Cognitive Reasoning (CHAI-2025 and FCR-2025) , CEUR, Okt.2025. pp. 41-56.
Weblink: https://ceur-ws.org/Vol-4058/paper4.pdf
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{SpLuGeBr25,
	author    = {Jan Speller and Malte Luttermann and Marcel Gehrke and Tanya Braun},
	title     = {{Towards Explainability of Approximate Lifted Model Construction: A Geometric Perspective}},
	booktitle = {Proceedings of the Eleventh Workshop on Formal and Cognitive Reasoning (FCR-2025)},
	year      = {2025},
	pages     = {41--56},
	publisher = {{CEUR}},
}
Malte Luttermann, Jan Speller, Marcel Gehrke, Tanya Braun, Ralf Möller, and Mattis Hartwig,
Approximate Lifted Model Construction, in Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-2025) , IJCAI Organization, 082025. pp. 9077-9085.
DOI:https://doi.org/10.24963/ijcai.2025/1009
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuSpGeBrMoHa25,
	author    = {Malte Luttermann and Jan Speller and Marcel Gehrke and Tanya Braun and Ralf Möller and Mattis Hartwig},
	title     = {{Approximate Lifted Model Construction}},
	booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-2025)},
	year      = {2025},
	pages     = {9077--9085},
	publisher = {{IJCAI} Organization},
}
Malte Luttermann, Ralf Möller, and Marcel Gehrke,
Lifted Model Construction Without Normalisation: A Vectorised Approach to Exploit Symmetries in Factor Graphs, in Proceedings of the Third Learning on Graphs Conference (LoG-2024) , PMLR, 072025. pp. 46:1-46:17.
Weblink: https://proceedings.mlr.press/v269/luttermann25a.html
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuMoGe25,
    author    = {Malte Luttermann and Ralf Möller and Marcel Gehrke},
    title     = {{Lifted Model Construction without Normalisation: A Vectorised Approach to Exploit Symmetries in Factor Graphs}},
    booktitle = {Proceedings of the Third Learning on Graphs Conference (LoG-2024)},
    year      = {2025},
    pages     = {46:1-46:17},
    publisher = {{PMLR}},
}
Malte Luttermann, Ralf Möller, and Marcel Gehrke,
Lifting Factor Graphs with Some Unknown Factors for New Individuals, International Journal of Approximate Reasoning , 04 2025. Elsevier.
DOI:https://doi.org/10.1016/j.ijar.2025.109371
Datei: Dateilink
Bibtex: BibTeX
@article{LuMoGe25,
	author    = {Malte Luttermann and Ralf Möller and Marcel Gehrke},
	title     = {{Lifting Factor Graphs with Some Unknown Factors for New Individuals}},
	journal   = {International Journal of Approximate Reasoning},
	volume    = {179},
	year      = {2025},
	pages     = {109371},
	publisher = {Elsevier},
}
Thomas Asselborn, Magnus Bender, Florian Marwitz, Ralf Möller, and Sylvia Melzer,
Verbalisation Process of a RAG-Based Chatbot to Support Tabular Data Evaluation for Humanities Researchers, in Proceedings of the Workshop on Large Language Models for Research Data Management?! co-located with the INFORMATIK Festival 2025 (55th Annual Conference of the German Informatics Society) , CEUR Workshop Proceedings, 2025. pp. 56-63.
Datei: paper1.pdf
Bibtex: BibTeX
@inbook{AsBeMaMöMe25,
title = "Verbalisation Process of a RAG-Based Chatbot to Support Tabular Data Evaluation for Humanities Researchers",
author = "Thomas Asselborn and Magnus Bender and Florian Marwitz and Ralf M{\"o}ller and Sylvia Melzer",
year = "2025",
month = dec,
day = "22",
language = "English",
volume = "4140",
pages = "56--63",
editor = "Magnus Bender and Sylvia Melzer and Ralf M{\"o}ller and Stefan Thiemann",
booktitle = "Proceedings of the Workshop on Large Language Models for Research Data Management?! co-located with the INFORMATIK Festival 2025 (55th Annual Conference of the German Informatics Society), September 18, 2025, Potsdam, Germany (INFORMATIK FESTIVAL)",
publisher = "CEUR-WS.org",
note = "Large Language Models for Research Data Management?! ; Conference date: 18-09-2025 Through 18-09-2025",
url = "https://informatik2025.gi.de/workshops\_a-z.html, https://www.conferences.uni-hamburg.de/event/621/",

}
Edyta Jurkiewicz-Rohrbacher, and Thomas Asselborn,
Challenges in Automatic Speech Recognition in the Research on Multilingualism, in Proceedings of the Workshop on Large Language Models for Research Data Management?! co-located with the INFORMATIK Festival 2025 (55th Annual Conference of the German Informatics Society) , CEUR Workshop Proceedings, 2025. pp. 37-44.
Datei: paper5.pdf
Bibtex: BibTeX
@inbook{JuRoAs25,
title = "Challenges in Automatic Speech Recognition in the Research on Multilingualism",
abstract = "This paper explores the potential of using Large Language Models in multilingualism research to accelerate the management and processing of spoken data. The speech-to-text processing of utterances by multilingual speakers are in the focus. Qualitative discussion of the main issues relating to the non-standard language use of bilingual individuals is provided, using Polish-German recordings from the LangGener corpus as an example.",
author = "Edyta Jurkiewicz-Rohrbacher and Thomas Asselborn",
year = "2025",
month = dec,
day = "22",
language = "English",
pages = "37--44",
booktitle = "Proceedings of the Workshop on Large Language Models for Research Data Management?! co-located with the INFORMATIK Festival 2025 (55th Annual Conference of the German Informatics Society)",
note = "Large Language Models for Research Data Management?! ; Conference date: 18-09-2025 Through 18-09-2025",
url = "https://informatik2025.gi.de/workshops\_a-z.html, https://www.conferences.uni-hamburg.de/event/621/",

}
Thomas Asselborn, Magnus Bender, Ralf Möller, and Sylvia Melzer,
Publishing a Chatbot: Opportunities and Challenges, in CHAI+FCR 2025 Humanities-Centred Artificial Intelligence 2025 and Formal & Cognitive Reasoning 2025 – Proceedings of the Joint Workshop on Humanities-Centred Artificial Intelligence and Formal & Cognitive Reasoning co-located with 48th German Conference on Artificial Intelligence , CEUR Workshop Proceedings, 2025. pp. 16-27.
Datei: paper2.pdf
Bibtex: BibTeX
@inproceedings{AsBeMöMe35,
title = "Publishing a Chatbot: Opportunities and Challenges",
author = "Thomas Asselborn and Magnus Bender and Ralf M{\"o}ller and Sylvia Melzer",
year = "2025",
month = oct,
day = "8",
language = "English",
volume = "4058",
pages = "16--27",
booktitle = "CHAI+FCR 2025 Humanities-Centred Artificial Intelligence 2025 and Formal \& Cognitive Reasoning 2025",
publisher = "CEUR-WS.org",
note = "5th Workshop on Humanities-Centred AI (CHAI) ; Conference date: 16-09-2025",
url = "https://www.csmc.uni-hamburg.de/ki2025-chai",
}
Sylvia Melzer, Simon Schiff, Franziska Weise, Thomas Asselborn, Meike Klettke, Özgür Lütfü Özçep, Kaja Harter-Uibopuu, and Ralf Möller,
Ontology-based Federated Information Systems with NLP-Enhanced Retrieval, in International Journal of Digital Humanities , 2025. pp. 417–441.
DOI:10.1007/s42803-025-00110-y
Bibtex: BibTeX
@article{cite-key,
	author = {Melzer, Sylvia and Schiff, Simon and Weise, Franziska and Asselborn, Thomas and Klettke, Meike and {\"O}z{\c c}ep, {\"O}zg{\"u}r L{\"u}tf{\"u} and Harter-Uibopuu, Kaja and M{\"o}ller, Ralf},
	date = {2025/12/01},
	doi = {10.1007/s42803-025-00110-y},
	isbn = {2524-7840},
	journal = {International Journal of Digital Humanities},
	number = {3},
	pages = {417--441},
	title = {Ontology-based federated information systems with NLP-enhanced retrieval},
	url = {https://doi.org/10.1007/s42803-025-00110-y},
	volume = {7},
	year = {2025},
}

2024

Malte Luttermann, Tanya Braun, Ralf Möller, and Marcel Gehrke,
Estimating Causal Effects in Partially Directed Parametric Causal Factor Graphs, in Proceedings of the Sixteenth International Conference on Scalable Uncertainty Management (SUM-2024) , Springer, Nov.2024. pp. 265--280.
DOI:https://doi.org/10.1007/978-3-031-76235-2_20
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuBrMoGe24,
    author    = {Malte Luttermann and Tanya Braun and Ralf Möller and Marcel Gehrke},
    title     = {{Estimating Causal Effects in Partially Directed Parametric Causal Factor Graphs}},
    booktitle = {Proceedings of the Sixteenth International Conference on Scalable Uncertainty Management (SUM-2024)},
    year      = {2024},
    pages     = {265--280},
    publisher = {Springer},
}
Malte Luttermann, Johann Machemer, and Marcel Gehrke,
Efficient Detection of Commutative Factors in Factor Graphs, in Proceedings of the Twelfth International Conference on Probabilistic Graphical Models (PGM-2024) , PMLR, 092024. pp. 38-56.
Weblink: https://proceedings.mlr.press/v246/luttermann24a.html
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuMaGe24b,
	author    = {Malte Luttermann and Johann Machemer and Marcel Gehrke},
	title     = {{Efficient Detection of Commutative Factors in Factor Graphs}},
	booktitle = {Proceedings of the Twelfth International Conference on Probabilistic Graphical Models (PGM-2024)},
	year      = {2024},
	volume    = {246},
	pages     = {38--56},
	publisher = {{PMLR}},
	url       = {https://proceedings.mlr.press/v246/luttermann24a.html}
}
Malte Luttermann, Johann Machemer, and Marcel Gehrke,
Efficient Detection of Exchangeable Factors in Factor Graphs, in Proceedings of the Thirty-Seventh International FLAIRS Conference (FLAIRS-24) , Florida Online Journals, 052024.
Weblink: https://journals.flvc.org/FLAIRS/article/view/135518
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuMaGe24,
  author    = {Malte Luttermann and Johann Machemer and Marcel Gehrke},
  title     = {Efficient Detection of Exchangeable Factors in Factor Graphs},
  booktitle = {Proceedings of the Thirty-Seventh International FLAIRS Conference (FLAIRS-24)},
  year      = {2024},
  volume    = {37},
  publisher = {Florida Online Journals},
  url       = {https://journals.flvc.org/FLAIRS/article/view/135518},
}
Malte Luttermann, Mattis Hartwig, Tanya Braun, Ralf Möller, and Marcel Gehrke,
Lifted Causal Inference in Relational Domains, in Proceedings of the Third Conference on Causal Learning and Reasoning (CLeaR-24) , PMLR, 042024. pp. 827-842.
Weblink: https://proceedings.mlr.press/v236/luttermann24a.html
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuHaBrMoGe24,
  author    = {Malte Luttermann and Mattis Hartwig and Tanya Braun and Ralf Möller and Marcel Gehrke},
  title     = {Lifted Causal Inference in Relational Domains},
  booktitle = {Proceedings of the Third Conference on Causal Learning and Reasoning (CLeaR-24)},
  year      = {2024},
  volume    = {236},
  pages     = {827--842},
  publisher = {PMLR},
  url       = {https://proceedings.mlr.press/v236/luttermann24a.html},
}
Malte Luttermann, Tanya Braun, Ralf Möller, and Marcel Gehrke,
Colour Passing Revisited: Lifted Model Construction with Commutative Factors, in Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) , AAAI Press, 022024. pp. 20500-20507.
DOI:https://doi.org/10.1609/aaai.v38i18.30034
Datei: Dateilink
Bibtex: BibTeX
@inproceedings{LuBrMoGe24,
    author    = {Malte Luttermann and Tanya Braun and Ralf M\"oller and Marcel Gehrke},
    title     = {{Colour Passing Revisited: Lifted Model Construction with Commutative Factors}},
    booktitle = {Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence     (AAAI-24)},
    year      = {2024},
    volume    = {38},
    pages     = {20500--20507},
    publisher = {{AAAI} Press},
    doi       = {https://doi.org/10.1609/aaai.v38i18.30034},
}
Simon Schiff, Sebastian Wolfrum, Ralf Möller, and Mattis Hartwig,
Using Data Synthesis to Improve Length of Stay Predictions for Patients with Rare Diagnoses, The International FLAIRS Conference Proceedings , vol. 37, no. 1, 2024.
Datei: 135651
Magnus Bender, Tanya Braun, Ralf Möller, and Marcel Gehrke,
Unsupervised Estimation of Subjective Content Descriptions in an Information System, International Journal of Semantic Computing , vol. 18, no. 1, 2024.
DOI:10.1142/S1793351X24410034
Datei: Dateilink
Bibtex: BibTeX
@article{BeBrMoGe24,
author={Magnus Bender and Tanya Braun and Ralf M\"oller and Marcel Gehrke},
title={Unsupervised Estimation of Subjective Content Descriptions in an Information System},
journal = {International Journal of Semantic Computing},
volume= {18},
number={1},
pages= {},
year={2024},
doi  = {}
}
Magnus Bender, Tanya Braun, Ralf Möller, and Marcel Gehrke,
ReFrESH – Relation-preserving Feedback-reliant Enhancement of Subjective Content Descriptions, in 18th IEEE International Conference on Semantic Computing, (ICSC 2024), February 5-7 , IEEE, 2024. pp. 17-24.
DOI:10.1109/ICSC59802.2024.00010
Datei: Dateilink
Bibtex: BibTeX
@INPROCEEDINGS{BeBrMoGe,
author ={Magnus Bender and Tanya Braun and Ralf M\"oller and Marcel Gehrke},
title ={ReFrESH – Relation-preserving Feedback-reliant Enhancement of Subjective Content Descriptions},
booktitle ={18th {IEEE} International Conference on Semantic Computing, ({ICSC} 2024), February 5-7},
year ={2024},
pages = {17--24},
publisher = {{IEEE}},
url = {https://dx.doi.org/10.1109/ICSC59802.2024.00010}
}
Sylvia Melzer, Tim Weilkiens, Christian Muggeo, and Axel Berres,
Sustainable Development of Information Systems Using SysML, FAS and DOL, in The 18th Annual International Systems Conference , 2024.
DOI:10.1109/SysCon61195.2024.10553629
Weblink: https://ieeexplore.ieee.org/document/10553629
Thomas Asselborn, Ralf Möller, and Sylvia Melzer,
Implementation of information systems for the long-term reuse of data in humanities research, in CENTERIS - International Conference on ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems and Technologies , 2024, pp. 86-92.
DOI:10.1016/j.procs.2025.02.099
Bibtex: BibTeX
@article{ASSELBORN202586,
title = {Implementation of information systems for the long-term reuse of data in humanities research},
journal = {Procedia Computer Science},
volume = {256},
pages = {86-92},
year = {2025},
note = {CENTERIS - International Conference on ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems and Technologies},
issn = {1877-0509},
doi = {https://doi.org/10.1016/j.procs.2025.02.099},
url = {https://www.sciencedirect.com/science/article/pii/S1877050925004569},
author = {Thomas Asselborn and Ralf Möller and Sylvia Melzer},
keywords = {information systems on demand, humanities, research data repository, relational databases}
}
Mattis Hartwig, Sebastian Wolfrum, Ralf Möller, and Simon Schiff,
Aggregating Predicted Individual Hospital Length of Stay to Predict Bed Occupancy for Hospitals, in Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies. International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC-2024), February 21-23, Rome, Italy , SciTePress, 2024. pp. 175--184.
ISBN:978-989-758-688-0
Hagen Peukert, Lucas F. Voges, Thomas Asselborn, Magnus Bender, Ralf Möller, and Sylvia Melzer,
Humanities in the Center of Data Usability: Data Visualization in Institutional Research Repositories, in Proceedings of the CHAI Workshop 2024 , CEUR Workshop Proceedings, 2024. pp. 67-74.
Datei: paper6.pdf
Bibtex: BibTeX
@inbook{PeVoAsBeMöMe24,
title = "Humanities in the Center of Data Usability: Data Visualization in Institutional Research Repositories",
author = "Hagen Peukert and Voges, \{Lucas Filipo\} and Thomas Asselborn and Magnus Bender and Ralf M{\"o}ller and Sylvia Melzer",
year = "2024",
month = oct,
day = "30",
language = "English",
volume = "3814",
pages = "67--74",
editor = "Sylvia Melzer and Hagen Peukert and Stefan Thiemann and Erik Radisch",
booktitle = "Proceedings of the CHAI Workshop 2024",
publisher = "CEUR-WS.org",

}
Oliver C. Eichmann, Jesko G. Lamm, Sylvia Melzer, Tim Weilkiens, and Ralf God,
Development of functional architectures for cyber-physical systems using interconnectable models, Systems Engineering , 2024. Wiley Online Library.
DOI:10.1002/sys.21761
Datei: sys.21761

2023

Thomas Asselborn, Sylvia Melzer, Said Aljoumani, Magnus Bender, Florian Andreas Marwitz, Konrad Hirschler, and Ralf Möller,
Fine-tuning BERT Models on Demand for Information Systems Explained Using Training Data from Pre-modern Arabic, in Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023) , CEUR Workshop Proceedings, Dez.2023. pp. 38--51.
Datei: paper5.pdf
Bibtex: BibTeX
@inproceedings{AsMeAlBeMaHiMo,
  author    = {Thomas Asselborn and Sylvia Melzer and Said Aljoumani and Magnus Bender and Florian Andreas Marwitz and Konrad Hirschler and Ralf M\"oller},
  booktitle = {Proceedings of the Workshop on Humanities-Centred Artificial Intelligence (CHAI 2023)},
  year = {2023},
  month = dec,
  title     = {Fine-tuning BERT Models on Demand for Information Systems Explained Using Training Data from Pre-modern Arabic},
  pages     = {38--51},
  publisher = {CEUR Workshop Proceedings},
  url ={https://ceur-ws.org/Vol-3580/paper5.pdf}
}