Towards Scalable Schema Mapping using Large Language Models

The growing need to integrate information from many diverse sources poses significant scalability challenges for data integration systems. These systems often rely on manually written schema mappings, which are complex and costly to maintain. While recent advances suggest that large language models (LLMs) can assist in automating schema mapping, key challenges remain. This work motivates future research in schema mapping generation by highlighting key challenges, presenting a competitive bidirectional schema matching pipeline, and exploring the limitations of current methods for generating more complex mappings. This work was presented at the SIGMOD Workshop MIDAS 2025. The official publication is forthcoming.

June 2025 · Christopher Buss, Mahdis Safari, Arash Termehchy, Stefan Lee, David Maier

Effective Entity Augmentation By Querying External Data Sources

Users often need to integrate information from multiple data sources. This paper proposes autonomous systems that progressively discover and integrate relevant information from multiple data sources while requiring minimal expert intervention. The proposed systems leverage end users’ feedback to learn how to retrieve information relevant to each entity in a dataset from external data sources. Our empirical evaluation shows that our approach learns accurate strategies for delivering relevant information quickly.

July 2023 · Christopher Buss, Jasmin Mousavi, Mikhail Tokarev, Arash Termehchy, David Maier, Stefan Lee

Generating Data Augmentation Queries Using Large Language Models

As an alternative to manually writing mappings from entities to queries, one can learn these mappings progressively by leveraging end users’ feedback. We evaluate the use of parameter efficient techniques for leveraging a pretrained large language model (LLM) for this task of online query policy learning. We evaluate teqniques for parameter efficent fine-tuning of LLMs online. Also presented at the 2nd NeurIPS Table Representation Learning Workshop, December, 2023.

July 2023 · Christopher Buss, Jasmin Mousavi, Mikhail Tokarev, Arash Termehchy, David Maier, Stefan Lee