Conferencia Invitada: Data quality in the era of Foundational Models

  • Saúl Marcelo Calderón Ramírez Instituto Tecnológico de Costa Rica

Abstract

Deep learning models usually need extensive amounts of data, and these data have to be labeled, becoming a concern when dealing with real-world applications. It is known that labeling a dataset is a costly task in time, money, and resource-wise. Foundational models are becoming a strong trend in different application fields, from natural language processing to image analysis. Commonly, foundational models are pre-trained with very large datasets in self-supervised fashion, with multi-modal data (text, images, audio, etc.). The usage of these models in target domains and tasks decreases the need of labeling very large target datasets, even more when using scarcely labeled data regimes: semi-supervised, self-supervised, few-shot learning, etc. However, data quality for both training and evaluation of the model even in these settings is important. In this talk, we address different data quality attributes for both training and evaluation of the model, which are still relevant for systems based upon foundational models.

Published
2023-10-12
How to Cite
Calderón Ramírez, S. (2023). Conferencia Invitada: Data quality in the era of Foundational Models. Proceedings of JAIIO, 9(12). Retrieved from https://ojs.sadio.org.ar/index.php/JAIIO/article/view/783
Section
SAIV - Simposio Argentino de Imágenes y Visión