10 – 14 de nov. de 2025
UFLA
Fuso horário America/Sao_Paulo

How Good Lusophones Are Data Science Llm Agents?: Evaluating Agentic Approaches for Data Science in Portuguese Contexts

10 de nov. de 2025 13:30
1h 30m
Centro de Eventos (UFLA)

Centro de Eventos

UFLA

Avenida Norte - Lavrinhas, Lavras - MG, 37200-900
Resumo Simples Ciência da Computação 1º Dia

Descrição

Data science (DS) and data analysis are complex fields, often requiring highly specialized professionals and time-intensive methods. Analyzing data and creating predictive models commonly involve intricate planning and reasoning capabilities once considered exclusive to humans. However, current advancements in large language models (LLMs), such as complex reasoning and tool use, have challenged this notion. Such advancements are reflected in the numerous recent studies that effectively apply tool-using LLM agents for data analysis and machine learning tasks. However, these developments are not as accessible as one might hope, with most frameworks and evaluations exclusively conducted using English prompts, data, and metadata. Specifically, LLM-automated DS in Portuguese contexts remains largely unassessed in the literature. To address this gap, we evaluate how capable language models are at conducting agentic lusophone analysis. To that end, we will develop a new evaluation set for Portuguese automated data science and hope to employ it to validate LLMs’ accuracy and linguistic consistency when performing exploratory data analysis (EDA) and machine learning engineering (MLE). Additionally, we explore a novel approach to assess agents on automated exploratory analysis, by ranking their analyses based on the improvement they provide to the subsequent task of automated MLE when compared to an EDA-free baseline. Our preliminary results show that, on average, LLM agents only perform 73% as well in Portuguese as in English, which shows a deficit in knowledge transfer between languages and a difficulty adapting to newer linguistics contexts.

Selecione a modalidade do seu trabalho Artigo Completo

Autor

João Paulo Lima (UFLA)

Co-autor

Denilson Pereira (UFLA)

Materiais de apresentação