Fine-tuning de un Modelo de Lenguaje Largo para la clasificación de Curriculums Vitae

Juan Diego  Salcedo-Salazar

doi:10.17268/scien.inge.2025.02.02

Autores

Juan Diego Salcedo-Salazar Programa de Maestría en Ingeniería de Sistemas e Informática. Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional Mayor de San Marcos, Av. Carlos Germán Amezaga #375 - Cercado de Lima - Ciudad Universitaria, Lima Perú. https://orcid.org/0009-0005-7692-460X

DOI:

https://doi.org/10.17268/scien.inge.2025.02.02

Palavras-chave:

Procesamiento del Lenguaje Natural, Modelo de Lenguaje Largo, Fine-tuning, Clasificación de textos, Curriculum Vitae

Resumo

The main objective of this work was to classify resumes according to their professional area, an important task in human resources management and personnel recruitment. This research seeks to explore the classification capabilities of Large Language Models (LLM) by performing a comparative analysis versus traditional Machine Learning methods. To achieve this objective, a fine-tuning technique was used on the Long Language Model pre-trained by Google in English called BERT BASE UNCASED using a dataset of more than 3,000 resumes from 25 professional areas and 3 training epochs against traditional Random Forest, SVM, Logistic Regression, and Naive Bayes Multinomial models. The methodology consists of 7 essential stages to adapt a pre-trained model to a specific task, ensuring optimal performance. The research provides a comparative analysis focusing on the metrics Accuracy, F1-score, Precision, and Recall. The most significant results obtained were 83,0% Accuracy and 82.3% Precision for the base Google model, and 82.8% F1-score and 86,2% Recall for the Naive Bayes Multinomial model, revealing that the base Google model performs well in predicting resume classification, while Naive Bayes Multinomial is better at detecting the majority of positive cases. This research not only contributes by showing the performance of MLLs for the classification task in contrast to their traditional Machine Learning peers, but also offers an innovative approach to human resource management and staff recruitment practices.

Referências

Carlini, N., Tramèr, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, Ú., Oprea, A., & Raffel, C. (2021). Extracting Training Data from Large Language Models.

Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. En J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171-4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423

Heakl, A., Mohamed, Y., Mohamed, N., Elsharkawy, A., & Zaky, A. (2024). ResuméAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models. Procedia Computer Science, 244, 158-165. https://doi.org/10.1016/j.procs.2024.10.189

Instituto de Estadística de la UNESCO. (2013). Clasificación Internacional Normalizada de la Educación (CINE) 2011 (Revisión 2). Instituto de Estadística de la UNESCO. https://doi.org/10.15220/978-92-9189-129-0-spa

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach (arXiv:1907.11692). arXiv. https://doi.org/10.48550/arXiv.1907.11692

López, D. (2024). Evaluación de rendimiento de diferentes modelo grandes de lenguaje para el reconocimiento de emociones en texto [Universidad EAFIT]. https://hdl.handle.net/10784/35404

mitsmrmex. (2024, enero 4). 5 pasos para redactar un CV con ayuda de la IA fácilmente. MIT Sloan Management Review Mexico. https://mitsloanreview.mx/data-ia-machine-learning/5-pasos-para-usar-la-ia-y-crear-un-cv-que-impactara-a-cualquier-reclutador/

Oliveira, A., Bessa, R., & Teles, A. (2024). Análisis comparativo de modelos de lenguaje basados en BERT y generativos amplios para la detección de ideación suicida: Un estudio de evaluación del desempeño. Cadernos de Saúde Pública, 40, e00028824. https://doi.org/10.1590/0102-311XEN028824

Parthasarathy, V., Zafar, A., Khan, A., & Shahid, A. (2024). The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities (arXiv:2408.13296). arXiv. https://doi.org/10.48550/arXiv.2408.13296

Seagate-WP-DataAge2025-March-2017.pdf. (s. f.). Recuperado 7 de enero de 2024, de https://www.seagate.com/files/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf

Wu, Y., & Wan, J. (2025). A survey of text classification based on pre-trained language model. Neurocomputing, 616, 128921. https://doi.org/10.1016/j.neucom.2024.128921

Yu, H., Yang, Z., Pelrine, K., Godbout, J., & Rabbany, R. (2023). Open, Closed, or Small Language Models for Text Classification? (arXiv:2308.10092). arXiv. https://doi.org/10.48550/arXiv.2308.10092

Zhao, W., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2023). A Survey of Large Language Models (arXiv:2303.18223). arXiv. http://arxiv.org/abs/2303.18223

Fine-tuning de un Modelo de Lenguaje Largo para la clasificación de Curriculums Vitae

Autores

DOI:

Palavras-chave:

Resumo

Referências

Downloads

Publicado

Como Citar

Edição

Seção

Licença

Enviar Submissão

Desenvolvido por

Idioma