FindTD Tool

Identifying Self-Admitted Technical Debt (SATD) in source code is one of the challenges present in contemporary software development. This debt reflects compromises in code quality, often resulting from design or implementation decisions made quickly and temporarily. FindTD is a Machine Learning tool developed to identify self-admitted technical debt in source code.

This research, conducted under the Samsung-UFAM Project for Education and Research (SUPER), based on a previous work available in the GitHub (Li, Soliman e Avgeriou, 2022), The tool utilizes a pre-trained model and a database extracted from this previous work as a starting point. Additionally, an extra model was developed using SVM (Support Vector Machine) to identify different types of technical debt. By using 80% of the dataset for training and 20% for testing, the SVM model achieved an accuracy of 65% and an F1-score of 64.13%.

Features

Automated Identification​

Uses a pre-trained CNN model to analyze comments and identify potential areas of technical debt in the source code.

Classification by Types​

Applies an SVM model to classify the different types of technical debt found, such as complex code, lack of documentation, among others.

Detailed Reports​

Generates detailed reports on the location, nature, and severity of the identified technical debt, facilitating the planning and prioritization of refactoring activities.

Implementation

1

Analysis and Understanding of the Problem

In this step, it was crucial to understand the problem of technical debt and how it affects software development. Therefore, a Quick Review was conducted, where we identified the different types of technical debt and their common causes.

2

Dataset and Model Acquisition

After the first step, we gained access to the pre-trained CNN model and the data. We identified the need for preprocessing, including cleaning, handling missing values, and removing duplicates. Once this process was completed, we refined the dataset to focus on only five types of technical debt that can be detected via source code.

3

Model Training and Evaluation

For the training process, we used frameworks such as TensorFlow, FastText, and joblib. This step involved training the SVM model using the preprocessed dataset. The model's performance was evaluated using appropriate metrics, such as accuracy, recall, and F1-score.

4

Tool Implementation

The next step was to integrate the CNN model and the SVM model into a tool. This allowed not only identifying the presence of technical debt in the source code but also determining its type and the specific line of code. By further refining the tool, it became capable of recognizing multiple programming languages.

5

Testing and Debugging

In the Testing and Debugging step of the FindTD tool, comprehensive procedures were carried out to ensure its performance and proper functioning across various types of technical debt identification and source code types. By conducting these testing procedures thoroughly and effectively, we ensured that the tool operates at its best, helping developers in the effective identification of self-admitted technical debt in source code.

Imagem Super

SUPER Project

SUPER is a collaborative project between SAMSUNG and the Federal University of Amazonas (UFAM), aimed at fostering training and research in 11 undergraduate programs at UFAM. At the Manaus campus, the participating programs include Computer Science, Electrical Engineering (with emphases in Electronics, Telecommunications, and Electrical Engineering), Computer Engineering, Software Engineering, Production Engineering, and Design. Meanwhile, at the Itacoatiara Campus, the programs covered are Production Engineering, Software Engineering, and Information Systems.

Learn More

Want to Know More?

We have developed a tutorial explaining how to use the tool step by step, as well as how to download the pre-trained models and the dataset used, free of charge.

Reference

Li, Y.; Soliman, M. e Avgeriou, P. (2022). Identifying Self-Admitted Technical Debt in Issue Tracking Systems Using Machine Learning. Empirical Software Engineering, v. 27, n. 131.