TSE2024

Number of papers: 14

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

Authors: Sch"{a}fer, Max and Nadi, Sarah and Eghbali, Aryaz and Tip, Frank
Abstract: Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to various aspects of software development, including their suggested use for automated generation of unit tests, but while requiring additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness...
Link: Read Paper
Labels: program testing, unit testing, empirical study

Automatic Commit Message Generation: A Critical Review and Directions for Future Work

Authors: Zhang, Yuxia and Qiu, Zhiqing and Stol, Klaas-Jan and Zhu, Wenhui and Zhu, Jiaxin and Tian, Yingchen and Liu, Hui
Abstract: Commit messages are critical for code comprehension and software maintenance. Writing a high-quality message requires skill and effort. To support developers and reduce their effort on this task, several approaches have been proposed to automatically generate commit messages. Despite the promising performance reported, we have identified three significant and prevalent threats in these automated approaches: 1) the datasets used to train and evaluate these approaches contain a considerable amount...
Link: Read Paper
Labels: software maintenance and deployment, commit message generation, empirical study

BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection

Authors: Jiang, Shuai and Fu, Cai and He, Shuai and Lv, Jianqiang and Han, Lansheng and Hu, Hong
Abstract: Binary Code Similarity Detection (BCSD) is a fundamental binary analysis technique in the area of software security. Recently, advanced deep learning algorithms are integrated into BCSD platforms to achieve superior performance on well-known benchmarks. However, real-world large programs embed more complex diversities due to different compilers, various optimization levels, multiple architectures and even obfuscations. Existing BCSD solutions suffer from low accuracy issues in such complicated r...
Link: Read Paper
Labels: static analysis, code similarity analysis, code model, code model training, binary code model

Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models

Authors: Yang, Guang and Zhou, Yu and Chen, Xiang and Zhang, Xiangyu and Zhuo, Terry Yue and Chen, Taolue
Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (<inline-formula><tex-math notation="LaTeX">$ell$</tex-math><alterna...
Link: Read Paper
Labels: code generation, program synthesis, empirical study

ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation

Authors: Tang, Yutian and Liu, Zhijie and Zhou, Zhichao and Luo, Xiapu
Abstract: Recent advancements in large language models (LLMs) have demonstrated exceptional success in a wide range of general domain tasks, such as question answering and following instructions. Moreover, LLMs have shown potential in various software engineering applications. In this study, we present a systematic comparison of test suites generated by the ChatGPT LLM and the state-of-the-art SBST tool EvoSuite. Our comparison is based on several critical factors, including correctness, readability, code...
Link: Read Paper
Labels: program testing, unit testing, empirical study

Code Review Automation: Strengths and Weaknesses of the State of the Art

Authors: Tufano, Rosalia and Dabi'{c}, Ozren and Mastropaolo, Antonio and Ciniselli, Matteo and Bavota, Gabriele
Abstract: The automation of code review has been tackled by several researchers with the goal of reducing its cost. The adoption of deep learning in software engineering pushed the automation to new boundaries, with techniques <italic>imitating</italic> developers in generative tasks, such as commenting on a code change as a reviewer would do or addressing a reviewer's comment by modifying code. The performance of these techniques is usually assessed through quantitative metrics, <italic&gt...
Link: Read Paper
Labels: code review, empirical study

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Authors: Kang, Sungmin and Yoon, Juyeon and Askarbekkyzy, Nargiz and Yoo, Shin
Abstract: Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompti...
Link: Read Paper
Labels: program testing, bug reproduction, empirical study

Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration

Authors: Paltenghi, Matteo and Pandita, Rahul and Henley, Austin Z. and Ziegler, Albert
Abstract: Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly fo...
Link: Read Paper
Labels: code generation, code completion, code model, code model training, source code model, benchmark

Isolating Compiler Bugs by Generating Effective Witness Programs With Large Language Models

Authors: Tu, Haoxin and Zhou, Zhide and Jiang, He and Yusuf, Imam Nur Bani and Li, Yuxian and Jiang, Lingxiao
Abstract: Compiler bugs pose a significant threat to safety-critical applications, and promptly as well as effectively isolating these bugs is crucial for assuring the quality of compilers. However, the limited availability of debugging information on reported bugs complicates the compiler bug isolation task. Existing compiler bug isolation approaches convert the problem into a test program mutation problem, but they are still limited by ineffective mutation strategies or high human effort requirements. D...
Link: Read Paper
Labels: program testing, debugging, code model, code model training, source code model

LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

Authors: Fakhoury, Sarah and Naik, Aaditya and Sakkas, Georgios and Chakraborty, Saikat and Lahiri, Shuvendu K.
Abstract: Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow <sc>TiCoder</sc> for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate...
Link: Read Paper
Labels: code generation, program synthesis, empirical study

Learning to Generate Structured Code Summaries From Hybrid Code Context

Authors: Zhou, Ziyi and Li, Mingchen and Yu, Huiqun and Fan, Guisheng and Yang, Penghui and Huang, Zijie
Abstract: Code summarization aims to automatically generate natural language descriptions for code, and has become a rapidly expanding research area in the past decades. Unfortunately, existing approaches mainly focus on the “one-to-one” mapping from methods to short descriptions, which hinders them from becoming practical tools: 1) The program context is ignored, so they have difficulty in predicting labels outside the target method; 2) They are typically trained to generate brief function descriptions w...
Link: Read Paper
Labels: static analysis, code summarization, benchmark

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

Authors: Liu, Zhijie and Tang, Yutian and Luo, Xiapu and Zhou, Yuming and Zhang, Liang Feng
Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing (NLP) tasks, such as machine translation, question answering, summarization, and so on. Additionally, LLMs are also highly valuable in supporting software engineering tasks, particularly in the field of code generation. Automatic code generation is a process of automatically generating source code or executable code based on given specifications or requirements, improving developer p...
Link: Read Paper
Labels: code generation, program synthesis, empirical study

Software Testing With Large Language Models: Survey, Landscape, and Vision

Authors: Wang, Junjie and Huang, Yuchao and Chen, Chunyang and Liu, Zhe and Wang, Song and Wang, Qing
Abstract: Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effect...
Link: Read Paper
Labels: program testing, survey

Towards Efficient Fine-Tuning of Language Models With Organizational Data for Automated Software Review

Authors: Nashaat, Mona and Miller, James
Abstract: Large language models like BERT and GPT possess significant capabilities and potential impacts across various applications. Software engineers often use these models for code-related tasks, including generating, debugging, and summarizing code. Nevertheless, large language models still have several flaws, including model hallucination. (e.g., generating erroneous code and producing outdated and inaccurate programs) and the substantial computational resources and energy required for training and ...
Link: Read Paper
Labels: code generation, program repair, code model, code model training, source code model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TSE2024

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

Automatic Commit Message Generation: A Critical Review and Directions for Future Work

BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection

Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models

ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation

Code Review Automation: Strengths and Weaknesses of the State of the Art

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration

Isolating Compiler Bugs by Generating Effective Witness Programs With Large Language Models

LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

Learning to Generate Structured Code Summaries From Hybrid Code Context

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

Software Testing With Large Language Models: Survey, Landscape, and Vision

Towards Efficient Fine-Tuning of Language Models With Organizational Data for Automated Software Review

Files

README.md

Latest commit

History

README.md

File metadata and controls

TSE2024