Skip to content

Latest commit

 

History

History
198 lines (142 loc) · 25.4 KB

README.md

File metadata and controls

198 lines (142 loc) · 25.4 KB

ISSTA2024

Number of papers: 28

  • Authors: Sun, Zhensu and Du, Xiaoning and Yang, Zhou and Li, Li and Lo, David
  • Abstract: Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used ...
  • Link: Read Paper
  • Labels: code generation, program synthesis
  • Authors: Zhang, Yuntong and Ruan, Haifeng and Fan, Zhiyu and Roychoudhury, Abhik
  • Abstract: Researchers have made significant progress in automating the software development process in the past decades. Automated techniques for issue summarization, bug reproduction, fault localization, and program repair have been built to ease the workload of developers. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless, software engineering involves the...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: Xia, Chunqiu Steven and Zhang, Lingming
  • Abstract: Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily on handcrafted or mined bug fixing patterns and cannot easily generalize to other bug/fix types. To address this limitation, recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using ...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: Xia, Chunqiu Steven and Zhang, Lingming
  • Abstract: Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily on handcrafted or mined bug fixing patterns and cannot easily generalize to other bug/fix types. To address this limitation, recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using ...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: Pan, Shengyi and Wang, You and Liu, Zhongxin and Hu, Xing and Xia, Xin and Li, Shanping
  • Abstract: Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergence between the source project and the hard fork, patch porting is complicated, which requires an ada...
  • Link: Read Paper
  • Labels: software maintenance and deployment, empirical study
  • Authors: Ouyang, Yicheng and Yang, Jun and Zhang, Lingming
  • Abstract: As bugs are inevitable and prevalent in real-world programs, many Automated Program Repair (APR) techniques have been proposed to generate patches for them. However, due to the lack of a standard for evaluating APR techniques, prior works tend to use different settings and benchmarks in evaluation, threatening the trustworthiness of the evaluation results. Additionally, they typically only adopt plausibility and genuineness as evaluation metrics, which may potentially mask some underlying issues...
  • Link: Read Paper
  • Labels: code generation, program repair, benchmark
  • Authors: Yang, Boyang and Tian, Haoye and Pian, Weiguo and Yu, Haoran and Wang, Haitao and Klein, Jacques and Bissyand'{e}, Tegawend'{e} F. and Jin, Shunfu
  • Abstract: With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, existing repair benchmarks might have influenced LLM training data, potentially causing data leakage. To evaluate LLMs’ realistic repair capabilities, (i) we introduce an extensive, non-crawled benchmark TutorCode, comprising 1,239 C++ defect codes and associated information such as tutor guidance, solution description, failing test cases, an...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: Liu, Chenyan and Cai, Yufan and Lin, Yun and Huang, Yuhuan and Pei, Yunrui and Jiang, Bo and Yang, Ping and Dong, Jin Song and Mei, Hong
  • Abstract: Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the...
  • Link: Read Paper
  • Labels: code generation, code completion, code model, code model training, source code model
  • Authors: Li, Dong and Yan, Meng and Zhang, Yaosheng and Liu, Zhongxin and Liu, Chao and Zhang, Xiaohong and Chen, Ting and Lo, David
  • Abstract: Large Language Models (LLMs) specialized in code have shown exceptional proficiency across various programming-related tasks, particularly code generation. Nonetheless, due to its nature of pretraining on massive uncritically filtered data, prior studies have shown that code LLMs are prone to generate code with potential vulnerabilities. Existing approaches to mitigate this risk involve crafting data without vulnerability and subsequently retraining or fine-tuning the model. As the number of par...
  • Link: Read Paper
  • Labels: code model, code model security
  • Authors: Zeng, Zhengran and Wang, Yidong and Xie, Rui and Ye, Wei and Zhang, Shikun
  • Abstract: In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail to capture the multi-tasking nature of software development. To address this, we introduce CoderUJB, a new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledg...
  • Link: Read Paper
  • Labels: code generation, program testing, bug detection, benchmark
  • Authors: Wen, Xin-Cheng
  • Abstract: Large Language Model (LLM)-based methods have proven to be effective for many software engineering domains, with a potential for substantial productivity effective for software vulnerability detection. However, due to the limitation of the length of input contexts of LLM, the existing LLM-based methods mainly focus on detecting function-level and leveraging the in-file context information for vulnerability detection (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-pro...
  • Link: Read Paper
  • Labels: code generation, bug detection
  • Authors: Shin, Jiho and Hashtroudi, Sepehr and Hemmati, Hadi and Wang, Song
  • Abstract: Recently, deep learning-based test case generation approaches have been proposed to automate the generation of unit test cases. In this study, we leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level. Specifically, we use CodeT5, a relatively small language model trained on source code data, and fine-tune it on the test generation task. Then, we apply domain adaptation to each target project data to le...
  • Link: Read Paper
  • Labels: program testing, unit testing
  • Authors: Shan, Shiwen and Huo, Yintong and Su, Yuxin and Li, Yichen and Li, Dan and Zheng, Zibin
  • Abstract: Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are easily accessible to most end-users, we conduct a preliminary study to outline the challenges and opp...
  • Link: Read Paper
  • Labels: software maintenance and deployment, software configuration
  • Authors: Eom, Jueon and Jeong, Seyeon and Kwon, Taekyoung
  • Abstract: JavaScript interpreters, crucial for modern web browsers, require an effective fuzzing method to identify security-related bugs. However, the strict grammatical requirements for input present significant challenges. Recent efforts to integrate language models for context- aware mutation in fuzzing are promising but lack the necessary coverage guidance to be fully effective. This paper presents a novel technique called CovRL (Coverage-guided Reinforcement Learning) that combines Large Language Mo...
  • Link: Read Paper
  • Labels: program testing, fuzzing, compiler testing
  • Authors: Wang, Luqiao and Wang, Qiangqiang and Wang, Jiaqi and Zhao, Yutong and Wei, Minjie and Quan, Zhou and Cui, Di and Li, Qingshan
  • Abstract: HECS is an advanced tool designed for Extract Class refactoring by leveraging hypergraph learning to model complex dependencies within large classes. Unlike traditional tools that rely on direct one-to-one dependency graphs, HECS uses intra-class dependency hypergraphs to capture one-to-many relationships. This allows HECS to provide more accurate and relevant refactoring suggestions. The tool constructs hypergraphs for each target class, attributes nodes using a pre-trained code model, and trai...
  • Link: Read Paper
  • Labels: code generation, program transformation
  • Authors: Zhang, Cen and Zheng, Yaowen and Bai, Mingqiang and Li, Yeting and Ma, Wei and Xie, Xiaofei and Li, Yuekang and Sun, Limin and Liu, Yang
  • Abstract: Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly ...
  • Link: Read Paper
  • Labels: program testing, fuzzing
  • Authors: Xue, Zhiyi and Li, Liangguo and Tian, Senyue and Chen, Xiaohong and Li, Pingping and Chen, Liangyu and Jiang, Tingting and Zhang, Min
  • Abstract: FinTech software, crucial for both safety and timely market deployment, presents a compelling case for automated acceptance testing against regulatory business rules. However, the inherent challenges of comprehending unstructured natural language descriptions of these rules and crafting comprehensive test cases demand human intelligence. The emergence of Large Language Models (LLMs) holds promise for automated test case generation, leveraging their natural language processing capabilities. Yet, ...
  • Link: Read Paper
  • Labels: program testing
  • Authors: Zhang, Mengxiao and Tian, Yongqiang and Xu, Zhenyang and Dong, Yiwen and Tan, Shin Hwei and Sun, Chengnian
  • Abstract: Program reduction is a widely used technique to facilitate debugging compilers by automatically minimizing programs that trigger compiler bugs. Existing program reduction techniques are either generic to a wide range of languages (such as Perses and Vulcan) or specifically optimized for one certain language by exploiting language-specific knowledge (e.g., C-Reduce). However, synergistically combining both g...
  • Link: Read Paper
  • Labels: code generation, program transformation, program testing, debugging
  • Authors: Tian, Zhao and Shu, Honglin and Wang, Dong and Cao, Xuejie and Kamei, Yasutaka and Chen, Junjie
  • Abstract: Mutation testing is vital for ensuring software quality. However, the presence of equivalent mutants is known to introduce redundant cost and bias issues, hindering the effectiveness of mutation testing in practical use. Although numerous equivalent mutant detection (EMD) techniques have been proposed, they exhibit limitations due to the scarcity of training data and challenges in generalizing to unseen mutants. Recently, large language models (LLMs) have been extensively adopted in various code...
  • Link: Read Paper
  • Labels: program testing, mutation testing, empirical study
  • Authors: Yu, Zeliang and Wen, Ming and Guo, Xiaochen and Jin, Hai
  • Abstract: As the largest package registry, Node Package Manager (NPM) has become the prime target for various supply chain attacks recently and has been flooded with numerous malicious packages, posing significant security risks to end-users. Learning-based methods have demonstrated promising performance with good adaptability to various types of attacks. However, they suffer from two main limitations. First, they often utilize metadata features or coarse-grained code features extracted at the package lev...
  • Link: Read Paper
  • Labels: static analysis, software composition analysis
  • Authors: Chen, Yang and Jabbarvand, Reyhaneh
  • Abstract: Test flakiness, a non-deterministic behavior of builds irrelevant to code changes, is a major and continuing impediment to deliver- ing reliable software. The very few techniques for the automated repair of test flakiness are specifically crafted to repair either Order- Dependent (OD) or Implementation-Dependent (ID) flakiness. They are also all symbolic approaches, i.e., they leverage program analy- sis to detect and repair known test flakiness patterns and root causes, failing to generalize. T...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: Cui, Di and Wang, Qiangqiang and Zhao, Yutong and Wang, Jiaqi and Wei, Minjie and Hu, Jingzhao and Wang, Luqiao and Li, Qingshan
  • Abstract: Excessively large classes that encapsulate multiple responsibilities are challenging to comprehend and maintain. Addressing this issue, several Extract Class refactoring tools have been proposed, employing a two-phase process: identifying suitable fields or methods for extraction, and implementing the mechanics of refactoring. These tools traditionally generate an intra-class dependency graph to analyze the class structure, applying hard-coded rules based on this graph to unearth refactoring opp...
  • Link: Read Paper
  • Labels: code generation, program transformation
  • Authors: Fan, Zhiyu and Ruan, Haifeng and Mechtaev, Sergey and Roychoudhury, Abhik
  • Abstract: While large language models (LLMs) have shown significant advancements in code generation, their susceptibility to producing incorrect code poses a significant challenge to the adoption of LLM-generated programs. This issue largely stems from the reliance on natural language descriptions as informal oracles in code generation. Current strategies to mitigate this involve selecting the best program from multiple LLM-generated alternatives, judged by criteria like the consistency of their execution...
  • Link: Read Paper
  • Labels: code generation, program synthesis
  • Authors: Wen, Xin-Cheng and Gao, Cuiyun and Gao, Shuzheng and Xiao, Yang and Lyu, Michael R.
  • Abstract: Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities. However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related structural information, as reflected in the following two aspects. First, they tend to fail ...
  • Link: Read Paper
  • Labels: static analysis, bug detection
  • Authors: Xue, Zhipeng and Gao, Zhipeng and Wang, Shaohua and Hu, Xing and Xia, Xin and Li, Shanping
  • Abstract: Code executability plays a vital role in software debugging and testing (e.g., detecting runtime exceptions or assertion violations). However, code execution, especially partial or arbitrary code execution, is a non-trivial task due to missing definitions and complex third-party dependencies. To make partial code (such as code snippets posted on the web or code fragments deep inside complex software projects) executable, the existing study has proposed a machine learning model to predict the und...
  • Link: Read Paper
  • Labels: program testing, debugging
  • Authors: Yin, Xin and Ni, Chao and Wang, Shaohua and Li, Zhenhao and Zeng, Limin and Yang, Xiaohu
  • Abstract: Though many approaches have been proposed for Automated Program Repair (APR) and indeed achieved remarkable performance, they still have limitations in fixing bugs that require analyzing and reasoning about the logic of the buggy program. Recently, large language models (LLMs) instructed by prompt engineering have attracted much attention for their powerful ability to address many kinds of tasks including bug-fixing. However, the quality of the prompt will highly affect the ability of LLMs and m...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: He, Yifeng and Huang, Jiabo and Rong, Yuyang and Guo, Yiwen and Wang, Ethan and Chen, Hao
  • Abstract: The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in generating accurate, complete tests since they were trained on code snippets collected without differentiating between code for testing and for other purposes. In this paper, we present a large-scale dataset, UniTSyn, which can enhance LLMs for Unit Test Synthesis. A...
  • Link: Read Paper
  • Labels: program testing, unit testing, benchmark
  • Authors: Guo, Lianghong and Wang, Yanlin and Shi, Ensheng and Zhong, Wanjun and Zhang, Hongyu and Chen, Jiachi and Zhang, Ruikai and Ma, Yuchi and Zheng, Zibin
  • Abstract: Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code generation task and identify a significant efficiency issue, i.e., continual generation of excess to...
  • Link: Read Paper
  • Labels: code generation, program synthesis