FSE2023

Number of papers: 13

An Extensive Study on Adversarial Attack against Pre-trained Models of Code

Authors: Du, Xiaohu and Wen, Ming and Wei, Zichao and Wang, Shangwen and Jin, Hai
Abstract: Transformer-based pre-trained models of code (PTMC) have been widely utilized and have achieved state-of-the-art performance in many mission-critical applications. However, they can be vulnerable to adversarial attacks through identifier substitution or coding style transformation, which can significantly degrade accuracy and may further incur security concerns. Although several approaches have been proposed to generate adversarial examples for PTMC, the effectiveness and efficiency of such appr...
Link: Read Paper
Labels: code model, code model security, empirical study

Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

Authors: Li, Haonan and Hao, Yu and Zhai, Yizhuo and Qian, Zhiyun
Abstract: Recent advances of Large Language Models (LLMs), e.g., ChatGPT, exhibited strong capabilities of comprehending and responding to questions across a variety of domains. Surprisingly, ChatGPT even possesses a strong understanding of program code. In this paper, we investigate where and how LLMs can assist static analysis by asking appropriate questions. In particular, we target a specific bug-finding tool, which produces many false positives from the static analysis. In our evaluation, we find tha...
Link: Read Paper
Labels: static analysis, bug detection

Baldur: Whole-Proof Generation and Repair with Large Language Models

Authors: First, Emily and Rabe, Markus N. and Ringer, Talia and Brun, Yuriy
Abstract: Formally verifying software is a highly desirable but labor-intensive task. Recent work has developed methods to automate formal verification using proof assistants, such as Coq and Isabelle/HOL, e.g., by training a model to predict one proof step at a time and using that model to search through the space of possible proofs. This paper introduces a new method to automate formal verification: We use large language models, trained on natural language and code and fine-tuned on proofs, to generat...
Link: Read Paper
Labels: static analysis, program verification

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Authors: Wei, Yuxiang and Xia, Chunqiu Steven and Zhang, Lingming
Abstract: During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful “copilots” in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the tar...
Link: Read Paper
Labels: code generation, program repair

Evaluating Transfer Learning for Simplifying GitHub READMEs

Authors: Gao, Haoyu and Treude, Christoph and Zahedi, Mansooreh
Abstract: Software documentation captures detailed knowledge about a software product, e.g., code, technologies, and design. It plays an important role in the coordination of development teams and in conveying ideas to various stakeholders. However, software documentation can be hard to comprehend if it is written with jargon and complicated sentence structure. In this study, we explored the potential of text simplification techniques in the domain of software engineering to automatically simplify GitHub ...
Link: Read Paper
Labels: software maintenance and deployment, documentation generation

Grace: Language Models Meet Code Edits

Authors: Gupta, Priyanshu and Khare, Avishree and Bajpai, Yasharth and Chakraborty, Saikat and Gulwani, Sumit and Kanade, Aditya and Radhakrishna, Arjun and Soares, Gustavo and Tiwari, Ashish
Abstract: Developers spend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) with the knowledge of relevant prior associated edits, which we call the Grace (Gene...
Link: Read Paper
Labels: code generation, code completion, code model, code model training, source code model

InferFix: End-to-End Program Repair with LLMs

Authors: Jin, Matthew and Shahriar, Syed and Tufano, Michele and Shi, Xin and Lu, Shuai and Sundaresan, Neel and Svyatkovskiy, Alexey
Abstract: Software development life cycle is profoundly influenced by bugs; their introduction, identification, and eventual resolution account for a significant portion of software development cost. This has motivated software engineering researchers and practitioners to propose different approaches for automating the identification and repair of software defects. Large Language Models (LLMs) have been adapted to the program repair task through few-shot demonstration learning and instruction prompting, t...
Link: Read Paper
Labels: code generation, program repair

LLM-Based Code Generation Method for Golang Compiler Testing

Authors: Gu, Qiuhan
Abstract: Modern optimizing compilers are among the most complex software systems humans build. One way to identify subtle compiler bugs is fuzzing. Both the quantity and the quality of testcases are crucial to the performance of fuzzing. Traditional testcase-generation methods, such as Csmith and YARPGen, have been proven successful at discovering compiler bugs. However, such generated testcases have limited coverage and quantity. In this paper, we present a code generation method for compiler testing ba...
Link: Read Paper
Labels: program testing, fuzzing, compiler testing

Log Parsing with Generalization Ability under New Log Types

Authors: Yu, Siyu and Wu, Yifan and Li, Zhijing and He, Pinjia and Chen, Ningjiang and Liu, Changjian
Abstract: Log parsing, which converts semi-structured logs into structured logs, is the first step for automated log analysis. Existing parsers are still unsatisfactory in real-world systems due to new log types in new-coming logs. In practice, available logs collected during system runtime often do not contain all the possible log types of a system because log types related to infrequently activated system states are unlikely to be recorded and new log types are frequently introduced with system update...
Link: Read Paper
Labels: software maintenance and deployment, system log analysis

Multilingual Code Co-evolution using Large Language Models

Authors: Zhang, Jiyang and Nie, Pengyu and Li, Junyi Jessy and Gligoric, Milos
Abstract: Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers) or machine learning models for translating code from one language to another provides limited value...
Link: Read Paper
Labels: code generation, program transformation, code model, code model training, source code model

Self-Supervised Query Reformulation for Code Search

Authors: Mao, Yuetian and Wan, Chengcheng and Jiang, Yuze and Gu, Xiaodong
Abstract: Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code...
Link: Read Paper
Labels: static analysis, code search

The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

Authors: Grishina, Anastasiia and Hort, Max and Moonen, Leon
Abstract: The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-train...
Link: Read Paper
Labels: static analysis, bug detection, code model, code model training, source code model, empirical study

Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study

Authors: Wei, Xiaokai and Gonugondla, Sujan Kumar and Wang, Shiqi and Ahmad, Wasi and Ray, Baishakhi and Qian, Haifeng and Li, Xiaopeng and Kumar, Varun and Wang, Zijian and Tian, Yuchen and Sun, Qing and Athiwaratkun, Ben and Shang, Mingyue and Ramanathan, Murali Krishna and Bhatia, Parminder and Xiang, Bing
Abstract: ML-powered code generation aims to assist developers to write code in a more productive manner by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have pushed the boundary of code generation and achieved impressive performance. However, the huge number of model parameters poses a significant challenge to their adoption in a typical software development environment, where a developer might use a standard laptop or mid-size ser...
Link: Read Paper
Labels: code generation, program synthesis, empirical study

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FSE2023

An Extensive Study on Adversarial Attack against Pre-trained Models of Code

Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

Baldur: Whole-Proof Generation and Repair with Large Language Models

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Evaluating Transfer Learning for Simplifying GitHub READMEs

Grace: Language Models Meet Code Edits

InferFix: End-to-End Program Repair with LLMs

LLM-Based Code Generation Method for Golang Compiler Testing

Log Parsing with Generalization Ability under New Log Types

Multilingual Code Co-evolution using Large Language Models

Self-Supervised Query Reformulation for Code Search

The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study

Files

README.md

Latest commit

History

README.md

File metadata and controls

FSE2023