Skip to content

Latest commit

 

History

History
412 lines (246 loc) · 64.8 KB

empirical_study.md

File metadata and controls

412 lines (246 loc) · 64.8 KB

Empirical Study

Agent Design

  • Can GPT-4 Replicate Empirical Software Engineering Research?, (FSE2024)

    • Abstract: Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodolo...
    • Labels: empirical study, agent design
  • Self-Planning Code Generation with Large Language Models, (TOSEM2024)

    • Abstract: Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-plann...
    • Labels: code generation, program synthesis, agent design, planning, empirical study

Code Generation

Code Model

General Coding Task

Hallucination In Reasoning

Program Testing

  • An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation, (TSE2024)

    • Abstract: Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to various aspects of software development, including their suggested use for automated generation of unit tests, but while requiring additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness...
    • Labels: program testing, unit testing, empirical study
  • ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation, (TSE2024)

    • Abstract: Recent advancements in large language models (LLMs) have demonstrated exceptional success in a wide range of general domain tasks, such as question answering and following instructions. Moreover, LLMs have shown potential in various software engineering applications. In this study, we present a systematic comparison of test suites generated by the ChatGPT LLM and the state-of-the-art SBST tool EvoSuite. Our comparison is based on several critical factors, including correctness, readability, code...
    • Labels: program testing, unit testing, empirical study
  • Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction, (TSE2024)

    • Abstract: Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompti...
    • Labels: program testing, bug reproduction, empirical study
  • Evaluating and Improving ChatGPT for Unit Test Generation, (FSE2024)

    • Abstract: Unit testing plays an essential role in detecting bugs in functionally-discrete program units (e.g., methods). Manually writing high-quality unit tests is time-consuming and laborious. Although the traditional techniques are able to generate tests with reasonable coverage, they are shown to exhibit low readability and still cannot be directly adopted by developers in practice. Recent work has shown the large potential of large language models (LLMs) in unit test generation. By being pre-trained ...
    • Labels: program testing, unit testing, empirical study, code generation
  • Large Language Models for Equivalent Mutant Detection: How Far Are We?, (ISSTA2024)

    • Abstract: Mutation testing is vital for ensuring software quality. However, the presence of equivalent mutants is known to introduce redundant cost and bias issues, hindering the effectiveness of mutation testing in practical use. Although numerous equivalent mutant detection (EMD) techniques have been proposed, they exhibit limitations due to the scarcity of training data and challenges in generalizing to unseen mutants. Recently, large language models (LLMs) have been extensively adopted in various code...
    • Labels: program testing, mutation testing, empirical study
  • On the Evaluation of Large Language Models in Unit Test Generation, (ASE2024)

    • Abstract: Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting strategies, leaving the capabilities of advanced open-source LLMs with various prompting settings une...
    • Labels: program testing, unit testing, empirical study
  • Towards Understanding the Effectiveness of Large Language Models on Directed Test Input Generation, (ASE2024)

    • Abstract: Automatic testing has garnered significant attention and success over the past few decades. Techniques such as unit testing and coverage-guided fuzzing have revealed numerous critical software bugs and vulnerabilities. However, a long-standing, formidable challenge for existing techniques is how to achieve higher testing coverage. Constraint-based techniques, such as symbolic execution and concolic testing, have been well-explored and integrated into the existing approaches. With the popularity ...
    • Labels: program testing, unit testing, empirical study

Software Maintenance And Deployment

Static Analysis