Skip to content

Latest commit

 

History

History
774 lines (387 loc) · 101 KB

static_analysis.md

File metadata and controls

774 lines (387 loc) · 101 KB

Static Analysis

Syntactic Analysis

Pointer Analysis

Call Graph Analysis

Data-flow Analysis

Type Inference

  • Generative Type Inference for Python, (ASE2023)

    • Abstract: Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. Existing type inference approaches can be generally grouped into three categories, i.e., rule-based, supervised, and cloze- style approaches. The rule-based type inference approaches can ensure the accuracy of predic...
    • Labels: static analysis, type inference
  • PyTy: Repairing Static Type Errors in Python, (ICSE2024)

    • Abstract: Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual typing in practice. This paper presents PyTy, an automated program repair approach targeted at statically dete...
    • Labels: code generation, program repair, static analysis, type inference
  • Risky Dynamic Typing-related Practices in Python: An Empirical Study, (TOSEM2024)

    • Abstract: Python’s dynamic typing nature provides developers with powerful programming abstractions. However, many type-related bugs are accumulated in code bases of Python due to the misuse of dynamic typing. The goal of this article is to aid in the understanding of developers’ high-risk practices toward dynamic typing and the early detection of type-related bugs. We first formulate the rules of six types of risky dynamic typing-related practices (type smells for short) in Python. We then develop a rule...
    • Labels: static analysis, type inference, bug detection, empirical study

Specification Inference

Equivalence Checking

Code Similarity Analysis

Bug Detection

  • A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection, (arXiv2024)

    • Abstract: Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabilities. Although recent work has applied LLMs to vulnerability detection using generic prompting techniq...
    • Labels: static analysis, bug detection, empirical study
  • Assisting Static Analysis with Large Language Models: A ChatGPT Experiment, (FSE2023)

    • Abstract: Recent advances of Large Language Models (LLMs), e.g., ChatGPT, exhibited strong capabilities of comprehending and responding to questions across a variety of domains. Surprisingly, ChatGPT even possesses a strong understanding of program code. In this paper, we investigate where and how LLMs can assist static analysis by asking appropriate questions. In particular, we target a specific bug-finding tool, which produces many false positives from the static analysis. In our evaluation, we find tha...
    • Labels: static analysis, bug detection
  • Beware of the unexpected: Bimodal taint analysis, (ISSTA2023)

    • Abstract: Static analysis is a powerful tool for detecting security vulnerabilities and other programming problems. Global taint tracking, in particular, can spot vulnerabilities arising from complicated data flow across multiple functions. However, precisely identifying which flows are problematic is challenging, and sometimes depends on factors beyond the reach of pure program analysis, such as conventions and informal knowledge. For example, learning that a parameter name of an API function locale ends...
    • Labels: static analysis, bug detection
  • CORE: Resolving Code Quality Issues using LLMs, (FSE2024)

    • Abstract: As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality issues. However, developers need to spend extra efforts to revise their code to improve code quality based on the tool findings. In this work, we investigate the use of (instruction-following) large language models (LLMs) to assist developers in revising code to ...
    • Labels: static analysis, bug detection
  • CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios, (ISSTA2024)

    • Abstract: In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail to capture the multi-tasking nature of software development. To address this, we introduce CoderUJB, a new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledg...
    • Labels: code generation, program testing, bug detection, benchmark
  • Collaboration to Repository-Level Vulnerability Detection, (ISSTA2024)

    • Abstract: Large Language Model (LLM)-based methods have proven to be effective for many software engineering domains, with a potential for substantial productivity effective for software vulnerability detection. However, due to the limitation of the length of input contexts of LLM, the existing LLM-based methods mainly focus on detecting function-level and leveraging the in-file context information for vulnerability detection (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-pro...
    • Labels: code generation, bug detection
  • Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications, (ICSE2025)

    • Abstract: Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing....
    • Labels: static analysis, bug detection, agent design
  • Continuous learning for android malware detection, (USENIXSec2023)

    • Abstract: Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples....
    • Labels: static analysis, bug detection, empirical study
  • Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection, (ICSE2024)

    • Abstract: Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs based on their root causes. In this paper, we propose to combine such causal-based vulnerability det...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • DiverseVul: {A} New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection, (RAID2023)

    • Abstract: We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects. Our new dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits. Our dataset covers 295 more projects than all previous datasets combined.Combining our new dataset with previous datasets, we present an analysis of the...
    • Labels: static analysis, bug detection, benchmark
  • Do Language Models Learn Semantics of Code? {A} Case Study in Vulnerability Detection, (arXiv2023)

    • Abstract: Recently, pretrained language models have shown state-of-the-art performance on the vulnerability detection task. These models are pretrained on a large corpus of source code, then fine-tuned on a smaller supervised vulnerability dataset. Due to the different training objectives and the performance of the models, it is interesting to consider whether the models have learned the semantics of code relevant to vulnerability detection, namely bug semantics, and if so, how the alignment to bug semant...
    • Labels: static analysis, bug detection, empirical study
  • Do you still need a manual smart contract audit?, (arXiv2023)

    • Abstract: We investigate the feasibility of employing large language models (LLMs) for conducting the security audit of smart contracts, a traditionally time-consuming and costly process. Our research focuses on the optimization of prompt engineering for enhanced security analysis, and we evaluate the performance and accuracy of LLMs using a benchmark dataset comprising 52 Decentralized Finance (DeFi) smart contracts that have previously been compromised. Our findings reveal that, when applied to vuln...
    • Labels: static analysis, bug detection
  • Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language Models, (ASE2024)

    • Abstract: Open-source software (OSS) has profoundly transformed the software development paradigm by facilitating effortless code reuse. However, in recent years, there has been an alarming increase in disclosed vulnerabilities within OSS, posing significant security risks to downstream users. Therefore, analyzing existing vulnerabilities and precisely assessing their threats to downstream applications become pivotal. Plenty of efforts have been made recently towards this problem, such as vulnerability re...
    • Labels: static analysis, bug detection
  • Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach, (OOPSLA2024)

    • Abstract: While static analysis is instrumental in uncovering software bugs, its precision in analyzing large and intricate codebases remains challenging. The emerging prowess of Large Language Models (LLMs) offers a promising avenue to address these complexities. In this paper, we present LLift, a pioneering framework that synergizes static analysis and LLMs, with a spotlight on identifying use-before-initialization (UBI) bugs within the Linux kernel. Drawing from our insights into variable usage convent...
    • Labels: static analysis, bug detection
  • Examining Zero-Shot Vulnerability Repair with Large Language Models, (S&P2023)

    • Abstract: Human developers can produce code with cybersecurity bugs. Can emerging ‘smart’ code completion tools help repair those bugs? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI’s Codex and AI21’s Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code. This is difficult due to the numerous ways to phrase key information— both semantically and synta...
    • Labels: static analysis, bug detection, empirical study
  • Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation, (ICSE2023)

    • Abstract: Software bugs claim ≈ 50% of development time and cost the global economy billions of dollars. Once a bug is reported, the assigned developer attempts to identify and understand the source code responsible for the bug and then corrects the code. Over the last five decades, there has been significant research on automatically finding or correcting software bugs. However, there has been little research on automatically explaining the bugs to the developers, which is essential but a highly challen...
    • Labels: static analysis, bug detection
  • E{&}V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification, (Microsoft2023)

    • Abstract: Static analysis, the process of examining code without executing it, is crucial for identifying software issues. Yet, static analysis is hampered by its complexity and the need for customization for different targets. Traditional static analysis tools require extensive human effort and are often limited to specific target programs and programming languages. Recent advancements in Large Language Models (LLMs), such as GPT-4 and Llama, offer new capabilities for software engineering tasks. However...
    • Labels: static analysis, bug detection
  • GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis, (ICSE2024)

    • Abstract: Smart contracts are prone to various vulnerabilities, leading to substantial financial losses over time. Current analysis tools mainly target vulnerabilities with fixed control- or data-flow patterns, such as re-entrancy and integer overflow. However, a recent study on Web3 security bugs revealed that about 80% of these bugs cannot be audited by existing tools due to the lack of domain-specific property description and checking. Given recent advances in Large Language Models (LLMs), it is worth...
    • Labels: static analysis, bug detection
  • Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis, (ICSE2024)

    • Abstract: Smart contracts are prone to various vulnerabilities, leading to substantial financial losses over time. Current analysis tools mainly target vulnerabilities with fixed control- or data-flow patterns, such as re-entrancy and integer overflow. However, a recent study on Web3 security bugs revealed that about 80% of these bugs cannot be audited by existing tools due to the lack of domain-specific property description and checking. Given recent advances in Large Language Models (LLMs), it is worth ...
    • Labels: static analysis, bug detection
  • Harnessing the power of llm to support binary taint analysis, (TOSEM2024)

    • Abstract: This paper proposes LATTE, the first static binary taint analysis that is powered by a large language model (LLM). LATTE is superior to the state of the art (e.g., Emtaint, Arbiter, Karonte) in three aspects. First, LATTE is fully automated while prior static binary taint analyzers need rely on human expertise to manually customize taint propagation rules and vulnerability inspection rules. Second, LATTE is significantly effective in vulnerability detection, demonstrated by our comprehensive eva...
    • Labels: static analysis, bug detection
  • How Far Have We Gone in Vulnerability Detection Using Large Language Models, (arXiv2023)

    • Abstract: As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This bench...
    • Labels: static analysis, bug detection, benchmark
  • Interleaving Static Analysis and LLM Prompting, (SOAP2024)

    • Abstract: This paper presents a new approach for using Large Language Models (LLMs) to improve static program analysis. Specifically, during program analysis, we interleave calls to the static analyzer and queries to the LLM: the prompt used to query the LLM is constructed using intermediate results from the static analysis, and the result from the LLM query is used for subsequent analysis of the program. We apply this novel approach to the problem of error-specification inference of functions in systems ...
    • Labels: static analysis, bug detection
  • Jtrans: Jump-aware transformer for binary code similarity detection, (ISSTA2022)

    • Abstract: Binary code similarity detection (BCSD) has important applications in various fields such as vulnerabilities detection, software component analysis, and reverse engineering. Recent studies have shown that deep neural networks (DNNs) can comprehend instructions or control-flow graphs (CFG) of binary code and support BCSD. In this study, we propose a novel Transformer-based approach, namely jTrans, to learn representations of binary code. It is the first solution that embeds control flow informati...
    • Labels: static analysis, bug detection, code model, code model training, binary code model
  • LLM-Assisted Static Analysis for Detecting Security Vulnerabilities, (arXiv2024)

    • Abstract: Software is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice due to their reliance on human labeled specifications. Large language models (or LLMs) have shown impressive code generation capabilities but they cannot do complex reasoning over code to detect such vulnerabilities especially since this task requires whole-repository analysis. We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis...
    • Labels: static analysis, bug detection
  • LLM-based Resource-Oriented Intention Inference for Static Resource Detection, (ICSE2025)

    • Abstract: Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisition/release APIs and (2) false positives caused by the unsoundness of resource reachability validatio...
    • Labels: static analysis, bug detection
  • LLM4Vuln: {A} Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning, (arXiv2024)

    • Abstract: Large language models (LLMs) have demonstrated significant potential in various tasks, including vulnerability detection. However, current efforts in this area are preliminary, lacking clarity on whether LLMs' vulnerability reasoning capabilities stem from the models themselves or external aids such as knowledge retrieval and tooling support.This paper aims to isolate LLMs' vulnerability reasoning from other capabilities, such as vulnerability knowledge adoption, context information retrieval, a...
    • Labels: static analysis, bug detection, benchmark
  • LLMDFA: Analyzing Dataflow in Code with Large Language Model, (NeurIPS2024)

    • Abstract: Dataflow analysis is a fundamental code analysis technique that identifies dependencies between program values. Traditional approaches typically necessitate successful compilation and expert customization, hindering their applicability and usability for analyzing uncompilable programs with evolving analysis needs in realworld scenarios. This paper presents LLMDFA, an LLM-powered compilation-free and customizable dataflow analysis framework. To address hallucinations for reliable results, we deco...
    • Labels: static analysis, bug detection
  • LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks, (S&P2024)

    • Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigati...
    • Labels: static analysis, bug detection, code generation, program repair, empirical study
  • Large Language Models for Code Analysis: Do LLMs Really Do Their Job?, (USENIXSec2024)

    • Abstract: Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into harnessing LLMs for code analysis purposes. However, the existing body of literature falls short in delivering a systematic evaluation and assessment of LLMs' effectiveness in code analysis, particularly in the context of obfuscated code.This paper seeks to bri...
    • Labels: static analysis, bug detection, empirical study
  • Large language model-powered smart contract vulnerability detection: New perspectives, (arXiv2023)

    • Abstract: This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet intere...
    • Labels: static analysis, bug detection
  • Learning to Detect and Localize Multilingual Bugs, (FSE2024)

    • Abstract: Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e., multilingual bugs). Yet existing tool support for bug detection/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static/dynamic analysis and deep learning (DL) based approaches all face major c...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • Leveraging Large Language Model to Assist Detecting Rust Code Comment Inconsistency, (ASE2024)

    • Abstract: Rust is renowned for its robust memory safety capabilities, yet its distinctive memory management model poses substantial challenges in both writing and understanding programs. Within Rust source code, comments are employed to clearly delineate conditions that might cause panic behavior, thereby warning developers about potential hazards associated with specific operations. Therefore, comments are particularly crucial for documenting Rust's program logic and design. Nevertheless, as modern softw...
    • Labels: static analysis, bug detection
  • Leveraging Semantic Relations in Code and Data to Enhance Taint Analysis of Embedded Systems, (USENIXSec2024)

    • Abstract: IoT devices have significantly impacted our daily lives, and detecting vulnerabilities in embedded systems early on is critical for ensuring their security. Among the existing vulnerability detection techniques for embedded systems, static taint analysis has been proven effective in detecting severe vulnerabilities, such as command injection vulnerabilities, which can cause remote code execution. Nevertheless, static taint analysis is faced with the problem of identifying sources comprehensively...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks, (ICSE2024)

    • Abstract: Vulnerability analysis is crucial for software security. Inspired by the success of pre-trained models on software engineering tasks, this work focuses on using pre-training techniques to enhance the understanding of vulnerable code and boost vulnerability analysis. The code understanding ability of a pre-trained model is highly related to its pre-training objectives. The semantic structure, e.g., control and data dependencies, of code is important for vulnerability analysis. However, existing p...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?, (EMNLP2024)

    • Abstract: The latest advancements in large language models (LLMs) have sparked interest in their potential for software vulnerability detection. However, there is currently a lack of research specifically focused on vulnerabilities in the PHP language, and challenges in data sampling and processing persist, hindering the model’s ability to effectively capture the characteristics of specific vulnerabilities. In this paper, we present RealVul, the first LLM-based framework designed for PHP vulnerability det...
    • Labels: static analysis, bug detection
  • Risky Dynamic Typing-related Practices in Python: An Empirical Study, (TOSEM2024)

    • Abstract: Python’s dynamic typing nature provides developers with powerful programming abstractions. However, many type-related bugs are accumulated in code bases of Python due to the misuse of dynamic typing. The goal of this article is to aid in the understanding of developers’ high-risk practices toward dynamic typing and the early detection of type-related bugs. We first formulate the rules of six types of risky dynamic typing-related practices (type smells for short) in Python. We then develop a rule...
    • Labels: static analysis, type inference, bug detection, empirical study
  • SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection, (ISSTA2024)

    • Abstract: Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities. However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related structural information, as reflected in the following two aspects. First, they tend to fail ...
    • Labels: static analysis, bug detection
  • Sanitizing Large Language Models in Bug Detection with Data-Flow, (EMNLP2024)

    • Abstract: Large language models (LLMs) show potential in code reasoning tasks, facilitating the customization of detecting bugs in software development. However, the hallucination effect can significantly compromise the reliability of bug reports. This work formulates a new schema of bug detection and presents a novel sanitization technique that detects false positives for hallucination mitigation. Our key idea is to enforce LLMs to emit data-flow paths in few-shot chain-of-thought prompting and validate ...
    • Labels: static analysis, bug detection, data-flow analysis
  • Semantic Sleuth: Identifying Ponzi Contracts via Large Language Models, (ASE2024)

    • Abstract: Smart contracts, self-executing agreements directly encoded in code, are fundamental to blockchain technology, especially in decentralized finance (DeFi) and Web3. However, the rise of Ponzi schemes in smart contracts poses significant risks, leading to substantial financial losses and eroding trust in blockchain systems. Existing detection methods, such as PonziGuard, depend on large amounts of labeled data and struggle to identify unseen Ponzi schemes, limiting their reliability and generaliza...
    • Labels: static analysis, bug detection
  • SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models, (arXiv2023)

    • Abstract: We introduce SkipAnalyzer, a large language model (LLM)-powered tool for static code analysis. SkipAnalyzer has three components: 1) an LLM-based static bug detector that scans source code and reports specific types of bugs, 2) an LLM-based false-positive filter that can identify false-positive bugs in the results of static bug detectors (e.g., the result of step 1) to improve detection accuracy, and 3) an LLM-based patch generator that can generate patches for the detected bugs above. As a proo...
    • Labels: static analysis, bug detection, agent design
  • Smartinv: Multimodal learning for smart contract invariant inference, (S&P2024)

    • Abstract: Smart contracts are software programs that enable diverse business activities on the blockchain. Recent research has identified new classes of "machine un-auditable" bugs that arise from source code not meeting underlying transaction contexts. Existing detection methods require human understanding of underlying transaction logic and manual reasoning across different sources of context (i.e., modalities), such as code and natural language specifying the expected transaction behavior.To automate t...
    • Labels: static analysis, bug detection
  • Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs, (arXiv2024)

    • Abstract: Currently, deep learning successfully applies to code vulnerability detection by learning from code sequences or property graphs. However, sequence-based methods often overlook essential code attributes such as syntax, control flow, and data dependencies, whereas graph-based approaches might underestimate the semantics of code and face challenges in capturing long-distance contextual information.To address this gap, we propose Vul-LMGNN, a unified model that combines pre-trained code language mo...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • Stanceformer: Target-Aware Transformer for Stance Detection, (EMNLP2024)

    • Abstract: The task of Stance Detection involves discerning the stance expressed in a text towards a specific subject or target. Prior works have relied on existing transformer models that lack the capability to prioritize targets effectively. Consequently, these models yield similar performance regardless of whether we utilize or disregard target information, undermining the task’s significance. To address this challenge, we introduce Stanceformer, a target-aware transformer model that incorporates enhanc...
    • Labels: static analysis, bug detection, empirical study
  • The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification, (FSE2023)

    • Abstract: The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-train...
    • Labels: static analysis, bug detection, code model, code model training, source code model, empirical study
  • The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks, (Forge2024)

    • Abstract: Binary code similarity detection(BCSD), as a fundamental technique in software security, has various applications, including malware family detection, known vulnerability detection and code plagiarism detection. Recent deep learning-based BCSD approaches have demonstrated promising performance. However, they face two significant challenges that limit detection performance. First, most approaches that use sequence networks (like RNN and Transformer) utilize coarse-grained tokenization methods, wh...
    • Labels: static analysis, bug detection, empirical study
  • Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection, (arXiv2024)

    • Abstract: According to our survey of the machine learning for vulnerability detection (ML4VD) literature published in the top Software Engineering conferences, every paper in the past 5 years defines ML4VD as a binary classification problem: Given a function, does it contain a security flaw?In this paper, we ask whether this decision can really be made without further context and study both vulnerable and non-vulnerable functions in the most popular ML4VD datasets. A function is vulnerable if it was invol...
    • Labels: static analysis, bug detection, empirical study
  • Twin Graph-Based Anomaly Detection via Attentive Multi-Modal Learning for Microservice System, (ASE2023)

    • Abstract: Microservice architecture has sprung up over recent years for managing enterprise applications, due to its ability to independently deploy and scale services. Despite its benefits, ensuring the reliability and safety of a microservice system remains highly challenging. Existing anomaly detection algorithms based on a single data modality (i.e., metrics, logs, or traces) fail to fully account for the complex correlations and interactions between different modalities, leading to false negatives an...
    • Labels: static analysis, bug detection
  • Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities, (arXiv2023)

    • Abstract: While automated vulnerability detection techniques have made promising progress in detecting security vulnerabilities, their scalability and applicability remain challenging. The remarkable performance of Large Language Models (LLMs), such as GPT-4 and CodeLlama, on code-related tasks has prompted recent works to explore if LLMs can be used to detect vulnerabilities. In this paper, we perform a more comprehensive study by concurrently examining a higher number of datasets, languages and LLMs, an...
    • Labels: static analysis, bug detection, empirical study
  • VALAR: Streamlining Alarm Ranking in Static Analysis with Value-Flow Assisted Active Learning, (ASE2023)

    • Abstract: Static analyzers play a critical role in program defects and security vulnerabilities detection. Despite their importance, the widespread adoption of static analysis techniques in industrial development faces numerous obstacles, among which the high rate of false alarms constitutes a significant one. To address this issue, we propose a novel approach called Valar, which performs alarm ranking for advanced value-flow analysis using the active learning technique. Active learning algorithms minimiz...
    • Labels: static analysis, bug detection
  • VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses, (ICSE2024)

    • Abstract: Accompanying the successes of learning-based defensive software vulnerability analyses is the lack of large and quality sets of labeled vulnerable program samples, which impedes further advancement of those defenses. Existing automated sample generation approaches have shown potentials yet still fall short of practical expectations due to the high noise in the generated samples. This paper proposes VGX, a new technique aimed for large-scale generation of high-quality vulnerability datasets. Give...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • VULGEN: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning, (ICSE2023)

    • Abstract: Building new, powerful data-driven defenses against prevalent software vulnerabilities needs sizable, quality vulnerability datasets, so does large-scale benchmarking of existing defense solutions. Automatic data generation would promisingly meet the need, yet there is little work aimed to generate much-needed quality vulnerable samples. Meanwhile, existing similar and adaptable techniques suffer critical limitations for that purpose. In this paper, we present VULGEN, the first injection-based v...
    • Labels: static analysis, bug detection, benchmark
  • VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection, (arXiv2024)

    • Abstract: Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, developers routinely engage with program analysis to detect vulnerabilities that span multiple functions wi...
    • Labels: static analysis, bug detection, benchmark
  • VulExplainer: A Transformer-Based Hierarchical Distillation for Explaining Vulnerability Types, (TSE2023)

    • Abstract: Deep learning-based vulnerability prediction approaches are proposed to help under-resourced security practitioners to detect vulnerable functions. However, security practitioners still do not know what type of vulnerabilities correspond to a given prediction (aka CWE-ID). Thus, a novel approach to explain the type of vulnerabilities for a given prediction is imperative. In this paper, we propose <italic>VulExplainer</italic>, an approach to explain the type of vulnerabilities. We re...
    • Labels: static analysis, bug detection
  • Vulnerability Detection with Code Language Models: How Far Are We?, (ICSE2025)

    • Abstract: In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection scenarios. Additionally, the evaluation methods used with these datasets are not representat...
    • Labels: static analysis, bug detection, benchmark
  • Where is it? Tracing the Vulnerability-relevant Files from Vulnerability Reports, (ICSE2024)

    • Abstract: With the widely usage of open-source software, supply-chain-based vulnerability attacks, including SolarWind and Log4Shell, have posed significant risks to software security. Currently, people rely on vulnerability advisory databases or commercial software bill of materials (SBOM) to defend against potential risks. Unfortunately, these datasets do not provide finer-grained file-level vulnerability information, compromising their effectiveness. Previous works have not adequately addressed this is...
    • Labels: static analysis, bug detection
  • Who Judges the Judge: An Empirical Study on Online Judge Tests, (ISSTA2023)

    • Abstract: Online Judge platforms play a pivotal role in education, competitive programming, recruitment, career training, and large language model training. They rely on predefined test suites to judge the correctness of submitted solutions. It is therefore important that the solution judgement is reliable and free from potentially misleading false positives (i.e., incorrect solutions that are judged as correct). In this paper, we conduct an empirical study of 939 coding problems with 541,552 solutions, a...
    • Labels: static analysis, bug detection
  • Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection, (arXiv2024)

    • Abstract: Software, while beneficial, poses potential cybersecurity risks due to inherent vulnerabilities. Detecting these vulnerabilities is crucial, and deep learning has shown promise as an effective tool for this task due to its ability to perform well without extensive feature engineering. However, a challenge in deploying deep learning for vulnerability detection is the limited availability of training data. Recent research highlights the deep learning efficacy in diverse tasks. This success is attr...
    • Labels: static analysis, bug detection, code model, code model training, source code model
  • iSMELL: Assembling LLMs with Expert Toolsets for Code Smell Detection and Refactoring, (ASE2024)

    • Abstract: Detecting and refactoring code smells is challenging, laborious, and sustaining. Although large language models have demonstrated potential in identifying various types of code smells, they also have limitations such as input-output token restrictions, difficulty in accessing repository-level knowledge, and performing dynamic source code analysis. Existing learning-based methods or commercial expert toolsets have advantages in handling complex smells. They can analyze project structures and cont...
    • Labels: code generation, program repair, static analysis, bug detection

Program Verification

  • Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus, (POPL2025)

    • Abstract: Recently, the rise of code-centric large language models (LLMs) appears to have reshaped the software engineering world with low-barrier tools like Copilot that can generate code easily. However, there is no correctness guarantee for the code generated by LLMs, which suffer from the hallucination problem, and their output is fraught with risks. Besides, the end-to-end process from specification to code through LLMs is a non-transparent and uncontrolled black box. This opacity makes it difficult ...
    • Labels: code generation, program transformation, static analysis, program verification
  • Baldur: Whole-Proof Generation and Repair with Large Language Models, (FSE2023)

    • Abstract: Formally verifying software is a highly desirable but labor-intensive task. Recent work has developed methods to automate formal verification using proof assistants, such as Coq and Isabelle/HOL, e.g., by training a model to predict one proof step at a time and using that model to search through the space of possible proofs. This paper introduces a new method to automate formal verification: We use large language models, trained on natural language and code and fine-tuned on proofs, to generat...
    • Labels: static analysis, program verification
  • Can ChatGPT support software verification?, (FASE2024)

    • Abstract: Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification....
    • Labels: static analysis, program verification
  • Can large language models reason about program invariants?, (ICML2023)

    • Abstract: Identifying invariants is an important program analysis task with applications towards program understanding, bug finding, vulnerability analysis, and formal verification. Existing tools for identifying program invariants rely on dynamic analysis, requiring traces collected from multiple executions in order to produce reliable invariants. We study the application of large language models to invariant prediction, finding that models trained on source code and fine-tuned for invariant generation c...
    • Labels: static analysis, program verification
  • CoqPilot, a plugin for LLM-based generation of proofs, (ASE2024)

    • Abstract: We present CoqPilot, a VS Code extension designed to help automate writing of Coq proofs. The plugin collects the parts of proofs marked with the admit tactic in a Coq file, i.e., proof holes, and combines LLMs along with non-machine-learning methods to generate proof candidates for the holes. Then, CoqPilot checks if each proof candidate solves the given subgoal and, if successful, replaces the hole with it. The focus of CoqPilot is twofold. Firstly, we want to allow users to seamlessly combine...
    • Labels: code generation, program synthesis, static analysis, program verification
  • Enchanting program specification synthesis by large language models using static analysis and program verification, (CAV2024)

    • Abstract: Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific...
    • Labels: static analysis, program verification, specification inference
  • Finding inductive loop invariants using large language models, (arXiv2023)

    • Abstract: Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding in...
    • Labels: static analysis, program verification
  • Hypothesis search: Inductive reasoning with language models, (ICLR2024)

    • Abstract: Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose...
    • Labels: code generation, program synthesis, static analysis, program verification
  • LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference, (ASE2024)

    • Abstract: Loop invariant inference, a key component in program verification, is a challenging task due to the inherent undecidability and complex loop behaviors in practice. Recently, machine learning based techniques have demonstrated impressive performance in generating loop invariants automatically. However, these methods highly rely on the labeled training data, and are intrinsically random and uncertain, leading to unstable performance. In this paper, we investigate a synergy of large language models...
    • Labels: static analysis, program verification
  • LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling, (ASE2024)

    • Abstract: We investigate a modification of the classical Bounded Model Checking (BMC) procedure that does not handle loops through unrolling but via modifications to the control flow graph (CFG). A portion of the CFG representing a loop is replaced by a node asserting invariants of the loop. We generate these invariants using Large Language Models (LLMs) and use a first-order theorem prover to ensure the correctness of the generated statements. We thus transform programs to loop-free variants in a sound m...
    • Labels: static analysis, program verification
  • Lemur: Integrating large language models in automated program verification, (ICLR2024)

    • Abstract: The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that typically demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of derivation rules and prove its soundness. We instantiate the...
    • Labels: static analysis, program verification
  • QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning, (ICSE2025)

    • Abstract: Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis...
    • Labels: static analysis, program verification
  • Ranking llm-generated loop invariants for program verification, (EMNLP2023)

    • Abstract: Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results ...
    • Labels: static analysis, program verification, prompt strategy, sampling and ranking
  • Towards AI-Assisted Synthesis of Verified Dafny Methods, (FSE2024)

    • Abstract: Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs’ specifications as well as code so that the code can be proved correct with respect to the specifications. Unfortunately, existing large language models show a severe lack of proficiency in ver...
    • Labels: code generation, program synthesis, static analysis, program verification
  • Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation, (NeurIPS2024)

    • Abstract: Program verification is vital for ensuring software reliability, especially in the context of increasingly complex systems. Loop invariants, remaining true before and after each iteration of loops, are crucial for this verification process. Traditional provers and machine learning based methods for generating loop invariants often require expert intervention or extensive labeled data, and typically only handle numerical property verification. These methods struggle with programs involving comple...
    • Labels: static analysis, program verification, benchmark
  • VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners, (arXiv2024)

    • Abstract: Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based approaches can theoretically produce correct transpilations that maintain input-output equivalence to th...
    • Labels: code generation, program transformation, static analysis, program verification
  • Verified Code Transpilation with LLMs, (NeurIPS2024)

    • Abstract: Domain-specific languages (DSLs) are integral to various software workflows. Such languages offer domain-specific optimizations and abstractions that improve code readability and maintainability. However, leveraging these languages requires developers to rewrite existing code using the specific DSL's API. While large language models (LLMs) have shown some success in automatic code transpilation, none of them provide any functional correctness guarantees on the transpiled code. Another approach f...
    • Labels: code generation, program synthesis, static analysis, program verification

Program Optimization

Code Summarization

Code Search

  • Natural Is the Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models, (FSE2024)

    • Abstract: Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are often heavy in computational complexity, and quadratically with the length of the input code sequence. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input program should not rely on the attention patterns of an LL...
    • Labels: static analysis, code search, code summarization, code model, code model training, source code model
  • On the Effectiveness of Transfer Learning for Code Search, (TSE2023)

    • Abstract: The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to and improve code search. To this end, we pre-train a BERT-based model on combinations of natural language and source code data and fine-tune it on pairs of StackOverflow question titles and code answers. Our results show that the pre-trained models consistently ...
    • Labels: static analysis, code search, code model, code model training, source code model
  • Self-Supervised Query Reformulation for Code Search, (FSE2023)

    • Abstract: Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code...
    • Labels: static analysis, code search
  • Survey of Code Search Based on Deep Learning, (TOSEM2024)

    • Abstract: Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given natural language query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have b...
    • Labels: survey, static analysis, code search
  • Virtual Compiler Is All You Need For Assembly Code Search, (ACL2024)

    • Abstract: Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs.Despite its significance, this critical task is impeded by the complexities involved in building high-quality datasets. This paper explores training a Large Language Model (LLM) to emulate a general compiler. By leveraging Ubuntu packages to compile a dataset of 20 billion tokens, we further continue pre-train CodeLlam...
    • Labels: code generation, program transformation, static analysis, code search, code model, code model training, source code model

Software Composition Analysis