Skip to content

Latest commit

 

History

History
72 lines (52 loc) · 9.07 KB

README.md

File metadata and controls

72 lines (52 loc) · 9.07 KB

TOSEM2024

Number of papers: 10

  • Authors: Liu, Puzhuo and Sun, Chengnian and Zheng, Yaowen and Feng, Xuan and Qin, Chuan and Wang, Yuncheng and Li, Zhi and Sun, Limin
  • Abstract: This paper proposes LATTE, the first static binary taint analysis that is powered by a large language model (LLM). LATTE is superior to the state of the art (e.g., Emtaint, Arbiter, Karonte) in three aspects. First, LATTE is fully automated while prior static binary taint analyzers need rely on human expertise to manually customize taint propagation rules and vulnerability inspection rules. Second, LATTE is significantly effective in vulnerability detection, demonstrated by our comprehensive eva...
  • Link: Read Paper
  • Labels: static analysis, bug detection
  • Authors: Zirak, Armin and Hemmati, Hadi
  • Abstract: Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE that combine text-to-text transformers with software-specific techniques are outperforming alternatives, these days. However, in most APR studies, the train and test sets are chosen from the same set of projects...
  • Link: Read Paper
  • Labels: code generation, program repair
  • Authors: Feng, Xiaoning and Han, Xiaohong and Chen, Simin and Yang, Wei
  • Abstract: Large Language Models (LLMs) have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of LLMs, which is of paramount importance due to often vast generation demands and real-time requirements, has surprisingly received little attention. In this article, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-a...
  • Link: Read Paper
  • Labels: code model, code model robustness
  • Authors: Huang, Qing and Luo, Zhiwen and Xing, Zhenchang and Zeng, Jinshan and Chen, Jieshan and Xu, Xiwei and Chen, Yong
  • Abstract: Dataflow graphs (DFGs) capture definitions (defs) and uses across program blocks, which is a fundamental program representation for program analysis, testing and maintenance. However, dynamically typed programming languages like Python present implicit dataflow issues that make it challenging to determine def-use flow information at compile time. Static analysis methods like Soot and WALA are inadequate for handling these issues, and manually enumerating comprehensive heuristic rules is impracti...
  • Link: Read Paper
  • Labels: static analysis, data-flow analysis
  • Authors: Chen, Zhifei and Chen, Lin and Yang, Yibiao and Feng, Qiong and Li, Xuansong and Song, Wei
  • Abstract: Python’s dynamic typing nature provides developers with powerful programming abstractions. However, many type-related bugs are accumulated in code bases of Python due to the misuse of dynamic typing. The goal of this article is to aid in the understanding of developers’ high-risk practices toward dynamic typing and the early detection of type-related bugs. We first formulate the rules of six types of risky dynamic typing-related practices (type smells for short) in Python. We then develop a rule...
  • Link: Read Paper
  • Labels: static analysis, type inference, bug detection, empirical study
  • Authors: Dong, Yihong and Jiang, Xue and Jin, Zhi and Li, Ge
  • Abstract: Although large language models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, (1) Multip...
  • Link: Read Paper
  • Labels: code generation, program synthesis
  • Authors: Jiang, Xue and Dong, Yihong and Wang, Lecheng and Fang, Zheng and Shang, Qiwei and Li, Ge and Jin, Zhi and Jiao, Wenpin
  • Abstract: Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-plann...
  • Link: Read Paper
  • Labels: code generation, program synthesis, agent design, planning, empirical study
  • Authors: Xie, Yutao and Lin, Jiayi and Dong, Hande and Zhang, Lei and Wu, Zhonghai
  • Abstract: Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given natural language query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have b...
  • Link: Read Paper
  • Labels: survey, static analysis, code search
  • Authors: Ma, Wei and Liu, Shangqing and Zhao, Mengjie and Xie, Xiaofei and Wang, Wenhang and Hu, Qiang and Zhang, Jie and Liu, Yang
  • Abstract: Code models have made significant advancements in code intelligence by encoding knowledge about programming languages. While previous studies have explored the capabilities of these models in learning code syntax, there has been limited investigation on their ability to understand code semantics. Additionally, existing analyses assume that the number of edges between nodes at the abstract syntax tree (AST) is related to syntax distance, and also often require transforming the high-dimension...
  • Link: Read Paper
  • Labels: static analysis, pointer analysis, data-flow analysis, empirical study
  • Authors: Fu, Michael and Nguyen, Van and Tantithamthavorn, Chakkrit and Phung, Dinh and Le, Trung
  • Abstract: Recently, automated vulnerability repair approaches have been widely adopted to combat increasing software security issues. In particular, transformer-based encoder-decoder models achieve competitive results. Whereas vulnerable programs may only consist of a few vulnerable code areas that need repair, existing AVR approaches lack a mechanism guiding their model to pay more attention to vulnerable code areas during repair generation. In this article, we propose a novel vulnerability repair framew...
  • Link: Read Paper
  • Labels: code generation, program repair