-
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching, (ICSE2024)
- Abstract: While third-party libraries (TPLs) are extensively reused to enhance productivity during software development, they can also introduce potential security risks such as vulnerability propagation. Software composition analysis (SCA), proposed to identify reused TPLs for reducing such risks, has become an essential procedure within modern DevSecOps. As one of the mainstream SCA techniques, binary-to-source SCA identifies the third-party source projects contained in binary files via binary source co...
- Labels: static analysis, software composition analysis, code model, code model training, binary code model
-
Can Large Language Models Comprehend Code Stylometry?, (ASE2024)
- Abstract: Code Authorship Attribution (CAA) has several applications such as copyright disputes, plagiarism detection and criminal prosecution. Existing studies mainly focused on CAA by proposing machine learning (ML) and Deep Learning (DL) based techniques. The main limitations of ML-based techniques are (a) manual feature engineering is required to train these models and (b) they are vulnerable to adversarial attack. In this study, we initially fine-tune five Large Language Models (LLMs) for CAA and eva...
- Labels: static analysis, software composition analysis
-
Maltracker: A Fine-Grained NPM Malware Tracker Copiloted by LLM-Enhanced Dataset, (ISSTA2024)
- Abstract: As the largest package registry, Node Package Manager (NPM) has become the prime target for various supply chain attacks recently and has been flooded with numerous malicious packages, posing significant security risks to end-users. Learning-based methods have demonstrated promising performance with good adaptability to various types of attacks. However, they suffer from two main limitations. First, they often utilize metadata features or coarse-grained code features extracted at the package lev...
- Labels: static analysis, software composition analysis