In this project, a combination of object detection and computer vision methods was utilized to inspect paper clips for defects. For normal clips, additional steps were taken to extract physical characteristics, such as size categorization and angle estimation. Moreover, To establish a reference for accurately converting pixel measurements to actual centimeters, an ArUco marker with ID 0 and a size of 50x50 mm was used.
For object detection and classification part, a pretrained YOLOv11-OBB (Oriented Bounding Box) model provided by Ultralytics was fine-tuned. Since the clips appeared in various orientations, the OBB model was chosen over a standard YOLOv11 to handle rotated bounding boxes and accurately detect clips, even when rotated. Annotations were prepared using CVAT, which supports rotated annotations, while to further enhance the model’s performance, data augmentation techniques like rotation, scaling, flipping, and brightness adjustments were applied.
The pipeline for this project begins by detecting the Aruco marker to establish the scale for real-world measurements. The YOLOv11-OBB model then identifies and classifies the objects in the frame. If a paper clip is classified as normal, additional computer vision techniques are used to determine its dimensions, categorize it into one of three predefined size categories, and calculate its angle relative to the frame. All of these operations are performed in real-time, with results displayed on the video feed.
This project demonstrates how computer vision and deep learning techniques can be integrated to create a robust automated inspection system. With modifications and of course new data collection, this system can also be adapted to inspect different objects, providing a practical solution for quality control in various manufacturing environments where consistency and accuracy are essential.
The PaperClip dataset for this project was captured using a Razer Kiyo X Web Camera Full HD and includes 80 high-resolution images (1920x1080). These images showcase standard paper clips in a variety of configurations, capturing both normal and defective conditions across three different sizes: 24x7 mm, 32x9 mm, and 44x11 mm. Below is an image that illustrates the range of sizes along with the ArUco marker used in the project:
To enhance dataset variability and provide the potential for creating a more robust model, both types of paper clips (normal and defected) were captured in various positions and orientations. Additionally, the defective clips included various defects, such as bending, twisting, or asymmetrical warping. Overall, the dataset consists of 80 images featuring a total of 247 paper clips—127 normal and 120 defective. As a result, each image may contain multiple clips rather than just one. To achieve balanced training, the dataset is divided into two parts: the training set includes 68 images depicting 110 normal clips and 107 defective ones, while the validation set contains 12 images with 17 normal and 17 defective clips, making up approximately 15% of the total dataset. The table below summarizes the distribution of normal and defective clips across the training and validation sets, along with the total number of images in each set.
Normal Clips | Defective Clips | Total Images | |
---|---|---|---|
Training Set | 110 | 107 | 68 |
Validation Set | 17 | 17 | 12 |
In addition to the training and validation sets, a 23-second test video (approximately 690 frames) in .mp4 format, captured at the same resolution, is provided for users who wish to test their model, since obtaining these specific types and sizes of paper clips may be challenging. The video features various paper clips, both normal and defective, with some defects not included in the training or validation datasets. This makes it ideal for evaluating the model’s capabilities and provides a realistic test of its performance in real-world scenarios.
-
Clone the repository:
git clone https://github.com/Dalageo/PaperClipInspection.git
-
Navigate to the cloned directory:
cd PaperClipInspection
-
Open the
Analyzing Paper Clips Using Deep Learning and Computer Vision Techniques.ipynb
using your preferred Jupyter-compatible environment (e.g., Jupyter Notebook, VS Code, or PyCharm) -
Update the
best_yolo
variable to point to the location of thePaperClipInspection-YOLOv11-OBB.pt
model on your local environment. -
Run the cells sequentially to reproduce the results in real-time.
You can run this project in real-time using your camera only if you have the same paper clip type and sizes used in this project. If you don't have those, you can use the .mp4 video provided in the test
folder, and modify the code accordingly. To run YOLOv11 on the GPU, you will need to activate GPU support based on your operating system and install the required dependencies. You can follow this guide provided by PyTorch for detailed instructions.
Special thanks to Ultralytics for providing the pretrained YOLOv11 model for educational purposes, as well as to the CVAT community for their user-friendly and free annotation software. Both were essential to the development of this project.
The provided fine-tuned model, along with the dataset, notebook, and accompanying documentation, are licensed under the AGPL-3.0 license. This license was chosen to promote open collaboration, ensure transparency, and allow others to freely use, modify, and contribute to the work, while maintaining consistency, as the provided pretrained YOLOv11 model is also licensed under AGPL-3.0. Any modifications or improvements must also be shared under the same license, with appropriate acknowledgment.