YOLO Manual

1. Overview

This project uses YOLO (You Only Look Once), maintained by Ultralytics.

This manual focuses on practical usage and deployment, not internal implementation.
For full details, refer to the official documentation: https://docs.ultralytics.com/

2. Hardware

A dedicated GPU is strongly recommended, CPU is typically ~10× slower，for example RTX 3050 (used for development in Spring 2026)
Ensure sufficient disk space. During development, generated 4k video data can quickly accumulate and may reach up to ~100 GB.

If no sufficiently powerful GPU is available, use the Aalto HPC cluster (Paniikki) with RTX 3080: https://scicomp.aalto.fi/aalto/paniikki/

3. Development environment Setup

Python Environment (Optional)

Use a virtual environment:

venv (recommended)
Conda (optional), especially on Paniikki.

Notes:

Virtual environments do NOT include CUDA / cuDNN automatically
These must be installed at the system level
Use pip, since most NVIDIA-related instructions are written via pip

4. Installation

Jetson (aarch64) Setup

For Jetson devices, follow the official guide for Jetson: https://docs.ultralytics.com/guides/nvidia-jetson/ (especially Run on JetPack 6.1 section)

Newest PyTorch & Torchvision for Jetson: https://pypi.jetson-ai-lab.io/jp6/cu126

https://forums.developer.nvidia.com/t/pytorch-for-jetson/340102
https://developer.nvidia.com/cuda/gpus

Other Setup:

For other devices (e.g. x86 computers), follow the official guide: https://docs.ultralytics.com/#how-can-i-get-started-with-yolo-installation-and-setup

5. Quickstart

Follow the official quickstart guide: https://docs.ultralytics.com/quickstart/

6. Usage

6.1 Model Types

.pt (PyTorch)
Default format for training and inference
.onnx
Cross-platform and flexible
.engine (TensorRT)
Fastest performance
Must be generated on the target device

Important: TensorRT models are NOT portable, copying .engine files between devices will NOT work

6.2 Predict

Find objects using images or video.

The official guide for predict: https://docs.ultralytics.com/modes/predict/

yolo detect predict model=your_model.pt source=path/to/source

Example:

yolo detect predict model=yolo26n.pt source=video.mp4

6.3 Train

Train the model by using customize dataset.

The official guide for train: https://docs.ultralytics.com/modes/train/

yolo detect train data=your_data.yaml model=your_model epochs=100 imgsz=640

6.4 Export (TensorRT)

Change the format of the model.

The official guide for export: https://docs.ultralytics.com/modes/export/

yolo export model=yolo26n.pt format=engine

6.5 Other Features

Track: https://docs.ultralytics.com/modes/track/
Segment: https://docs.ultralytics.com/tasks/segment/
Solutions: https://docs.ultralytics.com/solutions/

Note: Some solutions are experimental and may require further development

7. Common Issues

7.1 TensorRT Not Found in virtual environment

Error:

ModuleNotFoundError: No module named 'tensorrt'

Cause: TensorRT is installed system-wide but not visible in virtual environment

Temporary solution:

ln -s /usr/lib/python3.10/dist-packages/tensorrt \
      ~/your_env/lib/python3.10/site-packages/tensorrt

7.2 Other Common Problems

Version mismatch (PyTorch / CUDA / TensorRT/ Python)
Architecture mismatch: Jetson uses aarch64 (ARM).
Typos in commands
Out of memory (OOM)

8. Dataset & Training Pipeline

8.1 COCO128 Dataset

Default dataset for yolo:
https://docs.ultralytics.com/datasets/detect/coco128/

Contains 128 common object categories
Used for testing and validation

8.2 Project Dataset

8.2.1 ball_tree

A customize dataset contains ball and trees, mainly designed for nordic environment.

pretrained Yolo26 models with result (nano, small, medium, large, extra large): https://huggingface.co/niy1/

8.3 Data Sources

Google Open Images:
https://docs.ultralytics.com/datasets/detect/open-images-v7/

Download tool: https://github.com/DmitryRyumin/OIDv6

8.4 Annotation

We use Roboflow for dataset preparation.

Notes:

Online training in Roboflow may require payment, train models locally instead

Guide:
https://blog.roboflow.com/getting-started-with-roboflow

9. Best Practices

Performance

Use FP16 when exporting TensorRT models, faster with minimal accuracy loss

Training

Official model training tips: https://docs.ultralytics.com/guides/model-training-tips
Control batch size
Large batch → high memory usage → swapping → slowdown
Do NOT blindly increase imgsz during train, may not improve accuracy
Example performance
~14 hours to train 4000 images (YOLO26x) on RTX 3050

Development

Always name output models clearly (yolo XXX name=”your_name”), avoid relying on default filenames.
Prefer Ultralytics built-in tools for better performance. For example use “yolo export format=engine” instead of “trtexec”,

Workflow Tips

Read official documentation first
Search forums when encountering issues
Allocate sufficient time for annotation (depend on image, time can vary from few seconds to few minutes per image)

Practical Advice

Training on local machines can be noisy (GPU load)
Start from small models, the drone cannot run heavy models.

10. Additional Resources

YOLO paper:
https://doi.org/10.48550/arXiv.1506.02640
Video tutorial:
https://www.youtube.com/watch?v=r0RspiLG260
Extract frames from video:
https://www.geeksforgeeks.org/python/python-program-extract-frames-using-opencv/

11. Future Work

Improve labeling quality
Current method: semi-automatic labeling using SAM3

Known issue:

SAM3 may fail on certain objects (e.g., spruce)