AL-ViT: Label-Efficient Robusta Coffee-Bean Defect Detection in Thailand Using Active Learning Vision Transformers

Journal article


Authors/Editors


Strategic Research Themes


Publication Details

Author listSirawich Vachmanus, Wimolsiri Pridasawas, Worapan Kusakunniran, Kitti Thamrongaphichartkul, Noppanan Phinklao

PublisherElsevier

Publication year2026

Volume number29

ISSN2667-3053

LanguagesEnglish-United States (EN-US)


View on publisher site


Abstract

In major training and export markets, the coffee bean grading process still relies heavily on manual
labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming,
costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector.
This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that
operationalizes active learning and transformer-based feature extraction within a single, productionoriented
pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL)
query strategies, random sampling, entropy-based selection, Bayesian Active Learning by
Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set
diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed
to balance informativeness and representativeness during sample acquisition. A high-resolution dataset
of 2,098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with
grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across
five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an
F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full
supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection
benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy
sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and
Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that
explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline
already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and
bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random
sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically
deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction
of the labeling cost.


Keywords

Active LearningCoffee GradingVision Transformer


Last updated on 2025-08-12 at 12:00