From Imitation to Task Performance: Scheduled Reward Weighting for Energy-Efficient Bipedal Locomotion

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listTuchapong Sangthaworn, Bawornsak Sakulkueakulsuk

Publication year2025


Abstract

Bipedal locomotion requires balancing two objectives: natural human-like motion and accurate task performance. Fixed reward weights force practitioners to choose a single trade-off point, demanding extensive manual tuning while often converging to suboptimal solutions. Sophisticated adaptive methods exist but require complex implementations. This paper introduces Scheduled Reward Weighting (SRW), a simple approach that transitions from imitation-focused to taskfocused learning using basic mathematical functions. By starting with high imitation weights to leverage human motion as an exploration guide, then gradually decaying toward task optimization, SRW enables robots to discover novel energy-efficient gaits. This strategy requires minimal modification into existing frameworks while achieving consistent outcomes. Experiments on the Unitree G1 humanoid in simulation demonstrate that SRW achieves low energy consumption (12.5-13.6 kJ), accurate velocity tracking error (RMSE 0.23-0.27 m/s) and remarkably consistent results across training runs with 68% lower variance compared to fixed-weight baselines. These results establish that simple temporal reward dynamics effectively balance competing objectives without the complexity of adaptive methods.


Keywords

No matching items found.


Last updated on 2026-10-02 at 00:00