Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty
Abstract
Reinforcement learning policies are improved by using action Jacobian penalty to eliminate unrealistic high-frequency signals, with a new Linear Policy Net architecture reducing computational overhead while enabling faster convergence and efficient inference for motion imitation tasks.
Reinforcement learning provides a framework for learning control policies that can reproduce diverse motions for simulated characters. However, such policies often exploit unnatural high-frequency signals that are unachievable by humans or physical robots, making them poor representations of real-world behaviors. Existing work addresses this issue by adding a reward term that penalizes a large change in actions over time. This term often requires substantial tuning efforts. We propose to use the action Jacobian penalty, which penalizes changes in action with respect to the changes in simulated state directly through auto differentiation. This effectively eliminates unrealistic high-frequency control signals without task specific tuning. While effective, the action Jacobian penalty introduces significant computational overhead when used with traditional fully connected neural network architectures. To mitigate this, we introduce a new architecture called a Linear Policy Net (LPN) that significantly reduces the computational burden for calculating the action Jacobian penalty during training. In addition, a LPN requires no parameter tuning, exhibits faster learning convergence compared to baseline methods, and can be more efficiently queried during inference time compared to a fully connected neural network. We demonstrate that a Linear Policy Net, combined with the action Jacobian penalty, is able to learn policies that generate smooth signals while solving a number of motion imitation tasks with different characteristics, including dynamic motions such as a backflip and various challenging parkour skills. Finally, we apply this approach to create policies for dynamic motions on a physical quadrupedal robot equipped with an arm.
Community
Introduces an action Jacobian penalty with a Linear Policy Net to train smooth, motion-imitation policies and quadruped-arm control, reducing tuning and computational overhead.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Flow Policy Gradients for Robot Control (2026)
- Efficiently Learning Robust Torque-Based Locomotion Through Reinforcement With Model-Based Supervision (2026)
- ZEST: Zero-shot Embodied Skill Transfer for Athletic Robot Control (2026)
- HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation (2026)
- General Humanoid Whole-Body Control via Pretraining and Fast Adaptation (2026)
- Dynamic Policy Learning for Legged Robot with Simplified Model Pretraining and Model Homotopy Transfer (2025)
- PMG: Parameterized Motion Generator for Human-like Locomotion Control (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper