This is a collection of research papers for Visual-Language-Action (VLA) models with memory, designed to solve long-horizon, partially observabe tasks. The repository shall be regularly updated to track the frontiers. As there is no generally accepted definition of memory in the context of VLA, we include all works related to the ability of VLA systems to process information over horizons longer than a single step.
Keywords: memory · key frame selection · long-horizon · partial observability · POMDP · long context
Contributions are welcome. Submit a PR with relevant papers or resources you consider significant.
format:
- [title](paper link)
- author1, author2, and author3...
-
μVLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models
- Egor Cherepanov, Nikita Kachaev, Daniil Zelezetsky, Aydar Bulatov, Artem Pshenitsyn, Yuri Kuratov, Alexey Skrynnik, Aleksandr I. Panov, Alexey K. Kovalev
-
MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models
- Hao Shi, Weiye Li, Bin Xie, Yulin Wang, Renping Zhou, Tiancai Wang, Xiangyu Zhang, Ping Luo, Gao Huang
-
WeaveLA: Event Driven Cross-Subtask Latent Memory Weaving for Repetitive Robot Manipulation
- Shoujing Zhu, Zhenyang Liu, Fungmiu Wang, Jiafeng Wang, Bo Yue, Guiliang Liu, Simo Wu, Xiangyang Xue, Taiping Zeng
-
TRACE: Trajectory-Routed Causal Memory for Delayed-Evidence Visuomotor Imitation
- Zihao Li, Ranpeng Qiu, Yincong Chen, Guoqiang Ren, Weiming Zhi
-
AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing
- Jisong Cai, Long Ling, Shiwei Chu, Zhongshan Liu, Jiayue Kang, Zhixuan Liang, Wenjie Xu, Yinan Mao, Weinan Zhang, Xiaokang Yang, Ru Ying, Ran Zheng, Yao Mu
-
MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model
- Shanglin Yuan, Weiheng Zhao, Xianda Guo, Wei Sui, Li Yu, Wenyu Liu, Xinggang Wang
-
HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation
- Xiaoquan Sun, Ruijian Zhang, Chen Cao, Yihan Sun, Jiahui Chen, Zetian Xu, Bo Chen, Haijier Chen, Zhen Yang, Jiarun Zhu, Yijun Hong, JingZhe Xu, Jingrui Pang, Mingqi Yuan, Jiayu Chen
-
Action-Effect Memory Pretraining for Robot Manipulation
- Yijing Zhou, Qiwei Liang, Sitong Zhuang, Jiaxi Li, Xianpeng Wang, Boyang Cai, Yunyang Mo, Renjing Xu
-
- Li Xiang, Yali Li, Yuan Wang, Shengjin Wang
-
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
- Josef Chen
-
VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
- Shengyu Si, Yuanzhuo Lu, Ruimeng Yang, Ziyi Ye, Zuxuan Wu, Yu-Gang Jiang
-
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
- Jianchao Zhao, Huoren Yang, Yusong Hu, Yuyang Gao, Qiguan Ou, Cong Wan, SongLin Dong, Zhiheng Ma, Yihong Gong
-
Spatial Memory for Out-of-Vision Manipulation in Vision-Language-Action
- Pengteng Li, Weiyu Guo, He Zhang, Tiefu Cai, Xiao He, Yandong Guo, Hui Xiong
-
- Alex S. Huang, Jiahui Zhang, Shiqing Tang, Yu Xiang
-
RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark
- Huashuo Lei, Wenxuan Song, Huarui Zhang, Jieyuan Pei, Jiayi Chen, Haodong Yan, Han Zhao, Pengxiang Ding, Zhipeng Zhang, Lida Huang, Donglin Wang, Yan Wang, Haoang Li
-
ECHO: Continuous Hierarchical Memory for Vision-Language-Action Models
- Yanbin Hu, Jin Cui, Jiayi Lu, Ruixuan Yang, Jun Ye, Boran Zhao, Xingyu Chen, Xuguang Lan, Pengju Ren
-
Long-Term Memory for VLA-based Agents in Open-World Task Execution
- Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao
-
- Khoa Vo, Sieu Tran, Taisei Hanyu, Yuki Ikebe, Duy Nguyen, Bui Duy Quoc Nghi, Minh Vu, Anthony Gunderman, Chase Rainwater, Anh Nguyen, Ngan Le
-
- Yihuai Gao, Jinyun Liu, Shuang Li, Shuran Song
-
LongBench: Evaluating Robotic Manipulation Policies on Real-World Long-Horizon Tasks
- Xueyao Chen, Jingkai Jia, Tong Yang, Yibo Fu, Wei Li, Wenqiang Zhang
-
Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection
- Zhen Liu, Xinyu Ning, Zhe Hu, Xinxin Xie, Weize Li, Zhipeng Tang, Chongyu Wang, Zejun Yang, Hanlin Wang, Yitong Liu, Zhongzhu Pu
-
Long-Horizon Manipulation via Trace-Conditioned VLA Planning
- Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu
-
Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation
- Xinying Guo, Chenxi Jiang, Hyun Bin Kim, Ying Sun, Yang Xiao, Yuhang Han, Jianfei Yang
-
Scaling Short-Term Memory of Visuomotor Policies for Long-Horizon Tasks
- Rutav Shah, Rajat Kumar Jenamani, Xiaohan Zhang, Lingfeng Sun, Roberto Martín-Martín, Yuke Zhu, Deva Ramanan, Karl Schmeckpeper
-
PhysMem: Scaling Test-time Physical Memory for Robot Manipulation
- Haoyang Li, Yang You, Hao Su, Leonidas Guibas
-
HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation
- Zijian Zeng, Fei Ding, Huiming Yang, Xianwei Li
-
ReMem-VLA: Empowering Vision-Language-Action Model with Memory via Dual-Level Recurrent Queries
- Hang Li, Fengyi Shen, Dong Chen, Liudi Yang, Xudong Wang, Jinkui Shi, Zhenshan Bing, Ziyuan Liu, Alois Knoll
-
MEM: Multi-Scale Embodied Memory for Vision Language Action Models
- Marcel Torne, Karl Pertsch, Homer Walke, Kyle Vedder, Suraj Nair, Brian Ichter, Allen Z. Ren, Haohuan Wang, Jiaming Tang, Kyle Stachowicz, Karan Dhabalia, Michael Equi, Quan Vuong, Jost Tobias Springenberg, Sergey Levine, Chelsea Finn, Danny Driess
-
- Jun Sun, Boyu Yang, Jiahao Zhang, Ning Ma, Chencheng Wu, Siqing Zhang, Yiou Huang, Qiufeng Wang, Shan Liang, Yaran Chen
-
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
- Yinpei Dai, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai
-
- Wang Honghui, Jing Zhi, Ao Jicong, Song Shiji, Li Xuelong, Huang Gao, Bai Chenjia
-
RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design
- Tianxing Chen, Yuran Wang, Mingleyang Li, Yan Qin, Hao Shi, Zixuan Li, Yifan Hu, Yingsheng Zhang, Kaixuan Wang, Yue Chen, Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong, Ping Luo
-
Non-Markovian Long-Horizon Robot Manipulation via Keyframe Chaining
- Yipeng Chen, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen
-
VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory
- Yuheng Lei, Zhixuan Liang, Hongyuan Zhang, Ping Luo
-
LongNav-R1: Horizon-Adaptive Multi-Turn RL for Long-Horizon VLA Navigation
- Yue Hu, Avery Xi, Qixin Xiao, Seth Isaacson, Henry X. Liu, Ram Vasudevan, Maani Ghaffari
-
TacMamba: A Tactile History Compression Adapter Bridging Fast Reflexes and Slow VLA Reasoning
- Zhenan Wang, Yanzhe Wang, Meixuan Ren, Peng Li, Yang Liu, Yifei Nie, Limin Long, Yun Ye, Xiaofeng Wang, Zhen Zhu, Huixu Dong
-
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation
- Songhao Han, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, Si Liu
-
Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks
- Sanjay Haresh, Daniel Dijkman, Apratim Bhattacharyya, Roland Memisevic
-
BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
- Max Sobol Mark, Jacky Liang, Maria Attarian, Chuyuan Fu, Debidatta Dwibedi, Dhruv Shah, Aviral Kumar
-
- Yalcin Tur, Jalal Naghiyev, Haoquan Fang, Wei-Chuan Tsai, Jiafei Duan, Dieter Fox, Ranjay Krishna
-
Recursive Belief Vision Language Action Models
- Vaidehi Bagaria, Bijo Sebastian, Nirav Kumar Patel
-
LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies
- Yue Yang, Shuo Cheng, Yu Fang, Homanga Bharadhwaj, Mingyu Ding, Gedas Bertasius, Daniel Szafir
-
- Zaijing Li, Bing Hu, Rui Shao, Gongwei Chen, Dongmei Jiang, Pengwei Xie, Jianye Hao, Liqiang Nie
-
- Haoxuan Wang, Gengyu Zhang, Yan Yan, Ramana Rao Kompella, Gaowen Liu
-
Efficient Long-Horizon Vision-Language-Action Models via Static-Dynamic Disentanglement
- Weikang Qiu, Tinglin Huang, Rex Ying
-
Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation
- Huajie Tan, Peterson Co, Yijie Xu, Shanyu Rong, Yuheng Ji, Cheng Chi, Xiansheng Chen, Qiongyu Zhang, Zhongxia Zhao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang
- HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
- Minghui Lin, Pengxiang Ding, Shu Wang, Zifeng Zhuang, Yang Liu, Xinyang Tong, Wenxuan Song, Shangke Lyu, Siteng Huang, Donglin Wang
- LoLA: Long Horizon Latent Action Learning for General Robot Manipulation
- Xiaofan Wang, Xingyu Gao, Jianlong Fu, Zuolei Li, Dean Fortier, Galen Mullins, Andrey Kolobov, Baining Guo
- Affordance Field Intervention: Enabling VLAs to Escape Memory Traps in Robotic Manipulation
- Siyu Xu, Zijian Wang, Yunke Wang, Chenghao Xia, Tao Huang, Chang Xu
- Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective
- Nhat Chung, Taisei Hanyu, Toan Nguyen, Huy Le, Frederick Bumgarner, Duy Minh Ho Nguyen, Khoa Vo, Kashu Yamazaki, Chase Rainwater, Tung Kieu, Anh Nguyen, Ngan Le
- HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
- Myungkyu Koo, Daewon Choi, Taeyoung Kim, Kyungmin Lee, Changyeon Kim, Younggyo Seo, Jinwoo Shin
- LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
- Yi Yang, Jiaxuan Sun, Siqi Kou, Yihan Wang, Zhijie Deng
- RoboHiMan: A Hierarchical Evaluation Paradigm for Compositional Generalization in Long-Horizon Manipulation
- Yangtao Chen, Zixuan Chen, Nga Teng Chan, Junting Chen, Junhui Yin, Jieqi Shi, Yang Gao, Yong-Lu Li, Jing Huo
- Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding
- Maxim A. Patratskiy, Alexey K. Kovalev, Aleksandr I. Panov
- KV-Efficient VLA: A Method to Speed up Vision Language Models with RNN-Gated Chunked KV Cache
- Wanshun Xu, Long Zhuang, Lianlei Shan
- F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
- Qi Lv, Weijie Kong, Hao Li, Jia Zeng, Zherui Qiu, Delin Qu, Haoming Song, Qizhi Chen, Xiang Deng, Jiangmiao Pang
- Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation
- Yiguo Fan, Pengxiang Ding, Shuanghao Bai, Xinyang Tong, Yuyang Zhu, Hongchao Lu, Fengqi Dai, Wei Zhao, Yang Liu, Siteng Huang, Zhaoxin Fan, Badong Chen, Donglin Wang
- RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems
- Mingcong Lei, Honghao Cai, Zezhou Cui, Liangchen Tan, Junkun Hong, Gehan Hu, Shuangyu Zhu, Yimou Wu, Shaohan Jiang, Ge Wang, Yuyuan Yang, Junyuan Tan, Zhenglin Wan, Zhen Li, Shuguang Cui, Yiming Zhao, Yatong Han
- DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
- Yifan Zhong, Xuchuan Huang, Ruochong Li, Ceyao Zhang, Zhang Chen, Tianrui Guan, Fanlian Zeng, Ka Num Lui, Yuyao Ye, Yitao Liang, Yaodong Yang, Yuanpei Chen
- RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
- Huajie Tan, Xiaoshuai Hao, Cheng Chi, Minglan Lin, Yaoxu Lyu, Mingyu Cao, Dong Liang, Zhuo Chen, Mengsi Lyu, Cheng Peng, Chenrui He, Yulong Ao, Yonghua Lin, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang
- EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
- Ruihan Yang, Qinxi Yu, Yecheng Wu, Rui Yan, Borui Li, An-Chieh Cheng, Xueyan Zou, Yunhao Fang, Xuxin Cheng, Ri-Zhao Qiu, Hongxu Yin, Sifei Liu, Song Han, Yao Lu, Xiaolong Wang
- CycleManip: Enabling Cyclic Task Manipulation via Effective Historical Perception and Understanding
- Yi-Lin Wei, Haoran Liao, Yuhao Lin, Pengyue Wang, Zhizhao Liang, Guiliang Liu, Wei-Shi Zheng
- History-Aware Visuomotor Policy Learning via Point Tracking
- Jingjing Chen, Hongjie Fang, Chenxi Wang, Shiquan Wang, Cewu Lu
- MemER: Scaling Up Memory for Robot Control via Experience Retrieval
- Ajay Sridhar, Jennifer Pan, Satvik Sharma, Chelsea Finn
- MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
- Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, Tiancai Wang, Erjin Zhou, Haoqiang Fan, Xiangyu Zhang, Gao Huang
- Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
- Egor Cherepanov, Nikita Kachaev, Alexey K. Kovalev, Aleksandr I. Panov
- MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation
- Runhao Li, Wenkai Guo, Zhenyu Wu, Changyuan Wang, Haoyuan Deng, Zhenyu Weng, Yap-Peng Tan, Ziwei Wang
- Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective
- Nhat Chung, Taisei Hanyu, Toan Nguyen, Huy Le, Frederick Bumgarner, Duy Minh Ho Nguyen, Khoa Vo, Kashu Yamazaki, Chase Rainwater, Tung Kieu, Anh Nguyen, Ngan Le
- MVP: Memory-enhanced Vision-Language-Action Policy with Feedback Learning
- Anonymous authors. Paper under double-blind review
- Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
- Haoxuan Li, Sixu Yan, Yuhan Li, Xinggang Wang
- EvoVLA: Self-Evolving Vision-Language-Action Model
- Zeting Liu, Zida Yang, Zeyu Zhang, Hao Tang
- EchoVLA: Robotic Vision-Language-Action Model with Synergistic Declarative Memory for Mobile Manipulation
- Min Lin, Xiwen Liang, Bingqian Lin, Liu Jingzhi, Zijian Jiao, Kehan Li, Yuhan Ma, Yuecheng Liu, Shen Zhao, Yuzheng Zhuang, Xiaodan Liang
- Mixture of Horizons in Action Chunking
- Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, Mingyu Ding
- AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention
- Lei Xiao, Jifeng Li, Juntao Gao, Feiyang Ye, Yan Jin, Jingjing Qian, Jing Zhang, Yong Wu, Xiaoyuan Yu
- Vision-Language Memory for Spatial Reasoning
- Zuntao Liu, Yi Du, Taimeng Fu, Shaoshu Su, Cherie Ho, Chen Wang
- CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
- Hao Li, Shuai Yang, Yilun Chen, Xinyi Chen, Xiaoda Yang, Yang Tian, Hanqing Wang, Tai Wang, Dahua Lin, Feng Zhao, Jiangmiao Pang
- ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context
- Huiwon Jang, Sihyun Yu, Heeseung Kwon, Hojin Jeon, Younggyo Seo, Jinwoo Shin
- GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
- Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, Wanli Peng, Jingchao Qiao, Zeyu Ren, Haixin Shi, Zhi Su, Jiawen Tian, Yuyang Xiao, Shenyu Zhang, Liwei Zheng, Hang Li, Yonghui Wu
- CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
- Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard