The Mamba Explosion
Since the publication of the original Mamba paper on December 1st, 2023, there has been rapid adoption and experimentation with the selective state space model architecture. I have attempted to catalogue much of this activity. If you are aware of any significant work that is missing, you can email me at jason@statespace.info.
For places to get started thinking about Mamba, highly recommend the Mamba Primer and Mamba The Hard Way by Sasha Rush. Also Sparse Notes Mamba and Mamba The Easy Way by Jack Cook.
For podcasts:
- Interview with Tri Dao and Michael Poli - by Nathan Lambert at Interconnects
- Mamba, Memory, and the SSM Moment - by Nathan Labenz at Cognitive Revolution
- Mamba-Palooza: 90 Days of Mamba-Inspired Research - Part 1 - by Nathan Labenz and myself
- Mamba-Palooza: 90 Days of Mamba-Inspired Research - Part 2 - by Nathan Labenz and myself
Mamba Papers
Mamba for speech synthesis - Using Mamba for speech synthesis - 1/3/24
MoE-Mamba - Efficient Selective State Space Models with Mixture of Experts (Poland) - 1/8/24
U-Mamba - Enhancing Long-range Dependency for Biomedical Image Segmentation (University of Toronto) - 1/9/24
MambaTab - A Simple Yet Effective Approach for Handling Tabular Data (Univeristy of Kentucky) - 1/16/24
Vision Mamba - Efficient Visual Representation Learning with Bidirectional State Space Model (Huazhong University of Sci & Tech) - 1/17/24
VMamba - Visual State Space Model (Huawei & UCAS) - 1/18/24
MambaByte - Token-free Selective State Space Model (Cornell) - 1/24/24
SegMamba - Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation (Hong Kong University of Sci & Tech) - 1/25/24
Vivim - Video Vision Mamba for Medical Video Object Segmentationn (Hong Kong University of Sci & Tech) - 1/25/24
MambaMorph - a Mamba-based Backbone with Contrastive Feature Learning for Deformable MR-CT Registration (Beihang University, China) - 1/25/24
Black Mamba - - Mixture of Experts for State-Space Models (Palo Alto, Zyphra) 2/1/24
Graph-Mamba - Towards Long-Range Graph Sequence Modeling with Selective State Spaces (University of Toronto) - 2/1/24
VM-UNet - Vision Mamba UNet for Medical Image Segmentation (Shanghai JTU) - 2/4/24
Is Mamba Capable Of In-Context Learning? - Mamba matches ICL performance of transformers (Italian Institute of Technology, Univ of Freiburg) - 2/5/24
Swin-UMamba - Mamba-based UNet with ImageNet-based pretraining, beats U-mamba by 3.5% (Shenzhen IAT, Peng Cheng Lab) - 2/5/24
Can Mamba Learn How to Learn? - A Comparative Study on In-Context Learning Tasks (Krafton, Seoul National University) - 2/6/24
Othello-Mamba - Evaluating the Mamba architecture on the Othello game (Lille, France) - 2/6/24
U-shaped Vision Mamba - Single Image Dehazing (Nanjing Univ of Sci & Tech, China) - 2/6/24
Mamba-UNet - UNet-Like Pure Visual Mamba for Medical Image Segmentation (U of Oxford, Fudan U China, U of Pittsburgh) - 2/7/24
LongMamba - 2.8B model trained on 16k context (NTU, Singapore) - 2/8/24
Mamba-ND - Selective State Space Modeling for Multi-Dimensional Data (UCLA) - 2/8/24
Semi-Mamba-UNet - Pixel-Level Contrastive Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation (U of Oxford, Mianyang China) - 2/11/24
P-Mamba - Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation (Institute of Intelligent Software, Guangzhou China) - 2/13/24
FD-Vision - Mamba for Endoscopic Exposure Correction (Nanjing UST) - 2/14/24
Weak-Mamba-UNet - Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation (U of Oxford, Mianyang China) - 2/16/24
Graph Mamba - Towards Learning on Graphs with State Space Models (Cornell) - 2/19/24
PointMamba - A Simple State Space Model for Point Cloud Analysis (Huazhong UST, Baidu) - 2/19/24
Pan-Mamba - Effective pan-sharpening with State Space Model (Hefei IPS, UST China) - 2/19/24
MambaIR - A Simple Baseline for Image Restoration with State-Space Model (Tsinghua University, China) - 2/23/24
Res-VMamba - Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning (National Taiwan University) - 2/24/24
MambaMIR - An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation (Imperial College London) - 2/28/24
MambaStock - Selective state space model for stock prediction (Guangdong China) - 2/29/24
Point Cloud Mamba - Point Cloud Learning via State Space Model - achieves SOTA performance on ScanObjectNN, ModelNet40, and ShapeNetPart datasets (Wuhan University) - 3/1/24
The Hidden Attention of Mamba Models - Selective SSMs can be viewed as attention-driven models (Tel Aviv University) - 3/3/24
Theoretical Foundations of Deep Selective State-Space Models - Rough Path Theory shows Mamba captures non-linear interactions between tokens at distinct timescales (ICL London, Institute Tubingen, Oxford) - 3/4/24
Caduceus - Bi-Directional Equivariant Long-Range DNA Sequence Modeling (Cornell, Princeton, Carnegie Mellon) - 3/5/24
DenseMamba - State Space Models with Dense Hidden Connection for Efficient Large Language Models (Huawei) - 3/5/24
MedMamba - Vision Mamba for Medical Image Classification (Guangzhou Medical University) - 3/6/24
Mamba4Rec - Towards Efficient Sequential Recommendation with Selective State Space Models (Texas A&M, Shanghai JT Univ) - 3/6/24
MambaLithium - Selective state space model for remaining-useful-life, state-of-health, and state-of-charge estimation of lithium-ion batteries (Ji Hua Lab Guangdong China) - 3/8/24
MamMIL - Multiple Instance Learning for Whole Slide Images with State Space Models (Tsinghua Univ) - 3/8/24
Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy - motion guided prediction head with Mamba (Chinese Univ of Hong Kong) - 3/8/24
APRICOT-Mamba - Acuity Prediction in Intensive Care Unit (ICU) Life Sustaining Therapies Prediction Model (University of Florida, Stanford) - 3/8/24
ClinicalMamba - A Generative Clinical Language Model on Longitudinal Clinical Notes (Amherst, Univ of Mass Lowell) - 3/9/24
nnMamba - 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model (Shenzhen Research Institute of Big Data) - 3/10/24
MambaMIL - Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology (Hong Kong UST) - 3/11/24
LightM-UNet - Mamba Assists in Lightweight UNet for Medical Image Segmentation (Ministry of Education Beijing, Peking U) - 3/11/24
A multi-cohort study on prediction of acute brain dysfunction states - uses selective state space models (University of Florida) - 3/11/24
Universality of Linear Recurrences Followed by Non-linear Projections - Finite-Width Guarantees and Benefits of Complex Eigenvalues interpretability (Inst Tubingen, Deepmind) - 3/11/24
Large Window-based Mamba UNet for Medical Image Segmentation - Beyond Convolution and Self-attention (Zhejiang Univ, U of Illinois UC, Notre Dame) - 3/12/24
VideoMamba - State Space Model for Efficient Video Understanding (Shanghai AI Lab, Shenzhen CAS) - 3/12/24
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers - achieves speech enhancement performance for static and moving speakers (Zhejiang University) - 3/12/24
MD-Dose - A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction (Chengdu Chinse Acad of Sci) - 3/13/24
Activating Wider Areas in Image Super-Resolution - single image super-resolution with Vim based models (JiaoTong Univ) - 3/13/24
TimeMachine - A Time Series is Worth 4 Mambas for Long-term Forecasting (Univ of Kentucky) - 3/14/24
Video Mamba Suite - State Space Model as a Versatile Alternative for Video Understanding (Nanjing U, Shanghai AI Lab) - 3/14/24
MambaTalk - Efficient Holistic Gesture Synthesis with Selective State Space Models (Tsinghua University) - 3/14/24
LocalMamba - Visual State Space Model with Windowed Selective Scan (U of Sydney) - 3/14/24
VM-UNET-V2 - Rethinking Vision Mamba UNet for Medical Image Segmentation (Nanjing University) - 3/14/24
On the low-shot transferability of [V]-Mamba - explores the transfer learning potential of [V]-Mamba (Quebec AI Institute) - 3/15/24
EfficientVMamba - Atrous Selective Scan for Light Weight Visual Mamba (Univ of Sydney) - 3/15/24
MiM-ISTD - Mamba-in-Mamba for Efficient Infrared Small Target Detection (Alibaba Cloud, USTC China) - 3/17/24
Is Mamba Effective for Time Series Forecasting? - saves GPU memory and training time (Northeastern University China) - 3/17/24
Point Mamba - A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy (Shanghai JT University) - 3/18/24
Motion Mamba - Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM (Monash U, Australian Natl Univ, MBZ Univ of AI, Carnegie Mellon) - 3/19/24
STG-Mamba - Spatial-Temporal Graph Learning via Selective State Space Model (U of New South Wales Australia) - 3/19/24
H-vmunet - High-order Vision Mamba UNet for Medical Image Segmentation (Shanghai University) - 3/20/24
VL-Mamba - Exploring State Space Models for Multimodal Learning (U of Adelaide, Chinese Academy of S) - 3/20/24
ZigMa - Zigzag Mamba Diffusion Model (LMU Munich) - 3/20/24
ProMamba - Prompt-Mamba for polyp segmentation (Peking U) - 3/20/24
Music to Dance as Language Translation using Sequence Models - Mamba and the Transformer have a dance-off (University of Beira Interior Portugal) - 3/22/24
SiMBA - Simplified Mamba-Based Architecture for Vision and Multivariate Time series (Microsoft) - 3/22/24
Cobra - Extending Mamba to Multi-Modal for Efficient Inference (Westlake U, Zhejiang U) - 3/22/24
CMViM - Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for Alzheimer’s Classification (Hong Kong Polytechnic) - 3/25/24
Uncovering Selective State Space Model’s Capabilities in Lifelong Sequential Recommendation - RecMamba reduces training time by 70% and memory costs by 80% (Shandong Univ, Michigan State) - 3/25/24
State Space Models as Foundation Models - A Control Theoretic Overview (ETH Zurich Switzerland) - 3/25/24
Proprioception Is All You Need - Using Mamba for Terrain Classification for Boreal Forests (Northern Robotics Lab, U Laval Quebec City) - 3/25/24
VMRNN - Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting (Hong Kong UST) - 3/26/24
ReMamber - Referring Image Segmentation with Mamba Twister (Shanghai JTU, U of Nottingham) - 3/26/24
Rotate to Scan - UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation (Guangdong U of T) - 3/26/24
PlainMamba - Improving Non-Hierarchical Mamba in Visual Recognition (U of Edinburgh, UST China, Peking) - 3/26/24
Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion - Leveraging strengths from Mamba UNet for MRI segmentation (North South U of Bangladesh) - 3/26/24
Gamba - Marry Gaussian Splatting with Mamba for single view 3D reconstruction (National U of Singapore, Nanyang Tech U) - 3/27/24
RankMamba - Benchmarking Mamba’s Document Ranking Performance in the Era of Transformers (University of Utah) - 3/27/24
Dual-path Mamba - Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation (Columbia) - 3/27/24
Jamba - A Hybrid Transformer-Mamba Language Model (AI21 Labs) - 3/28/24
RSMamba - Remote Sensing Image Classification with State Space Model (Beihang Univ, Univ of Hong Kong) - 3/28/24
HARMamba - Efficient Wearable Sensor Human Activity Recognition Based on Bidirectional Selective SSM (NNNSF of China) - 3/29/24
Decision Mamba - Reinforcement Learning via Sequence Modeling with Selective State Spaces (AILab, Cyberagent, Japan) - 3/29/24
UltraLight VM-UNet - Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation (Shanghai Univ, Univ of Shanghai for Sci and Tech) - 4/9/24
T-Mamba - Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation (Huazhong Univ of ST, Univ of Hong Kong) - 4/1/24
SpikeMba - Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding (Harbin Institute of Technology, China) - 4/1/24
Samba - Semantic Segmentation of Remotely Sensed Images with State Space Model (University of Liverpool, Suzhou China & Liverpool) - 4/2/24
SPMamba - State-space model is all you need in speech separation (Tsinghua University, China) - 4/2/24
RS3Mamba - Visual State Space Model for Remote Sensing Images Semantic Segmentation (Chinese Univ of Hong Kong) - 4/3/24
RS-Mamba - for Large Remote Sensing Image Dense Prediction (Nanjing Univ) - 4/3/24
ChangeMamba - Remote Sensing Change Detection with Spatio-Temporal State Space Model (Univ of Tokyo, Wuhan Univ, Center for AIP) - 4/4/24
Locating and Editing Factual Associations in Mamba - rank-one model editing can successfully insert facts at specific locations (Northeastern University) - 4/4/24
Sigma - Siamese Mamba Network for Multi-Modal Semantic Segmentation (Carnegie Mellon, Dalian Univ of Tech, China) - 4/5/24
3DMambaIPF - A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering (Fudan U Shanghai, Nanyang Tech U Singapore) - 4/8/24
Does Transformer Interpretability Transfer to RNNs? - transformer interpretability techniques work for RNNs (EleutherAI) - 4/9/24
RhythmMamba - Fast Remote Physiological Measurement with Arbitrary Length Videos (UST Beijing) - 4/9/24
MambaAD - Exploring State Space Models for Multi-class Unsupervised Anomaly Detection (Zhejiang Univ, Tencent) - 4/9/24
3DMambaComplete - Exploring Structured State Space Model for Point Cloud Completion (Fudan U Shanghai) - 4/10/24
Simba - Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos (IIT Odisha & West Bengal India) - 4/11/24
ViM-UNet - Vision Mamba for Biomedical Segmentation (University Gottingen) - 4/11/24
DGMamba - Domain Generalization via Generalized State Space Model (Shanghai JT Univ, Skywork AI) - 4/11/24
FusionMamba - Efficient Image Fusion with State Space Model (UCAS) - 4/11/24
SurvMamba - State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction (Xiamen Univ) - 4/11/24
MambaDFuse - A Mamba-based Dual-phase Model for Multi-modality Image Fusion (Harbin Engineering Univ) - 4/12/24
SpectralMamba - Efficient Mamba for Hyperspectral Image Classification (Aerospace Info Research Inst, Beijing) - 4/12/24
Fusion-Mamba - for Cross-modality Object Detection (Beihang Univ, ECNU, Tencent) - 4/14/24
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion - local-enhanced vision Mamba block (UESTC Chengdu, China) - 4/14/24
FreqMamba - Viewing Mamba from a Frequency Perspective for Image Deraining (UST of China) - 4/15/24
FusionMamba - Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba (Great Bay U, Hong Kong Polytech, China) - 4/20/24
State Space Model for New-Generation Network Alternative to Transformers - A Survey (School of AI, Anhui U) - 4/15/24
HSIDMamba - Exploring Bidirectional State-Space Models for Hyperspectral Denoising (Inst of AI China) - 4/15/24
Text-controlled Motion Mamba - Text-Instructed Temporal Grounding of Human Motion (Peking Univ) - 4/17/24
CU-Mamba - Selective State Space Models with Channel Learning for Image Restoration (Stanford, KREA AI) - 4/17/24
Vim4Path - Self-Supervised Vision Mamba for Histopathology Images (Concordia Univ, U of Montreal) - 4/20/24
ST-SSMs - Spatial-Temporal Selective State of Space Model for Traffic Forecasting (University of Sydney) - 4/20/24
MambaUIE&SR - Unraveling the Ocean’s Secrets with Only 2.8 FLOPs (Beijing Info Sci & Tech Univ) - 4/22/24
Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting - combine Mamba and Transformer architecture in time series data (Illinois Inst of Tech) - 4/23/24
Mamba3D - Enhancing Local Features for 3D Point Cloud Analysis via State Space Model (Huazhong Univ of ST) - 4/23/24
Mamba Papers over time
Contesting Evidence
Repeat After Me - Transformers are Better than State Space Models at Copying (Harvard) - 2/1/24
Simple linear attention language models balance the recall-throughput tradeoff - found that Mamba struggled with MQAR recal task (Stanford, Buffalo, Purdue) - 2/28/24
Linear Transformers with Learnable Kernel Functions are Better In-Context Models - Found that RWKV & Mamba performance declines at higher sequence length (Tinkoff) - 2/16/24
The Illusion of State in State-Space Models - do SSM’s have greater expressive power for state tracking? (NYU, Allen Inst for AI) - 4/12/24
nnU-Net Revisited - A Call for Rigorous Validation in 3D Medical Image Segmentation, observing a bias for novel architectures (Cancer Research Center DKFZ, Germany) - 4/15/24
Other Linear RNNs and Linear Transformers
Griffin - Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models (Google Deepmind) - 2/29/24
Evo - DNA foundation modeling from molecular to genome scale - based on Striped Hyena (Arc Institute, Stanford, TogetherAI) - 2/27/24
Leave No Context Behind - Efficient Infinite Context Transformers with Infini-attention (Google) - 4/10/24
HGRN2 - Gated Linear RNNs with State Expansion (Shanghai AI Lab, Taptap, MIT CSAIL) - 4/11/24