This is a website where I write about state space models. The adoption of these models is happening fast (see the Mamba Explosion page).
Beyond this website, I recently appeared on the Cognitive Revolution Podcast: Mamba-Palooza Part 1 and Mamba-Palooza Part 2
I can be contacted on Twitter - @KamaraiCode
I am also cofounder of “Build the Future”, an Austin based organization whose mission is to help accelerate technological progress towards a future with greater prosperity, freedom, creativity, and adventure. As of now we host monthly meetups in Austin. If you are like-minded, feel free to join us! For more information: build the future
For state space models, here are some places to get started:
Links
S4 by Albert Gu - blog post introducing the S4 model. There are also links to the S4 code and paper.
The Annotated S4 - by Sasha Rush & Sidd Karamcheti, highly recommend, credit 80% of my intuitions about S4 to reading this.
Mamba Paper - original Mamba paper by Gu et al, selective SSM architecture achieves linear scaling with increasing context length.
Interview w Tri Dao - a worthwhile listen.
AlbertGu tweet - tweet introducing Mamba.
Clean code implementation of Mamba - a practical implementation, pytorch.
Chat finetuning for Mamba - one of the first I saw doing a chat fine-tune.
Gated linear attention (transformer) - Yikang Shen’s take on gated linear attention (not an SSM).
New Mamba model 12th December 3B parameters, 600B tokens - links to Albert Gu’s tweet.
Mamba, Memory, and the SSM Moment (Cog Rev Podcast) - recently found this ep, aligns with so much of my own thinking and more - highly recommended
Sparse Notes Mamba walk through - from S4 to Mamba
The Annotated Mamba paper - PENDING by Sasha Rush (we’re all hoping it will release soon)
The Mamba Explosion:
Mamba for speech synthesis - Using Mamba for speech synthesis - 1/3/24
MoE-Mamba - Efficient Selective State Space Models with Mixture of Experts (Poland) - 1/8/24
U-Mamba - Enhancing Long-range Dependency for Biomedical Image Segmentation (University of Toronto) - 1/9/24
MambaTab - A Simple Yet Effective Approach for Handling Tabular Data (Univeristy of Kentucky) - 1/16/24
Vision Mamba - Efficient Visual Representation Learning with Bidirectional State Space Model (Huazhong University of Sci & Tech) - 1/17/24
VMamba - Visual State Space Model (Huawei & UCAS) - 1/18/24
MambaByte - Token-free Selective State Space Model (Cornell) - 1/24/24
SegMamba - Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation (Hong Kong University of Sci & Tech) - 1/25/24
Vivim - Video Vision Mamba for Medical Video Object Segmentationn (Hong Kong University of Sci & Tech) - 1/25/24
MambaMorph - a Mamba-based Backbone with Contrastive Feature Learning for Deformable MR-CT Registration (Beihang University, China) - 1/25/24
Black Mamba - - Mixture of Experts for State-Space Models (Palo Alto, Zyphra) 2/1/24
Graph-Mamba - Towards Long-Range Graph Sequence Modeling with Selective State Spaces (University of Toronto) - 2/1/24
VM-UNet - Vision Mamba UNet for Medical Image Segmentation (Shanghai JTU) - 2/4/24
Is Mamba Capable Of In-Context Learning? - Mamba matches ICL performance of transformers (Italian Institute of Technology, Univ of Freiburg) - 2/5/24
Swin-UMamba - Mamba-based UNet with ImageNet-based pretraining, beats U-mamba by 3.5% (Shenzhen IAT, Peng Cheng Lab) - 2/5/24
Can Mamba Learn How to Learn? - A Comparative Study on In-Context Learning Tasks (Krafton, Seoul National University) - 2/6/24
Othello-Mamba - Evaluating the Mamba architecture on the Othello game (Lille, France) - 2/6/24
U-shaped Vision Mamba - Single Image Dehazing (Nanjing Univ of Sci & Tech, China) - 2/6/24
Mamba-UNet - UNet-Like Pure Visual Mamba for Medical Image Segmentation (U of Oxford, Fudan U China, U of Pittsburgh) - 2/7/24
LongMamba - 2.8B model trained on 16k context (NTU, Singapore) - 2/8/24
Mamba-ND - Selective State Space Modeling for Multi-Dimensional Data (UCLA) - 2/8/24
Semi-Mamba-UNet - Pixel-Level Contrastive Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation (U of Oxford, Mianyang China) - 2/11/24
P-Mamba - Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation (Institute of Intelligent Software, Guangzhou China) - 2/13/24
FD-Vision - Mamba for Endoscopic Exposure Correction (Nanjing UST) - 2/14/24
Weak-Mamba-UNet - Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation (U of Oxford, Mianyang China) - 2/16/24
Graph Mamba - Towards Learning on Graphs with State Space Models (Cornell) - 2/19/24
PointMamba - A Simple State Space Model for Point Cloud Analysis (Huazhong UST, Baidu) - 2/19/24
Pan-Mamba - Effective pan-sharpening with State Space Model (Hefei IPS, UST China) - 2/19/24
MambaIR - A Simple Baseline for Image Restoration with State-Space Model (Tsinghua University, China) - 2/23/24
Res-VMamba - Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning (National Taiwan University) - 2/24/24
MambaMIR - An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation (Imperial College London) - 2/28/24
MambaStock - Selective state space model for stock prediction (Guangdong China) - 2/29/24
Point Cloud Mamba - Point Cloud Learning via State Space Model - achieves SOTA performance on ScanObjectNN, ModelNet40, and ShapeNetPart datasets (Wuhan University) - 3/1/24
The Hidden Attention of Mamba Models - Selective SSMs can be viewed as attention-driven models (Tel Aviv University) - 3/3/24
Theoretical Foundations of Deep Selective State-Space Models - Rough Path Theory shows Mamba captures non-linear interactions between tokens at distinct timescales (ICL London, Institute Tubingen, Oxford) - 3/4/24
Caduceus - Bi-Directional Equivariant Long-Range DNA Sequence Modeling (Cornell, Princeton, Carnegie Mellon) - 3/5/24
DenseMamba - State Space Models with Dense Hidden Connection for Efficient Large Language Models (Huawei) - 3/5/24
MedMamba - Vision Mamba for Medical Image Classification (Guangzhou Medical University) - 3/6/24
Mamba4Rec - Towards Efficient Sequential Recommendation with Selective State Space Models (Texas A&M, Shanghai JT Univ) - 3/6/24
MambaLithium - Selective state space model for remaining-useful-life, state-of-health, and state-of-charge estimation of lithium-ion batteries (Ji Hua Lab Guangdong China) - 3/8/24
MamMIL - Multiple Instance Learning for Whole Slide Images with State Space Models (Tsinghua Univ) - 3/8/24
Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy - motion guided prediction head with Mamba (Chinese Univ of Hong Kong) - 3/8/24
APRICOT-Mamba - Acuity Prediction in Intensive Care Unit (ICU) Life Sustaining Therapies Prediction Model (University of Florida, Stanford) - 3/8/24
ClinicalMamba - A Generative Clinical Language Model on Longitudinal Clinical Notes (Amherst, Univ of Mass Lowell) - 3/9/24
nnMamba - 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model (Shenzhen Research Institute of Big Data) - 3/10/24
MambaMIL - Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology (Hong Kong UST) - 3/11/24
LightM-UNet - Mamba Assists in Lightweight UNet for Medical Image Segmentation (Ministry of Education Beijing, Peking U) - 3/11/24
A multi-cohort study on prediction of acute brain dysfunction states - uses selective state space models (University of Florida) - 3/11/24
Universality of Linear Recurrences Followed by Non-linear Projections - Finite-Width Guarantees and Benefits of Complex Eigenvalues interpretability (Inst Tubingen, Deepmind) - 3/11/24
Large Window-based Mamba UNet for Medical Image Segmentation - Beyond Convolution and Self-attention (Zhejiang Univ, U of Illinois UC, Notre Dame) - 3/12/24
VideoMamba - State Space Model for Efficient Video Understanding (Shanghai AI Lab, Shenzhen CAS) - 3/12/24
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers - achieves speech enhancement performance for static and moving speakers (Zhejiang University) - 3/12/24
MD-Dose - A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction (Chengdu Chinse Acad of Sci) - 3/13/24
Activating Wider Areas in Image Super-Resolution - single image super-resolution with Vim based models (JiaoTong Univ) - 3/13/24
TimeMachine - A Time Series is Worth 4 Mambas for Long-term Forecasting (Univ of Kentucky) - 3/14/24
Video Mamba Suite - State Space Model as a Versatile Alternative for Video Understanding (Nanjing U, Shanghai AI Lab) - 3/14/24
MambaTalk - Efficient Holistic Gesture Synthesis with Selective State Space Models (Tsinghua University) - 3/14/24
LocalMamba - Visual State Space Model with Windowed Selective Scan (U of Sydney) - 3/14/24
VM-UNET-V2 - Rethinking Vision Mamba UNet for Medical Image Segmentation (Nanjing University) - 3/14/24
On the low-shot transferability of [V]-Mamba - explores the transfer learning potential of [V]-Mamba (Quebec AI Institute) - 3/15/24
EfficientVMamba - Atrous Selective Scan for Light Weight Visual Mamba (Univ of Sydney) - 3/15/24
MiM-ISTD - Mamba-in-Mamba for Efficient Infrared Small Target Detection (Alibaba Cloud, USTC China) - 3/17/24
Is Mamba Effective for Time Series Forecasting? - saves GPU memory and training time (Northeastern University China) - 3/17/24
Point Mamba - A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy (Shanghai JT University) - 3/18/24
Motion Mamba - Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM (Monash U, Australian Natl Univ, MBZ Univ of AI, Carnegie Mellon) - 3/19/24
STG-Mamba - Spatial-Temporal Graph Learning via Selective State Space Model (U of New South Wales Australia) - 3/19/24
H-vmunet - High-order Vision Mamba UNet for Medical Image Segmentation (Shanghai University) - 3/20/24
VL-Mamba - Exploring State Space Models for Multimodal Learning (U of Adelaide, Chinese Academy of S) - 3/20/24
ZigMa - Zigzag Mamba Diffusion Model (LMU Munich) - 3/20/24
ProMamba - Prompt-Mamba for polyp segmentation (Peking U) - 3/20/24
Music to Dance as Language Translation using Sequence Models - Mamba and the Transformer have a dance-off (University of Beira Interior Portugal) - 3/22/24
SiMBA - Simplified Mamba-Based Architecture for Vision and Multivariate Time series (Microsoft) - 3/22/24
Cobra - Extending Mamba to Multi-Modal for Efficient Inference (Westlake U, Zhejiang U) - 3/22/24
CMViM - Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for Alzheimer’s Classification (Hong Kong Polytechnic) - 3/25/24
Uncovering Selective State Space Model’s Capabilities in Lifelong Sequential Recommendation - RecMamba reduces training time by 70% and memory costs by 80% (Shandong Univ, Michigan State) - 3/25/24
State Space Models as Foundation Models - A Control Theoretic Overview (ETH Zurich Switzerland) - 3/25/24
Proprioception Is All You Need - Using Mamba for Terrain Classification for Boreal Forests (Northern Robotics Lab, U Laval Quebec City) - 3/25/24
VMRNN - Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting (Hong Kong UST) - 3/26/24
ReMamber - Referring Image Segmentation with Mamba Twister (Shanghai JTU, U of Nottingham) - 3/26/24
Rotate to Scan - UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation (Guangdong U of T) - 3/26/24
PlainMamba - Improving Non-Hierarchical Mamba in Visual Recognition (U of Edinburgh, UST China, Peking) - 3/26/24
Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion - Leveraging strengths from Mamba UNet for MRI segmentation (North South U of Bangladesh) - 3/26/24
Gamba - Marry Gaussian Splatting with Mamba for single view 3D reconstruction (National U of Singapore, Nanyang Tech U) - 3/27/24
RankMamba - Benchmarking Mamba’s Document Ranking Performance in the Era of Transformers (University of Utah) - 3/27/24
Dual-path Mamba - Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation (Columbia) - 3/27/24
Jamba - A Hybrid Transformer-Mamba Language Model (AI21 Labs) - 3/28/24
RSMamba - Remote Sensing Image Classification with State Space Model (Beihang Univ, Univ of Hong Kong) - 3/28/24
HARMamba - Efficient Wearable Sensor Human Activity Recognition Based on Bidirectional Selective SSM (NNNSF of China) - 3/29/24
Decision Mamba - Reinforcement Learning via Sequence Modeling with Selective State Spaces (AILab, Cyberagent, Japan) - 3/29/24
UltraLight VM-UNet - Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation (Shanghai Univ, Univ of Shanghai for Sci and Tech) - 4/9/24
T-Mamba - Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation (Huazhong Univ of ST, Univ of Hong Kong) - 4/1/24
SpikeMba - Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding (Harbin Institute of Technology, China) - 4/1/24
Samba - Semantic Segmentation of Remotely Sensed Images with State Space Model (University of Liverpool, Suzhou China & Liverpool) - 4/2/24
SPMamba - State-space model is all you need in speech separation (Tsinghua University, China) - 4/2/24
RS3Mamba - Visual State Space Model for Remote Sensing Images Semantic Segmentation (Chinese Univ of Hong Kong) - 4/3/24
RS-Mamba - for Large Remote Sensing Image Dense Prediction (Nanjing Univ) - 4/3/24
ChangeMamba - Remote Sensing Change Detection with Spatio-Temporal State Space Model (Univ of Tokyo, Wuhan Univ, Center for AIP) - 4/4/24
Locating and Editing Factual Associations in Mamba - rank-one model editing can successfully insert facts at specific locations (Northeastern University) - 4/4/24
Sigma - Siamese Mamba Network for Multi-Modal Semantic Segmentation (Carnegie Mellon, Dalian Univ of Tech, China) - 4/5/24
3DMambaIPF - A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering (Fudan U Shanghai, Nanyang Tech U Singapore) - 4/8/24
Does Transformer Interpretability Transfer to RNNs? - transformer interpretability techniques work for RNNs (EleutherAI) - 4/9/24
RhythmMamba - Fast Remote Physiological Measurement with Arbitrary Length Videos (UST Beijing) - 4/9/24
MambaAD - Exploring State Space Models for Multi-class Unsupervised Anomaly Detection (Zhejiang Univ, Tencent) - 4/9/24
3DMambaComplete - Exploring Structured State Space Model for Point Cloud Completion (Fudan U Shanghai) - 4/10/24
Simba - Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos (IIT Odisha & West Bengal India) - 4/11/24
ViM-UNet - Vision Mamba for Biomedical Segmentation (University Gottingen) - 4/11/24
DGMamba - Domain Generalization via Generalized State Space Model (Shanghai JT Univ, Skywork AI) - 4/11/24
FusionMamba - Efficient Image Fusion with State Space Model (UCAS) - 4/11/24
SurvMamba - State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction (Xiamen Univ) - 4/11/24
MambaDFuse - A Mamba-based Dual-phase Model for Multi-modality Image Fusion (Harbin Engineering Univ) - 4/12/24
SpectralMamba - Efficient Mamba for Hyperspectral Image Classification (Aerospace Info Research Inst, Beijing) - 4/12/24
Fusion-Mamba - for Cross-modality Object Detection (Beihang Univ, ECNU, Tencent) - 4/14/24
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion - local-enhanced vision Mamba block (UESTC Chengdu, China) - 4/14/24
FreqMamba - Viewing Mamba from a Frequency Perspective for Image Deraining (UST of China) - 4/15/24
FusionMamba - Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba (Great Bay U, Hong Kong Polytech, China) - 4/20/24
State Space Model for New-Generation Network Alternative to Transformers - A Survey (School of AI, Anhui U) - 4/15/24
HSIDMamba - Exploring Bidirectional State-Space Models for Hyperspectral Denoising (Inst of AI China) - 4/15/24
Text-controlled Motion Mamba - Text-Instructed Temporal Grounding of Human Motion (Peking Univ) - 4/17/24
CU-Mamba - Selective State Space Models with Channel Learning for Image Restoration (Stanford, KREA AI) - 4/17/24
Vim4Path - Self-Supervised Vision Mamba for Histopathology Images (Concordia Univ, U of Montreal) - 4/20/24
ST-SSMs - Spatial-Temporal Selective State of Space Model for Traffic Forecasting (University of Sydney) - 4/20/24
MambaUIE&SR - Unraveling the Ocean’s Secrets with Only 2.8 FLOPs (Beijing Info Sci & Tech Univ) - 4/22/24
Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting - combine Mamba and Transformer architecture in time series data (Illinois Inst of Tech) - 4/23/24
Mamba3D - Enhancing Local Features for 3D Point Cloud Analysis via State Space Model (Huazhong Univ of ST) - 4/23/24
Mamba Repos
S4 - The original S4 repo - Apache 2.0 license
Mamba - The original mamba repo - Apache 2.0 license
Mamba chat - finetuned for chat based on 16k samples from HF ultra-chat_200k - Apache 2.0 license
Mamba on HF - 130m, 370m, 790m, 1.4b, 2.8b, 2.8b-slimpj models on Hugging Face - Apache 2.0 license
Striped Hyena - a hybrid architecture composed of multi-head, grouped-query attention and gated convolutions arranged in Hyena blocks - Apache 2.0 license
I’ve also recently given my thoughts on the new LLM framework DSPy.