Home

This is a website where I write about state space models. The adoption of these models is happening fast (see the Mamba Explosion page).

Beyond this website, I recently appeared on the Cognitive Revolution Podcast: Mamba-Palooza Part 1 and Mamba-Palooza Part 2

I can be contacted on Twitter - @KamaraiCode

I am also cofounder of “Build the Future”, an Austin based organization whose mission is to help accelerate technological progress towards a future with greater prosperity, freedom, creativity, and adventure. As of now we host monthly meetups in Austin. If you are like-minded, feel free to join us! For more information: build the future

For state space models, here are some places to get started:

Links

S4 by Albert Gu - blog post introducing the S4 model. There are also links to the S4 code and paper.

The Annotated S4 - by Sasha Rush & Sidd Karamcheti, highly recommend, credit 80% of my intuitions about S4 to reading this.

Mamba Paper - original Mamba paper by Gu et al, selective SSM architecture achieves linear scaling with increasing context length.

Interview w Tri Dao - a worthwhile listen.

AlbertGu tweet - tweet introducing Mamba.

Clean code implementation of Mamba - a practical implementation, pytorch.

Chat finetuning for Mamba - one of the first I saw doing a chat fine-tune.

Gated linear attention (transformer) - Yikang Shen’s take on gated linear attention (not an SSM).

New Mamba model 12th December 3B parameters, 600B tokens - links to Albert Gu’s tweet.

Mamba, Memory, and the SSM Moment (Cog Rev Podcast) - recently found this ep, aligns with so much of my own thinking and more - highly recommended

Sparse Notes Mamba walk through - from S4 to Mamba

The Annotated Mamba paper - PENDING by Sasha Rush (we’re all hoping it will release soon)

The Mamba Explosion:

Mamba for speech synthesis - Using Mamba for speech synthesis - 1/3/24

MoE-Mamba - Efficient Selective State Space Models with Mixture of Experts (Poland) - 1/8/24

U-Mamba - Enhancing Long-range Dependency for Biomedical Image Segmentation (University of Toronto) - 1/9/24

MambaTab - A Simple Yet Effective Approach for Handling Tabular Data (Univeristy of Kentucky) - 1/16/24

Vision Mamba - Efficient Visual Representation Learning with Bidirectional State Space Model (Huazhong University of Sci & Tech) - 1/17/24

VMamba - Visual State Space Model (Huawei & UCAS) - 1/18/24

MambaByte - Token-free Selective State Space Model (Cornell) - 1/24/24

SegMamba - Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation (Hong Kong University of Sci & Tech) - 1/25/24

Vivim - Video Vision Mamba for Medical Video Object Segmentationn (Hong Kong University of Sci & Tech) - 1/25/24

MambaMorph - a Mamba-based Backbone with Contrastive Feature Learning for Deformable MR-CT Registration (Beihang University, China) - 1/25/24

Black Mamba - - Mixture of Experts for State-Space Models (Palo Alto, Zyphra) 2/1/24

Graph-Mamba - Towards Long-Range Graph Sequence Modeling with Selective State Spaces (University of Toronto) - 2/1/24

VM-UNet - Vision Mamba UNet for Medical Image Segmentation (Shanghai JTU) - 2/4/24

Is Mamba Capable Of In-Context Learning? - Mamba matches ICL performance of transformers (Italian Institute of Technology, Univ of Freiburg) - 2/5/24

Swin-UMamba - Mamba-based UNet with ImageNet-based pretraining, beats U-mamba by 3.5% (Shenzhen IAT, Peng Cheng Lab) - 2/5/24

Can Mamba Learn How to Learn? - A Comparative Study on In-Context Learning Tasks (Krafton, Seoul National University) - 2/6/24

Othello-Mamba - Evaluating the Mamba architecture on the Othello game (Lille, France) - 2/6/24

U-shaped Vision Mamba - Single Image Dehazing (Nanjing Univ of Sci & Tech, China) - 2/6/24

Mamba-UNet - UNet-Like Pure Visual Mamba for Medical Image Segmentation (U of Oxford, Fudan U China, U of Pittsburgh) - 2/7/24

LongMamba - 2.8B model trained on 16k context (NTU, Singapore) - 2/8/24

Mamba-ND - Selective State Space Modeling for Multi-Dimensional Data (UCLA) - 2/8/24

Semi-Mamba-UNet - Pixel-Level Contrastive Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation (U of Oxford, Mianyang China) - 2/11/24

P-Mamba - Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation (Institute of Intelligent Software, Guangzhou China) - 2/13/24

FD-Vision - Mamba for Endoscopic Exposure Correction (Nanjing UST) - 2/14/24

Weak-Mamba-UNet - Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation (U of Oxford, Mianyang China) - 2/16/24

Graph Mamba - Towards Learning on Graphs with State Space Models (Cornell) - 2/19/24

PointMamba - A Simple State Space Model for Point Cloud Analysis (Huazhong UST, Baidu) - 2/19/24

Pan-Mamba - Effective pan-sharpening with State Space Model (Hefei IPS, UST China) - 2/19/24

MambaIR - A Simple Baseline for Image Restoration with State-Space Model (Tsinghua University, China) - 2/23/24

Res-VMamba - Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning (National Taiwan University) - 2/24/24

MambaMIR - An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation (Imperial College London) - 2/28/24

MambaStock - Selective state space model for stock prediction (Guangdong China) - 2/29/24

Point Cloud Mamba - Point Cloud Learning via State Space Model - achieves SOTA performance on ScanObjectNN, ModelNet40, and ShapeNetPart datasets (Wuhan University) - 3/1/24

The Hidden Attention of Mamba Models - Selective SSMs can be viewed as attention-driven models (Tel Aviv University) - 3/3/24

Theoretical Foundations of Deep Selective State-Space Models - Rough Path Theory shows Mamba captures non-linear interactions between tokens at distinct timescales (ICL London, Institute Tubingen, Oxford) - 3/4/24

Caduceus - Bi-Directional Equivariant Long-Range DNA Sequence Modeling (Cornell, Princeton, Carnegie Mellon) - 3/5/24

DenseMamba - State Space Models with Dense Hidden Connection for Efficient Large Language Models (Huawei) - 3/5/24

MedMamba - Vision Mamba for Medical Image Classification (Guangzhou Medical University) - 3/6/24

Mamba4Rec - Towards Efficient Sequential Recommendation with Selective State Space Models (Texas A&M, Shanghai JT Univ) - 3/6/24

MambaLithium - Selective state space model for remaining-useful-life, state-of-health, and state-of-charge estimation of lithium-ion batteries (Ji Hua Lab Guangdong China) - 3/8/24

MamMIL - Multiple Instance Learning for Whole Slide Images with State Space Models (Tsinghua Univ) - 3/8/24

Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy - motion guided prediction head with Mamba (Chinese Univ of Hong Kong) - 3/8/24

APRICOT-Mamba - Acuity Prediction in Intensive Care Unit (ICU) Life Sustaining Therapies Prediction Model (University of Florida, Stanford) - 3/8/24

ClinicalMamba - A Generative Clinical Language Model on Longitudinal Clinical Notes (Amherst, Univ of Mass Lowell) - 3/9/24

nnMamba - 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model (Shenzhen Research Institute of Big Data) - 3/10/24

MambaMIL - Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology (Hong Kong UST) - 3/11/24

LightM-UNet - Mamba Assists in Lightweight UNet for Medical Image Segmentation (Ministry of Education Beijing, Peking U) - 3/11/24

A multi-cohort study on prediction of acute brain dysfunction states - uses selective state space models (University of Florida) - 3/11/24

Universality of Linear Recurrences Followed by Non-linear Projections - Finite-Width Guarantees and Benefits of Complex Eigenvalues interpretability (Inst Tubingen, Deepmind) - 3/11/24

Large Window-based Mamba UNet for Medical Image Segmentation - Beyond Convolution and Self-attention (Zhejiang Univ, U of Illinois UC, Notre Dame) - 3/12/24

VideoMamba - State Space Model for Efficient Video Understanding (Shanghai AI Lab, Shenzhen CAS) - 3/12/24

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers - achieves speech enhancement performance for static and moving speakers (Zhejiang University) - 3/12/24

MD-Dose - A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction (Chengdu Chinse Acad of Sci) - 3/13/24

Activating Wider Areas in Image Super-Resolution - single image super-resolution with Vim based models (JiaoTong Univ) - 3/13/24

TimeMachine - A Time Series is Worth 4 Mambas for Long-term Forecasting (Univ of Kentucky) - 3/14/24

Video Mamba Suite - State Space Model as a Versatile Alternative for Video Understanding (Nanjing U, Shanghai AI Lab) - 3/14/24

MambaTalk - Efficient Holistic Gesture Synthesis with Selective State Space Models (Tsinghua University) - 3/14/24

LocalMamba - Visual State Space Model with Windowed Selective Scan (U of Sydney) - 3/14/24

VM-UNET-V2 - Rethinking Vision Mamba UNet for Medical Image Segmentation (Nanjing University) - 3/14/24

On the low-shot transferability of [V]-Mamba - explores the transfer learning potential of [V]-Mamba (Quebec AI Institute) - 3/15/24

EfficientVMamba - Atrous Selective Scan for Light Weight Visual Mamba (Univ of Sydney) - 3/15/24

MiM-ISTD - Mamba-in-Mamba for Efficient Infrared Small Target Detection (Alibaba Cloud, USTC China) - 3/17/24

Is Mamba Effective for Time Series Forecasting? - saves GPU memory and training time (Northeastern University China) - 3/17/24

Point Mamba - A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy (Shanghai JT University) - 3/18/24

Motion Mamba - Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM (Monash U, Australian Natl Univ, MBZ Univ of AI, Carnegie Mellon) - 3/19/24

STG-Mamba - Spatial-Temporal Graph Learning via Selective State Space Model (U of New South Wales Australia) - 3/19/24

H-vmunet - High-order Vision Mamba UNet for Medical Image Segmentation (Shanghai University) - 3/20/24

VL-Mamba - Exploring State Space Models for Multimodal Learning (U of Adelaide, Chinese Academy of S) - 3/20/24

ZigMa - Zigzag Mamba Diffusion Model (LMU Munich) - 3/20/24

ProMamba - Prompt-Mamba for polyp segmentation (Peking U) - 3/20/24

Music to Dance as Language Translation using Sequence Models - Mamba and the Transformer have a dance-off (University of Beira Interior Portugal) - 3/22/24

SiMBA - Simplified Mamba-Based Architecture for Vision and Multivariate Time series (Microsoft) - 3/22/24

Cobra - Extending Mamba to Multi-Modal for Efficient Inference (Westlake U, Zhejiang U) - 3/22/24

CMViM - Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for Alzheimer’s Classification (Hong Kong Polytechnic) - 3/25/24

Uncovering Selective State Space Model’s Capabilities in Lifelong Sequential Recommendation - RecMamba reduces training time by 70% and memory costs by 80% (Shandong Univ, Michigan State) - 3/25/24

State Space Models as Foundation Models - A Control Theoretic Overview (ETH Zurich Switzerland) - 3/25/24

Proprioception Is All You Need - Using Mamba for Terrain Classification for Boreal Forests (Northern Robotics Lab, U Laval Quebec City) - 3/25/24

VMRNN - Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting (Hong Kong UST) - 3/26/24

ReMamber - Referring Image Segmentation with Mamba Twister (Shanghai JTU, U of Nottingham) - 3/26/24

Rotate to Scan - UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation (Guangdong U of T) - 3/26/24

PlainMamba - Improving Non-Hierarchical Mamba in Visual Recognition (U of Edinburgh, UST China, Peking) - 3/26/24

Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion - Leveraging strengths from Mamba UNet for MRI segmentation (North South U of Bangladesh) - 3/26/24

Gamba - Marry Gaussian Splatting with Mamba for single view 3D reconstruction (National U of Singapore, Nanyang Tech U) - 3/27/24

RankMamba - Benchmarking Mamba’s Document Ranking Performance in the Era of Transformers (University of Utah) - 3/27/24

Dual-path Mamba - Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation (Columbia) - 3/27/24

Jamba - A Hybrid Transformer-Mamba Language Model (AI21 Labs) - 3/28/24

RSMamba - Remote Sensing Image Classification with State Space Model (Beihang Univ, Univ of Hong Kong) - 3/28/24

HARMamba - Efficient Wearable Sensor Human Activity Recognition Based on Bidirectional Selective SSM (NNNSF of China) - 3/29/24

Decision Mamba - Reinforcement Learning via Sequence Modeling with Selective State Spaces (AILab, Cyberagent, Japan) - 3/29/24

UltraLight VM-UNet - Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation (Shanghai Univ, Univ of Shanghai for Sci and Tech) - 4/9/24

T-Mamba - Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation (Huazhong Univ of ST, Univ of Hong Kong) - 4/1/24

SpikeMba - Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding (Harbin Institute of Technology, China) - 4/1/24

Samba - Semantic Segmentation of Remotely Sensed Images with State Space Model (University of Liverpool, Suzhou China & Liverpool) - 4/2/24

SPMamba - State-space model is all you need in speech separation (Tsinghua University, China) - 4/2/24

RS3Mamba - Visual State Space Model for Remote Sensing Images Semantic Segmentation (Chinese Univ of Hong Kong) - 4/3/24

RS-Mamba - for Large Remote Sensing Image Dense Prediction (Nanjing Univ) - 4/3/24

ChangeMamba - Remote Sensing Change Detection with Spatio-Temporal State Space Model (Univ of Tokyo, Wuhan Univ, Center for AIP) - 4/4/24

Locating and Editing Factual Associations in Mamba - rank-one model editing can successfully insert facts at specific locations (Northeastern University) - 4/4/24

Sigma - Siamese Mamba Network for Multi-Modal Semantic Segmentation (Carnegie Mellon, Dalian Univ of Tech, China) - 4/5/24

3DMambaIPF - A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering (Fudan U Shanghai, Nanyang Tech U Singapore) - 4/8/24

Does Transformer Interpretability Transfer to RNNs? - transformer interpretability techniques work for RNNs (EleutherAI) - 4/9/24

RhythmMamba - Fast Remote Physiological Measurement with Arbitrary Length Videos (UST Beijing) - 4/9/24

MambaAD - Exploring State Space Models for Multi-class Unsupervised Anomaly Detection (Zhejiang Univ, Tencent) - 4/9/24

3DMambaComplete - Exploring Structured State Space Model for Point Cloud Completion (Fudan U Shanghai) - 4/10/24

Simba - Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos (IIT Odisha & West Bengal India) - 4/11/24

ViM-UNet - Vision Mamba for Biomedical Segmentation (University Gottingen) - 4/11/24

DGMamba - Domain Generalization via Generalized State Space Model (Shanghai JT Univ, Skywork AI) - 4/11/24

FusionMamba - Efficient Image Fusion with State Space Model (UCAS) - 4/11/24

SurvMamba - State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction (Xiamen Univ) - 4/11/24

MambaDFuse - A Mamba-based Dual-phase Model for Multi-modality Image Fusion (Harbin Engineering Univ) - 4/12/24

SpectralMamba - Efficient Mamba for Hyperspectral Image Classification (Aerospace Info Research Inst, Beijing) - 4/12/24

Fusion-Mamba - for Cross-modality Object Detection (Beihang Univ, ECNU, Tencent) - 4/14/24

A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion - local-enhanced vision Mamba block (UESTC Chengdu, China) - 4/14/24

FreqMamba - Viewing Mamba from a Frequency Perspective for Image Deraining (UST of China) - 4/15/24

FusionMamba - Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba (Great Bay U, Hong Kong Polytech, China) - 4/20/24

State Space Model for New-Generation Network Alternative to Transformers - A Survey (School of AI, Anhui U) - 4/15/24

HSIDMamba - Exploring Bidirectional State-Space Models for Hyperspectral Denoising (Inst of AI China) - 4/15/24

Text-controlled Motion Mamba - Text-Instructed Temporal Grounding of Human Motion (Peking Univ) - 4/17/24

CU-Mamba - Selective State Space Models with Channel Learning for Image Restoration (Stanford, KREA AI) - 4/17/24

Vim4Path - Self-Supervised Vision Mamba for Histopathology Images (Concordia Univ, U of Montreal) - 4/20/24

ST-SSMs - Spatial-Temporal Selective State of Space Model for Traffic Forecasting (University of Sydney) - 4/20/24

MambaUIE&SR - Unraveling the Ocean’s Secrets with Only 2.8 FLOPs (Beijing Info Sci & Tech Univ) - 4/22/24

Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting - combine Mamba and Transformer architecture in time series data (Illinois Inst of Tech) - 4/23/24

Mamba3D - Enhancing Local Features for 3D Point Cloud Analysis via State Space Model (Huazhong Univ of ST) - 4/23/24

too many to count, here are the rest of them

Mamba Repos

S4 - The original S4 repo - Apache 2.0 license

Mamba - The original mamba repo - Apache 2.0 license

Mamba chat - finetuned for chat based on 16k samples from HF ultra-chat_200k - Apache 2.0 license

Mamba on HF - 130m, 370m, 790m, 1.4b, 2.8b, 2.8b-slimpj models on Hugging Face - Apache 2.0 license

Striped Hyena - a hybrid architecture composed of multi-head, grouped-query attention and gated convolutions arranged in Hyena blocks - Apache 2.0 license

I’ve also recently given my thoughts on the new LLM framework DSPy.