Provider: Meta AI License: Apache 2.0 (permissive open-source license) Access: Open weights on Hugging Face and GitHub Architecture: Vision Transformer (ViT-H) segmentation model
๐ Overview
Segment Anything Model (SAM) is a general-purpose image segmentation foundation model developed by Meta AI. Unlike traditional segmentation models that require task-specific training, SAM can generate accurate segmentation masks for almost any object in an image using simple prompts such as points, boxes, or masks.
SAM is trained on the SA-1B dataset, one of the largest segmentation datasets ever created, containing billions of masks across millions of images.
Key strengths:
- ๐ผ๏ธ Promptable segmentation using points, bounding boxes, or masks
- โก Zero-shot capability across diverse image domains
- ๐ง Foundation model for vision pipelines
โ๏ธ Technical Specs
- Architecture: Vision Transformer (ViT-H)
- Model Type: Promptable image segmentation
- Training Dataset: SA-1B dataset
- Embedding Backbone: Transformer-based image encoder
- Prompt Inputs: points, bounding boxes, masks
๐ Deployment
- Hugging Face Repo: https://huggingface.co/facebook/sam-vit-huge
- Frameworks: PyTorch, ๐ค Transformers
- Use Cases: dataset labeling, robotics perception, image editing, visual analytics
- Hardware: GPU recommended for large-scale inference