Provider: Meta AI License: Apache 2.0 (permissive open-source license) Access: Open weights on Hugging Face and GitHub Architecture: Vision Transformer (ViT-H) segmentation model


๐Ÿ” Overview

Segment Anything Model (SAM) is a general-purpose image segmentation foundation model developed by Meta AI. Unlike traditional segmentation models that require task-specific training, SAM can generate accurate segmentation masks for almost any object in an image using simple prompts such as points, boxes, or masks.

SAM is trained on the SA-1B dataset, one of the largest segmentation datasets ever created, containing billions of masks across millions of images.

Key strengths:

  • ๐Ÿ–ผ๏ธ Promptable segmentation using points, bounding boxes, or masks
  • โšก Zero-shot capability across diverse image domains
  • ๐Ÿง  Foundation model for vision pipelines

โš™๏ธ Technical Specs

  • Architecture: Vision Transformer (ViT-H)
  • Model Type: Promptable image segmentation
  • Training Dataset: SA-1B dataset
  • Embedding Backbone: Transformer-based image encoder
  • Prompt Inputs: points, bounding boxes, masks

๐Ÿš€ Deployment

  • Hugging Face Repo: https://huggingface.co/facebook/sam-vit-huge
  • Frameworks: PyTorch, ๐Ÿค— Transformers
  • Use Cases: dataset labeling, robotics perception, image editing, visual analytics
  • Hardware: GPU recommended for large-scale inference

๐Ÿ”— Resources