Provider: Keras / Google Research
License: Open-weight for research purposes (usage terms defined on Kaggle)
Access: Hosted on Kaggle Models
Architecture: Decoder-only transformer using Keras 3 backend
Framework: JAX + Keras3 on TPU
🔍 Overview
Gemma 3 (Keras) is an open and research-focused large language model developed using the new Keras 3 API and JAX. It serves as an educational and experimental platform for those exploring advanced LLM development and TPU-accelerated training on Google’s infrastructure.
This model emphasizes:
- 🧪 Experimental Design: Not production-ready, but structured for reproducible LLM prototyping
- ⚙️ Keras 3 Integration: Demonstrates declarative model building with modern tooling
- ⚡ TPU-Ready Workflows: Enables efficient training via JAX/XLA on Google Cloud TPUs
⚙️ Technical Highlights
- Architecture: GPT-style decoder with standard transformer layers
- Training Backend: Keras 3 with JAX (TPU support)
- Weights: Downloadable via Kaggle; also useful as a template for pretraining projects
- Tokenizer: SentencePiece-based
🚀 Deployment & Usage
- Model Access: Gemma3 on Kaggle Models
- Training Recipes: Provided within Kaggle Notebooks ecosystem
- Use Case: Research and benchmarking, not intended for direct production use