Research
Data Ethics
How we source and handle training data. We take data ethics seriously. No real user data is used for training.
No real user data
Trained entirely on synthetic data and ethically sourced public datasets. No customer data, no scraped content. When organizations deploy Rabbit, their data stays on their infrastructure and is never used for model training.
Synthetic generation methodology
We use a seed-and-expand approach: hand-crafted examples per signal are expanded to thousands using controlled generation with Claude API, then quality-filtered through automated and human review. Each training example is verified for format compliance, factual consistency, and diversity.
Continuous evaluation
Every version is evaluated against held-out test sets with human review. All benchmarks, training parameters, and model versions are published transparently in our research log.
No data sharing across tenants
Each API key maps to an isolated storage namespace. Memories, embeddings, and knowledge graphs are strictly separated. No data from one tenant is ever accessible to another.
Right to deletion
Any user can delete their memories at any time through the API. Deletion is permanent and immediate across all storage layers including vector indices and the knowledge graph.