Research

Data Ethics

How we source and handle training data. We take data ethics seriously. No real user data is used for training.

No real user data

Trained entirely on synthetic data and ethically sourced public datasets. No customer data, no scraped content. When organizations deploy Rabbit, their data stays on their infrastructure and is never used for model training.

Synthetic generation methodology

We use a seed-and-expand approach: hand-crafted examples per signal are expanded to thousands using controlled generation with Claude API, then quality-filtered through automated and human review. Each training example is verified for format compliance, factual consistency, and diversity.

Continuous evaluation

Every version is evaluated against held-out test sets with human review. All benchmarks, training parameters, and model versions are published transparently in our research log.

No data sharing across tenants

Each API key maps to an isolated storage namespace. Memories, embeddings, and knowledge graphs are strictly separated. No data from one tenant is ever accessible to another.

Right to deletion

Any user can delete their memories at any time through the API. Deletion is permanent and immediate across all storage layers including vector indices and the knowledge graph.