Models & Limits

Available models

Each model has a daily completion limit, and completions are currently rate-limited to 20 requests per minute. If you need a higher quota, increased rate limits, or support for a new model, feel free to reach out to us with your use case.

For each vertical, we offer two models: kirha and kirha-flash. Models are accessed using the model name followed by the vertical, for example for crypto vertical, two models are available kirha:crypto and kirha-flash:crypto .

  • kirha is our primary model, optimized for high-quality, nuanced, and context-aware responses. It’s ideal for complex tasks that require reasoning, depth, or extended context. Currently based on OpenAi Gpt 4.1. Limited to 150 completions per day.

  • kirha-flash is a faster, lightweight variant designed for low-latency use cases. It’s best suited for high-throughput applications or situations where response time is critical. Currently based on Gemini 2.5 flash. Limited to 500 completions per day.

Model Id
Quota

kirha:crypto

150 completions per day & 20 requests per minutes

kirha-flash:crypto

500 completions per day & 20 requests per minutes

Last updated