{"data":[{"model_id":"claude-fable-5","model_name":"Claude Fable 5","developer_id":2,"desc":"（The official channels have temporarily suspended all access to the model.）Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work（This model is extremely expensive and is not recommended for casual use.）","pricing":{"cache_read":1.1,"cache_write":13.75,"input":11,"output":55},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":1000000},{"model_id":"claude-opus-4-8","model_name":"Claude Opus 4.8","developer_id":2,"desc":"Claude Opus 4.8 is Anthropic’s newest and most powerful publicly available model. It is suitable for the most complex tasks. It is Anthropic’s strongest model for complex reasoning, long-horizon agent programming, and highly autonomous work.claude-opus-4-8 does not display thought content by default; you need to set the additional parameter \"display\": \"summarized\" to enable it.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"gemini-3.5-flash","model_name":"Gemini 3.5 Flash","developer_id":8,"desc":"Gemini 3.5 Flash provides sustained frontier-level intelligence optimized for real-world tasks at a higher speed and lower cost. Designed for the agentic era, it excels at sub-agent deployment, multi-step workflows, and long-horizon tasks at scale. This model is particularly effective for rapid agentic loops involving complex coding cycles and iterations.","pricing":{"cache_read":1.5,"input":1.5,"output":9},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,web,deepsearch,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"grok-build-0.1","model_name":"Grok Build 0.1","developer_id":9,"desc":"Fast coding model trained specifically for agentic coding workflows.","pricing":{"cache_read":0.2,"input":1,"output":2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"gemini-3.1-flash-image","model_name":"Gemini 3.1 Flash Image","developer_id":8,"desc":"gemini-3.1-flash-image (Nano Banana 2) features professional-grade visual intelligence, lightning-fast efficiency, and realistic, grounded generative capabilities. This model serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume developer use cases.","pricing":{"cache_read":0.5,"input":0.5,"output":3},"types":"image_generation,llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"kimi-k2.7-code","model_name":"Kimi K2.7 Code","developer_id":15,"desc":"Kimi K2.7 Code is Kimi’s most intelligent Coding model, capable of completing programming tasks with higher success rates in long context. It features a native multimodal architecture that supports text, image, video input, and thinking modes, and dialogue and agent tasks.","pricing":{"cache_read":0.160835,"input":0.95,"output":3.9995},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32768,"context_length":262144},{"model_id":"step-3.7-flash-free","model_name":"Step 3.7 Flash (free)","developer_id":16,"desc":"step-3.7-flash-free is the free, publicly available version of step-3.7-flash, offering the same model capabilities with usage limits in place to ensure service stability. Limits include up to 10 requests per minute, a maximum of 200 requests per day, and a daily quota of 2,000,000 tokens. Free usage is based on shared capacity and is limited in availability. This version is intended for testing and light usage; for consistent and reliable access, please switch to the paid model.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"gemini-3-pro-image","model_name":"Gemini 3 Pro Image","developer_id":8,"desc":"Gemini-3-Pro-Image (Nano Banana Pro) is a high-performance image generation and editing model built on Gemini 3 Pro. It delivers enhanced multimodal understanding and real-world semantic reasoning, enabling fast creation of well-structured visual content such as infographics, product sketches, and multi-subject scenes. It can also leverage real-time knowledge through Search grounding. The model excels in text rendering, consistent multi-image blending, and identity preservation, while offering fine-grained creative controls like localized edits, lighting and focus adjustments, camera transformations, and flexible aspect ratios. It’s ideal for rapid design, concept previews, product visualization, and everyday image generation workflows.","pricing":{"cache_read":2,"input":2,"output":12},"types":"image_generation,llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"hy3-preview","model_name":"Hy3 Preview","developer_id":32,"desc":"Hunyuan Hy3 preview is designed for agent workloads, adopting a MoE architecture with 295B capacity and 21B activated parameters. It provides three modes within the same model—no_think (ultra-fast response), think_low (fast thinking), and think_high (deep reasoning)—to accommodate different latency and depth requirements from high-frequency interactions to complex engineering tasks. On code benchmarks such as SWE-bench Verified it approaches the current state of the art, and its 256K context supports cross-file code refactoring and long-document analysis. It is suitable for developers who require reliable task completion while being sensitive to inference costs.","pricing":{"cache_read":0.051,"input":0.17,"output":0.566661},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text","endpoints":"","max_output":128000,"context_length":256000},{"model_id":"minimax-m3","model_name":"MiniMax M3","developer_id":18,"desc":"The MiniMax M3 is a flagship programming model built for real-world productivity. As a production-grade model natively designed for Agent scenarios, it has achieved state-of-the-art (SOTA) performance in coding, agentic tool use, search, and office work.","pricing":{"input":0.288,"output":1.152},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":192000,"context_length":204800},{"model_id":"qwen3.7-plus","model_name":"Qwen3.7 Plus","developer_id":13,"desc":"The Qwen 3.7 series' mid-to-high cost-performance \"Plus\" model builds on strong text capabilities with a comprehensive upgrade to vision-language abilities, while retaining full agent capabilities in coding, tool use, and productivity workflows. Its core features are multimodal interactive hybrid agent capabilities, able to perceive real-world scenes, read screens and operate GUIs, generate code based on visual references, and provide end-to-end navigation of mobile applications.","pricing":{"cache_read":0.0564,"cache_write":0.3525,"input":0.282,"output":1.128},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"step-3.7-flash","model_name":"Step 3.7 Flash","developer_id":16,"desc":"step-3.7-flash is stepfun's flagship inference model（This model is available at a limited-time 90% discount; everyone is welcome to try it.）, designed for high-complexity tasks that require deep reasoning and fast execution. It excels at decomposing multi-step problems, performing tool calls, and maintaining consistency across massive datasets. It is the preferred choice for complex workloads such as long-context agents, advanced software engineering, and end-to-end research automation.","pricing":{"cache_read":0.0044,"input":0.022,"output":0.132},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"claude-opus-4-8-think","model_name":"Claude Opus 4.8 Thinking","developer_id":2,"desc":"The claude-opus-4-8-think model has adaptive thinking mode pre-enabled; the default thinking intensity is \"medium\", and it can be invoked directly via the OpenAI unified API. The claude-opus-4-8 model does not display thinking content by default; you need to set the extra parameter \"display\": \"summarized\" to enable it.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"qwen3.7-max","model_name":"Qwen3.7 Max","developer_id":13,"desc":"The Max model, the largest and most capable in the Qwen3.7 series, is currently offering its pure-text model capabilities for trial. Qwen3.7 is a new-generation flagship model designed for the agent era; its core strengths lie in the breadth and depth of its agent capabilities: it performs excellently in programming, office and productivity tasks, and long-term autonomous execution.","pricing":{"cache_read":0.169,"cache_write":2.1125,"input":1.69,"output":5.07},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"gpt-image-2","model_name":"GPT Image 2","developer_id":12,"desc":"GPT-image-2 is OpenAI's latest cutting-edge image generation model. Key value adds include better performance, quality, editing controls, and face preservation.\n(Currently the model's image generation time may be \u003e5 minutes; it is recommended to set the client timeout to ≥10 minutes.)\nThe model supports high input_fidelity and adding/removing one aspect of the image while retaining others. This model includes improvements in aspect ratio, resolution, and editing capabilities.","pricing":{"cache_read":5,"input":5,"output":30},"types":"image_generation,llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"ernie-5.1","model_name":"ERNIE 5.1","developer_id":25,"desc":"ERNIE 5.1 is the latest model in the Wenxin series, with comprehensive upgrades to its foundational capabilities and significant improvements in agents, knowledge, reasoning, and deep search. This upgrade uses a decoupled fully-asynchronous reinforcement learning technique to specifically address challenges encountered as large models evolve toward agent-based autonomous decision-making, such as training–inference numerical bias, low utilization of heterogeneous resources, and global issues caused by long-tail effects. It is paired with scaled agent post-training techniques to enhance model capabilities and generalization, enabling a three-step collaboration of environment, expert, and fusion that both ensures training efficiency and significantly improves the model’s stability and performance on complex tasks.","pricing":{"cache_read":0.5634,"input":0.5634,"output":2.5353},"types":"llm","features":"thinking","input_modalities":"text,image,audio,video","endpoints":"","max_output":64000,"context_length":119000},{"model_id":"gemini-3.1-flash-lite","model_name":"Gemini 3.1 Flash Lite","developer_id":8,"desc":"gemini-3.1-flash-lite is currently Google's latest and most cost-effective model, optimized for large-scale agent-based tasks, translation, and simple data processing.","pricing":{"cache_read":0.25,"input":0.25,"output":1.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,web,deepsearch,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"grok-4.3","model_name":"Grok 4.3","developer_id":9,"desc":"Grok 4.3 is amongst the leading models in intelligence and well priced when comparing to other models of similar price. It's also notably fast, however very verbose. The model supports text and image input, outputs text, and has a 1m tokens context window.","pricing":{"cache_read":0.2,"input":1.25,"output":2.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":1000000,"context_length":1000000},{"model_id":"coding-glm-5.2","model_name":"Coding GLM 5.2","developer_id":5,"desc":"Only supports OpenAI-compatible formats.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"happyhorse-1.0-i2v","model_name":"Happyhorse 1.0 I2v","developer_id":13,"desc":"HappyHorse-1.0-I2V supports image-to-video generation, featuring highly faithful dynamic visual generation capabilities, accurately understanding textual semantics and producing smooth, natural, and detail-rich high-quality videos.\n\n","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"happyhorse-1.0-r2v","model_name":"Happyhorse 1.0 R2v","developer_id":13,"desc":"HappyHorse-1.0-R2V supports reference-guided video generation, offering more stable subject and scene referencing, supports up to 9 reference images, and can accurately maintain creative intent to achieve stronger expressive capabilities.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"happyhorse-1.0-t2v","model_name":"Happyhorse 1.0 T2v","developer_id":13,"desc":"HappyHorse-1.0-T2V supports text-to-video generation, featuring highly faithful dynamic visual generation capabilities, accurately understanding textual semantics and producing smooth, natural, and detail-rich high-quality videos.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"happyhorse-1.0-video-edit","model_name":"Happyhorse 1.0 Video Edit","developer_id":13,"desc":"HappyHorse-1.0-Video-Edit supports video editing, allows editing videos via natural language commands, can reference up to 5 images to edit video elements locally or globally, and can accurately replicate video dynamics to achieve stronger expressive capabilities.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-5.5","model_name":"GPT 5.5","developer_id":12,"desc":"GPT-5.5 raises the baseline for complex production workflows. It’s a strong fit for coding use cases, tool-heavy agents, grounded assistants, long-context retrieval, product-spec-to-plan workflows, and customer-facing workflows where execution quality and response polish are critical.","pricing":{"cache_read":0.5,"input":5,"output":30},"types":"llm","features":"thinking,function_calling,structured_outputs,web,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":1050000},{"model_id":"gpt-5.5-pro","model_name":"GPT 5.5 Pro","developer_id":12,"desc":"Please note: this model is extremely expensive and very slow. If a request fails due to network issues, you may still be charged heavily; we cannot refund charges incurred by requests to this model.\nGPT-5.5 pro is available in the Responses API only to enable support for multi-turn model interactions before responding to API requests, and other advanced API features in the future. Since GPT-5.5 pro is designed to tackle tough problems, some requests may take several minutes to finish. To avoid timeout, please set a longer timeout duration. It is recommended to use this under good network conditions.","pricing":{"cache_read":30,"input":30,"output":180},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":1050000},{"model_id":"coding-minimax-m3-free","model_name":"Coding MiniMax M3 (free)","developer_id":18,"desc":"coding-minimax-m3-free is a free and open version offered by AIHubMix specifically for MiniMax users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute, 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"deepseek-v4-flash","model_name":"DeepSeek V4 Flash","developer_id":7,"desc":"DeepSeek-V4 features an ultra-long context of one million characters and achieves leading performance domestically and in the open-source domain in agent capabilities, world knowledge, and reasoning.","pricing":{"cache_read":0.00308,"input":0.154,"output":0.308},"types":"llm","features":"tools,function_calling,structured_outputs,thinking","input_modalities":"text","endpoints":"","max_output":384000,"context_length":1000000},{"model_id":"deepseek-v4-pro","model_name":"DeepSeek V4 Pro","developer_id":7,"desc":"DeepSeek-V4 features an ultra-long context of one million characters and achieves leading performance domestically and in the open-source domain in agent capabilities, world knowledge, and reasoning.( Directly requesting deepseek-v4-pro will route you through the official discount channel.)","pricing":{"cache_read":0.003851,"input":0.464,"output":0.928},"types":"llm","features":"tools,function_calling,structured_outputs,thinking","input_modalities":"text","endpoints":"","max_output":384000,"context_length":1000000},{"model_id":"ernie-5.0","model_name":"ERNIE 5.0","developer_id":25,"desc":"ERNIE 5.0 is the next-generation natively multimodal foundation model in the ERNIE family. Built on a unified multimodal architecture, it jointly learns from text, images, audio, and video to deliver broad multimodal capabilities.\n\nERNIE 5.0 features significantly upgraded core capabilities and shows strong performance across benchmarks, with notable gains in multimodal understanding, instruction following, creative writing, factual accuracy, and agent planning with tool use.","pricing":{"cache_read":0.82192,"input":0.82192,"output":3.28768},"types":"llm","features":"thinking","input_modalities":"text,image,audio,video","endpoints":"","max_output":64000,"context_length":119000},{"model_id":"kimi-k2.6","model_name":"Kimi K2.6","developer_id":15,"desc":"Kimi K2.6 is Kimi's latest and most intelligent model, with stronger and more stable long-range code-writing capabilities, significantly improved instruction-following and self-correction abilities, and support for text, image, and video inputs, thinking and non-thinking modes, as well as dialogue and Agent tasks.The model has a context length of 256k, supports long-form thinking, and excels at deep reasoning.","pricing":{"cache_read":0.160835,"input":0.95,"output":3.9995},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32768,"context_length":262144},{"model_id":"qwen3.6-max-preview","model_name":"Qwen3.6 Max Preview","developer_id":13,"desc":"The Max model Preview version, the largest and most capable model in the Qwen3.6 series, currently offers its pure-text model capabilities for trial. Compared with the previously released Qwen3-Max and Qwen3.6-Plus, this model further enhances vibe coding capabilities, executes coding agents more efficiently, significantly improves front-end programming and development capabilities, and further upgrades long-tail knowledge handling.","pricing":{"cache_read":0.1268,"cache_write":1.585,"input":1.268,"output":7.608},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text","endpoints":"","max_output":64000,"context_length":240000},{"model_id":"xiaomi-mimo-v2.5-pro","model_name":"Xiaomi Mimo V2.5 Pro","developer_id":31,"desc":"MiMo-V2.5-Pro is Xiaomi's most powerful model to date. In areas such as general agent capabilities, complex software engineering, and long-horizon tasks, it can now directly compete with the world's top agent models (Claude Opus 4.6, GPT-5.4). Compared with the previous-generation MiMo-V2-Pro, it achieves an all-around leap forward.","pricing":{"cache_read":0.00384,"input":0.48,"output":0.96},"types":"llm","features":"web","input_modalities":"text","endpoints":"","max_output":0,"context_length":1000000},{"model_id":"xiaomi-mimo-v2.5","model_name":"Xiaomi Mimo V2.5","developer_id":31,"desc":"MiMo-V2.5 is a native, fully multimodal large model designed for agent scenarios; it can see, hear, and read, and translate understanding into action. It has over 1 trillion total parameters (42B active parameters), employs an innovative hybrid-attention architecture, and supports an ultra-long 1M context length. Built on a powerful model base, we continuously scale compute across broader agent scenarios, further expanding the agent’s action space and achieving an important generalization from coding to claw.","pricing":{"cache_read":0.0031,"input":0.155,"output":0.31},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":256000},{"model_id":"qwen3.6-27b","model_name":"Qwen3.6 27B","developer_id":13,"desc":"The Qwen3.6 series 27B native vision-language Dense model. Compared with the 3.5-27B, the model notably improves Agentic coding capability and further enhances STEM and reasoning abilities; on the visual modality side, spatial intelligence, object localization and detection capabilities are significantly strengthened, and video understanding, document OCR, and visual agent capabilities have steadily improved.","pricing":{"input":0.422,"output":2.532},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":254000},{"model_id":"qwen3.6-35b-a3b","model_name":"Qwen3.6 35B A3B","developer_id":13,"desc":"Qwen 3.6, the native vision-language Plus series model, demonstrates outstanding performance comparable to the current top cutting-edge models, with a significant improvement over the 3.5 series. The model's capabilities have been markedly enhanced in Agentic coding, front-end programming, Vibe coding and other coding abilities, universal multimodal recognition, OCR, object localization, and more.","pricing":{"cache_read":0.254,"input":0.254,"output":1.524},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":254000},{"model_id":"qwen3.6-flash","model_name":"Qwen3.6 Flash","developer_id":13,"desc":"Qwen 3.6, the native vision-language Plus series model, demonstrates outstanding performance comparable to the current top cutting-edge models, with a significant improvement over the 3.5 series. The model's capabilities have been markedly enhanced in Agentic coding, front-end programming, Vibe coding and other coding abilities, universal multimodal recognition, OCR, object localization, and more.","pricing":{"cache_read":0.0169,"cache_write":0.21125,"input":0.169,"output":1.014},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"gpt-chat-latest","model_name":"GPT Chat","developer_id":12,"desc":"GPT Chat Latest points to OpenAI's stable API alias chat-latest that always resolves to the latest Instant chat model used in ChatGPT. As OpenAI rolls out new Instant model updates in the future, they are routed behind this slug automatically.","pricing":{"cache_read":0.5,"input":5,"output":30},"types":"llm","features":"thinking,function_calling,structured_outputs,web,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":1050000},{"model_id":"claude-opus-4-7","model_name":"Claude Opus 4.7","developer_id":2,"desc":"Claude Opus 4.7 is Anthropic’s latest and most powerful publicly available model. It has high autonomy and performs exceptionally well on long-horizon agent tasks, knowledge work, vision tasks, and memory tasks.  The claude-opus-4-7 model does not display thinking content by default; you need to set the extra parameter \"display\": \"summarized\" to enable it.This page summarizes all the new features at release.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"claude-opus-4-7-think","model_name":"Claude Opus 4.7 Thinking","developer_id":2,"desc":"The claude-opus-4-7-think model has adaptive thinking mode pre-enabled; the default thinking intensity is \"medium\", and it can be invoked directly via the OpenAI unified API. The claude-opus-4-7 model does not display thinking content by default; you need to set the extra parameter \"display\": \"summarized\" to enable it.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"image,text","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"cohere-rerank-v4.0-fast","model_name":"Cohere Rerank V4.0 Fast","developer_id":6,"desc":"Rerank 4 is the most advanced set of reranker models available today, purpose-built to meet the realities and challenges of enterprise AI search. It delivers best-in-class retrieval, outperforming the likes of MongoDB’s Voyage models and ElasticSearch’s Jina rerankers in overall search relevance, as well as improved latency, flexible deployment options, deep customizability and robust multilingual performance. Designed for business-critical applications across key industries and domains, Rerank 4 sets a new standard for accuracy and adaptability in enterprise search.","pricing":{"cache_read":0.068,"input":0.068,"output":0},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"cohere-rerank-v4.0-pro","model_name":"Cohere Rerank V4.0 Pro","developer_id":6,"desc":"Rerank 4 is the most advanced set of reranker models available today, purpose-built to meet the realities and challenges of enterprise AI search. It delivers best-in-class retrieval, outperforming the likes of MongoDB’s Voyage models and ElasticSearch’s Jina rerankers in overall search relevance, as well as improved latency, flexible deployment options, deep customizability and robust multilingual performance. Designed for business-critical applications across key industries and domains, Rerank 4 sets a new standard for accuracy and adaptability in enterprise search.","pricing":{"cache_read":0.068,"input":0.068,"output":0},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"mai-image-2e","model_name":"Mai Image 2e","developer_id":3,"desc":"MAI-Image-2-Efficient is designed for builders who need high-quality image generation at speed and scale.","pricing":{"cache_read":0,"input":5,"output":19.5},"types":"image_generation","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-image-2.0","model_name":"Qwen Image 2.0","developer_id":13,"desc":"The Qwen-Image-2.0 series accelerated models integrate image generation and image editing; they offer more professional text rendering with support for 1k-token instructions, finer realistic textures and delicate portrayal of photorealistic scenes, and stronger semantic adherence. The accelerated version effectively achieves an optimal balance between model quality and performance.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-image-2.0-pro","model_name":"Qwen Image 2.0 Pro","developer_id":13,"desc":"The Qwen-Image-2.0 full-powered models achieve the integration of image generation and image editing; they offer more professional text rendering with support for 1k-token instructions, more refined photorealistic textures and delicate depiction of realistic scenes, and stronger semantic adherence. The full-powered version delivers the strongest text rendering capability and realism in the 2.0 series.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-4-20-non-reasoning","model_name":"Grok 4 20","developer_id":9,"desc":"Grok 4.2 is xAI’s latest large language model, built for strong reasoning, multimodal understanding, and enterprise use. It improves instruction following, honesty, and calibration over earlier Grok versions, while supporting both single‑agent and multi‑agent workflows. Designed as a general‑purpose, truth‑seeking assistant, Grok 4.2 is well suited for research, analysis, coding, and complex professional tasks when deployed with appropriate guardrails.","pricing":{"cache_read":0.2,"input":2,"output":6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"grok-4-20-reasoning","model_name":"Grok 4 20 (reasoning)","developer_id":9,"desc":"Grok 4.2 is xAI’s latest large language model, built for strong reasoning, multimodal understanding, and enterprise use. It improves instruction following, honesty, and calibration over earlier Grok versions, while supporting both single‑agent and multi‑agent workflows. Designed as a general‑purpose, truth‑seeking assistant, Grok 4.2 is well suited for research, analysis, coding, and complex professional tasks when deployed with appropriate guardrails.","pricing":{"cache_read":0.2,"input":2,"output":6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"doubao-seedance-2-0-260128","model_name":"Doubao Seedance 2.0 260128","developer_id":4,"desc":"The Doubao large-model team has launched a new-generation professional-grade multimodal video-creation model, Seedance 2.0. It supports images, videos, and audio as multimodal reference inputs to generate videos, and also offers video editing and extension capabilities. It can accurately reproduce various details while maintaining stable character features, delivers highly realistic audiovisual stability, and is deeply adapted to core scenarios such as commercial advertising, film and TV production, and social media marketing.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seedance-2-0-fast-260128","model_name":"Doubao Seedance 2.0 Fast 260128","developer_id":4,"desc":"Seedance 2.0 fast is a next-generation multimodal video-creation model launched by the Doubao large-model team; it inherits the core features and advantages of Seedance 2.0 and generates content faster.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-5.1","model_name":"GLM 5.1","developer_id":5,"desc":"GLM-5.1 is Zhipu's latest flagship model, with greatly enhanced coding capabilities and significantly improved long-range task performance. It can continuously and autonomously work for up to 8 hours on a single task, completing the full closed loop from planning and execution to iterative optimization, delivering engineering-grade results.\nIn terms of general capability and coding ability, GLM-5.1's overall performance aligns with Claude Opus 4.6, and it demonstrates stronger sustained work capability in long-range autonomous execution, complex engineering optimization, and real-world development scenarios, making it an ideal foundation for building Autonomous Agents and long-horizon Coding Agents.","pricing":{"cache_read":0.183112,"input":0.845,"output":3.38},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":128000,"context_length":200000},{"model_id":"glm-image","model_name":"GLM Image","developer_id":5,"desc":"GLM-Image is Zhipu AI's new flagship image generation model. The model is trained entirely on domestic chips and adopts an original hybrid architecture combining \"autoregressive + diffusion decoder,\" balancing global instruction understanding with local detail depiction. It overcomes generation challenges in knowledge-intensive scenarios such as posters, PPTs, and popular science illustrations. This represents an important exploration towards the new generation of \"cognitive generation\" technology paradigms exemplified by Nano Banana Pro.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.7-r2v","model_name":"Wan2.7 R2v","developer_id":13,"desc":"Wanxiang 2.7 — reference-driven video generation: more stable references for characters, props, and scenes; supports up to five mixed image/video references; supports audio/timbre references; combined with foundational capability upgrades to deliver stronger performance.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.7-videoedit","model_name":"Wan2.7 Videoedit","developer_id":13,"desc":"Wanxiang 2.7 — video editing: edit videos using natural-language commands, supporting local or global edits; can use reference images to replace video elements; supports replicating video actions, special effects, camera movements, and other dynamic processes.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.7-t2v","model_name":"Wan2.7 T2v","developer_id":13,"desc":"Wanxiang 2.7 — text-to-video: performance capabilities comprehensively upgraded. Dramatic/dialogue scenes convey delicate, natural emotions; action scenes are intense, with every punch landing hard. Paired with more dramatic and rhythmically paced camera cuts, it achieves stronger overall acting performance.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.7-i2v","model_name":"Wan2.7 I2v","developer_id":13,"desc":"Wanxiang 2.7 — image-to-video: performance capabilities comprehensively upgraded. Dramatic/dialogue scenes convey delicate, natural emotions; action scenes are fierce, with every punch landing hard. Combined with more dramatic, rhythmically paced camera cuts, it achieves stronger overall performance.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3.6-plus","model_name":"Qwen3.6 Plus","developer_id":13,"desc":"Qwen 3.6, the native vision-language Plus series model, demonstrates outstanding performance comparable to the current top cutting-edge models, with a significant improvement over the 3.5 series. The model's capabilities have been markedly enhanced in Agentic coding, front-end programming, Vibe coding and other coding abilities, universal multimodal recognition, OCR, object localization, and more.","pricing":{"cache_read":0.0282,"cache_write":0.3525,"input":0.282,"output":1.692},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"gemma-4-26b-a4b-it","model_name":"Gemma 4 26B A4B It","developer_id":8,"desc":"A Mixture-of-Experts model that activates only 4B parameters per inference,delivering high-performance reasoning with a fraction of the memory cost - idealfor cost-efficient, high-throughput server deployments.","pricing":{"cache_read":0,"input":0.14,"output":0.39998},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":131100,"context_length":262100},{"model_id":"gemma-4-31b-it","model_name":"Gemma 4 31B It","developer_id":8,"desc":"Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.","pricing":{"cache_read":0,"input":0.14,"output":0.39998},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":131100,"context_length":262100},{"model_id":"gpt-5.4","model_name":"GPT 5.4","developer_id":12,"desc":"GPT-5.4 is our frontier model for complex professional work.Reasoning.effort supports: none (default), low, medium, high and xhigh.","pricing":{"cache_read":0.25,"input":2.5,"output":15},"types":"llm","features":"thinking,function_calling,structured_outputs,web,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"cc-k2.6-code-preview","model_name":"CC K2.6 Code Preview","developer_id":15,"desc":"for claude code","pricing":{"cache_read":0.02,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.7-image-pro","model_name":"Wan2.7 Image Pro","developer_id":13,"desc":"Wanxiang 2.7 — image generation and editing: supports text-to-image, text-to-multi-image, image-to-multi-image, image editing, multi-image reference generation, and interactive editing, with stronger performance in text rendering, subject consistency, and following complex instructions.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.7-image","model_name":"Wan2.7 Image","developer_id":13,"desc":"Wanxiang 2.7 — image generation and editing: supports text-to-image, text-to-multi-image, image-to-multi-image, image editing, multi-image reference generation, and interactive editing, with stronger performance in text rendering, subject consistency, and following complex instructions.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seed-2-0-lite-260428","model_name":"Doubao Seed 2.0 Lite 260428","developer_id":4,"desc":"Doubao  Coding model optimized for real-world programming environments that can reliably invoke tools in common IDEs such as Claude Code. The model is specially optimized for frontend capabilities and performs well with common frontend frameworks. The model supports Skills and can work with various custom skills.","pricing":{"cache_read":0.018082,"input":0.09041,"output":0.54246},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,video,audio","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"doubao-seed-2-0-mini-260428","model_name":"Doubao Seed 2.0 Mini 260428","developer_id":4,"desc":"Doubao  Coding model optimized for real-world programming environments that can reliably invoke tools in common IDEs such as Claude Code. The model is specially optimized for frontend capabilities and performs well with common frontend frameworks. The model supports Skills and can work with various custom skills.","pricing":{"cache_read":0.00564,"input":0.0282,"output":0.282},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"gemini-3.1-flash-image-preview","model_name":"Gemini 3.1 Flash Image Preview","developer_id":8,"desc":"gemini-3.1-flash-image-preview (Nano Banana 2) features professional-grade visual intelligence, lightning-fast efficiency, and realistic, grounded generative capabilities. This model serves as the high-efficiency counterpart to Gemini 3 Pro Image, optimized for speed and high-volume developer use cases.","pricing":{"cache_read":0.5,"input":0.5,"output":3},"types":"image_generation,llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-3.1-flash-lite-preview","model_name":"Gemini 3.1 Flash Lite Preview","developer_id":8,"desc":"gemini-3.1-flash-lite-preview is currently Google's latest and most cost-effective model, optimized for large-scale agent-based tasks, translation, and simple data processing.","pricing":{"cache_read":0.25,"input":0.25,"output":1.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,web,deepsearch,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"gemini-3.1-pro-preview","model_name":"Gemini 3.1 Pro Preview","developer_id":8,"desc":"Gemini 3.1 Pro Preview is designed to further optimize the performance and reliability of the Gemini 3 Pro series, offering improved reasoning capabilities, greater token efficiency, and a more robust, factually consistent user experience. It is optimized for software-engineering behaviors and usability, and is also suitable for agent workflows that require precise tool invocation and reliable multi-step execution, enabling stable operation across a variety of real-world scenarios.","pricing":{"cache_read":0.2,"input":2,"output":12},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,web,deepsearch,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"gemini-3.1-pro-preview-customtools","model_name":"Gemini 3.1 Pro Preview Customtools","developer_id":8,"desc":"gemini-3.1-pro-preview-customtools\n\nFor users who build applications mixing bash and custom tools, the Gemini 3.1 Pro preview provides a separate endpoint accessible via the API call gemini-3.1-pro-preview-customtools. This endpoint is better at prioritizing your custom tools (for example, view_file or search_code).\n\nPlease note that while gemini-3.1-pro-preview-customtools is optimized for agent workflows that use custom tools and Bash, you may experience quality fluctuations in some use cases that cannot benefit from these tools.","pricing":{"cache_read":0.2,"input":2,"output":12},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,web,deepsearch,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"gemini-3.1-pro-preview-search","model_name":"Gemini 3.1 Pro Preview Search","developer_id":8,"desc":"Gemini-3.1-pro-preview-search integrates Google's official search functionality; the search feature incurs an additional separate fee log directly incorporated into the scoring, but the log details are not displayed; this will be fixed in the future to show the details; it only supports OpenAI-compatible format calls and does not support the Gemini SDK; for the Gemini native SDK, please directly set the official search parameters.","pricing":{"cache_read":0.2,"input":2,"output":12},"types":"llm","features":"thinking,web,deepsearch,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions","max_output":0,"context_length":0},{"model_id":"gpt-5.4-mini","model_name":"GPT 5.4 Mini","developer_id":12,"desc":"GPT-5.4 mini is a faster, more efficient model that inherits the advantages of GPT-5.4 and is specifically optimized for high-volume workloads.","pricing":{"cache_read":0.075,"input":0.75,"output":4.5},"types":"llm","features":"tools,function_calling,structured_outputs,web","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.4-nano","model_name":"GPT 5.4 Nano","developer_id":12,"desc":"GPT-5.4 nano is designed for tasks where speed and cost are most important, such as classification, data extraction, ranking, and sub-agent scenarios.","pricing":{"cache_read":0.02,"input":0.2,"output":1.25},"types":"llm","features":"tools,function_calling,structured_outputs,thinking,web","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.5-free","model_name":"GPT 5.5 (free)","developer_id":12,"desc":"This free model API comes from the OpenAI model deployed on Azure. To prevent abuse, the external content filter provided by Azure has been enforced, which will result in additional delays. If you want to experience the full version of the model API without filters, please use the paid version and request the model name ID without \"-free\".To ensure stable service operation, usage limits are set: up to 5 requests per minute, no more than 500 requests per day, and a daily quota of 1,000,000 tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,function_calling,structured_outputs,web,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":1050000},{"model_id":"qwen3.5-plus","model_name":"Qwen3.5 Plus","developer_id":13,"desc":"The Qwen 3.5 native vision-language Plus model is built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In multiple task evaluations, the 3.5 series has demonstrated outstanding performance comparable to current leading frontier models, with leapfrog improvements over the 3 series in both pure-text and multimodal capabilities. This model version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.","pricing":{"cache_read":0.01096,"cache_write":0.137,"input":0.1096,"output":0.6576},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"claude-sonnet-4-6","model_name":"Claude Sonnet 4.6","developer_id":2,"desc":"Claude Sonnet 4.6 delivers frontier intelligence at scale—built for coding, agents, and enterprise workflows. Use cases include: Agents: Sonnet 4.6 excels at complex, multi-step tasks requiring sustained reasoning and adaptive decision-making—ideal for workflows where reliability and autonomy matter most. Coding: Sonnet 4.6 is built for iterative development work, handling complex codebases without losing quality as you guide it through building, refactoring, and debugging. It can compress multi-day projects into hours with the technical depth to deliver production-ready solutions. Enterprise workflows: Sonnet 4.6 powers agents that manage professional projects from start to finish, leveraging memory to maintain context across files with a step-change improvement in creating spreadsheets, slides, and docs. Financial analysis: Sonnet 4.6 connects the dots across regulatory filings, market reports, and internal data—enabling sophisticated modeling and proactive compliance. Cybersecurity: Sonnet 4.6 brings professional-grade analysis to security workflows, correlating logs, vulnerability databases, and threat intelligence for proactive threat detection and automated incident response. Computer use: Sonnet 4.6 delivers confident, consistent navigation with more human-like browsing—enabling better web QA, workflow automation, and advanced user experiences.","pricing":{"cache_read":0.3,"cache_write":3.75,"input":3,"output":15},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"coding-xiaomi-mimo-v2.5","model_name":"Coding Xiaomi Mimo V2.5","developer_id":31,"desc":"Only supports OpenAI-compatible formats.","pricing":{"cache_read":0.0016,"input":0.08,"output":0.16},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-xiaomi-mimo-v2.5-pro","model_name":"Coding Xiaomi Mimo V2.5 Pro","developer_id":31,"desc":"Only supports OpenAI-compatible formats.","pricing":{"cache_read":0.0016,"input":0.2,"output":0.4},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-sonnet-4-6-think","model_name":"Claude Sonnet 4.6 Thinking","developer_id":2,"desc":"Claude sonnet 4.6 does not enable reasoning mode by default. To access its deep reasoning capabilities, users would typically need to call the native Claude API. To make this capability available through an OpenAI-compatible interface, we provide the claude-sonnet-4-6-think model, which has reasoning mode pre-enabled and a default 32k-token context window, allowing it to be called directly via the OpenAI unified API.\nClaude sonnet 4.5 Think is a reasoning-focused variant of Claude sonnet 4.6 designed for advanced tasks that require rigorous reasoning, complex decision-making, and long-chain analysis. Aside from its enhanced reasoning mechanism, all other capabilities remain consistent with the standard Claude sonnet 4.6 model, making it well suited for complex engineering problem decomposition, multi-stage planning, and logic-intensive analysis.","pricing":{"cache_read":0.3,"cache_write":3.75,"input":3,"output":15},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"image,text","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"coding-xiaomi-mimo-v2-omni","model_name":"Coding Xiaomi Mimo V2 Omni","developer_id":31,"desc":"Only supports OpenAI-compatible formats.","pricing":{"cache_read":0.016,"input":0.08,"output":0.4},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-xiaomi-mimo-v2-pro","model_name":"Coding Xiaomi Mimo V2 Pro","developer_id":31,"desc":"Only supports OpenAI-compatible formats.","pricing":{"cache_read":0.04,"input":0.2,"output":0.6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-5.3-chat-latest","model_name":"GPT 5.3 Chat","developer_id":12,"desc":"GPT-5.3Chat refers to the GPT-5.3 snapshot currently used in ChatGPT and is optimized for conversational use cases. While GPT-5.2 is recommended for most API applications, GPT-5.3Chat is ideal for testing the latest improvements in chat-based interactions.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-5.3-codex","model_name":"GPT-5.3-Codex","developer_id":12,"desc":"GPT-5.3-Codex is optimized for agentic coding tasks in Codex or similar environments. GPT-5.3-Codex supports low, medium, high, and xhigh reasoning effort settings. If you want to learn more about prompting GPT-5.3-Codex, refer to our dedicated guide.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-image-2-free","model_name":"GPT Image 2 (free)","developer_id":12,"desc":"This free model API comes from the OpenAI model deployed on Azure. To prevent abuse, the external content filter provided by Azure has been enforced, which will result in additional delays. If you want to experience the full version of the model API without filters, please use the paid version and request the model name ID without \"-free\".","pricing":{"cache_read":0,"input":0,"output":0},"types":"image_generation,llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3.5-122b-a10b","model_name":"Qwen3.5 122B A10B","developer_id":13,"desc":"The Qwen 3.5 native vision-language Plus model is built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In multiple task evaluations, the 3.5 series has demonstrated outstanding performance comparable to current leading frontier models, with leapfrog improvements over the 3 series in both pure-text and multimodal capabilities. This model version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.","pricing":{"cache_read":0.1126,"input":0.1126,"output":0.9008},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"qwen3.5-27b","model_name":"Qwen3.5 27B","developer_id":13,"desc":"The Qwen 3.5 native vision-language Plus model is built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In multiple task evaluations, the 3.5 series has demonstrated outstanding performance comparable to current leading frontier models, with leapfrog improvements over the 3 series in both pure-text and multimodal capabilities. This model version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.","pricing":{"cache_read":0.0846,"input":0.0846,"output":0.6768},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"qwen3.5-35b-a3b","model_name":"Qwen3.5 35B A3B","developer_id":13,"desc":"The Qwen 3.5 native vision-language Plus model is built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In multiple task evaluations, the 3.5 series has demonstrated outstanding performance comparable to current leading frontier models, with leapfrog improvements over the 3 series in both pure-text and multimodal capabilities. This model version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.","pricing":{"cache_read":0.0564,"input":0.0564,"output":0.4512},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"qwen3.5-397b-a17b","model_name":"Qwen3.5 397B A17B","developer_id":13,"desc":"The Qwen 3.5 native vision-language Plus model is built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In multiple task evaluations, the 3.5 series has demonstrated outstanding performance comparable to current leading frontier models, with leapfrog improvements over the 3 series in both pure-text and multimodal capabilities. This model version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.","pricing":{"cache_read":0.1644,"input":0.1644,"output":0.9864},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"qwen3.5-flash","model_name":"Qwen3.5 Flash","developer_id":13,"desc":"The Qwen3.5 native vision-language Flash series models are designed with a hybrid architecture that integrates linear attention mechanisms and sparse mixture-of-experts models, achieving higher inference efficiency. Compared with the 3 series, the models deliver leapfrog improvements in both pure-text and multimodal performance; they respond quickly and combine inference speed with high performance.","pricing":{"cache_read":0.00282,"cache_write":0.03525,"input":0.0282,"output":0.282},"types":"llm","features":"tools,function_calling,structured_outputs,web,long_context,thinking","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":991000},{"model_id":"qwen3-coder-next","model_name":"Qwen3 Coder Next","developer_id":13,"desc":"The Qwen3 series is a next-generation code-generation model with results close to Qwen3-Coder-Plus while offering superior performance. The model is optimized for repository-level understanding, supports multi-turn tool interactions, and improves compatibility with agentic coding tools.","pricing":{"cache_read":0.137,"input":0.137,"output":0.548},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":64000,"context_length":2000000},{"model_id":"coding-glm-5.1","model_name":"Coding GLM 5.1","developer_id":5,"desc":"Only supports OpenAI-compatible formats.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"xiaomi-mimo-v2.5-pro-free","model_name":"Xiaomi Mimo V2.5 Pro (free)","developer_id":31,"desc":"xiaomi-mimo-v2.5-pro-free is the open free version of xiaomi-mimo-v2.5-pro5. To ensure stable service operation, usage limits are set: up to 5 requests per minute, no more than 500 requests per day, and a daily quota of 1,000,000 tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":256000},{"model_id":"xiaomi-mimo-v2.5-free","model_name":"Xiaomi Mimo V2.5 (free)","developer_id":31,"desc":"xiaomi-mimo-v2.5-free is the open free version of xiaomi-mimo-v2.5. To ensure stable service operation, usage limits are set: up to 5 requests per minute, no more than 500 requests per day, and a daily quota of 1,000,000 tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":256000},{"model_id":"xiaomi-mimo-v2-pro-free","model_name":"Xiaomi Mimo V2 Pro (free)","developer_id":31,"desc":"xiaomi-mimo-v2-pro-free is the open free version of xiaomi-mimo-v2-pro. To ensure stable service operation, usage limits are set: up to 5 requests per minute, no more than 500 requests per day, and a daily quota of 1,000,000 tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":256000},{"model_id":"xiaomi-mimo-v2-omni-free","model_name":"Xiaomi Mimo V2 Omni (free)","developer_id":31,"desc":"xiaomi-mimo-v2-omni-free is the open free version of xiaomi-mimo-v2-omni. To ensure stable service operation, usage limits are set: up to 5 requests per minute, no more than 500 requests per day, and a daily quota of 1,000,000 tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":256000},{"model_id":"doubao-seed-2-0-pro","model_name":"Doubao Seed 2.0 Pro","developer_id":4,"desc":"Doubao flagship all-purpose general model, targeting complex reasoning and long-chain task execution scenarios in the Agent era. It emphasizes multimodal understanding, long-context reasoning, structured generation, and tool-augmented execution. It excels at handling complex instructions and multi-constraint execution, reliably addressing multi-step complex planning, intricate image-text reasoning, video content understanding, and high-difficulty analysis scenarios.","pricing":{"cache_read":0.09644,"input":0.4822,"output":2.411},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":128000,"context_length":256000},{"model_id":"gpt-5.4-high","model_name":"GPT 5.4 High","developer_id":12,"desc":"GPT-5.4 supports configurable reasoning effort only through the /responses endpoint. To make higher-intensity reasoning available directly via the /chat interface, GPT-5.4-High is provided as a reasoning-enhanced variant of GPT-5.4 with reasoning_effort preset to high. It is designed for tasks that require deeper analysis, stronger result consistency, and greater controllability. By applying more aggressive reasoning strategies and more effective use of extended context, the model delivers clearer and more reliable responses, making it well suited for complex agent workflows, long-chain decision-making, and reliability-critical advanced applications.","pricing":{"cache_read":0.25,"input":2.5,"output":15},"types":"llm","features":"thinking,function_calling,web,structured_outputs,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.4-low","model_name":"GPT 5.4 Low","developer_id":12,"desc":"GPT-5.4 supports configuring reasoning strength only through the /responses endpoint. To make lower-overhead reasoning available directly in the /chat endpoint, the GPT-5.4-Low model is provided. This model is based on GPT-5.4 with reasoning_effort preset to low. This model is designed for use cases that are sensitive to response latency and cost. By adopting a lighter reasoning strategy, it delivers stable responses with lower latency and higher throughput. It is well suited for high-concurrency conversations, real-time interactions, basic Q\u0026A, and scenarios where deep reasoning is not required.","pricing":{"cache_read":0.25,"input":2.5,"output":15},"types":"llm","features":"thinking,function_calling,web,structured_outputs,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.4-pro","model_name":"GPT 5.4 Pro","developer_id":12,"desc":"Please note: this model is extremely expensive and very slow. If a request fails due to network issues, you may still be charged heavily; we cannot refund charges incurred by requests to this model.\nGPT-5.4 pro is available in the Responses API only to enable support for multi-turn model interactions before responding to API requests, and other advanced API features in the future. Since GPT-5.4 pro is designed to tackle tough problems, some requests may take several minutes to finish. To avoid timeout, please set a longer timeout duration. It is recommended to use this under good network conditions.","pricing":{"cache_read":30,"input":30,"output":180},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":1050000},{"model_id":"glm-5","model_name":"GLM 5","developer_id":5,"desc":"GLM-5 is an advanced, open-source large language model designed for developers tackling the toughest challenges. It excels at long-context reasoning, multi-step tool orchestration, and complex systems engineering, making it the ideal choice for powering sophisticated agents and applications that require high-level cognitive tasks.","pricing":{"cache_read":0.176,"input":0.88,"output":2.816},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":202752},{"model_id":"glm-5v-turbo","model_name":"GLM 5 Vision Turbo","developer_id":5,"desc":"GLM-5V-Turbo is Zhipu's first multimodal coding foundation model, built for visual programming tasks. It can natively handle multimodal inputs such as images, videos, and text, and is adept at long-horizon planning, complex programming, and action execution; deeply adapted to Agent workflows, it can collaborate closely with agents like Claude Code and OpenClaw to complete the full closed loop of \"understand the environment → plan actions → execute tasks.\"","pricing":{"cache_read":0.169008,"input":0.7042,"output":3.09848},"types":"llm","features":"","input_modalities":"text,image,video","endpoints":"","max_output":128000,"context_length":200000},{"model_id":"claude-opus-4-6","model_name":"Claude Opus 4.6","developer_id":2,"desc":"Claude Opus 4.6 is Anthropic’s latest state-of-the-art reasoning model. It features an adaptive “thinking” mode that dynamically decides when to think and how much to think. At the default effort level (high), Claude will almost always engage in thinking. At lower effort levels, it may skip thinking for simple problems.\n ⚠️ The minimum cache token for claude-opus-4-6 has been increased from 1,024 to 4,096 tokens.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"coding-glm-5.1-free","model_name":"Coding GLM 5.1 (free)","developer_id":5,"desc":"coding-glm-5.1-free is the open and free version of coding-glm-5.1. To ensure stable service performance, usage limits are in place: up to 5 requests per minute, 500 requests per day, and a daily token allowance of 1 million.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-5.2-free","model_name":"Coding GLM 5.2 (free)","developer_id":5,"desc":"coding-glm-5.2-free is the open and free version of coding-glm-5.2. To ensure stable service performance, usage limits are in place: up to 5 requests per minute, 500 requests per day, and a daily token allowance of 1 million.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-minimax-m2.7-free","model_name":"Coding MiniMax M2.7 (free)","developer_id":18,"desc":"coding-minimax-m2.7-free is a free and open version offered by AIHubMix specifically for MiniMax users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute, 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"minimax-m2.7","model_name":"MiniMax M2.7","developer_id":18,"desc":"MiniMax M2.7 can autonomously build complex Agent Harnesses and, leveraging capabilities such as Agent Teams, complex Skills, and the Tool Search tool, complete highly complex productivity tasks.","pricing":{"cache_read":0.05916,"input":0.2958,"output":1.1832},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":128000,"context_length":200000},{"model_id":"claude-opus-4-6-think","model_name":"Claude Opus 4.6 Thinking","developer_id":2,"desc":"Claude Opus 4.6 does not enable reasoning mode by default. To access its deep reasoning capabilities, users would typically need to call the native Claude API. To make this capability available through an OpenAI-compatible interface, we provide the claude-opus-4-6-think model, which has reasoning mode pre-enabled and a default 32k-token context window, allowing it to be called directly via the OpenAI unified API.\nClaude Opus 4.5 Think is a reasoning-focused variant of Claude Opus 4.6 designed for advanced tasks that require rigorous reasoning, complex decision-making, and long-chain analysis. Aside from its enhanced reasoning mechanism, all other capabilities remain consistent with the standard Claude Opus 4.6 model, making it well suited for complex engineering problem decomposition, multi-stage planning, and logic-intensive analysis.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"image,text","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"coding-glm-5-free","model_name":"Coding GLM 5 (free)","developer_id":5,"desc":"coding-glm-5-free is the open and free version of coding-glm-5. To ensure stable service performance, usage limits are in place: up to 5 requests per minute, 500 requests per day, and a daily token allowance of 1 million.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-5-turbo-free","model_name":"Coding GLM 5 Turbo (free)","developer_id":5,"desc":"coding-glm-5-turbo-free is the open and free version of coding-glm-5-turbo. To ensure stable service performance, usage limits are in place: up to 5 requests per minute, 500 requests per day, and a daily token allowance of 1 million.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-minimax-m2.5-free","model_name":"Coding MiniMax M2.5 (free)","developer_id":18,"desc":"coding-minimax-m2.5-free is a free and open version offered by AIHubMix specifically for MiniMax users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute, 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"doubao-seed-2-0-code-preview","model_name":"Doubao Seed 2.0 Code Preview","developer_id":4,"desc":"The Doubao 2.0 series is a coding model optimized for real programming environments, capable of reliably invoking tools in common IDEs such as Claude Code. The model is specially optimized for frontend capabilities and performs well with common frontend frameworks. The model supports using Skills and can work with various custom skills.","pricing":{"cache_read":0.09644,"input":0.4822,"output":2.411},"types":"llm","features":"thinking,web,tools,function_calling","input_modalities":"text,image,video","endpoints":"","max_output":128000,"context_length":256000},{"model_id":"doubao-seed-2-0-lite-260215","model_name":"Doubao Seed 2.0 Lite 260215","developer_id":4,"desc":"Doubao  Coding model optimized for real-world programming environments that can reliably invoke tools in common IDEs such as Claude Code. The model is specially optimized for frontend capabilities and performs well with common frontend frameworks. The model supports Skills and can work with various custom skills.","pricing":{"cache_read":0.018082,"input":0.09041,"output":0.54246},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"doubao-seed-2-0-mini","model_name":"Doubao Seed 2.0 Mini","developer_id":4,"desc":"Doubao 2.0 series is designed for low-latency, high-concurrency, and cost-sensitive scenarios, emphasizing fast responses and flexible inference deployment. Model performance is comparable to Doubao-Seed-1.6. It supports a 256k context window, four levels of thinking length, and multimodal understanding, making it suitable for lightweight tasks that prioritize cost and speed.","pricing":{"cache_read":0.006027,"input":0.030136,"output":0.30136},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"gemini-3-flash-preview","model_name":"Gemini 3 Flash Preview","developer_id":8,"desc":"gemini-3-flash-preview is Google's latest released, most balanced model, excelling in speed, scale, and cutting-edge intelligence.","pricing":{"cache_read":0.05,"input":0.5,"output":3},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,audio","endpoints":"","max_output":0,"context_length":1048576},{"model_id":"gemini-3-flash-preview-search","model_name":"Gemini 3 Flash Preview Search","developer_id":8,"desc":"Gemini-3-flash-preview-search integrates Google's official search functionality; the search feature incurs an additional separate fee log directly incorporated into the scoring, but the log details are not displayed; this will be fixed in the future to show the details; it only supports OpenAI-compatible format calls and does not support the Gemini SDK; for the Gemini native SDK, please directly set the official search parameters.","pricing":{"cache_read":0.05,"input":0.5,"output":3},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,audio","endpoints":"","max_output":1048576,"context_length":1048576},{"model_id":"glm-5-turbo","model_name":"GLM 5 Turbo","developer_id":5,"desc":"GLM-5-Turbo is a foundational model deeply optimized for the OpenClaw scenario. From the training stage it has been specifically optimized for the core requirements of OpenClaw tasks, enhancing key capabilities such as tool invocation, instruction following, scheduled and persistent tasks, and long-chain execution.","pricing":{"cache_read":0.24,"input":1.2,"output":3.9996},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":202752},{"model_id":"embed-v-4-0","model_name":"Embed V 4.0","developer_id":6,"desc":"Cohere’s Embed 4 is a multilingual multimodal embedding model. It is capable of transforming different modalities such as images, texts, and interleaved images and texts into a single vector representation. Embed 4 offers state-of-the-art performance across all modalities (texts, images, interleaved texts and image) and in both English and multilingual settings. Embed 4 supports a 128k context length and an images can have a maximum of 2MM pixels.","pricing":{"cache_read":0.12,"input":0.12,"output":0},"types":"embedding","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":128000},{"model_id":"ernie-image-turbo","model_name":"ERNIE Image Turbo","developer_id":25,"desc":"The Ernie-image-Turbo model is an 8-step distilled version of the Ernie-image model, also with 8 billion parameters, offering a 6x speedup compared to pre-distillation, and is suitable for low-latency, local/on-device scenarios.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-3.1-flash-image-preview-free","model_name":"Gemini 3.1 Flash Image Preview (free)","developer_id":8,"desc":"This model is the free trial version of gemini-3.1-flash-image-preview (officially marketed as Nano Banana 2). To ensure stable service operation, usage limits have been set and resources are limited and may not be available at all times. For a stable version, please choose the official release; the request name for the official release is: gemini-3.1-flash-image-preview.","pricing":{"cache_read":0,"input":0,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-glm-5.1","model_name":"CC GLM 5.1","developer_id":5,"desc":"Supports Claude native interface, can be directly requested in Claude Code.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-opus-4-5","model_name":"Claude Opus 4.5","developer_id":2,"desc":"Claude Opus 4.5 is Anthropic’s latest frontier reasoning model, optimized for complex engineering, agentic workflows, and long-horizon computer use. It features strong multimodal capabilities, improved resistance to prompt injection, and a new Verbosity parameter to control token efficiency. With advanced tool use, extended context, and multi-agent support, Opus 4.5 excels in autonomous research, debugging, planning, and spreadsheet/browser operations.\n⚠️ The minimum cache token for claude-opus-4-5 has been increased from 1,024 to 4,096 tokens.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"claude-opus-4-5-think","model_name":"Claude Opus 4.5 Thinking","developer_id":2,"desc":"Claude Opus 4.5 does not enable reasoning mode by default. To access its deep reasoning capabilities, users would typically need to call the native Claude API. To make this capability available through an OpenAI-compatible interface, we provide the claude-opus-4-5-think model, which has reasoning mode pre-enabled and a default 32k-token context window, allowing it to be called directly via the OpenAI unified API.\nClaude Opus 4.5 Think is a reasoning-focused variant of Claude Opus 4.5 designed for advanced tasks that require rigorous reasoning, complex decision-making, and long-chain analysis. Aside from its enhanced reasoning mechanism, all other capabilities remain consistent with the standard Claude Opus 4.5 model, making it well suited for complex engineering problem decomposition, multi-stage planning, and logic-intensive analysis.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"image,text","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"coding-step-3.7-flash-free","model_name":"Coding Step 3.7 Flash (free)","developer_id":16,"desc":"coding-step-3.7-flash-free is the free, publicly available version of coding-step-3.7-flash, offering the same model capabilities with usage limits in place to ensure service stability. Limits include up to 5 requests per minute, a maximum of 250 requests per day, and a daily quota of 500,000 tokens. Free usage is based on shared capacity and is limited in availability. This version is intended for testing and light usage; for consistent and reliable access, please switch to the paid model.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"mimo-v2-omni","model_name":"MiMo V2 Omni","developer_id":31,"desc":"MiMo-V2-Omni is designed for complex real-world multimodal interaction and execution scenarios. We've built an all-modal foundation from the ground up that fuses text, vision, and speech, and use a unified architecture to deeply bind \"perception\" and \"action.\" This not only breaks the traditional models' limitation of emphasizing understanding over execution, but also natively equips the model with multimodal perception, tool invocation, function execution, and GUI operation capabilities. MiMo-V2-Omni can seamlessly integrate with major agent frameworks, achieving a leap from understanding to manipulation and significantly lowering the barrier to deploying full-modal agents.","pricing":{"cache_read":0.088,"input":0.44,"output":2.2},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":256000},{"model_id":"mimo-v2-pro","model_name":"MiMo V2 Pro","developer_id":31,"desc":"Xiaomi MiMo-V2-Pro is built for high-intensity agent work scenarios in the real world. It has over 1 trillion total parameters (42B active parameters), employs an innovative hybrid-attention architecture, and supports an ultra-long 1M-token context length. On top of a powerful model foundation, we continuously scale compute across broader agent scenarios, further expanding the intelligent action space and achieving significant generalization from Coding to Claw.","pricing":{"cache_read":0.22,"input":1.1,"output":3.3},"types":"llm","features":"web","input_modalities":"text,image,video,audio","endpoints":"","max_output":0,"context_length":1000000},{"model_id":"cohere-command-a","model_name":"Cohere Command A","developer_id":6,"desc":"Command A is Cohere most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.","pricing":{"cache_read":0,"input":2.5,"output":10},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-3-flash-preview-free","model_name":"Gemini 3 Flash Preview (free)","developer_id":8,"desc":"gemini-3-flash-preview-free is the free, publicly available version of gemini-3-flash-preview, offering the same model capabilities with usage limits in place to ensure service stability. Limits include up to 5 requests per minute, a maximum of 250 requests per day, and a daily quota of 500,000 tokens. Free usage is based on shared capacity and is limited in availability. This version is intended for testing and light usage; for consistent and reliable access, please switch to the paid model.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"tools,function_calling,structured_outputs,thinking","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"cc-minimax-m3","model_name":"CC MiniMax M3","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-minimax-m3","model_name":"Coding MiniMax M3","developer_id":18,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"coding-step-3.7-flash","model_name":"Coding Step 3.7 Flash","developer_id":16,"desc":"Originates from the tiered package.","pricing":{"cache_read":0.0176,"input":0.088,"output":0.528},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"gpt-4.1-free","model_name":"GPT 4.1 (free)","developer_id":12,"desc":"This free model API comes from the OpenAI model deployed on Azure. To prevent abuse, the external content filter provided by Azure has been enforced, which will result in additional delays. If you want to experience the full version of the model API without filters, please use the paid version and request the model name ID without \"-free\".","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"gpt-4.1-mini-free","model_name":"GPT 4.1 Mini (free)","developer_id":12,"desc":"This free model API comes from the OpenAI model deployed on Azure. To prevent abuse, the external content filter provided by Azure has been enforced, which will result in additional delays. If you want to experience the full version of the model API without filters, please use the paid version and request the model name ID without \"-free\".","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"gpt-4.1-nano-free","model_name":"GPT 4.1 Nano (free)","developer_id":12,"desc":"This free model API comes from the OpenAI model deployed on Azure. To prevent abuse, the external content filter provided by Azure has been enforced, which will result in additional delays. If you want to experience the full version of the model API without filters, please use the paid version and request the model name ID without \"-free\".","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"gpt-4o-free","model_name":"GPT 4o (free)","developer_id":12,"desc":"This free model API comes from the OpenAI model deployed on Azure. To prevent abuse, the external content filter provided by Azure has been enforced, which will result in additional delays. If you want to experience the full version of the model API without filters, please use the paid version and request the model name ID without \"-free\".","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"coding-glm-5","model_name":"Coding GLM 5","developer_id":5,"desc":"Only supports OpenAI-compatible formats.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-5-turbo","model_name":"Coding GLM 5 Turbo","developer_id":5,"desc":"Only supports OpenAI-compatible formats.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4.7","model_name":"GLM 4.7","developer_id":5,"desc":"GLM-4.7 is Zhiyuan's latest flagship model. GLM-4.7 enhances coding capabilities, long-range task planning, and tool collaboration for Agentic Coding scenarios, achieving leading performance among open-source models on several current public benchmarks. It features improved general capabilities, with responses that are more concise and natural, and writing that is more immersive. When executing complex agent tasks and tool usage, it follows instructions more strictly, with further improvements in the frontend aesthetics of Artifacts and Agentic Coding as well as the efficiency of completing long-range tasks.","pricing":{"cache_read":0.054795,"input":0.273974,"output":1.095896},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":128000,"context_length":200000},{"model_id":"veo-3.1-lite-generate-preview","model_name":"Veo 3.1 Lite Generate Preview","developer_id":8,"desc":"Veo 3.1 is Google's state-of-the-art model for generating high-fidelity, 8-second 720p , 1080p videos featuring stunning realism and natively generated audio.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4.7-flash-free","model_name":"GLM 4.7 Flash (free)","developer_id":5,"desc":"The glm-4.7-flash free model has usage restrictions to ensure stable service operation: a maximum of 5 requests per minute, no more than 500 requests per day, and a daily usage quota of 1 million tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,structured_outputs,function_calling","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-4.7-free","model_name":"Coding GLM 4.7 (free)","developer_id":5,"desc":"coding-glm-4.7-free is the open and free version of coding-glm-4.7. To ensure stable service performance, usage limits are in place: up to 5 requests per minute, 500 requests per day, and a daily token allowance of 1 million.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seedance-1-5-pro-251215","model_name":"Doubao Seedance 1.5 Pro 251215","developer_id":4,"desc":"The Doubao video generation model Seedance 1.5 Pro, as a world-leading video generation model, can produce video content with high-precision audio–visual synchronization. It supports multi-person, multi-language dialogue and comprehensively covers ambient sounds, action sounds, synthesized sounds, musical instrument sounds, background sounds, and human voices. It supports start-and-end frames to achieve film-level narrative effects, meeting the advanced creative needs of film, animated series, e-commerce, and advertising.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seedance-1-0-pro-250528","model_name":"Doubao Seedance 1.0 Pro 250528","developer_id":4,"desc":"Seedance 1.0 Pro is a foundational video-generation model that supports multi-shot storytelling and excels across multiple dimensions. It has made breakthroughs in semantic understanding and instruction-following, enabling the generation of 1080P HD videos with smooth motion, rich detail, diverse styles, and cinematic aesthetics.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seedance-1-0-pro-fast-251015","model_name":"Doubao Seedance 1.0 Pro Fast 251015","developer_id":4,"desc":"Seedance 1.0 Pro Fast is a comprehensive model that delivers rock-bottom prices and peak performance, striking an outstanding balance among video generation quality, speed, and cost. It inherits the core advantages of Seedance 1.0 Pro while increasing generation speed and offering more competitive pricing, providing creators with dual optimization of efficiency and cost.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-3-pro-image-preview","model_name":"Gemini 3 Pro Image Preview","developer_id":8,"desc":"Gemini-3-Pro-Image-Preview (Nano Banana Pro) is a high-performance image generation and editing model built on Gemini 3 Pro. It delivers enhanced multimodal understanding and real-world semantic reasoning, enabling fast creation of well-structured visual content such as infographics, product sketches, and multi-subject scenes. It can also leverage real-time knowledge through Search grounding. The model excels in text rendering, consistent multi-image blending, and identity preservation, while offering fine-grained creative controls like localized edits, lighting and focus adjustments, camera transformations, and flexible aspect ratios. It’s ideal for rapid design, concept previews, product visualization, and everyday image generation workflows.","pricing":{"cache_read":2,"input":2,"output":12},"types":"image_generation,llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-embedding-2-preview","model_name":"Gemini Embedding 2 Preview","developer_id":8,"desc":"Google's first multimodal embedding model .aihubmix does not currently support multimodal input; support is expected next week.perfectly meeting the need for fast, scalable similarity computation on large-scale multimodal datasets.","pricing":{"cache_read":0.2,"input":0.2,"output":0},"types":"embedding","features":"","input_modalities":"text,image,audio,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-5.2-codex","model_name":"GPT-5.2-Codex","developer_id":12,"desc":"GPT-5.2-Codex is an upgraded version of GPT-5.2, optimized for agentic coding tasks in Codex and similar execution environments. The model is specifically enhanced for code generation, modification, refactoring, and automated execution workflows, enabling more efficient participation in multi-step, tool-driven programming processes. GPT-5.2-Codex supports low, medium, high, and xhigh reasoning effort settings, allowing flexible trade-offs between latency, reasoning depth, and token usage depending on task complexity.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"deepinfra-gemma-4-26b-a4b-it","model_name":"Deepinfra Gemma 4 26B A4B It","developer_id":8,"desc":"A Mixture-of-Experts model that activates only 4B parameters per inference,delivering high-performance reasoning with a fraction of the memory cost - idealfor cost-efficient, high-throughput server deployments.","pricing":{"cache_read":0.011,"input":0.088,"output":0.385},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":131100,"context_length":262100},{"model_id":"doubao-seedream-5.0-lite","model_name":"Doubao Seedream 5.0 Lite","developer_id":4,"desc":"Doubao-Seedream-5.0-lite is the latest image-creation model released by ByteDance. For the first time, the model includes an online retrieval capability, allowing it to integrate real-time web information and improve the timeliness of generated images. At the same time, the model’s intelligence has been further upgraded, enabling it to accurately parse complex instructions and visual content. In addition, the model has been enhanced in terms of breadth of world knowledge, reference consistency, and generation quality in professional scenarios, better meeting enterprise-level visual-creation needs.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-image-1.5","model_name":"GPT Image 1.5","developer_id":12,"desc":"GPT Image 1.5 is a new image generation model powered by OpenAI’s flagship visual capabilities, comprehensively upgraded for high-quality creative and production workflows. It delivers significant improvements in instruction understanding, fine-grained image editing, and detail preservation, while achieving up to 4× faster generation compared to previous versions — reducing latency without compromising quality.\n\nGPT Image 1.5 is well suited for image generation, precise visual editing, and professional content creation, balancing performance with efficiency.","pricing":{"cache_read":5,"input":5,"output":10},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-3.1-flash-lite-preview-nothink","model_name":"Gemini 3.1 Flash Lite Preview (no think)","developer_id":8,"desc":"gemini-3.1-flash-lite-preview is currently Google's latest and most cost-effective model, optimized for large-scale agent-based tasks, translation, and simple data processing.","pricing":{"cache_read":0.25,"input":0.25,"output":1.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,web,deepsearch,long_context","input_modalities":"text,image,audio,video","endpoints":"chat_completions,gemini_api,claude_api","max_output":64000,"context_length":1000000},{"model_id":"gpt-5.2","model_name":"GPT 5.2","developer_id":12,"desc":"GPT-5.2 is an advanced general-purpose model that improves on GPT-5.1 with more reliable, flexible, and user-friendly interactions. It delivers clearer responses, stronger instruction-following, and adaptive reasoning that scales from simple requests to complex, multi-step tasks. With enhanced control over tone and structure and support for extended context, GPT-5.2 is well suited for agent workflows, analysis, coding, and cross-domain applications.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"thinking,function_calling,structured_outputs,web,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.2-chat-latest","model_name":"GPT 5.2 Chat","developer_id":12,"desc":"GPT-5.2Chat refers to the GPT-5.2 snapshot currently used in ChatGPT and is optimized for conversational use cases. While GPT-5.2 is recommended for most API applications, GPT-5.2Chat is ideal for testing the latest improvements in chat-based interactions.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-5.2-high","model_name":"GPT 5.2 High","developer_id":12,"desc":"GPT-5.2 supports configurable reasoning effort only through the /responses endpoint. To make higher-intensity reasoning available directly via the /chat interface, GPT-5.2-High is provided as a reasoning-enhanced variant of GPT-5.2 with reasoning_effort preset to high. It is designed for tasks that require deeper analysis, stronger result consistency, and greater controllability. By applying more aggressive reasoning strategies and more effective use of extended context, the model delivers clearer and more reliable responses, making it well suited for complex agent workflows, long-chain decision-making, and reliability-critical advanced applications.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"thinking,function_calling,web,structured_outputs,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.2-low","model_name":"GPT 5.2 Low","developer_id":12,"desc":"GPT-5.2 supports configuring reasoning strength only through the /responses endpoint. To make lower-overhead reasoning available directly in the /chat endpoint, the GPT-5.2-Low model is provided. This model is based on GPT-5.2 with reasoning_effort preset to low. This model is designed for use cases that are sensitive to response latency and cost. By adopting a lighter reasoning strategy, it delivers stable responses with lower latency and higher throughput. It is well suited for high-concurrency conversations, real-time interactions, basic Q\u0026A, and scenarios where deep reasoning is not required.","pricing":{"cache_read":0.175,"input":1.75,"output":14},"types":"llm","features":"thinking,function_calling,web,structured_outputs,tools","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.2-pro","model_name":"GPT 5.2 Pro","developer_id":12,"desc":"GPT-5.2 pro is available in the Responses API only to enable support for multi-turn model interactions before responding to API requests, and other advanced API features in the future. Since GPT-5.2 pro is designed to tackle tough problems, some requests may take several minutes to finish. To avoid timeout, please set a longer timeout duration. It is recommended to use this under good network conditions.","pricing":{"cache_read":2.1,"input":21,"output":168},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.1","model_name":"GPT 5.1","developer_id":12,"desc":"GPT-5 is OpenAI’s most advanced language model, designed for complex tasks that require step-by-step reasoning, precise instruction following, and high reliability. It improves reasoning, code generation, and prompt understanding—including test-time routing and intent cues like “think hard about this”—while reducing hallucination and sycophancy.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,web,tools,deepsearch,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.1-codex-max","model_name":"GPT-5.1-Codex Max","developer_id":12,"desc":"GPT-5.1-Codex-Max is a frontier programming model built for the agent-driven era. Powered by an upgraded core reasoning architecture, it is specially trained for complex agentic tasks in software engineering, mathematics, and scientific research. It delivers faster performance, greater stability, and higher token efficiency across the entire development lifecycle, including code generation, refactoring, debugging, and engineering collaboration. With native support for multiple context windows and a built-in compaction mechanism, the model can coherently process millions of tokens within a single task.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"function_calling,structured_outputs,thinking","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"doubao-seed-1-8","model_name":"Doubao Seed 1.8","developer_id":4,"desc":"Doubao's strongest multimodal Agent model Seed1.8 has powerful multimodal capabilities, supports image and text input, and can efficiently and accurately complete tasks in scenarios such as information retrieval, code generation, GUI interaction, and complex workflows, meeting increasingly diverse technical demands.","pricing":{"cache_read":0.021918,"input":0.10959,"output":0.273975},"types":"llm","features":"thinking,web,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":64000,"context_length":256000},{"model_id":"gpt-5.1-chat-latest","model_name":"GPT 5.1 Chat","developer_id":12,"desc":"GPT-5.1 Chat refers to the GPT-5.1 snapshot currently used in ChatGPT and is optimized for conversational use cases. While GPT-5.1 is recommended for most API applications, GPT-5.1 Chat is ideal for testing the latest improvements in chat-based interactions.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-5.1-codex","model_name":"GPT-5.1-Codex","developer_id":12,"desc":"GPT-5.1-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments. It's available in the Responses API only and the underlying model snapshot will be regularly updated. ","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5.1-codex-mini","model_name":"GPT-5.1-Codex Mini","developer_id":12,"desc":"GPT-5.1 Codex mini is a smaller, more cost-effective, less-capable version of GPT-5.1-Codex.","pricing":{"cache_read":0.025,"input":0.25,"output":2},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"grok-4.20-multi-agent-0309","model_name":"Grok 4.20 Multi Agent 0309","developer_id":9,"desc":"Grok 4.20 is our newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses.","pricing":{"cache_read":0.2,"input":2,"output":6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"claude-haiku-4-5","model_name":"Claude Haiku 4.5","developer_id":2,"desc":"Claude Haiku 4.5 is a fast, affordable, and highly capable AI model, excelling at coding and agentic tasks. Its combination of speed and low cost makes it ideal for powering real-time applications like chatbots, high-volume free services, and specialized \"sub-agents\" for complex tasks in coding, finance, and research. It can also handle common business tasks like creating office documents and assisting with strategy and analysis.\n⚠️ The minimum cache token for claude-haiku-4-5 has been increased from 1,024 to 4,096 tokens.","pricing":{"cache_read":0.11,"cache_write":1.375,"input":1.1,"output":5.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":131072,"context_length":204800},{"model_id":"claude-sonnet-4-5","model_name":"Claude Sonnet 4.5","developer_id":2,"desc":"Sonnet 4.5 is the best model in the world for agents, coding, and computer usage. It is also our most accurate and detailed model for long-running tasks, with enhanced knowledge in coding, finance, and cybersecurity.  \nThis model supports a thinking parameter to enable thinking requests in Claude mode.","pricing":{"cache_read":0.33,"cache_write":4.125,"input":3.3,"output":16.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":64000,"context_length":1000000},{"model_id":"claude-sonnet-4-5-think","model_name":"Claude Sonnet 4.5 Thinking","developer_id":2,"desc":"Claude Sonnet 4.5 does not enable reasoning mode by default. To access its deep reasoning capabilities, users would typically need to call the native Claude API. To make this capability available through an OpenAI-compatible interface, the claude-sonnet-4-5-think model is provided with reasoning mode pre-enabled and a default 64k-token context window, allowing it to be called directly via the OpenAI unified API. Claude Sonnet 4.5 Think is a reasoning-focused variant of Claude Sonnet 4.5 designed for advanced tasks that require rigorous reasoning, complex decision-making, and long-chain analysis; aside from its enhanced reasoning mechanism, all other capabilities remain consistent with the standard Claude Sonnet 4.5 model, making it well suited for complex problem decomposition, multi-step planning, and logic-intensive analysis.","pricing":{"cache_read":0.33,"cache_write":4.125,"input":3.3,"output":16.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":64000,"context_length":1000000},{"model_id":"mistral-large-3","model_name":"Mistral Large 3","developer_id":10,"desc":"Mistral Large 3 is a MoE model with 67.5B total parameters and 41B active parameters, supporting a 256K-token context window. Trained from scratch on 3,000 NVIDIA H200 GPUs, it is one of the strongest permissively licensed open-weight models available.\n\nDesigned for advanced reasoning and long-context understanding, Mistral Large 3 delivers performance on par with the best instruction-tuned open-weight models for general-purpose tasks, while also offering image understanding capabilities. Its multilingual strengths are particularly notable for non-English/Chinese languages, making it well-suited for global applications.\n\nTypical use cases include enterprise assistants, multilingual customer support, content generation and editing, data analysis over long documents, code assistance, and research workflows that require handling large corpora or complex instructions. With its MoE architecture, Mistral Large 3 balances strong performance with efficient inference, providing a versatile backbone for building reliable, production-grade AI systems.","pricing":{"input":0.5,"output":1.5},"types":"llm","features":"function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"k2.6-code-preview-free","model_name":"K2.6 Code Preview (free)","developer_id":15,"desc":"kimi-for-coding-free is a free and open version offered by AIHubMix specifically for Kimi users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"mimo-v2-flash","model_name":"MiMo V2 Flash","developer_id":31,"desc":"MiMo-V2-Flash is a mixture of experts (MoE) language model with a total of 309 billion parameters and 15 billion activated parameters. It is designed for high-speed inference and proxy workflows, adopting a novel hybrid attention architecture and multi-token prediction (MTP), significantly reducing inference costs while achieving state-of-the-art performance.","pricing":{"cache_read":0.03836,"input":0.1918,"output":0.5754},"types":"llm","features":"web","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"musesteamer-air-image","model_name":"Musesteamer Air Image","developer_id":25,"desc":"musesteamer-air-image is a text-to-image model developed by the Baidu Search team aimed at providing extreme cost-effectiveness. It can quickly generate clear images with coherent actions based on user prompts, making it easy to convert users' descriptions into images.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3.6-plus-preview-free","model_name":"Qwen3.6 Plus Preview (free)","developer_id":13,"desc":"This model has been removed from the platform.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,long_context","input_modalities":"text","endpoints":"","max_output":65535,"context_length":1000000},{"model_id":"cc-glm-5","model_name":"CC GLM 5","developer_id":5,"desc":"Supports Claude native interface, can be directly requested in Claude Code.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-glm-5-turbo","model_name":"CC GLM 5 Turbo","developer_id":5,"desc":"Supports Claude native interface, can be directly requested in Claude Code.","pricing":{"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-step-3.5-flash-free","model_name":"Coding Step 3.5 Flash (free)","developer_id":16,"desc":"coding-step-3.5-flash-free is the free, publicly available version of coding-step-3.5-flash, offering the same model capabilities with usage limits in place to ensure service stability. Limits include up to 5 requests per minute, a maximum of 250 requests per day, and a daily quota of 500,000 tokens. Free usage is based on shared capacity and is limited in availability. This version is intended for testing and light usage; for consistent and reliable access, please switch to the paid model.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"gemini-2.5-flash-image","model_name":"Gemini 2.5 Flash Image","developer_id":8,"desc":"Gemini 2.5 Flash Image (Nano-Banana) is a state-of-the-art image generation and editing model that enables seamless blending of multiple images into a single composition while maintaining character consistency for rich visual storytelling. It supports precise, targeted image transformations through natural language instructions and leverages built-in world knowledge for both image generation and editing, making it well suited for creative design, content production, advertising, and visual expression workflows.","pricing":{"cache_read":0.3,"input":0.3,"output":2.499},"types":"image_generation,llm","features":"","input_modalities":"image,text","endpoints":"","max_output":8000,"context_length":32800},{"model_id":"grok-4-1-fast-non-reasoning","model_name":"Grok 4.1 Fast","developer_id":9,"desc":"Grok 4.1 is a new conversational model with significant improvements in real-world usability, delivering exceptional performance in creative, emotional, and collaborative interactions. It is more perceptive to nuanced user intent, more engaging to converse with, and more coherent in personality, while fully preserving its core intelligence and reliability. Built on large-scale reinforcement learning infrastructure, the model is optimized for style, personality, helpfulness, and alignment, and leverages frontier agentic reasoning models as reward evaluators to autonomously assess and iterate on responses at scale, significantly enhancing overall interaction quality.","pricing":{"cache_read":0.05,"input":0.2,"output":0.5},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"grok-4-1-fast-reasoning","model_name":"Grok 4.1 Fast (reasoning)","developer_id":9,"desc":"Grok 4.1 is a new conversational model with significant improvements in real-world usability, delivering exceptional performance in creative, emotional, and collaborative interactions. It is more perceptive to nuanced user intent, more engaging to converse with, and more coherent in personality, while fully preserving its core intelligence and reliability. Built on large-scale reinforcement learning infrastructure, the model is optimized for style, personality, helpfulness, and alignment, and leverages frontier agentic reasoning models as reward evaluators to autonomously assess and iterate on responses at scale, significantly enhancing overall interaction quality.","pricing":{"cache_read":0.05,"input":0.2,"output":0.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"grok-code-fast-1","model_name":"Grok Code Fast 1","developer_id":9,"desc":"Grok 4.1 is a new conversational model with significant improvements in real-world usability, delivering exceptional performance in creative, emotional, and collaborative interactions. It is more perceptive to nuanced user intent, more engaging to converse with, and more coherent in personality, while fully preserving its core intelligence and reliability. Built on large-scale reinforcement learning infrastructure, the model is optimized for style, personality, helpfulness, and alignment, and leverages frontier agentic reasoning models as reward evaluators to autonomously assess and iterate on responses at scale, significantly enhancing overall interaction quality.","pricing":{"cache_read":0.05,"input":0.2,"output":0.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":10000,"context_length":256000},{"model_id":"zai-glm-5-turbo","model_name":"Zai Glm 5 Turbo","developer_id":5,"desc":"","pricing":{"cache_read":0.24,"input":1.2,"output":3.9996},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-5","model_name":"GPT 5","developer_id":12,"desc":"GPT-5 is OpenAI’s most advanced general-purpose model, delivering major improvements in reasoning, code quality, and overall user experience. It is optimized for complex tasks that require step-by-step reasoning, precise instruction following, and high accuracy in high-stakes scenarios. The model supports test-time routing and advanced prompt understanding, including user-specified intent such as “think hard about this,” while significantly reducing hallucination and sycophancy and improving performance in coding, writing, and health-related tasks.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"deepseek-v3.2","model_name":"DeepSeek V3.2","developer_id":7,"desc":"DeepSeek-V3.2 is an efficient large language model equipped with DeepSeek Sparse Attention and reinforced reasoning performance, but its core strength lies in powerful agentic capabilities—enabled by large-scale task-synthesis that tightly integrates reasoning with real-world tool use, delivering robust, compliant, and generalizable agent behaviour. Users can toggle deeper reasoning through the reasoning_enabled switch.","pricing":{"cache_read":0.0302,"input":0.302,"output":0.453},"types":"llm","features":"tools,function_calling,structured_outputs,thinking","input_modalities":"text","endpoints":"","max_output":64000,"context_length":128000},{"model_id":"deepseek-v3.2-think","model_name":"DeepSeek V3.2 Thinking","developer_id":7,"desc":"DeepSeek-V3.2 is an efficient large language model equipped with DeepSeek Sparse Attention and reinforced reasoning performance, but its core strength lies in powerful agentic capabilities—enabled by large-scale task-synthesis that tightly integrates reasoning with real-world tool use, delivering robust, compliant, and generalizable agent behaviour. Users can toggle deeper reasoning through the reasoning_enabled switch.","pricing":{"cache_read":0.0302,"input":0.302,"output":0.453},"types":"llm","features":"tools,function_calling,structured_outputs,thinking","input_modalities":"text","endpoints":"","max_output":64000,"context_length":128000},{"model_id":"gpt-5-codex","model_name":"GPT-5-Codex","developer_id":12,"desc":"GPT-5-Codex is a version of GPT-5 optimized for autonomous coding tasks in Codex or similar environments. It is only available in the Responses API, and the underlying model snapshots will be updated regularly. https://docs.aihubmix.com/en/api/Responses-API You can also use it in codex-cll; see https://docs.aihubmix.com/en/api/Codex-CLI for using codex-cll through Aihubmix.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"DeepSeek-V3.1-Terminus","model_name":"DeepSeek V3.1 Terminus","developer_id":7,"desc":"DeepSeek-V3.1 non-thinking mode has now been updated to the DeepSeek-V3.1-Terminus version.","pricing":{"input":0.56,"output":1.68},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":32000,"context_length":160000},{"model_id":"DeepSeek-V3.1-Think","model_name":"DeepSeek V3.1 Thinking","developer_id":7,"desc":"Thinking mode of DeepSeek-V3.1;  \nDeepSeek V3.1 is a text generation model provided by DeepSeek, featuring a hybrid reasoning architecture that achieves an effective integration of thinking and non-thinking modes.","pricing":{"input":0.56,"output":1.68},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":32000,"context_length":128000},{"model_id":"gpt-5-pro","model_name":"GPT 5 Pro","developer_id":12,"desc":"GPT-5 pro uses more compute to think harder and provide consistently better answers.\n\nGPT-5 pro is available in the Responses API only to enable support for multi-turn model interactions before responding to API requests, and other advanced API features in the future. Since GPT-5 pro is designed to tackle tough problems, some requests may take several minutes to finish. To avoid timeouts, try using background mode. As our most advanced reasoning model, GPT-5 pro defaults to (and only supports) reasoning.effort: high. GPT-5 pro does not support code interpreter.","pricing":{"input":15,"output":120},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5-mini","model_name":"GPT 5 Mini","developer_id":12,"desc":"GPT-5 mini is a faster, more cost-efficient version of GPT-5. It's great for well-defined tasks and precise prompts.","pricing":{"cache_read":0.025,"input":0.25,"output":2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5-nano","model_name":"GPT 5 Nano","developer_id":12,"desc":"GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, designed specifically for developer tools and environments that demand rapid interactions and ultra-low latency. While it offers a more lightweight solution with limited reasoning depth compared to its larger counterparts, GPT-5-Nano excels in core capabilities such as instruction-following and maintaining critical safety features. As the successor to GPT-4.1-nano, it provides an optimal choice for cost-sensitive or real-time applications, where efficiency and speed are paramount. Particularly well-suited for summarization and classification tasks, GPT-5-Nano is a powerful tool for developers needing a swift, reliable AI model for streamlined processes.","pricing":{"cache_read":0.005,"input":0.05,"output":0.4},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"gpt-5-chat-latest","model_name":"GPT 5 Chat","developer_id":12,"desc":"GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT. GPT-5 is our next-generation, high-intelligence flagship model. It accepts both text and image inputs, and produces text outputs.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":400000},{"model_id":"claude-opus-4-1","model_name":"Claude Opus 4.1","developer_id":2,"desc":"Opus 4.1 is an upgraded version of Claude Opus 4, with improvements mainly in agent tasks, practical coding, and reasoning. Compared to Opus 4, there is a slight improvement in software engineering accuracy; Opus 4.1 has higher accuracy at 74.5%.","pricing":{"input":16.5,"output":82.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"o3-deep-research","model_name":"O3 Deep Research","developer_id":12,"desc":"Only supported through requests to the v1/responses interface.\no3-deep-research is OpenAI most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data—brought in through MCP connectors.","pricing":{"cache_read":2.5,"input":10,"output":40},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"kimi-k2.5","model_name":"Kimi K2.5","developer_id":15,"desc":"Kimi K2.5 is the smartest model of Kimi to date, achieving open-source state-of-the-art (SoTA) performance in Agent, coding, visual understanding, and a series of general intelligent tasks. At the same time, Kimi K2.5 is also the most versatile model of Kimi so far, with a native multimodal architecture design that supports both visual and text input, thinking and non-thinking modes, as well as dialogue and Agent tasks.","pricing":{"cache_read":0.105,"input":0.6,"output":3},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":256000},{"model_id":"qwen3-max-2026-01-23","model_name":"Qwen3 Max 2026 01-23","developer_id":13,"desc":"The snapshot version of the Tongyi Qianwen 3 series Max model is from January 23, 2026. By default, it does not require thinking, but thinking mode can be enabled through the enable_thinking parameter, as detailed in the code example. (After enabling thinking by passing parameters, it becomes: Qwen3-Max-Thinking). This model has a total parameter count exceeding one trillion (1T) and a pre-training data volume of up to 36T Tokens, making it the largest and most powerful reasoning model from Alibaba to date.","pricing":{"cache_read":0.09016,"cache_write":0.5635,"input":0.4508,"output":1.8032},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":32000,"context_length":252000},{"model_id":"qwen3-vl-flash","model_name":"Qwen3 VL Flash","developer_id":13,"desc":"The Qwen3 series of compact visual-understanding models achieves an effective fusion of thinking mode and non-thinking mode, outperforming the open-source Qwen3-VL-30B-A3B with faster response speeds. It comprehensively upgrades image and video understanding, supporting ultra-long contexts such as long videos and long documents, spatial awareness, and universal object recognition; it also possesses visual 2D/3D localization capabilities and is capable of handling complex real-world tasks.","pricing":{"cache_read":0.00412,"input":0.0206,"output":0.206},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":254000},{"model_id":"qwen3-vl-flash-2026-01-22","model_name":"Qwen3 VL Flash 2026 01-22","developer_id":13,"desc":"The Qwen3 series of compact visual-understanding models achieves an effective fusion of thinking mode and non-thinking mode, outperforming the open-source Qwen3-VL-30B-A3B with faster response speeds. It comprehensively upgrades image and video understanding, supporting ultra-long contexts such as long videos and long documents, spatial awareness, and universal object recognition; it also possesses visual 2D/3D localization capabilities and is capable of handling complex real-world tasks.","pricing":{"cache_read":0.0206,"input":0.0206,"output":0.206},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":254000},{"model_id":"qwen3-vl-plus","model_name":"Qwen3 VL Plus","developer_id":13,"desc":"The Qwen3 series visual understanding model achieves an effective fusion of thinking and non-thinking modes. Its visual agent capabilities reach world-class levels on public test sets such as OS World. This version features comprehensive upgrades in visual coding, spatial perception, and multimodal reasoning; visual perception and recognition abilities are greatly enhanced, supporting ultra-long video understanding.","pricing":{"cache_read":0.0274,"input":0.137,"output":1.37},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"minimax-m2.5","model_name":"MiniMax M2.5","developer_id":18,"desc":"The MiniMax M2.5 is a flagship programming model built for real-world productivity. As a production-grade model natively designed for Agent scenarios, it has achieved state-of-the-art (SOTA) performance in coding, agentic tool use, search, and office work.","pricing":{"input":0.288,"output":1.152},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":192000,"context_length":204800},{"model_id":"minimax-m2.5-highspeed","model_name":"MiniMax M2.5 Highspeed","developer_id":18,"desc":"• Same performance as minimax-m2.5\n• Significantly faster inference","pricing":{"input":0.288,"output":1.152},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":192000,"context_length":204800},{"model_id":"mm-minimax-m2.7-highspeed","model_name":"Mm Minimax M2.7 Highspeed","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-minimax-m2.7","model_name":"CC MiniMax M2.7","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-minimax-m2.7-highspeed","model_name":"CC MiniMax M2.7 Highspeed","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-minimax-m2.7","model_name":"Coding MiniMax M2.7","developer_id":18,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"coding-minimax-m2.7-highspeed","model_name":"Coding MiniMax M2.7 Highspeed","developer_id":18,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"cc-minimax-m2.5","model_name":"CC MiniMax M2.5","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-minimax-m2.5-highspeed","model_name":"CC MiniMax M2.5 Highspeed","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-minimax-m2.5","model_name":"Coding MiniMax M2.5","developer_id":18,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"coding-minimax-m2.5-highspeed","model_name":"Coding MiniMax M2.5 Highspeed","developer_id":18,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"sora-2","model_name":"Sora 2","developer_id":12,"desc":"Sora-2 is the next-generation text-to-video model evolved from Sora, optimized for higher visual realism, stronger physical consistency, and longer temporal coherence. It delivers more stable character consistency, complex motion rendering, camera control, and narrative continuity, while supporting higher resolutions and minute-level video generation for film production, advertising, virtual content creation, and creative multimedia workflows.","pricing":{"input":2,"output":2},"types":"video","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"sora-2-pro","model_name":"Sora 2 Pro","developer_id":12,"desc":"OpenAI video model Sora2-pro official API.","pricing":{"input":2,"output":2},"types":"video","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seedream-4-5","model_name":"Doubao Seedream 4.5","developer_id":4,"desc":"Seedream 4.5 is ByteDance's latest multimodal image model, integrating capabilities such as text-to-image, image-to-image, and multi-image output, along with incorporating common sense and reasoning abilities. Compared to the previous 4.0 model, it significantly improves generation quality, offering better editing consistency and multi-image fusion effects, with more precise control over image details. The generation of small text and small faces is more natural.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-audio-preview","model_name":"GPT 4o Audio Preview","developer_id":12,"desc":"OpenAI voice input and output model, with prices consistent with the official ones. For now, only the text portion prices are displayed; voice prices can be found on the official OpenAI website. Backend billing is the same as the official.","pricing":{"input":2.5,"output":10},"types":"llm","features":"","input_modalities":"text,audio","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-4o-mini-audio-preview","model_name":"GPT 4o Mini Audio Preview","developer_id":12,"desc":"","pricing":{"input":0.15,"output":0.6},"types":"llm","features":"","input_modalities":"text,audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"minimax-m2.1","model_name":"MiniMax M2.1","developer_id":18,"desc":"MiniMax-M2.1 redefines efficiency for intelligent agents. It is a compact, fast, and cost-effective MoE model with a total of 230 billion parameters and 10 billion active parameters, designed for top performance in coding and intelligent agent tasks while maintaining strong general intelligence. With only 10 billion active parameters, MiniMax-M2 delivers the complex end-to-end tool usage performance expected from today's leading models, but in a more streamlined form factor, making deployment and scaling easier than ever before.","pricing":{"input":0.288,"output":1.152},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":192000,"context_length":204800},{"model_id":"o3","model_name":"O3","developer_id":12,"desc":"OpenAI o3 is a powerful model across multiple domains, setting a new standard for coding, math, science, and visual reasoning tasks.","pricing":{"cache_read":0.5,"input":2,"output":8},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":100000,"context_length":200000},{"model_id":"cc-glm-4.7","model_name":"CC GLM 4.7","developer_id":5,"desc":"Supports Claude native interface, can be directly requested in Claude Code.","pricing":{"input":0.06,"output":0.22},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-minimax-m2.1","model_name":"CC MiniMax M2.1","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-4.7","model_name":"Coding GLM 4.7","developer_id":5,"desc":"Only supports OpenAI-compatible formats.","pricing":{"cache_read":0.010998,"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-minimax-m2.1","model_name":"Coding MiniMax M2.1","developer_id":18,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"coding-minimax-m2.1-free","model_name":"Coding MiniMax M2.1 (free)","developer_id":18,"desc":"coding-minimax-m2.1-free is a free and open version offered by AIHubMix specifically for MiniMax users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute, 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"wan2.6-t2v","model_name":"Wan2.6 T2v","developer_id":13,"desc":"Wan 2.6 - Text-to-Video generation features intelligent storyboard scheduling supporting multi-shot narration, higher quality sound generation, stable multi-person dialogue, more natural and realistic voice tones, and supports video generation up to 15 seconds in length.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.6-i2v","model_name":"Wan2.6 I2v","developer_id":13,"desc":"Wan 2.6 - Text-to-Video generation features intelligent storyboard scheduling supporting multi-shot narration, higher quality sound generation, stable multi-person dialogue, more natural and realistic voice tones, and supports video generation up to 15 seconds in length.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.5-t2v-preview","model_name":"Wan2.5 T2v Preview","developer_id":13,"desc":"Tongyi Wanxiang 2.5 - Text-to-Video Preview, newly upgraded model architecture, supports sound generation synchronized with visuals, supports 10-second long video generation, enhanced instruction compliance, improved motion capability, and further enhanced visual quality.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.5-i2v-preview","model_name":"Wan2.5 I2v Preview","developer_id":13,"desc":"Tongyi Wanxiang 2.5 - Text-to-Video Preview features a newly upgraded technical architecture, supporting sound generation synchronized with visuals, 10-second long video generation, stronger instruction-following capabilities, and further improvements in motion ability and visual quality.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.2-i2v-plus","model_name":"Wan2.2 I2v Plus","developer_id":13,"desc":"The newly upgraded Tongyi Wanxiang 2.2 text-to-video offers higher video quality. It optimizes video generation stability and success rate, features stronger instruction-following capabilities, consistently maintains image text, portrait, and product consistency, and provides precise camera motion control.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"kimi-for-coding-free","model_name":"Kimi For Coding (free)","developer_id":15,"desc":"kimi-for-coding-free is a free and open version offered by AIHubMix specifically for Kimi users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"o3-pro","model_name":"O3 Pro","developer_id":12,"desc":"o3-pro\nThis model only supports Requests API interface requests.The model's thinking time is relatively long, so the response will be slow.","pricing":{"cache_read":20,"input":20,"output":80},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":100000,"context_length":200000},{"model_id":"qianfan-ocr","model_name":"Qianfan Ocr","developer_id":25,"desc":"Qianfan-OCR-Fast is a multimodal large model specialized for OCR, trained primarily on OCR-domain data while retaining appropriate general multimodal capabilities, and it outperforms Qianfan-OCR.","pricing":{"input":0.062,"output":0.248},"types":"ocr","features":"","input_modalities":"text,image","endpoints":"","max_output":28000,"context_length":32000},{"model_id":"qianfan-ocr-fast","model_name":"Qianfan Ocr Fast","developer_id":25,"desc":"Qianfan-OCR-Fast is a multimodal large model specialized for OCR, trained primarily on OCR-domain data while retaining appropriate general multimodal capabilities, and it outperforms Qianfan-OCR.","pricing":{"input":0.664,"output":2.738336},"types":"ocr","features":"","input_modalities":"text,image","endpoints":"","max_output":28000,"context_length":32000},{"model_id":"step-3.5-flash","model_name":"Step 3.5 Flash","developer_id":16,"desc":"step-3.5-flash is stepfun's flagship inference model, designed for high-complexity tasks that require deep reasoning and fast execution. It excels at decomposing multi-step problems, performing tool calls, and maintaining consistency across massive datasets. It is the preferred choice for complex workloads such as long-context agents, advanced software engineering, and end-to-end research automation.","pricing":{"input":0.11,"output":0.33},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"flux-2-flex","model_name":"Flux 2 Flex","developer_id":27,"desc":"FLUX.2 is purpose-built for real-world creative production workflows. It delivers high-quality images while maintaining character and style consistency across multiple reference images, shows exceptional understanding and execution of structured prompts, and supports complex text reading and writing. It also adheres to brand guidelines, handles lighting, layout, and logo elements with stability, and enables image editing at resolutions up to 4MP — all while preserving fine details, striking a balance between creativity and professional-grade visual output.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"flux-2-pro","model_name":"Flux 2 Pro","developer_id":27,"desc":"FLUX.2 is purpose-built for real-world creative production workflows. It delivers high-quality images while maintaining character and style consistency across multiple reference images, shows exceptional understanding and execution of structured prompts, and supports complex text reading and writing. It also adheres to brand guidelines, handles lighting, layout, and logo elements with stability, and enables image editing at resolutions up to 4MP — all while preserving fine details, striking a balance between creativity and professional-grade visual output.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-pro","model_name":"Gemini 2.5 Pro","developer_id":8,"desc":"Gemini 2.5 Pro is an advanced reasoning model developed by Google, optimized for solving highly complex problems across multiple domains. It can deeply understand large-scale information from diverse sources, including text, audio, images, video, and even entire codebases. The model demonstrates strong reasoning capabilities in coding, mathematics, and STEM-related tasks, and supports long-context analysis for large datasets, codebases, and technical documentation.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"tools,function_calling,structured_outputs,long_context,web,thinking,deepsearch","input_modalities":"text,image,audio,video,pdf","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"glm-4.6","model_name":"GLM 4.6","developer_id":5,"desc":"GLM-4.6 is Zhipu’s latest flagship model (total parameters 355B, activation parameters 32B), comprehensively surpassing GLM-4.5. Its coding capability is aligned with Claude Sonnet 4, making it a top domestic coding model; the context window has been expanded from 128K to 200K, better suited for long code and agent tasks; inference capabilities have been significantly enhanced and support tool invocation during processing; improvements have been made in tool calling, search agents, writing style, role play, and multilingual translation. The model is named glm-4.6 and is provided by three vendors, with calls prioritized to the Sophnet platform.","pricing":{"cache_read":0.054795,"input":0.273974,"output":1.095896},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":131072,"context_length":204800},{"model_id":"glm-4.6v","model_name":"GLM 4.6 Vision","developer_id":5,"desc":"Zhipu's latest visual reasoning model achieves state-of-the-art visual understanding accuracy at the same scale upon release. It natively supports tool invocation, can automatically complete tasks, supports ultra-long 128K context length, and allows flexible toggling of reasoning.","pricing":{"cache_read":0.0274,"input":0.137,"output":0.411},"types":"llm","features":"","input_modalities":"text,image,video","endpoints":"","max_output":128000,"context_length":128000},{"model_id":"glm-ocr","model_name":"GLM Ocr","developer_id":5,"desc":"GLM-OCR is a lightweight professional OCR model with only 0.9B parameters, yet multiple capabilities have reached SOTA levels, establishing a new benchmark for document parsing with a \"small size, high precision\" approach.","pricing":{"input":0.0282,"output":0.0282},"types":"ocr","features":"","input_modalities":"image","endpoints":"","max_output":0,"context_length":32000},{"model_id":"cc-glm-4.6","model_name":"CC GLM 4.6","developer_id":5,"desc":"for claude code","pricing":{"input":0.06,"output":0.22},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-4.6","model_name":"Coding GLM 4.6","developer_id":5,"desc":"","pricing":{"cache_read":0.010998,"input":0.06,"output":0.22},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-4.6-free","model_name":"Coding GLM 4.6 (free)","developer_id":5,"desc":"coding-glm-4.6-free is the open and free version of coding-glm-4.6. To ensure stable service performance, usage limits are in place: up to 5 requests per minute, 500 requests per day, and a daily token allowance of 1 million.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":128000,"context_length":200000},{"model_id":"coding-minimax-m2","model_name":"Coding MiniMax M2","developer_id":18,"desc":"coding-minimax-m2 is a free and open version offered by AIHubMix specifically for MiniMax users. To maintain stable service operations, the following usage limits apply: a maximum of 10 requests per minute, 1,000 total requests per day, and a daily quota of 5 million tokens.204800","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"coding-minimax-m2-free","model_name":"Coding MiniMax M2 (free)","developer_id":18,"desc":"coding-minimax-m2-free is a free and open version offered by AIHubMix specifically for MiniMax users. To maintain stable service operations, the following usage limits apply: a maximum of 5 requests per minute, 500 total requests per day, and a daily quota of 1 million tokens.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":13100,"context_length":204800},{"model_id":"coding-step-3.5-flash","model_name":"Coding Step 3.5 Flash","developer_id":16,"desc":"Originates from the tiered package.","pricing":{"input":0.03,"output":0.09},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":256000},{"model_id":"gemini-2.5-pro-search","model_name":"Gemini 2.5 Pro Search","developer_id":8,"desc":"gemini-2.5-pro-search integrates Google's official search functionality; the search feature will have an additional separate fee log directly incorporated into the scoring, with detailed logs not displayed; this will be fixed and displayed later; only supports OpenAI-compatible formats for invocation, does not support Gemini SDK; for Gemini's native SDK, please set parameters directly using the official search parameters.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm,search","features":"thinking,web,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video,pdf","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"kimi-k2-thinking","model_name":"Kimi K2 Thinking","developer_id":15,"desc":"Kimi K2 Thinking is Moonshot AI's most advanced open-source inference model to date, extending the K2 series into intelligent agent and long-context inference domains. The model is built on the trillion-parameter mixture of experts (MoE) architecture introduced by Kimi K2, activating 32 billion parameters per forward pass and supporting a context window of 256,000 tokens.","pricing":{"cache_read":0.137,"input":0.548,"output":2.192},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":262144,"context_length":262144},{"model_id":"claude-opus-4-0","model_name":"Claude Opus 4.0","developer_id":2,"desc":"Alias \nclaude-opus-4-20250514","pricing":{"input":16.5,"output":82.5},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"claude-sonnet-4-0","model_name":"Claude Sonnet 4.0","developer_id":2,"desc":"Claude Sonnet 4 is a significant upgrade to Sonnet 3.7, delivering superior performance in coding and reasoning with enhanced precision and control. Achieving a state-of-the-art 72.7% on SWE-bench, the model expertly balances advanced capability with computational efficiency. Key improvements include more reliable codebase navigation and complex instruction following, making it ideal for a wide range of applications, from routine coding to complex software development projects.","pricing":{"cache_read":0.33,"cache_write":4.125,"input":3.3,"output":16.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":64000,"context_length":1000000},{"model_id":"gemini-2.5-flash","model_name":"Gemini 2.5 Flash","developer_id":8,"desc":"Gemini 2.5 Flash is Google’s best model in terms of both performance and cost efficiency, offering a comprehensive set of capabilities. It is the first Flash model to support visible reasoning, allowing insight into the thought process behind its responses. With its strong price–performance ratio, the model is well suited for large-scale processing, low-latency, high-throughput tasks that require reasoning, as well as agent-based application scenarios.","pricing":{"cache_read":0.03,"input":0.3,"output":2.499},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-preview-09-2025","model_name":"Gemini 2.5 Flash Preview 09 2025","developer_id":8,"desc":"This latest 2.5 Flash model comes with improvements in two key areas we heard consistent feedback on:\n\nBetter agentic tool use: We've improved how the model uses tools, leading to better performance in more complex, agentic and multi-step applications. This model shows noticeable improvements on key agentic benchmarks, including a 5% gain on SWE-Bench Verified, compared to our last release (48.9% → 54%). More efficient: With thinking on, the model is now significantly more cost-efficient—achieving higher quality outputs while using fewer tokens, reducing latency and cost (see charts above).","pricing":{"cache_read":0.03,"input":0.3,"output":2.499},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"glm-4.5v","model_name":"GLM 4.5 Vision","developer_id":5,"desc":"GLM-4.5V is a vision-language foundational model designed for multimodal agent applications. Based on a mixture-of-experts (MoE) architecture, it has 106 billion parameters and 12 billion active parameters. It delivers outstanding performance in video understanding, image question answering, OCR, and document parsing, and achieves significant improvements in front-end web encoding, basic reasoning, and spatial reasoning.","pricing":{"cache_read":0.274,"input":0.274,"output":0.822},"types":"llm,ocr","features":"","input_modalities":"text,image,video","endpoints":"","max_output":16384,"context_length":64000},{"model_id":"gemini-2.5-flash-lite","model_name":"Gemini 2.5 Flash Lite","developer_id":8,"desc":"Gemini 2.5 Flash-Lite is a balanced model from Google, optimized for applications that require low-latency performance. It retains the practical capabilities of the Gemini 2.5 family, including configurable reasoning based on budget, integration with tools such as grounding via Google Search and code execution, multimodal input support, and an ultra-long context window of up to 1 million tokens, delivering a strong balance between efficiency, functionality, and cost.","pricing":{"cache_read":0.01,"input":0.1,"output":0.4},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-lite-nothink","model_name":"Gemini 2.5 Flash Lite (no think)","developer_id":8,"desc":"Gemini 2.5 Flash-Lite is a balanced model from Google, optimized for applications that require low-latency performance. It retains the practical capabilities of the Gemini 2.5 family, including configurable reasoning based on budget, integration with tools such as grounding via Google Search and code execution, multimodal input support, and an ultra-long context window of up to 1 million tokens, delivering a strong balance between efficiency, functionality, and cost.","pricing":{"cache_read":0.01,"input":0.1,"output":0.4},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-lite-preview-09-2025","model_name":"Gemini 2.5 Flash Lite Preview 09 2025","developer_id":8,"desc":"gemini-2.5-flash-lite latest preview version","pricing":{"cache_read":0.01,"input":0.1,"output":0.4},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-lite-preview-09-2025-nothink","model_name":"Gemini 2.5 Flash Lite Preview 09 2025 (no think)","developer_id":8,"desc":"gemini-2.5-flash-lite latest preview version","pricing":{"cache_read":0.01,"input":0.1,"output":0.4},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-nothink","model_name":"Gemini 2.5 Flash (no think)","developer_id":8,"desc":"Gemini-2.5-flash defaults to thinking enabled; to disable thinking, request the name gemini-2.5-flash-nothink, which only supports OpenAI-compatible format calls and does not support Gemini SDK; for the native Gemini SDK, please set the parameter budget=0 directly.","pricing":{"cache_read":0.03,"input":0.3,"output":2.499},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1047576},{"model_id":"gemini-2.5-flash-search","model_name":"Gemini 2.5 Flash Search","developer_id":8,"desc":"gemini-2.5-flash-search integrates Google's official search functionality; the search feature will have an additional separate fee log directly incorporated into the scoring, with detailed logs not displayed; this will be fixed and displayed later; only supports OpenAI-compatible formats for invocation, does not support Gemini SDK; for Gemini's native SDK, please set parameters directly using the official search parameters.","pricing":{"cache_read":0.03,"input":0.3,"output":2.499},"types":"llm,search","features":"web,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-preview-05-20-nothink","model_name":"Gemini 2.5 Flash Preview 05-20 (no think)","developer_id":8,"desc":"Gemini-2.5-flash-preview-05-20 is enabled by default for thinking; to disable it, request the name gemini-2.5-flash-preview-05-20-nothink.Only OpenAI-compatible format calls are supported; Gemini SDK is not supported. For the native Gemini SDK, please set the parameter budget=0 directly.","pricing":{"cache_read":0.03,"input":0.3,"output":2.499},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-flash-preview-05-20-search","model_name":"Gemini 2.5 Flash Preview 05-20 Search","developer_id":8,"desc":"Gemini-2.5 Flash Preview 05-20 Search integrates Google's official search functionality; the search feature will have an additional separate fee log directly integrated into the scoring deduction, with detailed logs not displayed. It will be fixed and displayed later. Only OpenAI-compatible formats are supported for invocation; Gemini SDK is not supported. For Gemini's native SDK, please set parameters directly using the official search parameters.","pricing":{"cache_read":0.03,"input":0.3,"output":2.499},"types":"llm,search","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"DeepSeek-V3-Fast","model_name":"DeepSeek V3 Fast","developer_id":7,"desc":"V3 Ultra-Fast Version,The current price is a limited-time 50% discount and will return to the original price on July 31st. The original price is: input: $0.55/M, output: $2.2/M. The model provider is the Sophnet platform. DeepSeek V3 Fast is a high-TPS, ultra-fast version of DeepSeek V3 0324, featuring full-precision (non-quantized) performance, enhanced code and math capabilities, and faster responses!\n\nDeepSeek V3 0324 is a powerful Mixture-of-Experts (MoE) model with a total parameter count of 671B, activating 37B parameters per token.\nIt adopts Multi-Head Latent Attention (MLA) and the DeepSeekMoE architecture to achieve efficient inference and economical training costs.\nIt innovatively implements a load balancing strategy without auxiliary loss and sets multi-token prediction training targets to enhance performance.\nThe model is pre-trained on 14.8 trillion diverse, high-quality tokens and further optimized through supervised fine-tuning and reinforcement learning stages to fully realize its capabilities.\nComprehensive evaluations show that DeepSeek V3 outperforms other open-source models and rivals leading closed-source models in performance.\nThe entire training process only requires 2.788M H800 GPU hours and remains highly stable, with no irrecoverable loss spikes or rollbacks.","pricing":{"input":0.56,"output":2.24},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":32000,"context_length":32000},{"model_id":"imagen-4.0","model_name":"Imagen 4.0","developer_id":8,"desc":"Imagen 4 is a high-quality text-to-image model developed by Google, designed for strong visual fidelity, diverse artistic styles, and precise controllability. It delivers near photographic realism with sharp details and natural lighting while significantly reducing common artifacts such as distorted hands. The model supports a wide range of styles including photorealistic, illustration, anime, oil painting, and pixel art, and offers flexible aspect ratios for use cases from content covers to mobile wallpapers. It also enables image editing and secondary creation on existing images, provides fast and stable generation, and offers strong commercial usability with high visual quality and reliable content safety.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"imagen-4.0-fast-generate-001","model_name":"Imagen 4.0 Fast Generate 001","developer_id":8,"desc":"Imagen 4 is a new-generation image generation model designed to balance high-quality output, inference efficiency, and content safety. It supports image generation, digital watermarking with authenticity verification, user-configurable safety settings, and prompt enhancement via the Prompt Rewriter, while also delivering reliable person generation capabilities. The model ID is imagen-4.0-generate-001, making it suitable for professional creation, design workflows, and various generative AI applications.","pricing":{"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"imagen-4.0-fast-generate-preview-06-06","model_name":"Imagen 4.0 Fast Generate Preview 06-06","developer_id":8,"desc":"","pricing":{"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"imagen-4.0-generate-001","model_name":"Imagen 4.0 Generate 001","developer_id":8,"desc":"Imagen 4 is a new-generation image generation model designed to balance high-quality output, inference efficiency, and content safety. It supports image generation, digital watermarking with authenticity verification, user-configurable safety settings, and prompt enhancement via the Prompt Rewriter, while also delivering reliable person generation capabilities. The model ID is imagen-4.0-generate-001, making it suitable for professional creation, design workflows, and various generative AI applications.","pricing":{"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"imagen-4.0-ultra-generate-001","model_name":"Imagen 4.0 Ultra Generate 001","developer_id":8,"desc":"","pricing":{"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"imagen-4.0-ultra","model_name":"Imagen 4.0 Ultra","developer_id":8,"desc":"","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-image-1","model_name":"GPT Image 1","developer_id":12,"desc":"Azure OpenAI’s gpt-image-1 image generation API offers both text-to-image generation and image-to-image editing with text guidance capabilities.\nBefore using this API, please ensure you have the latest OpenAI package installed by running pip install -U openai.","pricing":{"cache_read":5,"input":5,"output":40},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-image-1-mini","model_name":"GPT Image 1 Mini","developer_id":12,"desc":"OpenAI image generation model gpt-image-1-mini\nBefore use, please run pip install -U openai to upgrade to the latest openai package.","pricing":{"cache_read":5,"input":5,"output":40},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"o4-mini","model_name":"O4 Mini","developer_id":12,"desc":"o4-mini is a remarkably smart model for its speed and cost-efficiency. This allows it to support significantly higher usage limits than o3, making it a strong high-volume, high-throughput option for everyone with questions that benefit from reasoning.","pricing":{"cache_read":0.275,"input":1.1,"output":4.4},"types":"llm","features":"thinking,tool,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":100000,"context_length":200000},{"model_id":"kimi-k2-0711","model_name":"Kimi K2 0711","developer_id":15,"desc":"Kimi-K2 is a MoE architecture foundational model with extremely powerful coding and agent capabilities, featuring a total of 1 trillion parameters and activating 32 billion parameters. In benchmark performance tests across major categories such as general knowledge reasoning, programming, mathematics, and agents, the K2 model outperforms other mainstream open-source models.\nThe Kimi-K2 model supports a context length of 128k tokens.\nIt does not support visual capabilities.","pricing":{"input":0.54,"output":2.16},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":131000,"context_length":131000},{"model_id":"kimi-k2-instruct","model_name":"Kimi K2 Instruct","developer_id":15,"desc":"Kimi-K2 is a MoE architecture foundational model with extremely powerful coding and agent capabilities, featuring a total of 1 trillion parameters and activating 32 billion parameters. In benchmark performance tests across major categories such as general knowledge reasoning, programming, mathematics, and agents, the K2 model outperforms other mainstream open-source models.\nThe Kimi-K2 model supports a context length of 128k tokens.\nIt does not support visual capabilities.","pricing":{"input":0.54,"output":2.16},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"kimi-k2-turbo-preview","model_name":"Kimi K2 Turbo Preview","developer_id":15,"desc":"The kimi-k2-turbo-preview model is a high-speed version of kimi-k2, with the same model parameters as kimi-k2, but the output speed has been increased from 10 tokens per second to 40 tokens per second.","pricing":{"cache_read":0.3,"input":1.2,"output":4.8},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":262144,"context_length":262144},{"model_id":"paddleocr-vl-0.9b","model_name":"Paddleocr VL 0.9b","developer_id":25,"desc":"PaddleOCR-VL is an advanced and efficient document parsing model specifically designed for element recognition within documents. Its core component, PaddleOCR-VL-0.9B, is a compact yet powerful vision-language model (VLM) composed of a NaViT-style dynamic resolution visual encoder and the ERNIE-4.5-0.3B language model, enabling precise element recognition. This model supports 109 languages and excels at recognizing complex elements such as text, tables, formulas, and charts while maintaining extremely low resource consumption. Through comprehensive evaluations on widely used public benchmarks and internal benchmarks, PaddleOCR-VL achieves state-of-the-art (SOTA) performance in both page-level document parsing and element-level recognition. It significantly outperforms existing pipeline-based solutions, multimodal document parsing approaches, and advanced general-purpose multimodal large models, while also offering faster inference speed.","pricing":{"input":2,"output":0},"types":"ocr","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"pp-structurev3","model_name":"Pp Structurev3","developer_id":25,"desc":"PP-StructureV3 is an efficient and comprehensive document parsing solution that can effectively convert document images and PDF files into structured content (such as Markdown format). It features powerful capabilities including layout area detection, table recognition, formula recognition, chart understanding, and multi-column reading order recovery. This tool performs excellently across various document types and can handle complex document data.","pricing":{"input":2,"output":0},"types":"ocr","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-vl-235b-a22b-instruct","model_name":"Qwen3 VL 235B A22B Instruct","developer_id":13,"desc":"The Qwen3 series open-source models include hybrid models, thinking models, and non-thinking models, with both reasoning capabilities and general abilities reaching industry SOTA levels at the same scale.","pricing":{"input":0.274,"output":1.096},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":33000,"context_length":131000},{"model_id":"qwen3-vl-235b-a22b-thinking","model_name":"Qwen3 VL 235B A22B Thinking","developer_id":13,"desc":"The Qwen3 series open-source models include hybrid models, thinking models, and non-thinking models, with both reasoning capabilities and general abilities reaching industry SOTA levels at the same scale.","pricing":{"input":0.274,"output":2.74},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":33000,"context_length":131000},{"model_id":"qwen3-vl-30b-a3b-instruct","model_name":"Qwen3 VL 30B A3B Instruct","developer_id":13,"desc":"The Qwen3-VL series’ second-largest MoE model Instruct version offers fast response speed and supports ultra-long contexts such as long videos and long documents; it features comprehensive upgrades in image/video understanding, spatial perception, and universal recognition abilities; it also provides visual 2DD/3D localization capabilities, making it capable of handling complex real-world tasks.","pricing":{"input":0.1028,"output":0.4112},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":128000},{"model_id":"qwen3-vl-30b-a3b-thinking","model_name":"Qwen3 VL 30B A3B Thinking","developer_id":13,"desc":"The Qwen3-VL series’ second-largest MoE model Thinking version offers fast response speed, stronger multimodal understanding and reasoning, visual agent capabilities, and ultra-long context support for long videos and long documents; it features comprehensive upgrades in image/video understanding, spatial perception, and universal recognition abilities, making it capable of handling complex real-world tasks.","pricing":{"input":0.1028,"output":1.028},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":128000},{"model_id":"deepseek-ocr","model_name":"DeepSeek Ocr","developer_id":7,"desc":"DeepSeek-OCR is a vision-language model launched by DeepSeek AI, focusing on optical character recognition (OCR) and “contextual optical compression.” The model is designed to explore the limits of compressing contextual information from images, efficiently processing documents and converting them into structured text formats such as Markdown. The model requires an image as input.","pricing":{"input":0.02,"output":0.02},"types":"ocr","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":8000},{"model_id":"ernie-5.0-thinking-exp","model_name":"ERNIE 5.0 Thinking Exp","developer_id":25,"desc":"ERNIE 5.0 is the next-generation natively multimodal foundation model in the ERNIE family. Built on a unified multimodal architecture, it jointly learns from text, images, audio, and video to deliver broad multimodal capabilities.\n\nERNIE 5.0 features significantly upgraded core capabilities and shows strong performance across benchmarks, with notable gains in multimodal understanding, instruction following, creative writing, factual accuracy, and agent planning with tool use.","pricing":{"cache_read":0.82192,"input":0.82192,"output":3.28768},"types":"llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":64000,"context_length":119000},{"model_id":"flux-kontext-max","model_name":"Flux Kontext Max","developer_id":27,"desc":"","pricing":{"cache_read":0,"input":2,"output":0},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-flash-image-preview","model_name":"Gemini 2.5 Flash Image Preview","developer_id":8,"desc":"Aihubmix supports the gemini-2.5-flash-image-preview model; you can add extra parameters modalities=[\"text\", \"image\"] through the OpenAI-compatible chat interface; https://docs.aihubmix.com/en/api/Gemini-Guides#gemini-2-5-flash%3A-quick-task-support","pricing":{"cache_read":0.3,"input":0.3,"output":1.2},"types":"image_generation","features":"","input_modalities":"image,text","endpoints":"","max_output":8000,"context_length":32800},{"model_id":"glm-4.5","model_name":"GLM 4.5","developer_id":5,"desc":"GLM-4.5","pricing":{"input":0.4,"output":1.6},"types":"","features":"","input_modalities":"text","endpoints":"","max_output":98304,"context_length":131072},{"model_id":"gpt-4.1","model_name":"GPT 4.1","developer_id":12,"desc":"The latest flagship multimodal model supports million-token context, with encoding capability (SWE-bench 54.6%) and instruction-following (Scale AI 38.3%) performance significantly surpassing GPT-4o, while reducing costs by 26%, making it suitable for complex tasks. Its automatic caching mechanism offers a 75% cost reduction on cache hits.","pricing":{"cache_read":0.5,"input":2,"output":8},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"grok-4","model_name":"Grok 4","developer_id":9,"desc":"Grok, their latest and greatest flagship model, offers unparalleled performance in natural language, math, and reasoning – the perfect jack of all trades.\nThe current pointing model version is grok-4-0709.","pricing":{"cache_read":0.825,"input":3.3,"output":16.5},"types":"llm","features":"function_calling,structured_outputs,thinking","input_modalities":"text,image","endpoints":"","max_output":64000,"context_length":256000},{"model_id":"grok-4-fast-non-reasoning","model_name":"Grok 4 Fast","developer_id":9,"desc":"Grok-4-fast is a cost-effective inference model developed by xAI that delivers cutting-edge performance with excellent token efficiency. The model features a 2 million token context window, advanced Web and X search capabilities, and a unified architecture supporting both \"inference\" and \"non-inference\" modes. Compared to Grok 4, it reduces thinking tokens by an average of 40% and lowers the price by 98% while achieving the same performance.","pricing":{"cache_read":0.05,"input":0.2,"output":0.5},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":30000,"context_length":2000000},{"model_id":"grok-4-fast-reasoning","model_name":"Grok 4 Fast (reasoning)","developer_id":9,"desc":"Grok-4-fast is a cost-effective inference model developed by xAI that delivers cutting-edge performance with excellent token efficiency. The model features a 2 million token context window, advanced Web and X search capabilities, and a unified architecture supporting both \"inference\" and \"non-inference\" modes. Compared to Grok 4, it reduces thinking tokens by an average of 40% and lowers the price by 98% while achieving the same performance.","pricing":{"cache_read":0.05,"input":0.2,"output":0.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":30000,"context_length":2000000},{"model_id":"veo-3.1-generate-preview","model_name":"Veo 3.1 Generate Preview","developer_id":8,"desc":"Veo 3.1 is Google's state-of-the-art model for generating high-fidelity, 8-second 720p , 1080p or 4k videos featuring stunning realism and natively generated audio.","pricing":{"cache_read":0,"input":2,"output":2},"types":"video","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"veo-3.1-fast-generate-preview","model_name":"Veo 3.1 Fast Generate Preview","developer_id":8,"desc":"Veo 3.1 is Google's state-of-the-art model for generating high-fidelity, 8-second 720p , 1080p or 4k videos featuring stunning realism and natively generated audio.","pricing":{"input":2,"output":0},"types":"video","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"veo-3.0-generate-preview","model_name":"Veo 3.0 Generate Preview","developer_id":8,"desc":"Veo 3.0 Generate Preview is an advanced AI video generation model that supports text-to-video creation with synchronized audio, featuring excellent physical simulation and lip-sync capabilities. Users can generate vivid video clips from short story prompts. 🎟️ Limited-Time Deal: Save 10% Now.","pricing":{"cache_read":0,"input":2,"output":2},"types":"video","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"DeepSeek-OCR","model_name":"DeepSeek Ocr","developer_id":7,"desc":"DeepSeek-OCR is a vision-language model launched by DeepSeek AI, focusing on optical character recognition (OCR) and “contextual optical compression.” The model is designed to explore the limits of compressing contextual information from images, efficiently processing documents and converting them into structured text formats such as Markdown. The model requires an image as input.","pricing":{"input":0.02,"output":0.02},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":8000},{"model_id":"alicloud-kimi-k2-instruct","model_name":"Alicloud Kimi K2 Instruct","developer_id":15,"desc":"Kimi-K2 is a MoE architecture foundational model with extremely powerful coding and agent capabilities, featuring a total of 1 trillion parameters and activating 32 billion parameters. In benchmark performance tests across major categories such as general knowledge reasoning, programming, mathematics, and agents, the K2 model outperforms other mainstream open-source models.\nThe Kimi-K2 model supports a context length of 128k tokens.\nIt does not support visual capabilities.","pricing":{"input":0.548,"output":2.192},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-router","model_name":"Aihubmix Router","developer_id":12,"desc":"New model routing capability; request aihubmix-router to automatically route models based on question complexity, so everyone no longer needs to manually switch models; in our tests comparing the use of the model router versus only using GPT-4.1, we observed up to 60% cost savings while maintaining similar accuracy.  \nThe context length of the model router depends on the base model used for each prompt. Input size is 200,000, output size is 32,768.  \nCurrently, there are four routing models: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o4-mini.  \nPricing: Due to our current billing structure system, requests through aihubmix-router are billed at the price of gpt-4.1-mini regardless of which final model is used; future billing will be based on the actual model invoked.  \nEveryone is welcome to try it out; the interface will return the name of the actual called model.","pricing":{"cache_read":0.1,"input":0.4,"output":1.6},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4.1-mini","model_name":"GPT 4.1 Mini","developer_id":12,"desc":"Lightweight, high-performance model with million-token context and near-flagship-level encoding and image understanding capabilities, while reducing costs by 83%. It is suitable for rapid development and small to medium-sized applications. The automatic caching mechanism provides a 75% cost reduction on cache hits.","pricing":{"cache_read":0.1,"input":0.4,"output":1.6},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"gpt-4.1-nano","model_name":"GPT 4.1 Nano","developer_id":12,"desc":"Ultra-lightweight model with million-token context, optimized for speed and low latency, costing only $0.10 per million input tokens. It is suitable for edge computing and real-time interaction. The automatic caching mechanism offers a 75% cost reduction on cache hits.","pricing":{"cache_read":0.025,"input":0.1,"output":0.4},"types":"llm","features":"tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":32768,"context_length":1047576},{"model_id":"gemini-2.5-pro-preview-05-06","model_name":"Gemini 2.5 Pro Preview 05-06","developer_id":8,"desc":"gemini-2.5-pro latest model","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"gemini-2.5-pro-preview-03-25","model_name":"Gemini 2.5 Pro Preview 03-25","developer_id":8,"desc":"Supports high concurrency.  \nThe Gemini 2.5 Pro preview version is here, with higher limits for production testing.  \nGoogle's latest and most powerful model;","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-pro-preview-05-06-search","model_name":"Gemini 2.5 Pro Preview 05-06 Search","developer_id":8,"desc":"Integrated with Google's official search function.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm,search","features":"thinking,web","input_modalities":"text,image,audio,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-pro-preview-03-25-search","model_name":"Gemini 2.5 Pro Preview 03-25 Search","developer_id":8,"desc":"Integrated with Google's official search function.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm,search","features":"thinking,web,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-max-preview","model_name":"Qwen3 Max Preview","developer_id":13,"desc":"Qwen3-Max-Preview is the latest preview model in the Qwen3 series. This version is functionally equivalent to Qwen3-Max-Thinking — simply set extra_body={\"enable_thinking\": True} to enable the thinking mode. Compared to the Qwen2.5 series, it delivers significant improvements in overall general capabilities, including English–Chinese text understanding, complex instruction following, open-ended reasoning, multilingual processing, and tool-use proficiency. The model also exhibits fewer hallucinations and stronger overall reliability.","pricing":{"cache_read":0.1692,"input":0.846,"output":3.384},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-max","model_name":"Qwen3 Max","developer_id":13,"desc":"The Tongyi Qianwen 3 series Max model has undergone special upgrades in intelligent agent programming and tool invocation compared to the preview version. The officially released model this time reaches SOTA level in the field and is adapted to more complex intelligent agent scenarios.","pricing":{"cache_read":0.09016,"cache_write":0.5635,"input":0.4508,"output":1.8032},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":65536,"context_length":262144},{"model_id":"qwen3-next-80b-a3b-instruct","model_name":"Qwen3 Next 80B A3B Instruct","developer_id":13,"desc":"Qwen3-Next-80B-A3B-Instruct is an instruction-tuned model in the Qwen3-Next series, optimized for delivering fast, stable, and direct final answers without showing its reasoning steps (\"thinking traces\").\n\nUnlike chain-of-thought models, it focuses on generating consistent, instruction-following outputs, making it ideal for production environments. It excels at complex tasks like reasoning and coding while maintaining high throughput and stability, especially with ultra-long inputs and multi-turn dialogues.\n\nEngineered for efficiency, its performance rivals larger Qwen3 systems, making it perfectly suited for RAG, tool use, and agentic workflows where deterministic results are critical.","pricing":{"input":0.138,"output":0.552},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"qwen3-next-80b-a3b-thinking","model_name":"Qwen3 Next 80B A3B Thinking","developer_id":13,"desc":"Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that excels by outputting structured 'thinking' traces (Chain-of-Thought) by default.\n\nDesigned for hard, multi-step problems, it is ideal for tasks like math proofs, code synthesis, logic puzzles, and agentic planning. Compared to other Qwen3 variants, it offers greater stability during long reasoning chains and is tuned to follow complex instructions without getting repetitive or off-task.\n\nThis model is perfectly suited for agent frameworks, tool use (function calling), and benchmarks where a step-by-step breakdown is required. It leverages throughput-oriented techniques for fast generation of detailed, procedural outputs.","pricing":{"input":0.142,"output":1.42},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"qwen3-235b-a22b-instruct-2507","model_name":"Qwen3 235B A22B Instruct 2507","developer_id":13,"desc":"Qwen3-235B-A22B-Instruct-2507","pricing":{"input":0.28,"output":1.12},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":262144,"context_length":262144},{"model_id":"qwen3-235b-a22b-thinking-2507","model_name":"Qwen3 235B A22B Thinking 2507","developer_id":13,"desc":"The open-source thinking model based on Qwen3 has significantly improved in logical ability, general capability, knowledge enhancement, and creative ability compared to the previous version (Tongyi Qianwen 3-235B-A22B). It is suitable for high-difficulty and strong reasoning scenarios.","pricing":{"input":0.28,"output":2.8},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":262144,"context_length":262144},{"model_id":"qwen3-coder-30b-a3b-instruct","model_name":"Qwen3 Coder 30B A3B Instruct","developer_id":13,"desc":"The code generation model based on Qwen3 has powerful Coding Agent capabilities, achieving state-of-the-art performance compared to open-source models.The model adopts tiered pricing.","pricing":{"cache_read":0.2,"input":0.2,"output":0.8},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":262000,"context_length":2000000},{"model_id":"qwen3-coder-480b-a35b-instruct","model_name":"Qwen3 Coder 480B A35B Instruct","developer_id":13,"desc":"The code generation model based on Qwen3 has powerful Coding Agent capabilities, achieving state-of-the-art performance compared to open-source models.The model adopts tiered pricing.","pricing":{"cache_read":0.82,"input":0.82,"output":3.28},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":262000,"context_length":262000},{"model_id":"imagen-4.0-ultra-generate-exp-05-20","model_name":"Imagen 4.0 Ultra Generate","developer_id":8,"desc":"Image 4.0 Beta version, for testing purposes only. For production environment, it is recommended to use imagen-4.0-generate-preview-05-20.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-embeddings-v5-text-nano","model_name":"Jina Embeddings V5 Text Nano","developer_id":22,"desc":"A 3.8-billion-parameter general vector model (embedding model) for state-of-the-art multilingual embeddings for edge deployment.","pricing":{"input":0.05,"output":0},"types":"embedding","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-embeddings-v5-text-small","model_name":"Jina Embeddings V5 Text Small","developer_id":22,"desc":"A 3.8-billion-parameter general vector (embedding) model providing state-of-the-art multilingual embeddings with task-specific adapters.","pricing":{"input":0.05,"output":0},"types":"embedding","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-235b-a22b","model_name":"Qwen3 235B A22B","developer_id":13,"desc":"Qwen3-235B-A22B is a massive 235B parameter Mixture-of-Experts (MoE) model that operates with the efficiency of a 22B model. Its standout feature is the ability to seamlessly switch between a \"thinking\" mode for complex reasoning and a \"non-thinking\" mode for fast conversation, offering both world-class power and practical speed.","pricing":{"input":0.28,"output":1.12},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":131100},{"model_id":"qwen3-coder-flash","model_name":"Qwen3 Coder Flash","developer_id":13,"desc":"Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.","pricing":{"cache_read":0.136,"input":0.136,"output":0.544},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":65536,"context_length":256000},{"model_id":"qwen3-coder-plus","model_name":"Qwen3 Coder Plus","developer_id":13,"desc":"The code generation model based on Qwen3 has powerful Coding Agent capabilities, excels in tool invocation and environment interaction, and can achieve autonomous programming with outstanding coding abilities while also possessing general capabilities.The model adopts tiered pricing.","pricing":{"cache_read":0.108,"input":0.54,"output":2.16},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"qwen3-coder-plus-2025-07-22","model_name":"Qwen3 Coder Plus 2025 07-22","developer_id":13,"desc":"The code generation model based on Qwen3 has powerful Coding Agent capabilities, excels in tool invocation and environment interaction, and can achieve autonomous programming with outstanding coding abilities while also possessing general capabilities.The model adopts tiered pricing.","pricing":{"cache_read":0.54,"input":0.54,"output":2.16},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":65536,"context_length":128000},{"model_id":"gemini-2.5-pro-preview-06-05-search","model_name":"Gemini 2.5 Pro Preview 06-05 Search","developer_id":8,"desc":"Integrated with Google's official search function.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm,search","features":"thinking,web,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"DeepSeek-V3","model_name":"DeepSeek V3","developer_id":7,"desc":"It has been automatically upgraded to the latest released version, 250324.\nAutomatically upgraded to the latest released version 250324.","pricing":{"input":0.272,"output":1.088},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":1638000,"context_length":1638000},{"model_id":"LongCat-Flash-Chat","model_name":"Longcat Flash Chat","developer_id":28,"desc":"Meituan has officially released and open-sourced LongCat-Flash-Chat, which utilizes an innovative Mixture of Experts (MoE) and \"zero-computation expert\" mechanism to achieve a total of 560B parameters, while only activating around 27B parameters per token as needed. At the same time, end-to-end optimization for agents (including a self-built evaluation set and multi-agent trajectory data) significantly enhances its performance in tool usage and complex task orchestration.","pricing":{"input":0.14,"output":0.7},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen2.5-VL-72B-Instruct","model_name":"Qwen2.5 VL 72B Instruct","developer_id":13,"desc":"The model provider is the Sophon platform. Qwen2.5-VL-72B-Instruct is the latest vision-language model released by the Qwen team. This model excels not only at recognizing common objects such as flowers, birds, fish, and insects, but also at efficiently analyzing text, charts, icons, graphics, and layouts within images. As a visual agent, it is capable of reasoning and dynamically guiding tool usage, supporting both computer and mobile operations. Moreover, it can understand videos longer than one hour and accurately locate relevant video segments.","pricing":{"input":0.62,"output":0.62},"types":"","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"ernie-5.0-thinking-preview","model_name":"ERNIE 5.0 Thinking Preview","developer_id":25,"desc":"The new generation Wenxin model, Wenxin 5.0, is a native full-modal large model that adopts native full-modal unified modeling technology, jointly modeling text, images, audio, and video, possessing comprehensive full-modal capabilities. Wenxin 5.0's basic abilities are comprehensively upgraded, performing excellently on benchmark test sets, especially in multimodal understanding, instruction compliance, creative writing, factual accuracy, intelligent agent planning, and tool application.","pricing":{"cache_read":0.822,"input":0.822,"output":3.288},"types":"llm","features":"thinking,structured_outputs,function_calling","input_modalities":"text","endpoints":"","max_output":64000,"context_length":183000},{"model_id":"inclusionAI/Ling-1T","model_name":"Ling 1t","developer_id":29,"desc":"Ling-1T is the first flagship non-thinking model in the “Ling 2.0” series, featuring 1 trillion total parameters and approximately 50 billion active parameters per token. Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of efficient inference and scalable cognition. Ling-1T-base was pretrained on over 20 trillion high-quality, reasoning-intensive tokens, supports up to a 128K context length, and incorporates an Evolutionary Chain of Thought (Evo-CoT) process during mid-stage and post-stage training. This training regimen greatly enhances the model’s efficiency and depth of reasoning, enabling Ling-1T to achieve top performance across multiple complex reasoning benchmarks, balancing accuracy and efficiency.","pricing":{"input":0.548,"output":2.192},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"inclusionAI/Ring-1T","model_name":"Ring 1t","developer_id":29,"desc":"Ring-1T is an open-source idea model with a trillion parameters released by the Bailing team. It is based on the Ling 2.0 architecture and the Ling-1T-base foundational model for training, with a total parameter count of 1 trillion, an active parameter count of 50 billion, and supports up to a 128K context window. The model is trained via large-scale verifiable reward reinforcement learning (RLVR), combined with the self-developed Icepop reinforcement learning stabilization method and the efficient ASystem reinforcement learning system, significantly improving the model’s deep reasoning and natural language reasoning capabilities. Ring-1T achieves leading performance among open-source models on high-difficulty reasoning benchmarks such as mathematics competitions (e.g., IMO 2025), code generation (e.g., ICPC World Finals 2025), and logical reasoning.","pricing":{"input":0.548,"output":2.192},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gte-rerank-v2","model_name":"Gte Rerank V2","developer_id":13,"desc":"gte-rerank-v2 is a multilingual unified text ranking model developed by Tongyi Lab, covering multiple major languages worldwide and providing high-quality text ranking services. It is typically used in scenarios such as semantic retrieval and RAG, and can simply and effectively improve text retrieval performance. Given a query and a set of candidate texts (documents), the model ranks the candidates from highest to lowest based on their semantic relevance to the query.","pricing":{"input":0.11,"output":0.11},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"inclusionAI/Ling-flash-2.0","model_name":"Ling Flash 2.0","developer_id":29,"desc":"Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods.","pricing":{"input":0.136,"output":0.544},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"inclusionAI/Ling-mini-2.0","model_name":"Ling Mini 2.0","developer_id":29,"desc":"Ling-mini-2.0 is a small-sized, high-performance large language model based on the MoE architecture. It has a total of 16 billion parameters, but only activates 1.4 billion parameters per token (non-embedding 789 million), achieving extremely high generation speed. Thanks to the efficient MoE design and large-scale high-quality training data, despite activating only 1.4 billion parameters, Ling-mini-2.0 still demonstrates top-tier performance on downstream tasks comparable to dense LLMs under 10 billion parameters and even larger-scale MoE models.","pricing":{"input":0.068,"output":0.272},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"inclusionAI/Ring-flash-2.0","model_name":"Ring Flash 2.0","developer_id":29,"desc":"Ring-flash-2.0 is a high-performance thinking model deeply optimized based on the Ling-flash-2.0-base. It uses a mixture-of-experts (MoE) architecture with a total of 100 billion parameters, but only activates 6.1 billion parameters per inference. The model employs the original Icepop algorithm to solve the instability issues of large MoE models during reinforcement learning (RL) training, enabling its complex reasoning capabilities to continuously improve over long training cycles. Ring-flash-2.0 has achieved significant breakthroughs on multiple high-difficulty benchmarks, including mathematics competitions, code generation, and logical reasoning. Its performance not only surpasses top dense models under 40 billion parameters but also rivals larger open-source MoE models and closed-source high-performance thinking models. Although the model focuses on complex reasoning, it also performs exceptionally well on creative writing tasks. Furthermore, thanks to its efficient architecture, Ring-flash-2.0 delivers high performance with low-latency inference, significantly reducing deployment costs in high-concurrency scenarios.","pricing":{"input":0.136,"output":0.544},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-deepsearch-v1","model_name":"Jina Deepsearch V1","developer_id":22,"desc":"DeepSearch combines search, reading, and reasoning capabilities to pursue the best possible answer. It's fully compatible with OpenAI's Chat API format—just replace api.openai.com with aihubmix.com to get started.  \nThe stream will return the thinking process.","pricing":{"input":0.05,"output":0.05},"types":"llm,search","features":"thinking,web,deepsearch","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":1000000},{"model_id":"jina-embeddings-v4","model_name":"Jina Embeddings V4","developer_id":22,"desc":"A general-purpose vector model with 3.8 billion parameters, used for multimodal and multilingual retrieval, supporting both unidirectional and multi-vector embedding outputs.","pricing":{"input":0.05,"output":0.05},"types":"embedding","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-reranker-v3","model_name":"Jina Reranker V3","developer_id":22,"desc":"Multimodal multilingual document reranker, 131K context, 0.6B parameters, for visual document sorting.","pricing":{"input":0.05,"output":0.05},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":131000},{"model_id":"llama-4-maverick","model_name":"Llama 4 Maverick","developer_id":11,"desc":"Llama 4 Maverick is a high-capacity Mixture-of-Experts (MoE) model from Meta, featuring 400B total parameters and 128 experts, while activating an efficient 17B parameters per inference. Engineered for peak performance, it excels at advanced multimodal tasks.\n\nMaverick natively supports text and image input, producing multilingual text and code. With a 1-million-token context window and instruction tuning, it is optimized for complex image reasoning and general-purpose assistant-like interactions.\n\nReleased under the Llama 4 Community License, Maverick is ideal for research and commercial applications demanding state-of-the-art multimodal understanding and high throughput.","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":1048576},{"model_id":"llama-4-scout","model_name":"Llama 4 Scout","developer_id":11,"desc":"Llama 4 Scout is a highly efficient Mixture-of-Experts (MoE) model from Meta, activating 17B out of 109B total parameters per inference. It natively supports multimodal input (text and image) and multilingual output (text and code) across 12 languages.\n\nDesigned for assistant-style interaction and visual reasoning, Scout features a massive 10-million-token context window. It is instruction-tuned for tasks like multilingual chat and image understanding and is released under the Llama 4 Community License for local or commercial deployment.","pricing":{"input":0.2,"output":0.2},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":131000,"context_length":131000},{"model_id":"qwen-image","model_name":"Qwen Image","developer_id":13,"desc":"Qwen-Image is a foundational image generation model in the Qwen series, achieving significant progress in complex text rendering and precise image editing. Experiments show that the model has strong general capabilities in image generation and editing, especially excelling in Chinese text rendering.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-image-edit","model_name":"Qwen Image Edit","developer_id":13,"desc":"Qwen-Image-Edit is the image editing version of Qwen-Image. Based on the 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image's unique text rendering capabilities to image editing tasks, achieving precise text editing. Additionally, Qwen-Image-Edit can input the same image into Qwen2.5-VL (for visual semantic control) and the VAE encoder (for visual appearance control), enabling both semantic and appearance editing functionalities.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-image-max","model_name":"Qwen Image Max","developer_id":13,"desc":"Qwen-Image-Edit is the image editing version of Qwen-Image. Based on the 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image's unique text rendering capabilities to image editing tasks, achieving precise text editing. Additionally, Qwen-Image-Edit can input the same image into Qwen2.5-VL (for visual semantic control) and the VAE encoder (for visual appearance control), enabling both semantic and appearance editing functionalities.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-mt-plus","model_name":"Qwen Mt Plus","developer_id":13,"desc":"Based on the comprehensive upgrade of Qwen3, this flagship translation large model supports bidirectional translation across 92 languages. It offers fully enhanced model performance and translation quality, along with more stable terminology customization, format fidelity, and domain-prompting capabilities, making translations more accurate and natural.","pricing":{"input":0.492,"output":1.476},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":8000,"context_length":16000},{"model_id":"qwen-mt-turbo","model_name":"Qwen Mt Turbo","developer_id":13,"desc":"Based on the comprehensive upgrade of Qwen3, this flagship translation large model supports bidirectional translation across 92 languages. It offers fully enhanced model performance and translation quality, along with more stable terminology customization, format fidelity, and domain-prompting capabilities, making translations more accurate and natural.","pricing":{"input":0.192,"output":0.534912},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":8000,"context_length":16000},{"model_id":"qwen3-embedding-0.6b","model_name":"Qwen3 Embedding 0.6b","developer_id":13,"desc":"The Qwen3 Embedding model series is the latest proprietary model family from Qwen, specifically designed for text embedding and ranking tasks. Based on the dense base models of the Qwen3 series, it offers comprehensive text embedding and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the excellent multilingual capabilities, long-text understanding, and reasoning skills of its base models. The Qwen3 Embedding series demonstrates significant advancements in various text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bilingual text mining.","pricing":{"input":0.068,"output":0},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-embedding-4b","model_name":"Qwen3 Embedding 4B","developer_id":13,"desc":"The Qwen3 Embedding model series is the latest proprietary model family from Qwen, specifically designed for text embedding and ranking tasks. Based on the dense base models of the Qwen3 series, it offers comprehensive text embedding and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the excellent multilingual capabilities, long-text understanding, and reasoning skills of its base models. The Qwen3 Embedding series demonstrates significant advancements in various text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bilingual text mining.","pricing":{"input":0.068,"output":0.068},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-embedding-8b","model_name":"Qwen3 Embedding 8B","developer_id":13,"desc":"The Qwen3 Embedding model series is the latest proprietary model family from Qwen, specifically designed for text embedding and ranking tasks. Based on the dense base models of the Qwen3 series, it offers comprehensive text embedding and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the excellent multilingual capabilities, long-text understanding, and reasoning skills of its base models. The Qwen3 Embedding series demonstrates significant advancements in various text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bilingual text mining.","pricing":{"input":0.068,"output":0},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-reranker-0.6b","model_name":"Qwen3 Reranker 0.6b","developer_id":13,"desc":"Based on the dense foundational model of the Qwen3 series, it is specifically designed for ranking tasks. It inherits the base model’s outstanding multilingual capabilities, long-text understanding, and reasoning skills, achieving significant advancements in ranking tasks.","pricing":{"input":0.11,"output":0.11},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":8000,"context_length":16000},{"model_id":"qwen3-reranker-4b","model_name":"Qwen3 Reranker 4B","developer_id":13,"desc":"Based on the dense foundational model of the Qwen3 series, it is specifically designed for ranking tasks. It inherits the base model’s outstanding multilingual capabilities, long-text understanding, and reasoning skills, achieving significant advancements in ranking tasks.","pricing":{"input":0.11,"output":0.11},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-reranker-8b","model_name":"Qwen3 Reranker 8B","developer_id":13,"desc":"Based on the dense foundational model of the Qwen3 series, it is specifically designed for ranking tasks. It inherits the base model’s outstanding multilingual capabilities, long-text understanding, and reasoning skills, achieving significant advancements in ranking tasks.","pricing":{"input":0.11,"output":0.11},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"tao-8k","model_name":"Tao 8K","developer_id":25,"desc":"","pricing":{"input":0.068,"output":0.068},"types":"embedding","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seedream-4-0","model_name":"Doubao Seedream 4.0","developer_id":4,"desc":"Seedream 4.0 is a SOTA-level multimodal image creation model based on leading architecture. It breaks the creative boundaries of traditional text-to-image models by natively supporting text, single image, and multiple image inputs. Users can freely combine text and images to achieve various creative styles within the same model, such as multi-image fusion creation based on subject consistency, image editing, and set image generation, making image creation more flexible and controllable.\nSeedream 4.0 supports composite editing with up to 10 images in a single input. Through deep reasoning of prompt words, it automatically adapts the optimal image aspect ratio and generation quantity, enabling continuous output of up to 15 content-related images at one time. Additionally, the model significantly improves the accuracy and content diversity of Chinese generation, supports 4K ultra-high-definition output, and provides a one-stop solution from generation to editing for professional image creation.","pricing":{"cache_read":0,"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"embedding-v1","model_name":"Embedding V1","developer_id":25,"desc":"Embedding-V1 is a text representation model based on Baidu's Wenxin large model technology, capable of converting text into numerical vector forms for applications such as text retrieval, information recommendation, and knowledge mining. Embedding-V1 provides an Embeddings interface that generates corresponding vector representations based on the input content. By calling this interface, you can input text into the model and obtain the corresponding vector representations for subsequent text processing and analysis.","pricing":{"input":0.068,"output":0},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"ernie-4.5-turbo-latest","model_name":"ERNIE 4.5 Turbo","developer_id":25,"desc":"Wenxin 4.5 Turbo also has significant improvements in hallucination reduction, logical reasoning, and coding capabilities. Compared to Wenxin 4.5, it is faster and more affordable.","pricing":{"input":0.11,"output":0.44},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":12000,"context_length":135000},{"model_id":"glm-4.5-x","model_name":"GLM 4.5 X","developer_id":5,"desc":"GLM-4.5-X is the high-speed version of GLM-4.5, offering powerful performance with a generation speed of up to 100 tokens per second.","pricing":{"cache_read":0.44,"input":2.2,"output":8.91},"types":"","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gme-qwen2-vl-2b-instruct","model_name":"Gme Qwen2 VL 2B Instruct","developer_id":13,"desc":"The GME-Qwen2VL series is a unified multimodal Embedding model trained based on the Qwen2-VL multimodal large language model (MLLMs). The GME model supports three types of inputs: text, images, and image-text pairs. All these input types can generate universal vector representations and exhibit excellent retrieval performance.","pricing":{"input":0.138,"output":0.138},"types":"embedding","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"bce-reranker-base","model_name":"Bce Reranker Base","developer_id":13,"desc":"Based on the dense foundational model of the Qwen3 series, it is specifically designed for ranking tasks. It inherits the base model’s outstanding multilingual capabilities, long-text understanding, and reasoning skills, achieving significant advancements in ranking tasks.","pricing":{"input":0.068,"output":0},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"codex-mini-latest","model_name":"Codex Mini","developer_id":12,"desc":"Only supports v1/responses API calls.https://docs.aihubmix.com/en/api/Responses-API\ncodex-mini-latest is a fine-tuned version of o4-mini specifically for use in Codex CLI. For direct use in the API, we recommend starting with gpt-4.1.","pricing":{"cache_read":0.375,"input":1.5,"output":6},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-clip-v2","model_name":"Jina Clip V2","developer_id":22,"desc":"Multi-modal Embeddings Model, multilingual, 1024-dimensional, 865M parameters.","pricing":{"input":0.05,"output":0.05},"types":"embedding","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-reranker-m0","model_name":"Jina Reranker M0","developer_id":22,"desc":"Multimodal multilingual document reranker, 10K context, 2.4B parameters, for visual document sorting.","pricing":{"input":0.05,"output":0.05},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-colbert-v2","model_name":"Jina Colbert V2","developer_id":22,"desc":"Multi-language ColBERT embeddings model, 560M parameters, used for embedding and reranking.","pricing":{"input":0.05,"output":0.05},"types":"embedding,rerank","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"DeepSeek-R1","model_name":"DeepSeek R1","developer_id":7,"desc":"DeepSeek R1 is a new open-source model with performance on par with OpenAI's o1 and features fully open reasoning tokens. It is a 671B-parameter Mixture-of-Experts (MoE) model that activates 37B parameters during inference.","pricing":{"input":0.4,"output":2},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":1638000,"context_length":1638000},{"model_id":"gpt-4o-search-preview","model_name":"GPT 4o Search Preview","developer_id":12,"desc":"Using the Chat Completions API, you can directly access the fine-tuned models and tool used by Search in ChatGPT.\n\nWhen using Chat Completions, the model always retrieves information from the web before responding to your query. To use web_search_preview as a tool that models like gpt-4o and gpt-4o-mini invoke only when necessary, switch to using the Responses API.\n\nCurrently, you need to use one of these models to use web search in Chat Completions:\n\ngpt-4o-search-preview\ngpt-4o-mini-search-preview\nWeb search parameter example\nimport OpenAI from \"openai\";\nconst client = new OpenAI();\n\nconst completion = await client.chat.completions.create({\n    model: \"gpt-4o-search-preview\",\n    web_search_options: {},\n    messages: [{\n        \"role\": \"user\",\n        \"content\": \"What was a positive news story from today?\"\n    }],\n});\n\nconsole.log(completion.choices[0].message.content);\nOutput and citations\nThe API response item in the choices array will include:\n\nmessage.content with the text result from the model, inclusive of any inline citations\nannotations with a list of cited URLs\nBy default, the model's response will include inline citations for URLs found in the web search results. In addition to this, the url_citation annotation object will contain the URL and title of the cited source, as well as the start and end index characters in the model's response where those sources were used.","pricing":{"cache_read":1.25,"input":2.5,"output":10},"types":"llm,search","features":"web,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-4o-mini-search-preview","model_name":"GPT 4o Mini Search Preview","developer_id":12,"desc":"Using the Chat Completions API, you can directly access the fine-tuned models and tool used by Search in ChatGPT.\n\nWhen using Chat Completions, the model always retrieves information from the web before responding to your query. To use web_search_preview as a tool that models like gpt-4o and gpt-4o-mini invoke only when necessary, switch to using the Responses API.\n\nCurrently, you need to use one of these models to use web search in Chat Completions:\n\ngpt-4o-search-preview\ngpt-4o-mini-search-preview\nWeb search parameter example\nimport OpenAI from \"openai\";\nconst client = new OpenAI();\n\nconst completion = await client.chat.completions.create({\n    model: \"gpt-4o-search-preview\",\n    web_search_options: {},\n    messages: [{\n        \"role\": \"user\",\n        \"content\": \"What was a positive news story from today?\"\n    }],\n});\n\nconsole.log(completion.choices[0].message.content);\nOutput and citations\nThe API response item in the choices array will include:\n\nmessage.content with the text result from the model, inclusive of any inline citations\nannotations with a list of cited URLs\nBy default, the model's response will include inline citations for URLs found in the web search results. In addition to this, the url_citation annotation object will contain the URL and title of the cited source, as well as the start and end index characters in the model's response where those sources were used.","pricing":{"cache_read":0.075,"input":0.15,"output":0.6},"types":"llm,search","features":"web,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"jina-embeddings-v3","model_name":"Jina Embeddings V3","developer_id":22,"desc":"Text Embeddings Model, multilingual, 1024-dimensional, 570M parameters.","pricing":{"cache_read":0,"input":0.05,"output":0.05},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"mimo-v2-flash-free","model_name":"MiMo V2 Flash (free)","developer_id":31,"desc":"MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It adopts a MoE architecture with 309B total parameters and 15B active parameters per inference, balancing performance and efficiency. The model features a hybrid attention architecture, supports a hybrid-thinking toggle, and offers a 256K context window, enabling strong capabilities in complex reasoning, code generation, and agent-based scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks #1 among open-source models globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much.","pricing":{"cache_read":0,"input":0,"output":0},"types":"llm","features":"web","input_modalities":"text","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"ernie-4.5","model_name":"ERNIE 4.5","developer_id":25,"desc":"Wenxin Large Model 4.5 is a next-generation native multimodal foundational model independently developed by Baidu. It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating excellent multimodal understanding capabilities; it possesses more advanced language abilities, with comprehensive improvements in comprehension, generation, logic, and memory, as well as significant enhancements in hallucination reduction, logical reasoning, and coding capabilities.ERNIE-4.5-21B-A3B is an aligned open-source model with a MoE structure, having a total of 21 billion parameters and 3 billion activated parameters.","pricing":{"input":0.068,"output":0.272},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":64000,"context_length":160000},{"model_id":"ernie-4.5-turbo-vl","model_name":"ERNIE 4.5 Turbo VL","developer_id":25,"desc":"The new version of the Wenxin Yiyan large model significantly improves capabilities in image understanding, creation, translation, and coding. It supports a context length of up to 32K tokens for the first time, with a notable reduction in the latency of the first token.","pricing":{"input":0.4,"output":1.2},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16000,"context_length":139000},{"model_id":"claude-3-7-sonnet","model_name":"Claude 3.7 Sonnet","developer_id":2,"desc":"Support for the thinking parameter through the original Claude SDK.","pricing":{"input":3.3,"output":16.5},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":128000,"context_length":200000},{"model_id":"FLUX-1.1-pro","model_name":"Flux 1.1 Pro","developer_id":27,"desc":"FLUX-1.1-pro is an AI image generation tool for professional creators and content workflows. It understands complex semantic and structural instructions to deliver high consistency, multi-image coherence, and style customization from text prompts.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"o3-mini","model_name":"O3 Mini","developer_id":12,"desc":"OpenAI's latest fast inference model excels at STEAM tasks and offers exceptional cost-effectiveness. Official support for cache hits reduces input prices by half.","pricing":{"cache_read":0.55,"input":1.1,"output":4.4},"types":"llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":100000,"context_length":200000},{"model_id":"doubao-seed-1-6","model_name":"Doubao Seed 1.6","developer_id":4,"desc":"Doubao-Seed-1.6 is a brand new multimodal deep reasoning model that supports four types of reasoning effort: minimal, low, medium, and high. It offers stronger model performance, serving complex tasks and challenging scenarios. It supports a 256k context window, with output length up to a maximum of 32k tokens.","pricing":{"cache_read":0.036,"input":0.18,"output":1.8},"types":"llm","features":"thinking，tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"doubao-seed-1-6-flash","model_name":"Doubao Seed 1.6 Flash","developer_id":4,"desc":"Doubao-Seed-1.6-flash is an extremely fast multimodal deep thinking model, with TPOT requiring only 10ms. It supports both text and visual understanding, with its text comprehension skills surpassing the previous generation lite model and its visual understanding on par with competitor's pro series models. It supports a 256k context window and an output length of up to 16k tokens.","pricing":{"cache_read":0.0088,"input":0.044,"output":0.44},"types":"llm","features":"thinking，tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":33000,"context_length":256000},{"model_id":"doubao-seed-1-6-lite","model_name":"Doubao Seed 1.6 Lite","developer_id":4,"desc":"Doubao-Seed-1.6-lite is a brand new multimodal deep reasoning model that supports adjustable reasoning effort, with four modes: Minimal, Low, Medium, and High. It offers better cost performance, making it the best choice for common tasks, with a context window of up to 256k.","pricing":{"cache_read":0.0164,"input":0.082,"output":0.656},"types":"llm","features":"thinking，tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"doubao-seed-1-6-thinking","model_name":"Doubao Seed 1.6 Thinking","developer_id":4,"desc":"The Doubao-Seed-1.6-thinking model has significantly enhanced reasoning capabilities. Compared with Doubao-1.5-thinking-pro, it has further improvements in fundamental abilities such as coding, mathematics, and logical reasoning, and now also supports visual understanding. It supports a 256k context window, with output length supporting up to 16k tokens.","pricing":{"cache_read":0.036,"input":0.18,"output":1.8},"types":"llm","features":"thinking，tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"qwen3-30b-a3b-instruct-2507","model_name":"Qwen3 30B A3B Instruct 2507","developer_id":13,"desc":"Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise.\nMarkedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences.\nEnhanced 256K long-context understanding capabilities.","pricing":{"input":0.1028,"output":0.4112},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-30b-a3b-thinking-2507","model_name":"Qwen3 30B A3B Thinking 2507","developer_id":13,"desc":"Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise.\nMarkedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences.\nEnhanced 256K long-context understanding capabilities.","pricing":{"input":0.12,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-3-235b-a22b-thinking-2507","model_name":"Qwen 3 235B A22B Thinking 2507","developer_id":13,"desc":"cerebras","pricing":{"input":0.28,"output":2.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-embedding-001","model_name":"Gemini Embedding 001","developer_id":8,"desc":"Latest version","pricing":{"input":0.15,"output":0.15},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-oss-120b","model_name":"gpt-oss-120b","developer_id":12,"desc":"gpt-oss-120b is a 117B-parameter open-weight Mixture-of-Experts (MoE) language model from OpenAI, designed for high-reasoning, agentic, and general-purpose production use cases. Activating just 5.1B parameters per pass, it is optimized to run on a single H100 GPU with native MXFP4 quantization. The model features configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.","pricing":{"input":0.18,"output":0.9},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":32768,"context_length":131072},{"model_id":"Qwen2-VL-72B-Instruct","model_name":"Qwen2 VL 72B Instruct","developer_id":13,"desc":"The model provider is the Sophnet platform. Qwen2-VL-72B-Instruct is the latest iteration in the Qwen2-VL series launched by Alibaba Cloud, representing nearly a year of innovative achievements. This model has 72 billion parameters and can understand images of various resolutions and aspect ratios. Additionally, it supports video understanding of over 20 minutes, enabling high-quality video question answering, dialogue, and content creation, along with complex reasoning and decision-making capabilities.\n\n- State-of-the-art image understanding: capable of processing images of various resolutions and aspect ratios, performing excellently across multiple visual understanding benchmarks.\n- Long video understanding: supports video comprehension exceeding 20 minutes, enabling high-quality video Q\u0026A, dialogues, and content creation.\n- Agent operation capability: equipped with complex reasoning and decision-making abilities, it can integrate with devices such as phones and robots to perform automated operations based on visual environments and textual instructions.\n- Multilingual support: in addition to English and Chinese, it supports understanding text in images in multiple languages, including most European languages, Japanese, Korean, Arabic, Vietnamese, and more.\n- Supports a maximum context length of 128K tokens, offering powerful processing capabilities.","pricing":{"input":2.18,"output":6.54},"types":"","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen2-VL-7B-Instruct","model_name":"Qwen2 VL 7B Instruct","developer_id":13,"desc":"The model provider is the Sophnet platform. Qwen2-VL-7B-Instruct is the latest vision-language model launched by Alibaba Cloud and the newest member of the Qwen family. This model is proficient not only in recognizing common objects but also in analyzing text, charts, icons, and layouts within images. As a visual agent, it can reason and dynamically guide tool usage, supporting operations on computers and mobile phones. Additionally, it can understand long videos exceeding one hour and capture key events, accurately locate objects in images, and generate structured outputs for data such as invoices and tables, making it suitable for various scenarios including finance and business.\n\n- Vision understanding capability: not only recognizes common objects but also analyzes text, charts, icons, and layouts within images.\n- Agent capability: functions as a visual agent capable of reasoning and dynamically guiding tool usage, supporting operations on computers and mobile phones.\n- Long video understanding: can comprehend video content over one hour in length and accurately localize relevant video segments.\n- Visual localization: precisely locates objects within images by generating bounding boxes or points, providing stable JSON coordinate outputs.\n- Structured output: supports structured data output for invoices, tables, and other data, suitable for finance, business, and various other scenarios.","pricing":{"input":0.28,"output":0.7},"types":"","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-kimi-for-coding","model_name":"CC Kimi For Coding","developer_id":15,"desc":"for claude code","pricing":{"cache_read":0.02,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen3-30B-A3B","model_name":"Qwen3 30B A3B","developer_id":13,"desc":"Provided by chutes.ai","pricing":{"input":1,"output":1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen3-32B","model_name":"Qwen3 32B","developer_id":13,"desc":"","pricing":{"input":0.4,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-32b","model_name":"Qwen3 32B","developer_id":13,"desc":"","pricing":{"input":0.16,"output":0.64},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen3-14B","model_name":"Qwen3 14B","developer_id":13,"desc":"Provided by chutes.ai","pricing":{"input":0.5,"output":0.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen3-8B","model_name":"Qwen3 8B","developer_id":13,"desc":"Provided by chutes.ai","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"embedding-2","model_name":"Embedding 2","developer_id":5,"desc":"A text vector model that converts input text information into vector representations so that, in conjunction with a vector database, it provides an external knowledge base for the large model, thereby improving the accuracy of the model’s reasoning.","pricing":{"input":0.0686,"output":0.0686},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":8000},{"model_id":"embedding-3","model_name":"Embedding 3","developer_id":5,"desc":"A text vector model that converts input text into vector representations to work with a vector database and provide an external knowledge base for a large model. The model supports custom vector dimensions; it is recommended to choose 256, 512, 1024, or 2048 dimensions.","pricing":{"input":0.0686,"output":0.0686},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":8000},{"model_id":"gemini-2.5-pro-preview-06-05","model_name":"Gemini 2.5 Pro Preview 06-05","developer_id":8,"desc":"Google’s latest multimodal flagship model, combining exceptional coding and reasoning capabilities. Its massive 1 million token context window (soon to expand to 2 million) places it at the top of the WebDevArena and LMArena leaderboards. It is particularly well-suited for developing aesthetically pleasing and highly functional interactive web applications, code transformation, and complex workflows. The newly introduced \"reasoning budget\" feature cleverly balances cost and performance, while optimized tool calls and response styles further enhance development efficiency, making it the ideal choice for rapid prototyping and advanced coding.","pricing":{"cache_read":0.125,"input":1.25,"output":10},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":65536,"context_length":1048576},{"model_id":"Qwen/Qwen2.5-VL-72B-Instruct","model_name":"Qwen2.5 VL 72B Instruct","developer_id":13,"desc":"Qwen2.5-VL is a visual language model from the Qwen2.5 series, equipped with strong visual understanding and reasoning capabilities. It can recognize objects, analyze text and charts, understand key events in long videos, and accurately locate targets within images. The model supports structured output, making it suitable for data such as invoices and forms, and performs excellently in multiple benchmark tests.","pricing":{"cache_read":0,"input":0.5,"output":0.5},"types":"llm","features":"","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1","model_name":"O1","developer_id":12,"desc":"OpenAI's most powerful O-series model supports official cache hits that halve the input cost.","pricing":{"cache_read":7.5,"input":15,"output":60},"types":"llm","features":"thinking","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1-pro","model_name":"O1 Pro","developer_id":12,"desc":"The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.","pricing":{"cache_read":170,"input":170,"output":680},"types":"llm","features":"thinking","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"ByteDance-Seed/Seed-OSS-36B-Instruct","model_name":"Seed Oss 36B Instruct","developer_id":4,"desc":"Seed-OSS is a series of open-source large language models developed by ByteDance's Seed team, designed specifically for powerful long-context processing, reasoning, agents, and general capabilities. Among this series, Seed-OSS-36B-Instruct is an instruction-tuned model with 36 billion parameters that natively supports ultra-long context lengths, enabling it to process massive documents or complex codebases in a single pass. This model is specially optimized for reasoning, code generation, and agent tasks (such as tool usage), while maintaining balanced and excellent general capabilities. A notable feature of this model is the \"Thinking Budget\" functionality, which allows users to flexibly adjust the inference length as needed, thereby effectively improving inference efficiency in practical applications.","pricing":{"input":0.2,"output":0.534},"types":"llm","features":"thinking，tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":256000},{"model_id":"doubao-seed-1-6-250615","model_name":"Doubao Seed 1.6 250615","developer_id":4,"desc":"Doubao-Seed-1.6 is a brand new multimodal deep reasoning model that supports four types of reasoning effort: minimal, low, medium, and high. It offers stronger model performance, serving complex tasks and challenging scenarios. It supports a 256k context window, with output length up to a maximum of 32k tokens.","pricing":{"cache_read":0.036,"input":0.18,"output":2.52},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seed-1-6-flash-250615","model_name":"Doubao Seed 1.6 Flash 250615","developer_id":4,"desc":"Doubao-Seed-1.6-flash is an extremely fast multimodal deep thinking model, with TPOT requiring only 10ms. It supports both text and visual understanding, with its text comprehension skills surpassing the previous generation lite model and its visual understanding on par with competitor's pro series models. It supports a 256k context window and an output length of up to 16k tokens.","pricing":{"cache_read":0.0088,"input":0.044,"output":0.44},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seed-1-6-thinking-250615","model_name":"Doubao Seed 1.6 Thinking 250615","developer_id":4,"desc":"The Doubao-Seed-1.6-thinking model has significantly enhanced reasoning capabilities. Compared with Doubao-1.5-thinking-pro, it has further improvements in fundamental abilities such as coding, mathematics, and logical reasoning, and now also supports visual understanding. It supports a 256k context window, with output length supporting up to 16k tokens.","pricing":{"cache_read":0.036,"input":0.18,"output":2.52},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seed-1-6-vision-250815","model_name":"Doubao Seed 1.6 Vision 250815","developer_id":4,"desc":"Doubao-Seed-1.6-vision is a visual deep-thinking model that demonstrates stronger general multimodal understanding and reasoning capabilities in scenarios such as education, image moderation, inspection and security, and AI search Q\u0026A. It supports a 256K context window and an output length of up to 64K tokens.","pricing":{"cache_read":0.021918,"input":0.10959,"output":1.0959},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-flash-preview-tts","model_name":"Gemini 2.5 Flash Preview Tts","developer_id":8,"desc":"Gemini 2.5 Flash Preview TTS is a lightweight, low-latency text-to-speech model designed for real-time voice generation. It produces natural, expressive speech with accurate control over tone, style, and pacing, while dynamically adjusting speaking speed based on context and instructions. The model also maintains consistent and distinguishable voices across multi-turn and multi-speaker conversations, making it well-suited for interactive and conversational applications that require stable, high-quality audio output.","pricing":{"cache_read":0,"input":0.5,"output":0.5},"types":"tts","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-pro-preview-tts","model_name":"Gemini 2.5 Pro Preview Tts","developer_id":8,"desc":"Gemini 2.5 Pro Preview TTS is a high-fidelity text-to-speech model designed for premium voice experiences and complex speech generation scenarios. It delivers highly natural and expressive audio with precise control over tone, style, and emotional nuance, while maintaining smooth, context-aware pacing across long-form content. The model excels in multi-speaker and dialogue-heavy use cases, preserving consistent character voices and conversational coherence, making it well-suited for narration, storytelling, and advanced conversational AI applications.","pricing":{"cache_read":0,"input":1,"output":1},"types":"tts","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemma-3-12b-it","model_name":"Gemma 3 12B It","developer_id":8,"desc":"Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemma-3-27b-it","model_name":"Gemma 3 27B It","developer_id":8,"desc":"Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemma-3-4b-it","model_name":"Gemma 3 4B It","developer_id":8,"desc":"Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemma-3n-e4b-it","model_name":"Gemma 3n E4B It","developer_id":8,"desc":"Gemma 3n is a generative AI model optimized for use in everyday devices, such as phones, laptops, and tablets. This model includes innovations in parameter-efficient processing, including Per-Layer Embedding (PLE) parameter caching and a MatFormer model architecture that provides the flexibility to reduce compute and memory requirements. These models feature audio input handling, as well as text and visual data.","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-1.5-thinking-pro","model_name":"Doubao 1.5 Thinking Pro","developer_id":4,"desc":"Doubao-1.5 is a brand-new deep thinking model that excels in specialized fields such as mathematics, programming, scientific reasoning, and general tasks like creative writing. It achieves or approaches the top-tier industry level on multiple authoritative benchmarks including AIME 2024, Codeforces, and GPQA. It supports a 128k context window and 16k output.","pricing":{"cache_read":0.62,"input":0.62,"output":2.48},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-minimax-m2","model_name":"CC MiniMax M2","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-Prover-V2-671B","model_name":"DeepSeek Prover V2 671B","developer_id":7,"desc":"Provided by chutes.ai\nDeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5 Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description.","pricing":{"input":0.1,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemma-3-1b-it","model_name":"Gemma 3 1B It","developer_id":8,"desc":"Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-r1-distill-llama-70b","model_name":"DeepSeek R1 Distill Llama 70B","developer_id":7,"desc":"Provided by Groq, the DeepSeek-R1-Distill model is fine-tuned based on an open-source model, using samples generated by DeepSeek-R1. We have made slight modifications to their configurations and tokenizers. Please use our settings to run these models.","pricing":{"input":0.8,"output":1.6},"types":"llm","features":"thinking","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-mini-tts","model_name":"GPT 4o Mini Tts","developer_id":12,"desc":"OpenAI’s latest TTS model, gpt-4o-mini-tts, uses the same API endpoint (/v1/audio/speech) as tts-1. However, OpenAI introduced a new pricing method without providing billing details via API, causing discrepancies between official pricing and aihubmix’s charges—some requests may cost more, others less. Avoid using this model if precise billing accuracy is essential.","pricing":{"cache_read":0.6,"input":0.6,"output":12},"types":"tts","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"tngtech/DeepSeek-R1T-Chimera","model_name":"DeepSeek R1t Chimera","developer_id":7,"desc":"Provided by chutes.ai\nDeepSeek-R1T-Chimera merges DeepSeek-R1’s reasoning strengths with DeepSeek-V3 (0324)’s token-efficiency improvements into a MoE Transformer optimized for general text generation. It integrates pretrained weights from both models and is released under the MIT license for research and commercial use.\n","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-5-sonnet","model_name":"Claude 3.5 Sonnet","developer_id":2,"desc":"Claude 3.5 Sonnet delivers performance superior to Opus and speeds faster than its predecessor, all at the same price point. Its core strengths include:\n\nCoding: Autonomously writes, edits, and executes code with advanced reasoning and troubleshooting.\nData Science: Augments human expertise by analyzing unstructured data and using multiple tools to generate insights.\nVisual Processing: Excels at interpreting charts, graphs, and images, accurately transcribing text to derive high-level insights.\nAgentic Tasks: Exceptional tool use makes it highly effective for complex, multi-step agentic workflows that interact with other systems.","pricing":{"input":3.3,"output":16.5},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":8192,"context_length":200000},{"model_id":"veo-2.0-generate-001","model_name":"Veo 2.0 Generate 001","developer_id":8,"desc":"Veo 2.0 is an advanced video generation model capable of producing high-quality videos based on text or image prompts. It excels in understanding real-world physics and human motion, resulting in fluid character movements and lifelike scenes. Veo 2.0 supports various visual styles and camera control options, including lens types, angles, and motion effects. Users can generate 8-second video clips at 720p resolution.","pricing":{"cache_read":0,"input":2,"output":2},"types":"video","features":"","input_modalities":"video","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1-preview","model_name":"O1 Preview","developer_id":12,"desc":"The latest and most powerful inference model from OpenAI; AiHubMix uses both OpenAI and Microsoft Azure OpenAI channels simultaneously to achieve high-concurrency load balancing.","pricing":{"cache_read":7.5,"input":15,"output":60},"types":"llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1-mini","model_name":"O1 Mini","developer_id":12,"desc":"o1-mini is faster and 80% cheaper, and is competitive with o1-preview on coding tasks. AiHubMix uses both OpenAI and Microsoft Azure OpenAI channels simultaneously.","pricing":{"cache_read":1.5,"input":3,"output":12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-2024-11-20","model_name":"GPT 4o 2024 11-20","developer_id":12,"desc":"The latest version of the GPT-4o model; it is recommended to use this version, as it is currently smarter than the regular 4o.","pricing":{"cache_read":1.25,"input":2.5,"output":10},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-4o","model_name":"GPT 4o","developer_id":12,"desc":"GPT-4o (“o” stands for “omni”) is a new-generation multimodal model designed for more natural human–computer interaction. It can accept any combination of text, audio, image, and video as input, and generate multimodal outputs including text, audio, and images. With audio response latency as low as 232 milliseconds on average around 320 milliseconds, it approaches real human conversational speed. The model delivers strong performance in English text and code, significantly improved multilingual understanding, and outstanding capabilities in visual and audio perception, while offering faster API performance and substantially reduced cost for real-time and complex multimodal applications.","pricing":{"cache_read":1.25,"input":2.5,"output":10},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"gpt-4o-mini","model_name":"GPT 4o Mini","developer_id":12,"desc":"The lightweight version of GPT-4o, which is affordable and fast, suitable for handling simple tasks; our site supports the official automatic caching for this model, and charges for cache hits will be automatically halved.","pricing":{"cache_read":0.075,"input":0.15,"output":0.6},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":16384,"context_length":128000},{"model_id":"AiHubmix-mistral-medium","model_name":"Aihubmix Mistral Medium","developer_id":10,"desc":"Mistral Medium 3 is a SOTA \u0026 versatile model designed for a wide range of tasks, including programming, mathematical reasoning, understanding long documents, summarization, and dialogue.\n\nIt boasts multi-modal capabilities, enabling it to process visual inputs, and supports dozens of languages, including over 80 coding languages. Additionally, it features function calling and agentic workflows.\n\nMistral Medium 3 is optimized for single-node inference, particularly for long-context applications. Its size allows it to achieve high throughput on a single node.","pricing":{"input":0.4,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ERNIE-X1.1-Preview","model_name":"ERNIE X1.1 Preview","developer_id":25,"desc":"The Wenxin large model X1.1 has made significant improvements in question answering, tool invocation, intelligent agents, instruction following, logical reasoning, mathematics, and coding tasks, with notable enhancements in factual accuracy. The context length has been extended to 64K tokens, supporting longer inputs and dialogue history, which improves the coherence of long-chain reasoning while maintaining response speed.","pricing":{"input":0.136,"output":0.544},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":64000,"context_length":119000},{"model_id":"Qwen/QwQ-32B","model_name":"QwQ 32B","developer_id":13,"desc":"Silicon-based flow provision","pricing":{"input":0.14,"output":0.56},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"chutesai/Mistral-Small-3.1-24B-Instruct-2503","model_name":"Mistral Small 3.1 24B Instruct 2503","developer_id":10,"desc":"Mistral's latest open-source small model; provided by chutes.ai.","pricing":{"input":0.2,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"minimax-m2","model_name":"MiniMax M2","developer_id":18,"desc":"MiniMax-M2 redefines efficiency for intelligent agents. It is a compact, fast, and cost-effective MoE model with a total of 230 billion parameters and 10 billion active parameters, designed for top performance in coding and intelligent agent tasks while maintaining strong general intelligence. With only 10 billion active parameters, MiniMax-M2 delivers the complex end-to-end tool usage performance expected from today's leading models, but in a more streamlined form factor, making deployment and scaling easier than ever before.","pricing":{"input":0.288,"output":1.152},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":192000,"context_length":204800},{"model_id":"ernie-x1.1-preview","model_name":"ERNIE X1.1 Preview","developer_id":25,"desc":"The Wenxin large model X1.1 has made significant improvements in question answering, tool invocation, intelligent agents, instruction following, logical reasoning, mathematics, and coding tasks, with notable enhancements in factual accuracy. The context length has been extended to 64K tokens, supporting longer inputs and dialogue history, which improves the coherence of long-chain reasoning while maintaining response speed.","pricing":{"input":0.136,"output":0.544},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ernie-4.5-0.3b","model_name":"ERNIE 4.5 0.3b","developer_id":25,"desc":"Wenxin Large Model 4.5 is a next-generation native multimodal foundational large model independently developed by Baidu. It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating excellent multimodal understanding capabilities. The model possesses enhanced language abilities, with comprehensive improvements in understanding, generation, reasoning, and memory. It significantly reduces hallucinations and shows notable advancements in logical reasoning and coding skills.","pricing":{"input":0.0136,"output":0.0544},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"ernie-4.5-turbo-128k-preview","model_name":"ERNIE 4.5 Turbo 128K Preview","developer_id":25,"desc":"Wenxin 4.5 Turbo also shows significant enhancements in reducing hallucinations, logical reasoning, and coding capabilities. Compared to Wenxin 4.5, it is faster and more cost-effective.","pricing":{"input":0.108,"output":0.432},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"ernie-x1-turbo","model_name":"ERNIE X1 Turbo","developer_id":25,"desc":"Wenxin Large Model X1 possesses enhanced abilities in understanding, planning, reflection, and evolution. As a more comprehensive deep-thinking model, Wenxin X1 combines accuracy, creativity, and literary elegance, excelling particularly in Chinese knowledge Q\u0026A, literary creation, document writing, daily conversations, logical reasoning, complex calculations, and tool invocation.","pricing":{"input":0.136,"output":0.544},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":28000,"context_length":50500},{"model_id":"MiniMaxAI/MiniMax-M1-80k","model_name":"MiniMax M1 80K","developer_id":18,"desc":"MiniMax-M1 is an open-source large-scale hybrid attention model with 456B total parameters (45.9B activated per token). It natively supports 1M-token context and reduces FLOPs by 75% versus DeepSeek R1 in 100K-token generation tasks via lightning attention. Built on MoE architecture and optimized by CISPO algorithm, it achieves state-of-the-art performance in long-context reasoning and real-world software engineering scenarios.","pricing":{"input":0.6,"output":2.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2.5-VL-32B-Instruct","model_name":"Qwen2.5 VL 32B Instruct","developer_id":13,"desc":"Qwen2.5-VL-32B-Instruct is an advanced multimodal model from the Tongyi Qianwen team that can recognize objects, analyze text and graphics in images, operate tools, locate objects in images, and generate structured outputs. Through reinforcement learning, it has improved mathematics and problem-solving capabilities, with a more concise and natural response style.","pricing":{"input":0.24,"output":0.24},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"baidu/ERNIE-4.5-300B-A47B","model_name":"ERNIE 4.5 300B A47B","developer_id":25,"desc":"ERNIE-4.5-300B-A47B is a large language model developed by Baidu based on a Mixture of Experts (MoE) architecture. The model has a total of 300 billion parameters, but only activates 47 billion parameters per token during inference, which balances strong performance with computational efficiency. As one of the core models in the ERNIE 4.5 series, it demonstrates outstanding capabilities in tasks such as text understanding, generation, reasoning, and programming. The model employs an innovative multimodal heterogeneous MoE pretraining approach, leveraging joint training of textual and visual modalities to effectively enhance the model’s overall abilities, particularly excelling in instruction following and world knowledge memorization. Baidu has open-sourced this model along with other models in the series, aiming to promote the research and application of AI technology.","pricing":{"cache_read":0,"input":0.32,"output":1.28},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"bge-large-en","model_name":"Bge Large En","developer_id":30,"desc":"bge-large-en, open-sourced by the Beijing Academy of Artificial Intelligence (BAAI), is currently the most powerful vector representation model for Chinese tasks, with its semantic representation capabilities comprehensively surpassing those of similar open-source models.","pricing":{"input":0.068,"output":0.068},"types":"embedding","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"bge-large-zh","model_name":"Bge Large Zh","developer_id":30,"desc":"bge-large-zh, open-sourced by the Beijing Academy of Artificial Intelligence (BAAI), is currently the most powerful vector representation model for Chinese tasks, with its semantic representation capabilities comprehensively surpassing those of similar open-source models.","pricing":{"input":0.068,"output":0.068},"types":"embedding","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"codestral-latest","model_name":"Codestral","developer_id":10,"desc":"Mistral has launched a new code model - Codestral 25.01; https://mistral.ai/news/codestral-2501/","pricing":{"input":0.4,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"unsloth/gemma-3-27b-it","model_name":"Gemma 3 27B It","developer_id":8,"desc":"Google's latest open-source model; provided by chutes.ai","pricing":{"cache_read":0,"input":0.22,"output":0.22},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"kat-dev","model_name":"Kat Dev","developer_id":13,"desc":"KAT-Dev (32B) is an open-source 32B parameter model specifically designed for software engineering tasks. It achieved a 62.4% resolution rate on the SWE-Bench Verified benchmark, ranking fifth among all open-source models of various scales. The model is optimized through multiple stages, including intermediate training, supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT), as well as large-scale agent reinforcement learning (RL). Based on Qwen3-32B, its training process lays the foundation for subsequent fine-tuning and reinforcement learning stages by enhancing fundamental abilities such as tool usage, multi-turn interaction, and instruction following. During the fine-tuning phase, the model not only learns eight carefully curated task types and programming scenarios but also innovatively introduces a reinforcement fine-tuning (RFT) stage guided by human engineer-annotated “teacher trajectories.” The final agent reinforcement learning phase addresses scalability challenges through multi-level prefix caching, entropy-based trajectory pruning, and efficient architecture.","pricing":{"input":0.137,"output":0.548},"types":"llm","features":"tools","input_modalities":"text","endpoints":"","max_output":0,"context_length":128000},{"model_id":"llama-3.3-70b","model_name":"Llama 3.3 70B","developer_id":11,"desc":"The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.","pricing":{"input":0.6,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":8192,"context_length":65536},{"model_id":"moonshotai/Kimi-Dev-72B","model_name":"Kimi Dev 72B","developer_id":15,"desc":"Kimi-Dev-72B is a new generation open-source programming large model that achieved a leading performance of 60.4% on SWE-bench Verified. Through large-scale reinforcement learning optimization, it can automatically fix code in real Docker environments, receiving rewards only when passing the complete test suite, thereby ensuring the correctness and robustness of solutions and aligning more closely with real software development standards.","pricing":{"cache_read":0,"input":0.32,"output":1.28},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshotai/Moonlight-16B-A3B-Instruct","model_name":"Moonlight 16B A3B Instruct","developer_id":15,"desc":"Provided by chutes.ai.","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia-nemotron-3-super-120b-a12b","model_name":"Nvidia Nemotron 3 Super 120B A12B","developer_id":17,"desc":"An open-source, efficient hybrid Mamba-Transformer MoE model that supports a context length of one million tokens and excels at agent reasoning, programming, planning, and tool invocation.","pricing":{"cache_read":0.0275,"input":0.11,"output":0.55},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text","endpoints":"","max_output":0,"context_length":1000000},{"model_id":"o1-global","model_name":"O1 Global","developer_id":1,"desc":"OpenAI new model","pricing":{"cache_read":7.5,"input":15,"output":60},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qianfan-qi-vl","model_name":"Qianfan Qi VL","developer_id":25,"desc":"The Qianfan-QI-VL model is a proprietary image quality inspection and visual understanding large model (Quality Inspection Large Vision Language Model, Qianfan-QI-VL) developed by Baidu Cloud’s Qianfan platform. It is designed for quality inspection of product images uploaded in e-commerce scenarios, with detection capabilities including AIGC human defect detection, mosaic recognition, watermark recognition, and trademark detection.","pricing":{"input":0.2,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-vl-72b-instruct","model_name":"Qwen2.5 VL 72B Instruct","developer_id":13,"desc":"Strong capability in Chinese domain recognition, comparable to ChatGPT-4.0.","pricing":{"input":2.4,"output":7.2},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"tencent/Hunyuan-A13B-Instruct","model_name":"Hunyuan A13B Instruct","developer_id":24,"desc":"Hunyuan-A13B-Instruct has 8 billion parameters and can match larger models by activating only 1.3 billion parameters, supporting \"fast thinking/slow thinking\" hybrid inference. It offers stable long text understanding. Verified by BFCL-v3 and τ-Bench, its Agent capabilities are leading in the field. Combined with GQA and multiple quantization formats, it enables efficient inference.","pricing":{"input":0.14,"output":0.56},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-qwq-32b","model_name":"Qwen Qwq 32B","developer_id":13,"desc":"","pricing":{"input":0.4,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-exp-1206","model_name":"Gemini","developer_id":8,"desc":"Google's latest experimental model, currently Google's most powerful model.","pricing":{"input":1.25,"output":5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-zh","model_name":"GPT 4o Zh","developer_id":12,"desc":"","pricing":{"input":2.5,"output":10},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"unsloth/gemma-3-12b-it","model_name":"Gemma 3 12B It","developer_id":8,"desc":"Provided by chutes.ai.","pricing":{"cache_read":0,"input":0.2,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-max-0125","model_name":"Qwen Max 0125","developer_id":13,"desc":"Qwen 2.5-Max latest model","pricing":{"input":0.38,"output":1.52},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-5-haiku","model_name":"Claude 3.5 Haiku","developer_id":2,"desc":"Claude 3.5 Haiku is the next generation of Claude's fastest model.","pricing":{"input":1.1,"output":5.5},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":8192,"context_length":200000},{"model_id":"BAAI/bge-large-en-v1.5","model_name":"Bge Large En V1.5","developer_id":30,"desc":"BAAI/bge-large-en-v1.5 is a large English text embedding model and part of the BGE (BAAI General Embedding) series. It achieves excellent performance on the MTEB benchmark, with an average score of 64.23 across 56 datasets, excelling in tasks such as retrieval, clustering, and text pair classification. The model supports a maximum input length of 512 tokens and is suitable for various natural language processing tasks, such as text retrieval and semantic similarity computation.","pricing":{"input":0.034,"output":0.034},"types":"embedding","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"BAAI/bge-large-zh-v1.5","model_name":"Bge Large Zh V1.5","developer_id":30,"desc":"BAAI/bge-large-zh-v1.5 is a large Chinese text embedding model and part of the BGE (BAAI General Embedding) series. It performs excellently on the C-MTEB benchmark, achieving an average score of 64.53 across 31 datasets, with outstanding results in tasks such as retrieval, semantic similarity, and text pair classification. The model supports a maximum input length of 512 tokens and is suitable for various Chinese natural language processing tasks, such as text retrieval and semantic similarity computation.","pricing":{"input":0.034,"output":0.034},"types":"embedding","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"BAAI/bge-reranker-v2-m3","model_name":"Bge Reranker V2 M3","developer_id":30,"desc":"BAAI/bge-reranker-v2-m3 is a lightweight multilingual reranking model. It is developed based on the bge-m3 model, offering strong multilingual capabilities, easy deployment, and fast inference. The model takes a query and documents as input and directly outputs similarity scores instead of embedding vectors. It is suitable for multilingual scenarios and performs particularly well in both Chinese and English processing.","pricing":{"input":0.034,"output":0.034},"types":"rerank","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"tencent/Hunyuan-MT-7B","model_name":"Hunyuan Mt 7B","developer_id":24,"desc":"Hunyuan-MT-7B is a lightweight translation model with 7 billion parameters, designed to translate source text into target languages. The model supports translation among 33 languages as well as 5 Chinese minority languages. In the WMT25 International Machine Translation Competition, Hunyuan-MT-7B achieved first place in 30 out of 31 language categories it participated in, demonstrating its exceptional translation capabilities. For translation scenarios, Tencent Hunyuan proposed a complete training paradigm from pre-training to supervised fine-tuning, followed by translation reinforcement and ensemble reinforcement, enabling it to achieve industry-leading performance among models of similar scale. The model is computationally efficient, easy to deploy, and suitable for various application scenarios.","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"V3","model_name":"V3","developer_id":21,"desc":"Fast and high-quality — top image quality in just 11 seconds per piece, with almost no extra time for batch generation.\nFlexible ratios — supports ultra-wide and tall formats like 3:1, 2:1, offering diverse perspectives.\nUnique strengths — outstanding design capabilities in the V3 and V2 series, with powerful text rendering (Chinese support coming soon).\nPrecise local editing — fine-tuned mask control for area redrawing (edit) and easy background replacement (replace-background).","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"V_2","model_name":"V_2","developer_id":21,"desc":"The Ideogram AI drawing interface is now live. This model boasts powerful text-to-image capabilities, supporting endpoints are: /generate, /remix, /edit.\nThis model is the stable V_2 version, highly recommended for editing.\nUS $0.08/ 1 IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"V_2_TURBO","model_name":"V_2_turbo","developer_id":21,"desc":"The Ideogram AI drawing interface is now live. This model boasts powerful text-to-image capabilities, supporting endpoints are: /generate, /remix, /edit.\nThis model is the fast version of V_2, offering increased speed at the slight expense of quality.\nUS $0.05/ IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"V_2A","model_name":"V_2a","developer_id":21,"desc":"The Ideogram AI drawing interface is now live. This model boasts powerful text-to-image capabilities, supporting endpoints are: /generate, /remix.\nThis model is the fast version of V_2, faster and cheaper.\nUS $0.04/ IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"V_2A_TURBO","model_name":"V_2a_turbo","developer_id":21,"desc":"The Ideogram AI drawing interface is now live. This model boasts powerful text-to-image capabilities, supporting endpoints are: /generate, /remix.\nThis model is the ultra-fast version of V_2, delivering the highest speed while slightly reducing quality.\nUS $0.025/ IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"V_1","model_name":"V_1","developer_id":21,"desc":"V_1 is a text-to-image model in the Ideogram series. It delivers strong text rendering capabilities, high photorealistic image quality, and precise prompt adherence. The model also introduces Magic Prompt, a new feature that automatically refines input prompts to generate more detailed and creative visuals.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"V_1_TURBO","model_name":"V_1_turbo","developer_id":21,"desc":"The Ideogram AI drawing interface is now live. This model boasts powerful text-to-image capabilities, supporting endpoints are: /generate, /remix.\nThis model is the ultra-fast version of the original V_1, offering increased speed at the slight expense of quality.\nUS $0.02/ IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"kimi-thinking-preview","model_name":"Kimi Thinking Preview","developer_id":15,"desc":"The latest kimi model.","pricing":{"input":30,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-embedding-large-text-240915","model_name":"Doubao Embedding Large Text 240915","developer_id":4,"desc":"doubao-embedding-large-text-240915\nDoubao Embedding is a semantic vectorization model developed by ByteDance, primarily designed for vector search scenarios. It supports both Chinese and English languages and has a maximum context length of approximately 4K tokens.","pricing":{"input":0.1,"output":0.1},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-2024-08-06","model_name":"GPT 4o 2024 08-06","developer_id":12,"desc":"Supports caching, with automatic halving of charges upon a cache hit.","pricing":{"cache_read":1.25,"input":2.5,"output":10},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-plus-2025-07-28","model_name":"Qwen Plus 2025 07-28","developer_id":13,"desc":"The Tongyi Qianwen series balanced capability model has inference performance and speed between Tongyi Qianwen-Max and Tongyi Qianwen-Turbo, making it suitable for moderately complex tasks. This model adopts tiered pricing.","pricing":{"cache_read":0.02252,"cache_write":0.14075,"input":0.1126,"output":1.126},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-plus-latest","model_name":"Qwen Plus","developer_id":13,"desc":"The Qwen series models with balanced capabilities have inference performance and speed between Qwen-Max and Qwen-Turbo, making them suitable for moderately complex tasks. This model is a dynamically updated version, and updates will not be announced in advance. The current version is qwen-plus-2025-04-28.The model adopts tiered pricing.","pricing":{"cache_read":0.02252,"cache_write":0.14075,"input":0.1126,"output":1.126},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"sonar","model_name":"Sonar","developer_id":19,"desc":"Latest Perplexity Model","pricing":{"input":1.6,"output":1.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"stepfun-ai/step3","model_name":"Step3","developer_id":16,"desc":"Step3 is a multimodal reasoning model released by StepFun. It uses a Mixture‑of‑Experts (MoE) architecture with 321 billion total parameters and 38 billion activation parameters. The model follows an end‑to‑end design that reduces decoding cost while delivering top‑tier performance on vision‑language reasoning tasks. Thanks to the combined use of Multi‑Head Factorized Attention (MFA) and Attention‑FFN Decoupling (AFD), Step3 remains highly efficient on both flagship and low‑end accelerators. During pre‑training, it processed over 20 trillion text tokens and 4 trillion image‑text mixed tokens, covering more than ten languages. On benchmarks for mathematics, code, and multimodal tasks, Step3 consistently outperforms other open‑source models.","pricing":{"input":1.1,"output":2.75},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-embedding-v4","model_name":"Text Embedding V4","developer_id":13,"desc":"This is the Tongyi Laboratory's multilingual unified text vector model trained based on Qwen3, which significantly improves performance in text retrieval, clustering, and classification compared to version V3; it achieves a 15% to 40% improvement on evaluation tasks such as MTEB multilingual, Chinese-English, and code retrieval; supports user-defined vector dimensions ranging from 64 to 2048.","pricing":{"input":0.08,"output":0.08},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-turbo-latest","model_name":"Qwen Turbo","developer_id":13,"desc":"The Qwen series model with the fastest speed and lowest cost, suitable for simple tasks. This model is a dynamically updated version, and updates will not be announced in advance. The model's overall Chinese and English abilities have been significantly improved, human preference alignment has been greatly enhanced, inference capability and complex instruction understanding have been substantially strengthened, performance on difficult tasks is better, and mathematics and coding skills have been significantly improved. The current version is qwen-turbo-2025-04-28.","pricing":{"cache_read":0.0092,"input":0.046,"output":0.092},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"AiHubmix-Phi-4-mini-reasoning","model_name":"Aihubmix Phi 4 Mini (reasoning)","developer_id":3,"desc":"Phi-4-mini-reasoning is a lightweight open model designed for advanced mathematical reasoning and logic-intensive problem-solving. It is particularly well-suited for tasks such as formal proofs, symbolic computation, and solving multi-step word problems. With its efficient architecture, the model balances high-quality reasoning performance with cost-effective deployment, making it ideal for educational applications, embedded tutoring, and lightweight edge or mobile systems.\n\nPhi-4-mini-reasoning supports a 128K token context length, enabling it to process and reason over long mathematical problems and proofs. Built on synthetic and high-quality math datasets, the model leverages advanced fine-tuning techniques such as supervised fine-tuning and preference modeling to enhance reasoning capabilities. Its training incorporates safety and alignment protocols, ensuring robust and reliable performance across supported use cases.","pricing":{"input":0.12,"output":0.12},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":4000,"context_length":128000},{"model_id":"aihub-Phi-4-multimodal-instruct","model_name":"Aihub Phi 4 Multimodal Instruct","developer_id":3,"desc":"Microsoft's latest model","pricing":{"input":0.12,"output":0.48},"types":"llm","features":"","input_modalities":"text,image,audio","endpoints":"","max_output":4000,"context_length":128000},{"model_id":"qwen3-30b-a3b","model_name":"Qwen3 30B A3B","developer_id":13,"desc":"Achieves effective integration of thinking and non-thinking modes, allowing mode switching during conversations. Its reasoning ability matches that of QwQ-32B with a smaller parameter size, and its general capability significantly surpasses Qwen2.5-14B, reaching state-of-the-art (SOTA) levels among industry models of the same scale.","pricing":{"cache_read":0,"input":0.12,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihub-Phi-4-mini-instruct","model_name":"Aihub Phi 4 Mini Instruct","developer_id":3,"desc":"Microsoft's latest model","pricing":{"input":0.12,"output":0.48},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":4000,"context_length":128000},{"model_id":"grok-3","model_name":"Grok 3","developer_id":9,"desc":"Grok's latest model","pricing":{"input":3,"output":15},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-embedding-text-240715","model_name":"Doubao Embedding Text 240715","developer_id":4,"desc":"doubao-embedding-text-240715\nDoubao Embedding is a semantic vectorization model developed by ByteDance, primarily designed for vector search scenarios. It supports both Chinese and English languages and has a maximum context length of approximately 4K tokens.","pricing":{"input":0.7,"output":0.7},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-3-beta","model_name":"Grok 3 Beta","developer_id":9,"desc":"Grok's latest model\nThis model ID with beta has been officially taken offline. Using this model grok-3-beta will automatically point to grok-3.","pricing":{"cache_read":0,"input":3,"output":15},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-14b","model_name":"Qwen3 14B","developer_id":13,"desc":"Achieves effective integration of thinking and non-thinking modes, enabling mode switching during conversations. Its reasoning ability reaches state-of-the-art (SOTA) levels among models of the same scale, and its general capability significantly surpasses Qwen2.5-14B.","pricing":{"cache_read":0,"input":0.16,"output":1.6},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihub-Phi-4","model_name":"Aihub Phi 4","developer_id":3,"desc":"Phi-4 is a state-of-the-art open model based on a combination of synthetic datasets, curated public domain website data, and acquired academic books and QA datasets. The approach aims to ensure that small, efficient models are trained using data focused on high quality and advanced reasoning.","pricing":{"input":0.12,"output":0.48},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":16400,"context_length":16400},{"model_id":"claude-3-opus-20240229","model_name":"Claude 3 Opus 20240229","developer_id":2,"desc":"Claude’s previous generation strongest model","pricing":{"input":16.5,"output":82.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"dall-e-3","model_name":"Dall E 3","developer_id":12,"desc":"dall-e-3 is an AI image generation model that converts natural language prompts into realistic visuals and artistic content. It delivers accurate semantic understanding, supports customizable output resolutions, and produces high-quality images across a wide range of styles, making it well-suited for concept design, creative prototyping, and professional content workflows.","pricing":{"input":40,"output":40},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-3-fast","model_name":"Grok 3 Fast","developer_id":9,"desc":"","pricing":{"cache_read":0,"input":5.5,"output":27.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-8b","model_name":"Qwen3 8B","developer_id":13,"desc":"Achieves effective integration of thinking and non-thinking modes, enabling mode switching during conversations. Its reasoning ability reaches state-of-the-art (SOTA) levels among models of the same scale, and its general capability significantly surpasses Qwen2.5-7B.","pricing":{"cache_read":0,"input":0.08,"output":0.8},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-4b","model_name":"Qwen3 4B","developer_id":13,"desc":"Achieves effective integration of thinking and non-thinking modes, allowing mode switching during conversations. Its reasoning ability reaches state-of-the-art (SOTA) levels among models of the same scale, with significantly enhanced human preference alignment. There are notable improvements in creative writing, role-playing, multi-turn dialogue, and instruction following, resulting in a noticeably better user experience.","pricing":{"cache_read":0,"input":0.046,"output":0.46},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Zero","model_name":"DeepSeek R1 Zero","developer_id":7,"desc":"Openly deployed by chutes.ai; inference with FP8; zero is the initial preliminary version of R1 without optimizations and is not recommended for use unless for research purposes.","pricing":{"input":2.2,"output":2.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-3-fast-beta","model_name":"Grok 3 Fast Beta","developer_id":9,"desc":"","pricing":{"cache_read":0,"input":5.5,"output":27.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-3-mini","model_name":"Grok 3 Mini","developer_id":9,"desc":"","pricing":{"cache_read":0,"input":0.3,"output":0.501},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-3-mini-beta","model_name":"Grok 3 Mini Beta","developer_id":9,"desc":"This model ID with beta has been officially taken offline. Using this model grok-3-mini-beta will automatically point to grok-3-mini.","pricing":{"cache_read":0,"input":0.33,"output":0.5511},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-1.7b","model_name":"Qwen3 1.7b","developer_id":13,"desc":"Effectively integrates thinking and non-thinking modes, allowing mode switching during conversations. Its general capabilities significantly surpass those of the Qwen2.5 small-scale series, with greatly enhanced human preference alignment. There are notable improvements in creative writing, role-playing, multi-turn dialogue, and instruction following, resulting in a significantly better expected user experience.","pricing":{"cache_read":0,"input":0.046,"output":0.46},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen3-0.6b","model_name":"Qwen3 0.6b","developer_id":13,"desc":"Effectively integrates thinking and non-thinking modes, allowing mode switching during conversations. Its general capabilities significantly surpass those of the Qwen2.5 small-scale series.","pricing":{"cache_read":0,"input":0.046,"output":0.46},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-3-32b","model_name":"Qwen 3 32B","developer_id":13,"desc":"cerebras","pricing":{"input":0.4,"output":1.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-turbo-2025-04-28","model_name":"Qwen Turbo 2025 04-28","developer_id":13,"desc":"The Qwen3 series Turbo model effectively integrates thinking and non-thinking modes, allowing seamless switching between modes during conversations. With a smaller parameter size, its reasoning ability rivals that of QwQ-32B, and its general capabilities significantly surpass those of Qwen2.5-Turbo, reaching state-of-the-art (SOTA) levels among models of the same scale. This version is a snapshot model as of April 28, 2025.","pricing":{"cache_read":0,"input":0.046,"output":0.092},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-3-mini-fast-beta","model_name":"Grok 3 Mini Fast Beta","developer_id":9,"desc":"","pricing":{"input":0.33,"output":2.20011},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"alicloud-glm-5","model_name":"Alicloud Glm 5","developer_id":5,"desc":"GLM-5 is an advanced, open-source large language model designed for developers tackling the toughest challenges. It excels at long-context reasoning, multi-step tool orchestration, and complex systems engineering, making it the ideal choice for powering sophisticated agents and applications that require high-level cognitive tasks.","pricing":{"cache_read":0.11268,"input":0.5634,"output":2.5353},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-a-03-2025","model_name":"Command A 03 2025","developer_id":6,"desc":"Command A is Cohere most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.","pricing":{"cache_read":0,"input":2.5,"output":10},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-plus-2025-04-28","model_name":"Qwen Plus 2025 04-28","developer_id":13,"desc":"The Qwen3 series Plus model effectively integrates thinking and non-thinking modes, allowing for mode switching during conversations. Its reasoning abilities significantly surpass those of QwQ, and its general capabilities are markedly superior to Qwen2.5-Plus, reaching state-of-the-art (SOTA) levels among models of the same scale. This version is a snapshot model as of April 28, 2025.","pricing":{"cache_read":0.02252,"cache_write":0.14075,"input":0.1126,"output":1.126},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"THUDM/GLM-Z1-32B-0414","model_name":"GLM Z1 32B 0414","developer_id":5,"desc":"GLM-Z1-32B-0414 is a reasoning-focused AI model built on GLM-4-32B-0414. It has been enhanced through cold-start methods and reinforcement learning, with a strong emphasis on math, coding, and logic tasks. Despite having only 32B parameters, it performs comparably to the 671B DeepSeek-R1 on some benchmarks. It excels in complex reasoning tasks, as shown in evaluations like AIME 24/25, LiveCodeBench, and GPQA.","pricing":{"input":0.08,"output":0.08},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Pro/THUDM/GLM-4.1V-9B-Thinking","model_name":"Thudm/glm 4.1 Vision 9B Thinking","developer_id":5,"desc":"GLM-4.1V-9B-Thinking is an open-source Vision Language Model (VLM) jointly released by Zhipu AI and the KEG Laboratory at Tsinghua University, designed specifically for handling complex multimodal cognitive tasks. Based on the GLM-4-9B-0414 foundation model, it significantly enhances cross-modal reasoning ability and stability by introducing the “Chain-of-Thought” reasoning mechanism and using reinforcement learning strategies. As a lightweight model with 9 billion parameters, it strikes a balance between deployment efficiency and performance. In 28 authoritative benchmark evaluations, it matched or even outperformed the 72-billion-parameter Qwen-2.5-VL-72B model in 18 tasks. The model excels not only in image-text understanding, mathematical and scientific reasoning, and video understanding, but also supports images up to 4K resolution and inputs of arbitrary aspect ratios.","pricing":{"cache_read":0,"input":0.04,"output":0.16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"THUDM/GLM-4.1V-9B-Thinking","model_name":"GLM 4.1 Vision 9B Thinking","developer_id":5,"desc":"GLM-4.1V-9B-Thinking is an open-source Vision Language Model (VLM) jointly released by Zhipu AI and the KEG Laboratory at Tsinghua University, designed specifically for handling complex multimodal cognitive tasks. Based on the GLM-4-9B-0414 foundation model, it significantly enhances cross-modal reasoning ability and stability by introducing the “Chain-of-Thought” reasoning mechanism and using reinforcement learning strategies. As a lightweight model with 9 billion parameters, it strikes a balance between deployment efficiency and performance. In 28 authoritative benchmark evaluations, it matched or even outperformed the 72-billion-parameter Qwen-2.5-VL-72B model in 18 tasks. The model excels not only in image-text understanding, mathematical and scientific reasoning, and video understanding, but also supports images up to 4K resolution and inputs of arbitrary aspect ratios.","pricing":{"cache_read":0,"input":0.1,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-embedding-004","model_name":"Text Embedding 004","developer_id":8,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"THUDM/GLM-4-32B-0414","model_name":"GLM 4 32B 0414","developer_id":5,"desc":"GLM-4-32B-0414 is a next-generation open-source model with 32 billion parameters, delivering performance comparable to OpenAI’s GPT series and DeepSeek V3/R1. It supports smooth local deployment.\n\nThe base model was pre-trained on 15T of high-quality data, including a large amount of reasoning-focused synthetic content, setting the stage for advanced reinforcement learning.\n\nIn the post-training phase, techniques like human preference alignment, rejection sampling, and reinforcement learning were used to improve the model’s ability to follow instructions, generate code, and handle function calls—core skills needed for agent-style tasks.\n\nGLM-4-32B-0414 has shown strong results in engineering code, artifact generation, function calling, search-based QA, and report writing—sometimes matching or even surpassing larger models like GPT-4o and DeepSeek-V3 (671B) on specific benchmarks.","pricing":{"input":0.08,"output":0.08},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"THUDM/GLM-Z1-9B-0414","model_name":"GLM Z1 9B 0414","developer_id":5,"desc":"GLM-Z1-9B-0414 is a small but powerful model in the GLM series, with only 9 billion parameters. Despite its size, it delivers strong performance in math reasoning and general tasks, ranking among the best in its class of open-source models.\n\nTrained with the same techniques as larger models, it strikes an excellent balance between performance and efficiency—making it a great option for low-resource or lightweight deployment scenarios.","pricing":{"input":0.05,"output":0.05},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"THUDM/GLM-4-9B-0414","model_name":"GLM 4 9B 0414","developer_id":5,"desc":"GLM-4-9B-0414 is a lightweight model in the GLM family, with 9 billion parameters. It inherits the core tech from GLM-4-32B and offers an efficient option for deployment on limited resources.\n\nDespite its smaller size, it performs well in tasks like code generation, web design, SVG graphics creation, and search-based writing. It also supports function calling to interact with external tools, enhancing its versatility.\n\nGLM-4-9B-0414 strikes a solid balance between efficiency and performance, making it a strong choice for low-resource environments—while remaining competitive on various benchmarks.","pricing":{"input":0.05,"output":0.05},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-doubao-seed-code-preview-latest","model_name":"CC Doubao Seed Code Preview","developer_id":4,"desc":"claude code ","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-seed-code-preview-latest","model_name":"Doubao Seed Code Preview","developer_id":4,"desc":"chat","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/Janus-Pro-7B","model_name":"Janus Pro 7B","developer_id":7,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-zero-preview","model_name":"GLM Zero Preview","developer_id":5,"desc":"Simply put, it is the intelligent enhanced version of O1.","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-3-235b-a22b-instruct-2507","model_name":"Qwen 3 235B A22B Instruct 2507","developer_id":13,"desc":"cerebras","pricing":{"input":0.28,"output":1.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia-llama-3.1-nemotron-70b-instruct","model_name":"Nvidia Llama 3.1 Nemotron 70B Instruct","developer_id":17,"desc":"","pricing":{"input":1.32,"output":1.32},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia-llama-3.3-nemotron-super-49b-v1.5","model_name":"Nvidia Llama 3.3 Nemotron Super 49B V1.5","developer_id":17,"desc":"","pricing":{"input":0.11,"output":0.44},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia-nemotron-3-nano-30b-a3b","model_name":"Nvidia Nemotron 3 Nano 30B A3B","developer_id":17,"desc":"","pricing":{"input":0.066,"output":0.264},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia-nemotron-nano-12b-v2-vl","model_name":"Nvidia Nemotron Nano 12B V2 VL","developer_id":17,"desc":"","pricing":{"input":0.22,"output":0.66},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia-nemotron-nano-9b-v2","model_name":"Nvidia Nemotron Nano 9B V2","developer_id":17,"desc":"","pricing":{"input":0.044,"output":0.176},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1-preview-2024-09-12","model_name":"O1 Preview 2024 09-12","developer_id":12,"desc":"","pricing":{"cache_read":7.5,"input":15,"output":60},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4.5-air","model_name":"GLM 4.5 Air","developer_id":5,"desc":"","pricing":{"input":0.14,"output":0.84},"types":"","features":"","input_modalities":"text","endpoints":"","max_output":98304,"context_length":131072},{"model_id":"gpt-4-32k","model_name":"GPT 4 32K","developer_id":12,"desc":"The smartest version of GPT-4; OpenAI no longer offers it officially. All the 32k versions on this site are provided by Microsoft, deployed on Azure OpenAI by the official Microsoft service.","pricing":{"input":60,"output":120},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"coding-glm-4.5-air","model_name":"Coding GLM 4.5 Air","developer_id":5,"desc":"","pricing":{"input":0.014,"output":0.084},"types":"","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepinfra-nvidia-nemotron-3-nano-30b-a3b2","model_name":"Deepinfra Nvidia Nemotron 3 Nano 30B A3b2","developer_id":17,"desc":"","pricing":{"input":0.066,"output":0.264},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/QVQ-72B-Preview","model_name":"Qvq 72B Preview","developer_id":13,"desc":"","pricing":{"input":1.2,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/QwQ-32B-Preview","model_name":"QwQ 32B Preview","developer_id":13,"desc":"","pricing":{"input":0.16,"output":0.16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-sonar-huge-128k-online","model_name":"Llama 3.1 Sonar Huge 128K Online","developer_id":19,"desc":"On February 22, 2025, this model will be officially discontinued. The Perplexity AI official fine-tuned LLMA internet-connected interface is currently only supported at the api.aihubmix.com address.","pricing":{"input":5.6,"output":5.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-sonar-large-128k-online","model_name":"Llama 3.1 Sonar Large 128K Online","developer_id":19,"desc":"On February 22, 2025, this model will be officially discontinued; Perplexity AI's official fine-tuned LLMA internet-connected interface is currently only supported at the api.aihubmix.com address.","pricing":{"input":1.2,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Mistral-Large-2411","model_name":"Aihubmix Mistral Large 2411","developer_id":10,"desc":"The latest Mistral Large 2 model is deployed on Azure.","pricing":{"input":2,"output":6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Mistral-large-2407","model_name":"Aihubmix Mistral Large 2407","developer_id":10,"desc":"","pricing":{"input":3,"output":9},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-70b","model_name":"Llama 3.1 70B","developer_id":11,"desc":"","pricing":{"input":0.44,"output":0.44},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"wan2.6-t2i","model_name":"Wan2.6 T2i","developer_id":13,"desc":"","pricing":{"input":2,"output":0},"types":"image_generation","features":"","input_modalities":"image,text","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-2-1212","model_name":"Grok 2 1212","developer_id":9,"desc":"","pricing":{"input":1.8,"output":9},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-image-test","model_name":"GPT Image Test","developer_id":12,"desc":"","pricing":{"cache_read":0,"input":5,"output":40},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-4.20-beta-0309-non-reasoning","model_name":"Grok 4.20 Beta 0309","developer_id":9,"desc":"Grok 4.20 Beta is our latest flagship model, offering industry-leading speed and agent tool-invocation capabilities. It combines the lowest hallucination rate on the market with strict prompt adherence, enabling it to consistently deliver precise and factual responses.","pricing":{"cache_read":0.2,"input":2,"output":6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"grok-4.20-beta-0309-reasoning","model_name":"Grok 4.20 Beta 0309 (reasoning)","developer_id":9,"desc":"Grok 4.20 Beta is our latest flagship model, offering industry-leading speed and agent tool-invocation capabilities. It combines the lowest hallucination rate on the market with strict prompt adherence, enabling it to consistently deliver precise and factual responses.","pricing":{"cache_read":0.2,"input":2,"output":6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"grok-4.20-multi-agent-beta-0309","model_name":"Grok 4.20 Multi Agent Beta 0309","developer_id":9,"desc":"Grok 4.20 Beta is our latest flagship model, offering industry-leading speed and agent tool-invocation capabilities. It combines the lowest hallucination rate on the market with strict prompt adherence, enabling it to consistently deliver precise and factual responses.","pricing":{"cache_read":0.2,"input":2,"output":6},"types":"llm","features":"thinking,tools,function_calling,structured_outputs,long_context","input_modalities":"text,image","endpoints":"","max_output":2000000,"context_length":2000000},{"model_id":"imagen-3.0-generate-002","model_name":"Imagen 3.0 Generate 002","developer_id":8,"desc":"Imagen 3.0 is Google's latest text-to-image generation model, capable of creating high-quality images from natural language prompts. Compared to its predecessors, Imagen 3.0 offers significant improvements in detail, lighting, and reduced visual artifacts. It supports rendering in various artistic styles, from photorealism to impressionism, as well as abstract and anime styles.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3.1-8b","model_name":"Llama3.1 8B","developer_id":11,"desc":"cerebras","pricing":{"input":0.3,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1-2024-12-17","model_name":"O1 2024 12-17","developer_id":12,"desc":"","pricing":{"cache_read":7.5,"input":15,"output":60},"types":"llm","features":"thinking","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"sf-kimi-k2-thinking","model_name":"Sf Kimi K2 Thinking","developer_id":15,"desc":"","pricing":{"input":0.548,"output":2.192},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"DESCRIBE","model_name":"Describe","developer_id":21,"desc":"This endpoint is used to describe an image.\nSupported image formats include JPEG, PNG, and WebP.\nUS $0.01/ IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"UPSCALE","model_name":"Upscale","developer_id":21,"desc":"The super-resolution upscale interface of the Ideogram AI drawing model is designed to enlarge low-resolution images into high-resolution ones, redrawing details (with controllable similarity and detail proportions).\nUS $0.06/ IMG.\nFor usage examples and pricing details, refer to the documentation at https://docs.aihubmix.com/cn/api/IdeogramAI.","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"bai-qwen3-vl-235b-a22b-instruct","model_name":"Bai Qwen3 VL 235B A22B Instruct","developer_id":13,"desc":"The Qwen3 series open-source models include hybrid models, thinking models, and non-thinking models, with both reasoning capabilities and general abilities reaching industry SOTA levels at the same scale.","pricing":{"input":0.274,"output":1.096},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-MiniMax-M2","model_name":"CC MiniMax M2","developer_id":18,"desc":"For Claude Code only","pricing":{"input":0.1,"output":0.1},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-deepseek-v3","model_name":"CC DeepSeek V3","developer_id":7,"desc":"For Claude code only","pricing":{"input":0.3,"output":0.3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-deepseek-v3.1","model_name":"CC DeepSeek V3.1","developer_id":7,"desc":"For Claude code only","pricing":{"input":0.56,"output":1.68},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-ernie-4.5-300b-a47b","model_name":"CC ERNIE 4.5 300B A47B","developer_id":25,"desc":"For Claude code only","pricing":{"cache_read":0,"input":0.32,"output":1.28},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-kimi-dev-72b","model_name":"CC Kimi Dev 72B","developer_id":15,"desc":"For Claude code only","pricing":{"cache_read":0,"input":0.32,"output":1.28},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-kimi-k2-instruct","model_name":"CC Kimi K2 Instruct","developer_id":15,"desc":"For Claude code only","pricing":{"input":1.1,"output":3.3},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-kimi-k2-instruct-0905","model_name":"CC Kimi K2 Instruct 0905","developer_id":15,"desc":"For Claude code only","pricing":{"input":1.1,"output":3.3},"types":"llm","features":"tools,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"cc-kimi-k2-thinking","model_name":"CC Kimi K2 Thinking","developer_id":15,"desc":"Dedicated for Claude Code","pricing":{"input":0.548,"output":2.192},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"computer-use-preview","model_name":"Computer Use Preview","developer_id":12,"desc":"","pricing":{"input":3,"output":12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Baichuan3-Turbo","model_name":"Baichuan3 Turbo","developer_id":20,"desc":"","pricing":{"input":1.9,"output":1.9},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Baichuan3-Turbo-128k","model_name":"Baichuan3 Turbo 128K","developer_id":20,"desc":"","pricing":{"input":3.8,"output":3.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Baichuan4","model_name":"Baichuan4","developer_id":20,"desc":"","pricing":{"input":16,"output":16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Baichuan4-Air","model_name":"Baichuan4 Air","developer_id":20,"desc":"","pricing":{"input":0.16,"output":0.16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Baichuan4-Turbo","model_name":"Baichuan4 Turbo","developer_id":20,"desc":"","pricing":{"input":2.4,"output":2.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"DeepSeek-v3","model_name":"DeepSeek V3","developer_id":7,"desc":"","pricing":{"input":0.272,"output":1.088},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-1.5-lite-32k","model_name":"Doubao 1.5 Lite 32K","developer_id":4,"desc":"Doubao-1.5-lite, a brand-new generation of lightweight model, offers exceptional response speed with both performance and latency reaching world-class levels. It supports a 32k context window and an output length of up to 12k tokens.","pricing":{"cache_read":0.01,"input":0.05,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-1.5-pro-256k","model_name":"Doubao 1.5 Pro 256K","developer_id":4,"desc":"Doubao-1.5-pro-256k, a fully upgraded version based on Doubao-1.5-Pro, delivers an overall performance improvement of 10%. It supports inference with a 256k context window and an output length of up to 12k tokens. With higher performance, larger window size, and exceptional cost-effectiveness, it is suitable for a wider range of application scenarios.","pricing":{"cache_read":0.8,"input":0.8,"output":1.44},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-1.5-pro-32k","model_name":"Doubao 1.5 Pro 32K","developer_id":4,"desc":"Doubao-1.5-pro, a brand-new generation of flagship model, features comprehensive performance upgrades and excels in knowledge, coding, reasoning, and other aspects. It supports a 32k context window and an output length of up to 12k tokens.","pricing":{"cache_read":0.0268,"input":0.134,"output":0.335},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-1.5-vision-pro-32k","model_name":"Doubao 1.5 Vision Pro 32K","developer_id":4,"desc":"Doubao-1.5-vision-pro is a newly upgraded multimodal large model that supports image recognition at any resolution and extreme aspect ratios. It enhances visual reasoning, document recognition, detailed information understanding, and instruction-following capabilities. It supports a 32k context window and an output length of up to 12k tokens.","pricing":{"input":0.46,"output":1.38},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-lite-128k","model_name":"Doubao Lite 128K","developer_id":4,"desc":"","pricing":{"cache_read":0.14,"input":0.14,"output":0.28},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-lite-32k","model_name":"Doubao Lite 32K","developer_id":4,"desc":"","pricing":{"cache_read":0.012,"input":0.06,"output":0.12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-lite-4k","model_name":"Doubao Lite 4K","developer_id":4,"desc":"","pricing":{"cache_read":0.06,"input":0.06,"output":0.12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-pro-128k","model_name":"Doubao Pro 128K","developer_id":4,"desc":"","pricing":{"input":0.8,"output":1.44},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-pro-256k","model_name":"Doubao Pro 256K","developer_id":4,"desc":"","pricing":{"cache_read":0.8,"input":0.8,"output":1.44},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-pro-32k","model_name":"Doubao Pro 32K","developer_id":4,"desc":"","pricing":{"cache_read":0.028,"input":0.14,"output":0.35},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Doubao-pro-4k","model_name":"Doubao Pro 4K","developer_id":4,"desc":"","pricing":{"input":0.14,"output":0.35},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"GPT-OSS-20B","model_name":"gpt-oss-20b","developer_id":12,"desc":"","pricing":{"input":0.11,"output":0.55},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Gryphe/MythoMax-L2-13b","model_name":"Mythomax L2 13B","developer_id":11,"desc":"","pricing":{"input":0.4,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"MiniMax-Text-01","model_name":"MiniMax Text 01","developer_id":18,"desc":"","pricing":{"input":0.14,"output":1.12},"types":"","features":"long_context","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"Mistral-large-2407","model_name":"Mistral Large 2407","developer_id":10,"desc":"","pricing":{"input":3,"output":9},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2-1.5B-Instruct","model_name":"Qwen2 1.5b Instruct","developer_id":13,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2-57B-A14B-Instruct","model_name":"Qwen2 57B A14B Instruct","developer_id":13,"desc":"","pricing":{"input":0.24,"output":0.24},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2-72B-Instruct","model_name":"Qwen2 72B Instruct","developer_id":13,"desc":"","pricing":{"input":0.8,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2-7B-Instruct","model_name":"Qwen2 7B Instruct","developer_id":13,"desc":"","pricing":{"input":0.08,"output":0.08},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2.5-32B-Instruct","model_name":"Qwen2.5 32B Instruct","developer_id":13,"desc":"","pricing":{"input":0.6,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2.5-72B-Instruct","model_name":"Qwen2.5 72B Instruct","developer_id":13,"desc":"","pricing":{"input":0.8,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2.5-72B-Instruct-128K","model_name":"Qwen2.5 72B Instruct 128K","developer_id":13,"desc":"","pricing":{"input":0.8,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2.5-7B-Instruct","model_name":"Qwen2.5 7B Instruct","developer_id":13,"desc":"","pricing":{"input":0.4,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen/Qwen2.5-Coder-32B-Instruct","model_name":"Qwen2.5 Coder 32B Instruct","developer_id":13,"desc":"","pricing":{"input":0.16,"output":0.16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Qwen3-235B-A22B-Thinking-2507","model_name":"Qwen3 235B A22B Thinking 2507","developer_id":13,"desc":"","pricing":{"input":0.28,"output":2.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"Stable-Diffusion-3-5-Large","model_name":"Stable Diffusion 3.5 Large","developer_id":23,"desc":"Stable Diffusion 3.5 Large, developed by Stability AI, is a text-to-image generation model that supports high-quality image creation with excellent prompt responsiveness and customization, suitable for professional applications.","pricing":{"cache_read":0,"input":4,"output":4},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"WizardLM/WizardCoder-Python-34B-V1.0","model_name":"Wizardcoder Python 34B V1.0","developer_id":11,"desc":"","pricing":{"input":0.9,"output":0.9},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ahm-Phi-3-5-MoE-instruct","model_name":"Ahm Phi 3.5 Moe Instruct","developer_id":3,"desc":"","pricing":{"input":0.4,"output":1.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ahm-Phi-3-5-mini-instruct","model_name":"Ahm Phi 3.5 Mini Instruct","developer_id":3,"desc":"Phi-3.5-mini is a lightweight, state-of-the-art open model built upon the dataset used for Phi-3—which includes synthetic data and carefully curated publicly available websites—focusing on very high-quality, reasoning-intensive data. This model is part of the Phi-3 model family and supports a context length of 128K tokens.","pricing":{"input":1,"output":3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ahm-Phi-3-5-vision-instruct","model_name":"Ahm Phi 3.5 Vision Instruct","developer_id":3,"desc":"","pricing":{"input":0.4,"output":1.6},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"ahm-Phi-3-medium-128k","model_name":"Ahm Phi 3 Medium 128K","developer_id":3,"desc":"","pricing":{"input":6,"output":18},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ahm-Phi-3-medium-4k","model_name":"Ahm Phi 3 Medium 4K","developer_id":3,"desc":"","pricing":{"input":1,"output":3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"ahm-Phi-3-small-128k","model_name":"Ahm Phi 3 Small 128K","developer_id":3,"desc":"","pricing":{"input":1,"output":3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Codestral-2501","model_name":"Aihubmix Codestral 2501","developer_id":10,"desc":"","pricing":{"input":0.4,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Cohere-command-r","model_name":"Aihubmix Cohere Command R","developer_id":6,"desc":"","pricing":{"input":0.64,"output":1.92},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Jamba-1-5-Large","model_name":"Aihubmix Jamba 1.5 Large","developer_id":1,"desc":"","pricing":{"input":2.2,"output":8.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Llama-3-1-405B-Instruct","model_name":"Aihubmix Llama 3.1 405B Instruct","developer_id":11,"desc":"","pricing":{"input":5,"output":15},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Llama-3-1-70B-Instruct","model_name":"Aihubmix Llama 3.1 70B Instruct","developer_id":11,"desc":"","pricing":{"input":0.6,"output":0.78},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Llama-3-1-8B-Instruct","model_name":"Aihubmix Llama 3.1 8B Instruct","developer_id":11,"desc":"","pricing":{"input":0.3,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Llama-3-2-11B-Vision","model_name":"Aihubmix Llama 3.2 11B Vision","developer_id":11,"desc":"","pricing":{"input":0.4,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Llama-3-2-90B-Vision","model_name":"Aihubmix Llama 3.2 90B Vision","developer_id":11,"desc":"","pricing":{"input":2.4,"output":2.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Llama-3-70B-Instruct","model_name":"Aihubmix Llama 3 70B Instruct","developer_id":11,"desc":"","pricing":{"input":0.7,"output":0.7},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-Mistral-large","model_name":"Aihubmix Mistral Large","developer_id":10,"desc":"","pricing":{"input":4,"output":12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-command-r-08-2024","model_name":"Aihubmix Command R 08 2024","developer_id":6,"desc":"","pricing":{"input":0.2,"output":0.8},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-command-r-plus","model_name":"Aihubmix Command R Plus","developer_id":6,"desc":"","pricing":{"input":3.84,"output":19.2},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"aihubmix-command-r-plus-08-2024","model_name":"Aihubmix Command R Plus 08 2024","developer_id":6,"desc":"","pricing":{"input":2.8,"output":11.2},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"alicloud-deepseek-v3.2","model_name":"Alicloud Deepseek V3.2","developer_id":7,"desc":"","pricing":{"cache_read":0.0548,"cache_write":0.3425,"input":0.274,"output":0.411},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"alicloud-glm-4.7","model_name":"Alicloud Glm 4.7","developer_id":5,"desc":"","pricing":{"cache_read":0.41096,"input":0.41096,"output":1.917786},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"alicloud-kimi-k2-thinking","model_name":"Alicloud Kimi K2 Thinking","developer_id":15,"desc":"","pricing":{"input":0.548,"output":2.192},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"alicloud-kimi-k2.5","model_name":"Alicloud Kimi K2.5","developer_id":15,"desc":"","pricing":{"cache_read":0.0959,"input":0.548,"output":2.877},"types":"","features":"","input_modalities":"","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"alicloud-minimax-m2.5","model_name":"Alicloud Minimax M2.5","developer_id":18,"desc":"","pricing":{"cache_read":0.05752,"input":0.2876,"output":1.1504},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"anthropic-opus-4-6","model_name":"Anthropic Opus 4.6","developer_id":2,"desc":"Claude Opus 4.6 is Anthropic’s latest state-of-the-art reasoning model. It features an adaptive “thinking” mode that dynamically decides when to think and how much to think. At the default effort level (high), Claude will almost always engage in thinking. At lower effort levels, it may skip thinking for simple problems.\n ⚠️ The minimum cache token for claude-opus-4-6 has been increased from 1,024 to 4,096 tokens.","pricing":{"cache_read":0.5,"cache_write":6.25,"input":5,"output":25},"types":"llm","features":"thinking,tools,function_calling,structured_outputs","input_modalities":"text,image","endpoints":"","max_output":32000,"context_length":200000},{"model_id":"azure-deepseek-v3.2","model_name":"Azure Deepseek V3.2","developer_id":7,"desc":"","pricing":{"input":0.58,"output":1.680028},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"azure-deepseek-v3.2-speciale","model_name":"Azure Deepseek V3.2 Speciale","developer_id":7,"desc":"","pricing":{"input":0.58,"output":1.680028},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"azure-kimi-k2.5","model_name":"Azure Kimi K2.5","developer_id":15,"desc":"","pricing":{"input":0.6,"output":3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":256000,"context_length":256000},{"model_id":"cbs-glm-4.7","model_name":"Cbs Glm 4.7","developer_id":5,"desc":"","pricing":{"input":2.25,"output":2.749995},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cerebras-llama-3.3-70b","model_name":"Cerebras Llama 3.3 70B","developer_id":11,"desc":"","pricing":{"input":0.6,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"chatglm_lite","model_name":"Chatglm_lite","developer_id":5,"desc":"","pricing":{"input":0.2858,"output":0.2858},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"chatglm_pro","model_name":"Chatglm_pro","developer_id":5,"desc":"","pricing":{"input":1.4286,"output":1.4286},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"chatglm_std","model_name":"Chatglm_std","developer_id":5,"desc":"","pricing":{"input":0.7144,"output":0.7144},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"chatglm_turbo","model_name":"Chatglm_turbo","developer_id":5,"desc":"","pricing":{"input":0.7144,"output":0.7144},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-2","model_name":"Claude 2","developer_id":2,"desc":"","pricing":{"input":8.8,"output":8.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-2.0","model_name":"Claude 2.0","developer_id":2,"desc":"","pricing":{"input":8.8,"output":39.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-2.1","model_name":"Claude 2.1","developer_id":2,"desc":"","pricing":{"input":8.8,"output":39.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-5-sonnet-20240620","model_name":"Claude 3.5 Sonnet 20240620","developer_id":2,"desc":"Claude 3.5 Sonnet delivers performance superior to Opus and speeds faster than its predecessor, all at the same price point. Its core strengths include:\n\nCoding: Autonomously writes, edits, and executes code with advanced reasoning and troubleshooting.\nData Science: Augments human expertise by analyzing unstructured data and using multiple tools to generate insights.\nVisual Processing: Excels at interpreting charts, graphs, and images, accurately transcribing text to derive high-level insights.\nAgentic Tasks: Exceptional tool use makes it highly effective for complex, multi-step agentic workflows that interact with other systems.","pricing":{"input":3.3,"output":16.5},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":8192,"context_length":200000},{"model_id":"claude-3-haiku-20240229","model_name":"Claude 3 Haiku 20240229","developer_id":2,"desc":"","pricing":{"input":0.275,"output":0.275},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-haiku-20240307","model_name":"Claude 3 Haiku 20240307","developer_id":2,"desc":"","pricing":{"input":0.275,"output":1.375},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-haiku@20240307","model_name":"Claude 3 Haiku@20240307","developer_id":2,"desc":"","pricing":{"input":0.275,"output":1.375},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-opus@20240229","model_name":"Claude 3 Opus@20240229","developer_id":2,"desc":"","pricing":{"input":16.5,"output":82.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-3-sonnet-20240229","model_name":"Claude 3 Sonnet 20240229","developer_id":2,"desc":"","pricing":{"input":3.3,"output":16.5},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-instant-1","model_name":"Claude Instant 1","developer_id":2,"desc":"","pricing":{"input":1.793,"output":1.793},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"claude-instant-1.2","model_name":"Claude Instant 1.2","developer_id":2,"desc":"","pricing":{"input":0.88,"output":3.96},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"code-davinci-edit-001","model_name":"Code Davinci Edit 001","developer_id":5,"desc":"","pricing":{"input":20,"output":20},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cogview-3","model_name":"Cogview 3","developer_id":5,"desc":"","pricing":{"input":35.5,"output":35.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"cogview-3-plus","model_name":"Cogview 3 Plus","developer_id":5,"desc":"","pricing":{"input":10,"output":10},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"command","model_name":"Command","developer_id":6,"desc":"","pricing":{"input":1,"output":2},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-light","model_name":"Command Light","developer_id":6,"desc":"","pricing":{"input":1,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-light-nightly","model_name":"Command Light Nightly","developer_id":6,"desc":"","pricing":{"input":1,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-nightly","model_name":"Command Nightly","developer_id":6,"desc":"","pricing":{"input":1,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-r","model_name":"Command R","developer_id":6,"desc":"","pricing":{"input":0.64,"output":1.92},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-r-08-2024","model_name":"Command R 08 2024","developer_id":6,"desc":"","pricing":{"input":0.2,"output":0.8},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-r-plus","model_name":"Command R Plus","developer_id":6,"desc":"","pricing":{"input":3.84,"output":19.2},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"command-r-plus-08-2024","model_name":"Command R Plus 08 2024","developer_id":6,"desc":"","pricing":{"input":2.8,"output":11.2},"types":"llm","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"dall-e-2","model_name":"Dall E 2","developer_id":12,"desc":"","pricing":{"input":16,"output":16},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"davinci","model_name":"Davinci","developer_id":12,"desc":"","pricing":{"input":20,"output":20},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"davinci-002","model_name":"Davinci 002","developer_id":12,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepinfra-llama-3.1-8b-instant","model_name":"Deepinfra Llama 3.1 8B Instant","developer_id":11,"desc":"","pricing":{"input":0.033,"output":0.054978},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepinfra-llama-3.3-70b-instant-turbo","model_name":"Deepinfra Llama 3.3 70B Instant Turbo","developer_id":11,"desc":"","pricing":{"input":0.11,"output":0.352},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepinfra-llama-4-maverick-17b-128e-instruct","model_name":"Deepinfra Llama 4 Maverick 17B 128e Instruct","developer_id":11,"desc":"","pricing":{"input":1.65,"output":6.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepinfra-llama-4-scout-17b-16e-instruct","model_name":"Deepinfra Llama 4 Scout 17B 16e Instruct","developer_id":11,"desc":"","pricing":{"cache_read":0,"input":0.088,"output":0.33},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-Coder-V2-Instruct","model_name":"DeepSeek Coder V2 Instruct","developer_id":7,"desc":"","pricing":{"input":0.16,"output":0.32},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Distill-Llama-70B","model_name":"DeepSeek R1 Distill Llama 70B","developer_id":7,"desc":"","pricing":{"input":0.6,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","model_name":"DeepSeek R1 Distill Llama 8B","developer_id":7,"desc":"","pricing":{"input":0.01,"output":0.01},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","model_name":"DeepSeek R1 Distill Qwen 1.5b","developer_id":7,"desc":"","pricing":{"input":0.01,"output":0.01},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B","model_name":"DeepSeek R1 Distill Qwen 14B","developer_id":7,"desc":"Open source deployment from SiliconFlow, the model itself is obtained through knowledge distillation.","pricing":{"input":0.1,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","model_name":"DeepSeek R1 Distill Qwen 32B","developer_id":7,"desc":"Open source deployment from SiliconFlow, the model itself is obtained through knowledge distillation.","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B","model_name":"DeepSeek R1 Distill Qwen 7B","developer_id":7,"desc":"Open source deployment from SiliconFlow, the model itself is obtained through knowledge distillation.","pricing":{"input":0.01,"output":0.01},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-V2-Chat","model_name":"DeepSeek V2 Chat","developer_id":7,"desc":"","pricing":{"input":0.16,"output":0.32},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/DeepSeek-V2.5","model_name":"DeepSeek V2.5","developer_id":7,"desc":"","pricing":{"input":0.16,"output":0.32},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/deepseek-llm-67b-chat","model_name":"DeepSeek Llm 67B Chat","developer_id":7,"desc":"","pricing":{"input":0.16,"output":0.16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-ai/deepseek-vl2","model_name":"DeepSeek Vl2","developer_id":7,"desc":"","pricing":{"input":0.16,"output":0.16},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-v3","model_name":"DeepSeek V3","developer_id":7,"desc":"","pricing":{"cache_read":0,"input":0.272,"output":1.088},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"distil-whisper-large-v3-en","model_name":"Distil Whisper Large V3 En","developer_id":12,"desc":"","pricing":{"input":5.556,"output":5.556},"types":"stt","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-1-5-thinking-vision-pro-250428","model_name":"Doubao 1.5 Thinking Vision Pro 250428","developer_id":4,"desc":"Deep Thinking  \nImage Understanding  \nVisual Localization  \nVideo Understanding  \nTool Invocation  \nStructured Output","pricing":{"cache_read":2,"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"fx-flux-2-pro","model_name":"Fx Flux 2 Pro","developer_id":27,"desc":"","pricing":{"cache_read":0,"input":2,"output":0},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-2.5-pro-exp-03-25","model_name":"Gemini 2.5 Pro","developer_id":8,"desc":"Google’s latest experimental model, highly unstable, for experience only.\nIt boasts strong reasoning and coding capabilities, able to \"think\" before responding, enhancing performance and accuracy in complex tasks. It supports multimodal inputs (text, audio, images, video) and a 1 million token context window, suitable for advanced programming, math, and science tasks.\n\nThis means Gemini 2.5 can handle more complex problems in coding, science and math, and support more context-aware agents.","pricing":{"cache_read":0.125,"input":1.25,"output":5},"types":"llm","features":"structured_outputs,tools,long_context","input_modalities":"text,image,audio,video","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-embedding-exp-03-07","model_name":"Gemini Embedding","developer_id":8,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-exp-1114","model_name":"Gemini","developer_id":8,"desc":"","pricing":{"input":1.25,"output":5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-exp-1121","model_name":"Gemini","developer_id":8,"desc":"","pricing":{"input":1.25,"output":5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-pro","model_name":"Gemini Pro","developer_id":8,"desc":"","pricing":{"input":0.2,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemini-pro-vision","model_name":"Gemini Pro Vision","developer_id":8,"desc":"","pricing":{"input":1,"output":1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gemma-7b-it","model_name":"Gemma 7B It","developer_id":8,"desc":"","pricing":{"input":0.1,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-3-turbo","model_name":"GLM 3 Turbo","developer_id":5,"desc":"","pricing":{"input":0.71,"output":0.71},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4","model_name":"GLM 4","developer_id":5,"desc":"","pricing":{"input":14.2,"output":14.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4-flash","model_name":"GLM 4 Flash","developer_id":5,"desc":"","pricing":{"input":0.1,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4-plus","model_name":"GLM 4 Plus","developer_id":5,"desc":"","pricing":{"input":8,"output":8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4.5-airx","model_name":"GLM 4.5 Airx","developer_id":5,"desc":"GLM-4.5-AirX is the high-speed version of GLM-4.5-Air, with faster response times, specifically designed for large-scale high-speed demands.","pricing":{"cache_read":0.22,"input":1.1,"output":4.51},"types":"","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4v","model_name":"GLM 4 Vision","developer_id":5,"desc":"","pricing":{"input":14.2,"output":14.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"glm-4v-plus","model_name":"GLM 4 Vision Plus","developer_id":5,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"google-gemma-3-12b-it","model_name":"Google Gemma 3 12B It","developer_id":8,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"google-gemma-3-27b-it","model_name":"Google Gemma 3 27B It","developer_id":8,"desc":"","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"google-gemma-3-4b-it","model_name":"Google Gemma 3 4B It","developer_id":8,"desc":"","pricing":{"cache_read":0,"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"google/gemini-exp-1114","model_name":"Gemini","developer_id":8,"desc":"","pricing":{"input":1.25,"output":5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"google/gemma-2-27b-it","model_name":"Gemma 2 27B It","developer_id":8,"desc":"","pricing":{"input":0.8,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"google/gemma-2-9b-it:free","model_name":"Gemma 2 9B It (free)","developer_id":8,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo","model_name":"GPT 3.5 Turbo","developer_id":12,"desc":"Since the GPT-3.5-turbo model has been officially deprecated, all requests targeting this model will be automatically routed to GPT-40-mini. We recommend using GPT-40-mini directly as a replacement.","pricing":{"input":0.5,"output":1.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo-0301","model_name":"GPT 3.5 Turbo 0301","developer_id":12,"desc":"","pricing":{"input":1.5,"output":1.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo-0613","model_name":"GPT 3.5 Turbo 0613","developer_id":12,"desc":"","pricing":{"input":1.5,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo-1106","model_name":"GPT 3.5 Turbo 1106","developer_id":12,"desc":"","pricing":{"input":1,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo-16k","model_name":"GPT 3.5 Turbo 16K","developer_id":12,"desc":"","pricing":{"input":3,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo-16k-0613","model_name":"GPT 3.5 Turbo 16K 0613","developer_id":12,"desc":"","pricing":{"input":3,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-3.5-turbo-instruct","model_name":"GPT 3.5 Turbo Instruct","developer_id":12,"desc":"","pricing":{"input":1.5,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4","model_name":"GPT 4","developer_id":12,"desc":"","pricing":{"input":30,"output":60},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-0125-preview","model_name":"GPT 4 0125 Preview","developer_id":12,"desc":"","pricing":{"input":10,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-0314","model_name":"GPT 4 0314","developer_id":12,"desc":"","pricing":{"input":30,"output":60},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-0613","model_name":"GPT 4 0613","developer_id":12,"desc":"","pricing":{"input":30,"output":60},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-1106-preview","model_name":"GPT 4 1106 Preview","developer_id":12,"desc":"","pricing":{"input":10,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-32k-0314","model_name":"GPT 4 32K 0314","developer_id":12,"desc":"","pricing":{"input":60,"output":120},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-32k-0613","model_name":"GPT 4 32K 0613","developer_id":12,"desc":"","pricing":{"input":60,"output":120},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-turbo","model_name":"GPT 4 Turbo","developer_id":12,"desc":"","pricing":{"input":10,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-turbo-2024-04-09","model_name":"GPT 4 Turbo 2024 04-09","developer_id":12,"desc":"","pricing":{"input":10,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-turbo-preview","model_name":"GPT 4 Turbo Preview","developer_id":12,"desc":"","pricing":{"input":10,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4-vision-preview","model_name":"GPT 4 Vision Preview","developer_id":12,"desc":"","pricing":{"input":10,"output":30},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-2024-05-13","model_name":"GPT 4o 2024 05-13","developer_id":12,"desc":"","pricing":{"cache_read":5,"input":5,"output":15},"types":"","features":"","input_modalities":"","endpoints":"","max_output":4096,"context_length":128000},{"model_id":"gpt-4o-mini-2024-07-18","model_name":"GPT 4o Mini 2024 07-18","developer_id":12,"desc":"","pricing":{"cache_read":0.075,"input":0.15,"output":0.6},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-oss-20b","model_name":"gpt-oss-20b","developer_id":12,"desc":"gpt-oss-20b is a 21-billion parameter open-weight model released by OpenAI under the Apache 2.0 license. Its core feature is a Mixture-of-Experts (MoE) architecture that uses only 3.6B active parameters, enabling low-latency inference and deployment on consumer GPUs. The model also supports fine-tuning, function calling, tool use, and structured outputs.","pricing":{"input":0.11,"output":0.55},"types":"llm","features":"thinking,function_calling,structured_outputs","input_modalities":"text","endpoints":"","max_output":128000,"context_length":128000},{"model_id":"grok-2-vision-1212","model_name":"Grok 2 Vision 1212","developer_id":9,"desc":"grok-2-vision-1212 is the latest vision model in the Grok family, delivering outstanding performance on vision-based tasks and achieving state-of-the-art results in visual mathematical reasoning and document-based question answering. It supports a wide range of visual inputs, including documents, charts, screenshots, and real-world images, making it well-suited for advanced visual understanding and reasoning use cases.\n\nThe price of calling this model in AIhubMix is ​​10% lower than on the official website.","pricing":{"input":1.8,"output":9},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"grok-vision-beta","model_name":"Grok Vision Beta","developer_id":9,"desc":"","pricing":{"input":5.6,"output":16.8},"types":"llm","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"groq-llama-3.1-8b-instant","model_name":"Groq Llama 3.1 8B Instant","developer_id":11,"desc":"","pricing":{"input":0.055,"output":0.088},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"groq-llama-3.3-70b-versatile","model_name":"Groq Llama 3.3 70B Versatile","developer_id":11,"desc":"","pricing":{"input":0.649,"output":0.869011},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"groq-llama-4-maverick-17b-128e-instruct","model_name":"Groq Llama 4 Maverick 17B 128e Instruct","developer_id":11,"desc":"","pricing":{"input":0.22,"output":0.66},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"groq-llama-4-scout-17b-16e-instruct","model_name":"Groq Llama 4 Scout 17B 16e Instruct","developer_id":11,"desc":"","pricing":{"input":0.122,"output":0.366},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"imagen-4.0-generate-preview-05-20","model_name":"Imagen 4.0 Generate Preview 05-20","developer_id":8,"desc":"Google's latest raw image model","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"text,image","endpoints":"","max_output":0,"context_length":0},{"model_id":"jina-embeddings-v2-base-code","model_name":"Jina Embeddings V2 Base Code","developer_id":22,"desc":"Model optimized for code and document search, 768-dimensional, 137M parameters.","pricing":{"input":0.05,"output":0.05},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"learnlm-1.5-pro-experimental","model_name":"Learnlm 1.5 Pro Experimental","developer_id":8,"desc":"","pricing":{"input":1.25,"output":5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-405b-instruct","model_name":"Llama 3.1 405B Instruct","developer_id":11,"desc":"","pricing":{"input":4,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-405b-reasoning","model_name":"Llama 3.1 405B (reasoning)","developer_id":11,"desc":"","pricing":{"input":4,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-70b-versatile","model_name":"Llama 3.1 70B Versatile","developer_id":11,"desc":"","pricing":{"input":0.6,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-8b-instant","model_name":"Llama 3.1 8B Instant","developer_id":11,"desc":"","pricing":{"input":0.3,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.1-sonar-small-128k-online","model_name":"Llama 3.1 Sonar Small 128K Online","developer_id":19,"desc":"On February 22, 2025, this model will be officially discontinued. The Perplexity AI official fine-tuned LLMA online interface is currently supported only at the api.aihubmix.com address.","pricing":{"input":0.3,"output":0.3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.2-11b-vision-preview","model_name":"Llama 3.2 11B Vision Preview","developer_id":11,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.2-1b-preview","model_name":"Llama 3.2 1B Preview","developer_id":11,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.2-3b-preview","model_name":"Llama 3.2 3B Preview","developer_id":11,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama-3.2-90b-vision-preview","model_name":"Llama 3.2 90B Vision Preview","developer_id":11,"desc":"","pricing":{"input":2.4,"output":2.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama2-70b-4096","model_name":"Llama2 70B 4096","developer_id":11,"desc":"","pricing":{"input":0.5,"output":0.5},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama2-70b-40960","model_name":"Llama2 70B 40960","developer_id":11,"desc":"","pricing":{"input":0.5,"output":0.5},"types":"llm","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama2-7b-2048","model_name":"Llama2 7B 2048","developer_id":11,"desc":"","pricing":{"input":0.1,"output":0.1},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3-70b-8192","model_name":"Llama3 70B 8192","developer_id":11,"desc":"","pricing":{"input":0.7,"output":0.937288},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3-70b-8192(33)","model_name":"Llama3 70B 8192(33)","developer_id":11,"desc":"","pricing":{"input":2.65,"output":2.65},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3-8b-8192","model_name":"Llama3 8B 8192","developer_id":11,"desc":"","pricing":{"input":0.06,"output":0.12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3-8b-8192(33)","model_name":"Llama3 8B 8192(33)","developer_id":11,"desc":"","pricing":{"input":0.3,"output":0.3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3-groq-70b-8192-tool-use-preview","model_name":"Llama3 Groq 70B 8192 Tool Use Preview","developer_id":11,"desc":"","pricing":{"input":0.00089,"output":0.00089},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"llama3-groq-8b-8192-tool-use-preview","model_name":"Llama3 Groq 8B 8192 Tool Use Preview","developer_id":11,"desc":"","pricing":{"input":0.00019,"output":0.00019},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"mai-image-2","model_name":"Mai Image 2","developer_id":3,"desc":"","pricing":{"cache_read":0,"input":2,"output":2},"types":"image_generation","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama/Llama-3.2-90B-Vision-Instruct","model_name":"Llama 3.2 90B Vision Instruct","developer_id":11,"desc":"","pricing":{"input":0.5,"output":0.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama/llama-3.1-405b-instruct:free","model_name":"Llama 3.1 405B Instruct (free)","developer_id":11,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama/llama-3.1-70b-instruct:free","model_name":"Llama 3.1 70B Instruct (free)","developer_id":11,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama/llama-3.1-8b-instruct:free","model_name":"Llama 3.1 8B Instruct (free)","developer_id":11,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama/llama-3.2-11b-vision-instruct:free","model_name":"Llama 3.2 11B Vision Instruct (free)","developer_id":11,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama/llama-3.2-3b-instruct:free","model_name":"Llama 3.2 3B Instruct (free)","developer_id":11,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta/llama-3.1-405b-instruct","model_name":"Llama 3.1 405B Instruct","developer_id":11,"desc":"","pricing":{"input":5,"output":5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta/llama3-8B-chat","model_name":"Llama3 8B Chat","developer_id":11,"desc":"","pricing":{"input":0.3,"output":0.3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"mistralai/mistral-7b-instruct:free","model_name":"Mistral 7B Instruct (free)","developer_id":10,"desc":"","pricing":{"input":0.002,"output":0.002},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"mm-minimax-m3","model_name":"Mm Minimax M3","developer_id":18,"desc":"","pricing":{"input":0.288,"output":1.152},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-kimi-k2.5","model_name":"Moonshot Kimi K2.5","developer_id":15,"desc":"","pricing":{"cache_read":0.105,"input":0.6,"output":3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-v1-128k","model_name":"Moonshot V1 128K","developer_id":15,"desc":"","pricing":{"input":10,"output":10},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-v1-128k-vision-preview","model_name":"Moonshot V1 128K Vision Preview","developer_id":15,"desc":"","pricing":{"input":10,"output":10},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-v1-32k","model_name":"Moonshot V1 32K","developer_id":15,"desc":"","pricing":{"input":4,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-v1-32k-vision-preview","model_name":"Moonshot V1 32K Vision Preview","developer_id":15,"desc":"","pricing":{"input":4,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-v1-8k","model_name":"Moonshot V1 8K","developer_id":15,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"moonshot-v1-8k-vision-preview","model_name":"Moonshot V1 8K Vision Preview","developer_id":15,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"nvidia/Llama-3_1-Nemotron-Ultra-253B-v1","model_name":"Llama 3_1 Nemotron Ultra 253B V1","developer_id":17,"desc":"Llama-3.1-Nemotron-Ultra-253B is a 253 billion parameter reasoning-focused language model optimized for efficiency that excels at math, coding, and general instruction-following tasks while running on a single 8xH100 node.","pricing":{"cache_read":0,"input":0.5,"output":0.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"o1-mini-2024-09-12","model_name":"O1 Mini 2024 09-12","developer_id":12,"desc":"","pricing":{"cache_read":1.5,"input":3,"output":12},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"omni-moderation-latest","model_name":"Omni Moderation","developer_id":12,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-flash","model_name":"Qwen Flash","developer_id":13,"desc":"The model adopts tiered pricing.","pricing":{"cache_read":0.02,"input":0.02,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-flash-2025-07-28","model_name":"Qwen Flash 2025 07-28","developer_id":13,"desc":"The model adopts tiered pricing.","pricing":{"cache_read":0.02,"input":0.02,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-long","model_name":"Qwen Long","developer_id":13,"desc":"","pricing":{"input":0.1,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-max","model_name":"Qwen Max","developer_id":13,"desc":"","pricing":{"input":0.38,"output":1.52},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-max-longcontext","model_name":"Qwen Max Longcontext","developer_id":13,"desc":"","pricing":{"input":7,"output":21},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-plus","model_name":"Qwen Plus","developer_id":13,"desc":"","pricing":{"cache_read":0.02252,"cache_write":0.14075,"input":0.1126,"output":1.126},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-turbo","model_name":"Qwen Turbo","developer_id":13,"desc":"","pricing":{"cache_read":0.0092,"input":0.046,"output":0.092},"types":"llm","features":"long_context","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen-turbo-2024-11-01","model_name":"Qwen Turbo 2024 11-01","developer_id":13,"desc":"","pricing":{"input":0.046,"output":0.092},"types":"llm","features":"long_context","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-14b-instruct","model_name":"Qwen2.5 14B Instruct","developer_id":13,"desc":"","pricing":{"input":0.4,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-32b-instruct","model_name":"Qwen2.5 32B Instruct","developer_id":13,"desc":"","pricing":{"input":0.6,"output":1.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-3b-instruct","model_name":"Qwen2.5 3B Instruct","developer_id":13,"desc":"","pricing":{"input":0.4,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-72b-instruct","model_name":"Qwen2.5 72B Instruct","developer_id":13,"desc":"","pricing":{"input":0.8,"output":2.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-7b-instruct","model_name":"Qwen2.5 7B Instruct","developer_id":13,"desc":"","pricing":{"input":0.4,"output":0.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-coder-1.5b-instruct","model_name":"Qwen2.5 Coder 1.5b Instruct","developer_id":13,"desc":"","pricing":{"input":0.2,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-coder-7b-instruct","model_name":"Qwen2.5 Coder 7B Instruct","developer_id":13,"desc":"","pricing":{"input":0.2,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-math-1.5b-instruct","model_name":"Qwen2.5 Math 1.5b Instruct","developer_id":13,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-math-72b-instruct","model_name":"Qwen2.5 Math 72B Instruct","developer_id":13,"desc":"","pricing":{"input":0.8,"output":2.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qwen2.5-math-7b-instruct","model_name":"Qwen2.5 Math 7B Instruct","developer_id":13,"desc":"","pricing":{"input":0.2,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"step-2-16k","model_name":"Step 2 16K","developer_id":16,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-ada-001","model_name":"Text Ada 001","developer_id":12,"desc":"","pricing":{"input":0.4,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-babbage-001","model_name":"Text Babbage 001","developer_id":12,"desc":"","pricing":{"input":0.5,"output":0.5},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-curie-001","model_name":"Text Curie 001","developer_id":12,"desc":"","pricing":{"input":2,"output":2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-davinci-002","model_name":"Text Davinci 002","developer_id":12,"desc":"","pricing":{"input":20,"output":20},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-davinci-003","model_name":"Text Davinci 003","developer_id":12,"desc":"","pricing":{"input":20,"output":20},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-davinci-edit-001","model_name":"Text Davinci Edit 001","developer_id":12,"desc":"","pricing":{"input":20,"output":20},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-embedding-3-large","model_name":"Text Embedding 3 Large","developer_id":12,"desc":"","pricing":{"input":0.13,"output":0.13},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-embedding-3-small","model_name":"Text Embedding 3 Small","developer_id":12,"desc":"","pricing":{"input":0.02,"output":0.02},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-embedding-ada-002","model_name":"Text Embedding Ada 002","developer_id":12,"desc":"","pricing":{"input":0.1,"output":0.1},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-embedding-v1","model_name":"Text Embedding V1","developer_id":12,"desc":"","pricing":{"input":0.1,"output":0.1},"types":"embedding","features":"","input_modalities":"text","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-moderation-007","model_name":"Text Moderation 007","developer_id":12,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-moderation-latest","model_name":"Text Moderation","developer_id":12,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-moderation-stable","model_name":"Text Moderation Stable","developer_id":12,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"text-search-ada-doc-001","model_name":"Text Search Ada Doc 001","developer_id":12,"desc":"","pricing":{"input":20,"output":20},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"tts-1","model_name":"Tts 1","developer_id":12,"desc":"","pricing":{"input":15,"output":15},"types":"tts","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"tts-1-1106","model_name":"Tts 1 1106","developer_id":12,"desc":"","pricing":{"input":15,"output":15},"types":"tts","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"whisper-1","model_name":"Whisper 1","developer_id":12,"desc":"Ignore the displayed price on the page; the actual charge for this model request is consistent with the official, so you can use it with confidence.","pricing":{"input":100,"output":100},"types":"stt","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"whisper-large-v3","model_name":"Whisper Large V3","developer_id":12,"desc":"","pricing":{"input":30.834,"output":30.834},"types":"stt","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"whisper-large-v3-turbo","model_name":"Whisper Large V3 Turbo","developer_id":12,"desc":"","pricing":{"input":5.556,"output":5.556},"types":"stt","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"tts-1-hd-1106","model_name":"Tts 1 Hd 1106","developer_id":12,"desc":"","pricing":{"input":30,"output":30},"types":"tts","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"yi-large","model_name":"Yi Large","developer_id":14,"desc":"","pricing":{"input":3,"output":3},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"yi-large-rag","model_name":"Yi Large Rag","developer_id":14,"desc":"","pricing":{"input":4,"output":4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"yi-large-turbo","model_name":"Yi Large Turbo","developer_id":14,"desc":"","pricing":{"input":1.8,"output":1.8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"yi-lightning","model_name":"Yi Lightning","developer_id":14,"desc":"","pricing":{"input":0.2,"output":0.2},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"yi-medium","model_name":"Yi Medium","developer_id":14,"desc":"","pricing":{"input":0.4,"output":0.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"yi-vl-plus","model_name":"Yi VL Plus","developer_id":14,"desc":"","pricing":{"input":0.000852,"output":0.000852},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"tts-1-hd","model_name":"Tts 1 Hd","developer_id":12,"desc":"","pricing":{"input":30,"output":30},"types":"tts","features":"","input_modalities":"audio","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama-3-70b","model_name":"Meta Llama 3 70B","developer_id":11,"desc":"","pricing":{"input":4.795,"output":4.795},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"meta-llama-3-8b","model_name":"Meta Llama 3 8B","developer_id":11,"desc":"","pricing":{"input":0.548,"output":0.548},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"o3-global","model_name":"O3 Global","developer_id":12,"desc":"","pricing":{"cache_read":0.5,"input":2,"output":8},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"o3-mini-global","model_name":"O3 Mini Global","developer_id":12,"desc":"","pricing":{"cache_read":0.55,"input":1.1,"output":4.4},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"o3-pro-global","model_name":"O3 Pro Global","developer_id":12,"desc":"","pricing":{"input":20,"output":80},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qianfan-chinese-llama-2-13b","model_name":"Qianfan Chinese Llama 2 13B","developer_id":11,"desc":"","pricing":{"input":0.822,"output":0.822},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"qianfan-llama-vl-8b","model_name":"Qianfan Llama VL 8B","developer_id":11,"desc":"","pricing":{"input":0.274,"output":0.685},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"deepseek-r1-distill-qianfan-llama-8b","model_name":"DeepSeek R1 Distill Qianfan Llama 8B","developer_id":11,"desc":"","pricing":{"input":0.137,"output":0.548},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-1-5-pro-256k-250115","model_name":"Doubao 1.5 Pro 256K 250115","developer_id":5,"desc":"","pricing":{"input":0.684,"output":1.2312},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"doubao-1-5-pro-32k-250115","model_name":"Doubao 1.5 Pro 32K 250115","developer_id":5,"desc":"","pricing":{"input":0.108,"output":0.27},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-2024-08-06-global","model_name":"GPT 4o 2024 08-06 Global","developer_id":12,"desc":"","pricing":{"cache_read":1.25,"input":2.5,"output":10},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0},{"model_id":"gpt-4o-mini-global","model_name":"GPT 4o Mini Global","developer_id":12,"desc":"","pricing":{"cache_read":0.075,"input":0.15,"output":0.6},"types":"","features":"","input_modalities":"","endpoints":"","max_output":0,"context_length":0}],"message":"","success":true}