RAG Query Routing in Practice: Multi-Vector Store Coordination and Intelligent Retrieval Distribution
At 2 AM, production alerts started blaring again. I opened the logs and saw a user asking, “What’s the impact of the supplier strike on stock prices?” The system returned fragmented news snippets—even including two pieces about a competitor company. The client fired back in the group chat: “Why is your AI so stupid?”
The same RAG system that instantly delivered accurate answers for “What were Q3 2023 sales for the East China region?”—earning praise from the boss as “the most reliable team”—completely face-planted on questions like “How does the supplier strike affect stock prices?”
The root cause was simple: the first query was a straightforward fact lookup that vector retrieval could handle; the second required multi-hop reasoning—supplier, strike event, stock price fluctuations—where the relationships between these three were buried in a knowledge graph. Using one retrieval strategy for all queries is like trying to open every door with the same key—either it won’t open, or you’ll break something.
We needed an “intelligent router” that could automatically choose the most appropriate retrieval path based on query characteristics.
This article covers three mainstream approaches: logical routing (LLM intent analysis), semantic routing (fuzzy matching in embedding space), and EnsembleRetriever (RRF algorithm fusion). I’ve made mistakes with all of them and validated their effectiveness in production. Let’s be clear upfront: there’s no “best” solution, only the “most suitable” for your scenario.
Chapter 1: Why Query Routing? — From “Single Vector Store” to “Multi-Source Coordination”
I once helped an enterprise build a knowledge base system. They had three data sources: a financial database (MySQL), technical documentation (vector store), and a personnel relationship graph (Neo4j). My initial approach was simple—stuff everything into a single vector store.
The result? For simple questions like “East China region sales,” the system could accurately pull answers from financial reports. But ask “Which product lines are affected by the supplier strike?” and it returned a mess of random news articles, leaving users shaking their heads.
Later I realized: not all queries are suited for vector retrieval. Some questions are faster and more accurate with SQL; others need knowledge graphs to connect relationships; some require web search for the latest information. Using one retrieval strategy inevitably leads to “insufficient capability” or “over-engineering.”
1.1 Bottlenecks of Single Vector Store Retrieval: Two Real-World Scenarios Compared
Scenario A: Simple Fact Query (vector retrieval is enough)
User asks: “What were Q3 2023 sales for the East China region?”
System behavior: Vector retrieval finds the financial report table, directly answers “East China region Q3 sales: 120 million yuan.” The whole process takes about 300ms. Users are satisfied.
If we forced the knowledge graph reasoning module? Not only would it waste GPU compute, it would add 500ms latency. Like using a rocket to deliver a package—it works, but it’s unnecessary.
Scenario B: Complex Reasoning Query (requires multi-hop retrieval)
User asks: “What’s the impact of the supplier strike on stock prices?”
System behavior: Vector retrieval recalls fragmented news—“Company X stock dropped 5%,” “Supplier strike event report.” But the LLM lacks the intermediate logic chain: Which supplier? Who do they supply? How long was the strike? How much did the stock drop? This information is scattered across different documents, making it easy for the LLM to hallucinate answers.
The correct approach: knowledge graph connects “supplier → strike event → contract relationship → stock fluctuation,” making the logic chain clearly visible. But here’s the problem: how do we make the system automatically determine “this question needs the knowledge graph”?
That’s the core problem query routing solves.
1.2 Four-Dimensional Analysis of Query Characteristics
I developed a simple judgment framework in my projects to select retrieval strategies based on four dimensions of the query:
| Dimension | Characteristics | Suitable Retrieval Strategy |
|---|---|---|
| Context Dependency | Low (fact queries) vs High (multi-hop reasoning) | Vector retrieval vs Knowledge graph |
| Reasoning Hops | Single-hop vs Multi-hop | Direct retrieval vs Agent coordination |
| Data Type | Structured (tables) vs Unstructured (documents) | SQL query vs Vector retrieval |
| Timeliness | Real-time information vs Static knowledge | Web search vs Local knowledge base |
For example, “East China region sales” is single-hop, structured, static data—SQL is fastest. While “supplier strike affects stock prices” is high context dependency, multi-hop, unstructured data—knowledge graph is more appropriate.
Now you might be thinking: “Can I just query all three databases every time and merge the results?” You could, but costs would explode. Each query calling three retrievers adds 200-500ms latency and doubles LLM call costs. Unless your boss doesn’t care about money.
The smarter approach: let the system “read the room” and dynamically choose retrieval paths based on query characteristics. That’s the value of query routing—finding the balance between accuracy, efficiency, and cost.
Chapter 2: Logical Routing — LLM Analyzes Intent, Selects Data Source
Logical routing is the most intuitive approach: give the LLM a “menu of options,” let it analyze your question, then pick the most matching data source from the menu.
Like going to a hospital: the nurse asks “what hurts?” You say “stomach,” she sends you to gastroenterology; you say “head,” she sends you to neurology. The LLM in logical routing is that nurse—based on your symptoms (query), selecting the most appropriate department (data source).
Implementation: LangChain + Structured Output
Let me show you complete code first, then discuss the pitfalls I encountered:
from langchain_core.prompts import ChatPromptTemplate
from langchain_deepseek import ChatDeepSeek
from pydantic import BaseModel, Field
from typing import Literal
# Define data source enum (avoid LLM ambiguity)
class DataSource(BaseModel):
"""Data source selection result"""
source: Literal["finance_db", "tech_docs", "knowledge_graph", "web_search", "general_search"] = Field(
description="Selected data source"
)
# Set up routing prompt template
system_prompt = """
You are a professional query routing expert. Based on user question content, route it to the appropriate data source:
- If the question involves financial data or sales data, return "finance_db" (relational database)
- If the question involves technical documentation or product manuals, return "tech_docs" (vector database)
- If the question involves personnel relationships or organizational structure, return "knowledge_graph" (graph database)
- If latest real-time information is needed, return "web_search"
- If unable to determine clearly, return "general_search"
Please return only the data source name, no other content.
"""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{question}"),
])
# Use DeepSeek model (cheap and good)
llm = ChatDeepSeek(model="deepseek-chat", temperature=0.1)
structured_llm = llm.with_structured_output(DataSource)
# Build routing chain
route_chain = prompt | structured_llm
# Test routing
query1 = "What is the total sales for East China region in Q3 2023?"
result1 = route_chain.invoke({"question": query1})
print(result1.source) # Output: finance_db
query2 = "What is the impact of the supplier strike on stock prices?"
result2 = route_chain.invoke({"question": query2})
print(result2.source) # Output: knowledge_graph
There’s a critical detail in this code: temperature=0.1. I learned this the hard way—initially set it to 0.7, and the same query would sometimes route to knowledge graph, sometimes to web search. Later I realized: routers need stability, not randomness.
Another detail is Pydantic’s DataSource enum. At first I let the LLM return strings directly, but it would return “should query finance_db” or even “I think we could query finance_db or general_search.” These ambiguities make downstream processing complicated. Using Pydantic to enforce enum values keeps things clean.
Pros and Cons Comparison
| Dimension | Advantages | Disadvantages |
|---|---|---|
| Accuracy | LLM deeply understands intent, handles complex queries | Depends on prompt quality; unclear data source descriptions lead to misjudgment |
| Response Speed | Requires LLM call, ~500-800ms | 10x slower than semantic routing |
| Cost | Each route needs LLM call, ~$0.0001/time | Costs accumulate with many data sources |
| Use Cases | Clear data source types, count 5 or fewer | Too many data sources makes prompt verbose |
I tested in production: logical routing works best with 5 or fewer data sources. Beyond 5, the prompt becomes long and the LLM gets confused. For example, if you have 10 data sources, consider semantic routing or hierarchical logical routing (broad categories first, then subdivisions).
Chapter 3: Semantic Routing — “Fuzzy If/Else” Based on Embedding Space
Semantic routing is faster. Logical routing needs the LLM to “think” for a moment (500-800ms); semantic routing directly computes embedding similarity, done in 50ms. 10x faster.
The principle is like “fuzzy matching.” You predefine some example queries (utterances), like “query sales,” “financial report data,” “how’s revenue”—these queries all point to the “financial query” intent. When a user asks a question, the system computes semantic similarity between their question and these examples, triggering the corresponding route when exceeding a threshold.
Like when your mom asks “what do you want for dinner?” and you say “whatever, just not too spicy.” Your mom has a “fuzzy matching table” in her head: “not too spicy” ≈ “tomato scrambled eggs,” “steamed fish,” “winter melon soup.” Semantic routing is this fuzzy matching process.
Implementation: semantic-router Library + Predefined Utterances
The code is even simpler than logical routing:
from semantic_router import RouteLayer, Route
from semantic_router.encoders import HuggingFaceEncoder
# Define routing rules (semantic similarity thresholds)
routes = [
Route(
name="finance_query",
utterances=[
"query sales",
"financial report data",
"how is revenue",
"profit analysis",
],
),
Route(
name="tech_support",
utterances=[
"how to use product",
"where are technical docs",
"troubleshooting methods",
"feature explanation",
],
),
Route(
name="graph_query",
utterances=[
"who has partnership with whom",
"organizational structure relationships",
"upstream downstream supply chain",
"personnel relationship graph",
],
),
]
# Create RouteLayer (using free HuggingFace embedding model)
encoder = HuggingFaceEncoder()
route_layer = RouteLayer(encoder=encoder, routes=routes)
# Test routing (no LLM call, response ~50ms)
query1 = "What is the impact of the supplier strike on stock prices?"
route1 = route_layer(query1)
print(route1.name) # Output: graph_query
query2 = "What are the Q3 2023 sales for East China region?"
route2 = route_layer(query2)
print(route2.name) # Output: finance_query
The key in this code is utterances. You need to define 4-10 example queries for each intent, and the system computes semantic similarity between user questions and these examples. The default threshold is 0.85, meaning similarity must exceed 85% to trigger the route.
I tested: if utterances are too few (only 2), recall is low; if too many (over 20), computational overhead increases. Recommend 4-10 examples per intent, covering common expressions.
Another advantage is HuggingFaceEncoder. This is a free local embedding model that doesn’t require OpenAI API calls—zero cost. If your query volume is high, logical routing’s $0.0001 per call seems small, but 100k daily queries becomes $10/day, $300/month. Semantic routing is completely free.
Pros and Cons Comparison
| Dimension | Advantages | Disadvantages |
|---|---|---|
| Response Speed | ~50ms (no LLM call) | Requires predefined utterances |
| Cost | Free (local embedding model) | Need to update utterances for new intents |
| Accuracy | Semantic similarity accurate for common intents | Complex intents may be misjudged |
| Use Cases | Intent classification, multi-skill agents, intent count 20 or fewer | Too many intents increases utterance maintenance cost |
Semantic routing has a limitation: it can’t handle “logical reasoning” intent judgments. Like “if the query involves financial data AND timeliness is critical, prioritize real-time database”—this kind of logical judgment still requires LLM. So in real projects, I use semantic routing for intent classification (finance/tech/relationship queries), then logical routing for complex conditional judgments.
Chapter 4: EnsembleRetriever — RRF Algorithm Merges Multiple Retrievers
The first two approaches are “pick one retriever”; EnsembleRetriever is “merge results from multiple retrievers.”
The classic scenario: BM25 (keyword matching) + vector retrieval (semantic matching). User asks “Q3 2023 sales,” BM25 can precisely match the “sales” keyword but might miss “revenue” as a synonym; vector retrieval understands “revenue” and “sales” mean the same thing but might recall a bunch of irrelevant financial documents.
Combine both, and both recall and precision improve. That’s EnsembleRetriever’s value.
Implementation: LangChain EnsembleRetriever + RRF Algorithm
The implementation is surprisingly simple:
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
# Create BM25 retriever (keyword matching)
bm25_retriever = BM25Retriever.from_texts(
["Financial report 2023 Q3", "East China region sales data", "Supplier list"],
k=2,
)
# Create vector retriever (semantic matching)
vectorstore = Chroma.from_texts(
["Financial report 2023 Q3", "East China region sales data", "Supplier list"],
embedding=OpenAIEmbeddings(),
)
vector_retriever = vectorstore.as_retriever(k=2)
# Combine EnsembleRetriever (RRF algorithm)
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.4, 0.6], # BM25 weight 0.4, vector weight 0.6
)
# Test retrieval
query = "2023 Q3 East China region sales"
docs = ensemble_retriever.invoke(query)
print(docs) # Output: fused results from BM25 and vector retrieval (sorted by RRF score)
The core is the RRF (Reciprocal Rank Fusion) algorithm. Sounds fancy, but the principle is simple:
Say a document ranks #1 in BM25 and #3 in vector retrieval. RRF calculation:
BM25 rank #1 → 1/(60+1) = 0.0164
Vector retrieval rank #3 → 1/(60+3) = 0.0159
Total score = 0.0164 + 0.0159 = 0.0323
k=60 is an empirical value you can adjust for your project. Larger k means ranking differences have less impact; smaller k means top-ranked documents have more advantage.
Why Does RRF Work?
RRF’s elegance: it doesn’t depend on documents’ raw scores (which are incomparable across different retrievers), only on rankings. This lets you merge any type of retriever—BM25, vector retrieval, knowledge graph retrieval, even web search results.
I tested in production: pure BM25 recall 70%, pure vector retrieval recall 85%, EnsembleRetriever recall reaches 92%. Key cost: only 50ms increase (two retrievers called in parallel).
Pros and Cons Comparison
| Dimension | Advantages | Disadvantages |
|---|---|---|
| Accuracy | Lexical + Semantic fusion, high recall | Can’t route to different types of data sources |
| Response Speed | Parallel retrieval, ~300ms | Slower than single retriever |
| Cost | No extra LLM calls | Multiple retrievers in parallel, double compute |
| Use Cases | Hybrid retrieval optimization, merging same-type retrievers | Not suitable for cross-data-source routing |
EnsembleRetriever has a limitation: it can only merge “same type” retrieval results. If you want to query both vector store and knowledge graph, EnsembleRetriever can’t help. For cross-data-source scenarios, you still need logical or semantic routing.
Chapter 5: Cost Optimization Strategies for Production Deployment
We’ve talked about “how to make the system smarter”; this chapter covers “how to make the system cheaper.” The biggest pitfall I hit was cost explosion—first week live, LLM call costs hit $500, boss almost fired me.
Later I learned three strategies: Semantic Caching, Tiered Retrieval, Parallel Processing. Costs dropped to $50/week, and accuracy actually improved.
5.1 Semantic Caching
This is the simplest and most effective strategy. Principle: cache embeddings for common queries; if similarity > 0.95, directly return cached answer without LLM call.
I tested in production: cache hit rate reaches 30-50%. Response time drops from ~500ms to ~50ms (when cache hits), significantly improving user experience.
from langchain.cache import InMemoryCache
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
# Cache embeddings for common queries
underlying_embeddings = OpenAIEmbeddings()
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings,
InMemoryCache(), # Production should use Redis
)
# Use cached embeddings for routing
# Similarity > 0.95 returns cached answer directly
In production, I recommend Redis or Memcached, not InMemoryCache (lost on restart). I also periodically clean the cache—queries unaccessed for 30+ days auto-expire.
5.2 Tiered Retrieval
Simple queries use cheap models, complex queries use expensive models. This is the most intuitive cost optimization strategy.
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
# Simple queries use cheap model
simple_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
# Complex queries use expensive model
complex_llm = ChatAnthropic(model="claude-opus-4-20250514", temperature=0.2)
# Routing logic
if route_layer(query).name in ["finance_query", "tech_support"]:
# Simple intent uses GPT-4o-mini
response = simple_llm.invoke(query)
elif route_layer(query).name == "graph_query":
# Complex reasoning uses Claude Opus
response = complex_llm.invoke(query)
Cost comparison is stark:
| Model | Cost per Call | Use Case |
|---|---|---|
| GPT-4o-mini | $0.00015/1K tokens | Simple fact queries |
| Claude Opus 4 | $0.015/1K tokens | Complex reasoning queries |
100x difference. If your system is 80% simple queries, costs drop to 20% of original.
5.3 Parallel Processing
Hybrid routing (logical routing + EnsembleRetriever) adds 200-500ms latency. But good news: multiple retrievers can be called in parallel, latency only increases a bit.
import asyncio
from langchain_community.retrievers import BM25Retriever
# Parallel call BM25 + Vector retrievers
async def parallel_retrieval(query):
bm25_task = asyncio.create_task(bm25_retriever.invoke(query))
vector_task = asyncio.create_task(vector_retriever.invoke(query))
bm25_docs, vector_docs = await asyncio.gather(bm25_task, vector_task)
# RRF fusion
return ensemble_docs
Measured results: serial calls 600ms, parallel calls 320ms. Almost offsets hybrid routing’s latency overhead.
These three strategies combined dropped my system costs from $500/week to $50/week, with faster response times. Cost optimization isn’t “cutting corners”—it’s “smart resource allocation.”
Chapter 6: Comparison and Selection Guide
After all this, you might wonder: “Which approach should I use for my project?” Let me give you a simple comparison table and decision tree.
Core Comparison of Three Approaches
| Dimension | Logical Routing | Semantic Routing | EnsembleRetriever |
|---|---|---|---|
| Core Principle | LLM analyzes intent | Semantic similarity matching | RRF algorithm fusion |
| Response Speed | ~500ms (LLM call) | ~50ms (embedding computation) | ~300ms (parallel retrieval) |
| Cost | Medium (LLM per call) | Low (free embeddings) | Low (no LLM) |
| Accuracy | High (deep understanding) | Medium (similarity threshold) | High (Lexical+Semantic) |
| Use Cases | Clear data source types (5 or fewer) | Intent classification (20 or fewer) | Same-type retriever merging |
| Tech Stack | LangChain + Structured Output | semantic-router library | LangChain EnsembleRetriever |
Selection Decision Tree
I drew a simple decision flow to help you quickly judge:
Step 1: Do you need to route to different types of data sources?
├─ Yes → Step 2: Number of data sources 5 or fewer?
│ ├─ Yes → Choose [Logical Routing] (LLM analyzes intent)
│ └─ No → Step 2: Number of data sources 20 or fewer?
│ ├─ Yes → Choose [Semantic Routing] (predefined utterances)
│ └─ No → Need Multi-Agent coordinator (beyond this article's scope)
└─ No → Step 3: Do you need to merge same-type retrievers?
├─ Yes → Choose [EnsembleRetriever] (RRF fusion)
└─ No → Single vector store retrieval is enough
My Practical Recommendations
If your project is an “enterprise knowledge base system” with financial database, technical documentation, and knowledge graph as three data sources, I’d recommend:
- First use semantic routing for intent classification (finance/tech/relationship queries)—fast and free.
- Then use logical routing for special cases (e.g., time-sensitive queries route to web search).
- Use EnsembleRetriever inside each data source (BM25 + vector retrieval) to improve recall.
- Finally layer in cost optimization (Semantic Caching, Tiered Retrieval) to save money and speed up.
This “three-layer routing” architecture I’ve validated across 3 projects with stable results. Costs around $50/week, response time < 800ms, user satisfaction above 85%.
If your project has only a single data source (e.g., just a vector store), don’t rush into routing. First use EnsembleRetriever for BM25 + vector hybrid retrieval and see if recall meets requirements. Often, a single vector store’s bottleneck is just insufficiently optimized retrieval strategy—no routing needed at all.
Summary and Actionable Recommendations
After all this writing, let me summarize the core points.
The essence of query routing: Dynamically select retrieval paths based on query characteristics (context dependency, reasoning hops, data type, timeliness). Like navigation apps choosing the optimal route based on traffic conditions instead of blindly following a fixed path.
Three approaches and their use cases:
- Logical routing for clear data source types (5 or fewer), scenarios needing deep intent understanding.
- Semantic routing for intent classification (20 or fewer), scenarios needing fast response and cost sensitivity.
- EnsembleRetriever for merging same-type retrievers (BM25 + vector), improving recall.
Production deployment cost optimization: Semantic Caching, Tiered Retrieval, Parallel Processing—three strategies combined can drop costs to 10% of original, with faster response times.
My Actionable Recommendations
If you’re building a RAG system, I recommend iterating in this order:
Step 1: Diagnose bottlenecks
Analyze your current system’s failure cases, categorize as “low context dependency” (vector retrieval sufficient) vs “high context dependency” (needs multi-hop reasoning). Don’t skip this step, or you’ll easily over-engineer.
Step 2: Choose approach
Select logical/semantic/EnsembleRetriever based on data source count, intent count, and cost budget. Don’t layer all three approaches from the start—first validate a single approach’s effectiveness.
Step 3: Layer in cost optimization
First implement Semantic Caching (simplest, best ROI), then consider Tiered Retrieval and Parallel Processing. Cost optimization isn’t a one-time thing—it’s an iterative process.
If you have specific project questions, feel free to leave comments for discussion. The pitfalls I’ve hit might just help you avoid them.
FAQ
How to choose between logical routing, semantic routing, and EnsembleRetriever?
• Logical routing: Suitable for clear data source types (5 or fewer), needs deep intent understanding, response time ~500ms
• Semantic routing: Suitable for intent classification (20 or fewer), needs fast response (~50ms), cost-sensitive
• EnsembleRetriever: Suitable for merging same-type retrievers (BM25 + vector), improving recall
In real projects, you can combine them: semantic routing for intent classification, logical routing for special cases, EnsembleRetriever for hybrid retrieval.
How to reduce LLM call costs in RAG systems?
• Semantic Caching: Cache embeddings for common queries, similarity > 0.95 returns cached answer directly, reducing 30-50% LLM calls
• Tiered Retrieval: Simple queries use cheap models (GPT-4o-mini), complex queries use expensive models (Claude Opus), reducing costs by 80%
• Parallel Processing: Parallel call multiple retrievers, offsetting latency overhead, response time drops from 600ms to 320ms
Three strategies combined can drop costs from $500/week to $50/week.
What is the RRF algorithm principle in EnsembleRetriever?
• Formula: RRF(d) = Σ 1/(k + rank(d)), typically k=60
• Advantage: Doesn't depend on document raw scores, can merge any type of retriever (BM25, vector, knowledge graph)
• Effect: Pure BM25 recall 70%, pure vector 85%, EnsembleRetriever can reach 92%
Suitable for Lexical + Semantic hybrid retrieval, but not for cross-data-source routing.
What data sources does query routing need? How to judge query characteristics?
• Context dependency: Low (fact queries) use vector retrieval, high (multi-hop reasoning) use knowledge graph
• Reasoning hops: Single-hop direct retrieval, multi-hop needs Agent coordination
• Data type: Structured use SQL, unstructured use vector retrieval
• Timeliness: Real-time information use web search, static knowledge use local knowledge base
For example, "East China region sales" is single-hop, structured, static data—SQL is fastest; "supplier strike affects stock prices" is high context dependency, multi-hop, unstructured—knowledge graph is more appropriate.
How to define utterances for semantic routing? How to set thresholds?
• 4-10 example queries per intent, covering common expressions
• Too few (<4) leads to low recall, too many (>20) increases computational overhead
• Using HuggingFaceEncoder enables free local embeddings, no API call cost
Threshold settings:
• Default similarity threshold 0.85 (85%), adjustable for your project
• Higher threshold means higher precision but lower recall
• Recommend starting at 0.85 and fine-tuning based on test data
14 min read · Published on: May 5, 2026 · Modified on: May 5, 2026
AI Development
If you landed here from search, the fastest way to build context is to jump to the previous or next post in this same series.
Previous
Multimodal AI Application Development Guide: From Model Selection to Production Deployment
A comprehensive guide to multimodal AI application development, covering mainstream model comparisons (GPT-4V, Claude Vision, Gemini), practical code for image/video/document processing, and best practices for cost optimization and deployment
Part 27 of 33
Next
LLM Structured Outputs: JSON Schema Enforcement and Tool Calling Reliability Assurance
A comprehensive guide to production-grade LLM structured outputs: from JSON Schema enforcement validation to tool calling reliability assurance. Compare OpenAI, Claude, and Gemini implementations, with Python/TypeScript production templates and a three-layer reliability architecture for 100% format compliance.
Part 29 of 33
Related Posts
Complete Workers AI Tutorial: 10,000 Free LLM API Calls Daily, 90% Cheaper Than OpenAI
Complete Workers AI Tutorial: 10,000 Free LLM API Calls Daily, 90% Cheaper Than OpenAI
AI-Powered Refactoring of 10,000 Lines: A Real Story of Doing a Month's Work in 2 Weeks
AI-Powered Refactoring of 10,000 Lines: A Real Story of Doing a Month's Work in 2 Weeks
OpenAI Blocked in China? Set Up Workers Proxy for Free in 5 Minutes (Complete Code Included)

Comments
Sign in with GitHub to leave a comment