From RAG to search agents: BEIR co-author Nandan Thakur on BrowseComp-Plus, synthetic data pipelines, GRPO economics, and why retrieval benchmarks, training cost, and harness design pull in different directions.
Enterprise RAG on financial research corpora: engineering trade-offs across vector stores, agents, and eval—ingestion throughput, retrieval granularity, entitlements, and agent latency.
Enterprise RAG and agents when vector databases meet four decades of analytics software—engineering tensions in regulated industries, SAS RAM, Weaviate integration, and production boundaries.
Enterprise RAG and agents: from stitched-together pipelines to an end-to-end optimizable system—RAG 2.0, active retrieval, preference learning (KTO/APO), and LMUnit-style evaluation, with evidence boundaries called out.
Enterprise AI on exabyte-scale unstructured content: permissions, layered retrieval, and agent boundaries—engineering lessons from Box × Weaviate on ACL-aware RAG, embedding economics, and production agents.
Engineering trade-offs in retrieval embeddings: how to read leaderboards, what contrastive pre-training and fine-tuning each solve, how Matryoshka representation learning scales to billion-vector indexes, and the gap between multilingual benchmarks and proprietary distributions—grounded in Snowflake Arctic Embed and the Weaviate podcast.
Data agents across Snowflake, MySQL, Mongo, and Salesforce—DAB benchmarks, DocETL, tribal knowledge, and agent-first databases, with verifiable claims separated from speaker opinion.
Compound AI: When a single LLM call is not enough—multiple model calls, retrievers, tools, and business logic as a graph; structured output, specialist pipelines, inference stacks, and deployment granularity from a Weaviate podcast with Baseten’s Philip Kiely.
AI-Powered Search: When RAG, agents, and classic IR get rewired—retrieval quality vs. agent loops, long context vs. searchable history, leaderboard embeddings vs. domain corpora, with Doug Turnbull and Trey Grainger on what ships.
Stanford’s STaRK benchmark and AvaTaR contrastive optimization for retrieval agents on semi-structured knowledge bases—metrics, multi-vector limits, when agents lose to dense retrievers, and what to ship in production.
Agentic RAG: When retrieval pipelines add LLM plan–act–observe loops, tool calling, and multi-step validation—separating verified docs from interview speculation for production teams.
Agent oversight stack: from static evaluation to trajectory-level observability—evaluation, observability, and supervision for multi-agent systems, with Percival, Lynx, and Glider, and evidence boundaries called out.