LlamaIndex-第2篇（QA和评估）

生产级的范例

SEC-Insights

QA

User Case:

Q&A: 很不错
Structured Data Extraction

What

语义查询（Semantic search / Top K）
总结

Where

Over documents
Building a multi-document agent over the LlamaIndex docs
Over structured data（例如JSON）
Searching Pandas tables
Text to SQL

How

上面的链接都是指向：下面的Q&A patterns

Understanding: Q&A patterns

一个最简单的Q&A

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

选择不同的数据源（Route Datasource）

链接

Compare/Contrast Queries

这个不懂

Multi Document Queries

Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well. It can do this through our SubQuestionQueryEngine class. Given a query, this query engine will generate a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.

This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.

Multi-Step Queries

LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions, and sequentially generate subquestions based on returned answers until the final answer is returned.

For instance, given a question "Who was in the first batch of the accelerator program the author started?", the module will first decompose the query into a simpler initial question "What was the accelerator program the author started?", query the index, and then ask followup questions.

时态查询

Eval

概念入门

评估响应
评估检索

详解概述和流程

评估响应
- 使用GPT-4来评估
- 评估的维度
  - 生成的答案与参考答案：正确性和语义相似度
  - 生成的答案与retrieved contexts：Faithfulness
  - 生成的答案与Query: Answer Relevancy
  - retrieved contexts和Query：Context Relevancy
- 生成参考答案
评估检索（retrieval）
- 如何评估：ranking metrics like mean-reciprocal rank (MRR), hit-rate, precision, and more.

生成dataset

使用范例

集成到其它工具

UpTrain: 1.9K：可试用，但是需要book demo，目测不便宜
Tonic Validate(Includes Web UI for visualizing results)：有商业版本，可试用，之后200美元/月
- llamaindex有详细文章
DeepEval: 1.6K
Ragas: 4.4K
- 感觉很不错
- Llamaindex-->Ragas-->LangSmith和其它工具
- 但是，很搓，quick start运行失败，一起提示ModuleNotFoundError: No module named 'ragas.metrics'; 'ragas' is not a package

费用评估

优化

基础优化

Retrieval