LlamaIndex-第2篇(QA和评估)
生产级的范例
QA
User Case:
What
- 语义查询(Semantic search / Top K)
- 总结
Where
- Over documents
- Building a multi-document agent over the LlamaIndex docs
- Over structured data(例如JSON)
- Searching Pandas tables
- Text to SQL
How
上面的链接都是指向:下面的Q&A patterns
一个最简单的Q&A
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
选择不同的数据源(Route Datasource)
Compare/Contrast Queries
这个不懂
Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well. It can do this through our SubQuestionQueryEngine
class. Given a query, this query engine will generate a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.
This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.
LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions, and sequentially generate subquestions based on returned answers until the final answer is returned.
For instance, given a question "Who was in the first batch of the accelerator program the author started?", the module will first decompose the query into a simpler initial question "What was the accelerator program the author started?", query the index, and then ask followup questions.
Eval
- 评估响应
- 评估检索
- 评估响应
- 使用GPT-4来评估
- 评估的维度
- 生成的答案与参考答案:正确性和语义相似度
- 生成的答案与retrieved contexts:Faithfulness
- 生成的答案与Query: Answer Relevancy
- retrieved contexts和Query:Context Relevancy
- 生成参考答案
- 评估检索(retrieval)
- 如何评估:ranking metrics like mean-reciprocal rank (MRR), hit-rate, precision, and more.
使用范例
集成到其它工具
- UpTrain: 1.9K:可试用,但是需要book demo,目测不便宜
- Tonic Validate(Includes Web UI for visualizing results):有商业版本,可试用,之后200美元/月
- DeepEval: 1.6K
- Ragas: 4.4K
- 感觉很不错
- Llamaindex-->Ragas-->LangSmith和其它工具
- 但是,很搓,quick start运行失败,一起提示
ModuleNotFoundError: No module named 'ragas.metrics'; 'ragas' is not a package