Skip to main content

vector database

time line

Vespa was one of the first vendors to Add Vector similarity Searching next to the mainstream BM25-based keyword Searching Algorithm.

Weaviate then launched a dedicated Open-Source vector Searching database product at the end of 2018.

annum 2019, we started to see more competition in this Domain, including Milvus (which is also Open-Source). Zilliz is the parent company of Milvus.

annum 2021, three new suppliers Add the competition: Vald, Qdrant and Pinecone.

It was only then that established vendors like Elasticsearch, Redis, and PostgreSQL began offering Vector Searching, much later than people originally thought, only annum 2022 and beyond.

image-20240125145334529

Open-Source and Commerce

Business: Pinecone and Zilliz

plug-in forms

  • pgvector
  • Redis Stack

image-20240125145550416

Postgres

A database simultaneously supports:

  • Relation database: RDS
  • vector database: pgvector
  • Time Series Data Database: Time Series Data Database plays a major role in Yuan Data filtering. It is a Data database that records events and occurrence times. The Searching speed for time series is very fast. In RAG applications, if the industry Knowledge files are segmented into tens of thousands, then it is very important to use time filtering. For example, we only need to retrieve the contract files annum March 2023, and then we can use time series Data to reduce the Objective chunk from tens of thousands. First pick out the tens of thousands, and then perform Vector calculation.
image-20240125150309961

Timescale Vector plug-in

Faster similarity Searching on millions of vector: Support forDiskANN Algorithm,HNSW Algorithm

  • Timescale Vector optimizes time-based vector Searching Query:Exploitation Timescale's super table's automatic time-based partition and Index of Matrix to Valid find the most recent Embeddings, Constraint vector Searching by time range or Document existence year, and and if you do easily Storage and retrieval of Grande Language Model (LLM) responses and chat history % remote;. Time-based semantic Searching also enables you to useRetrieval Augmented Generation (RAG) and time-based contextual retrieval to provide users with more usefulLLM responses.
  • Simplified AI Infrastructure Stack:By combiningVector Embeddings,Relation Data, andTime Series Data in a PostgreSQL Database, Timescale vector eliminates the Operation complexity associated with managing multiple Database systems on a large scale.
  • **Simplify Yuan Data Processing and Attribute filtering:**Developers can Exploitation all PostgreSQL data types to Storage and filter Yuan Data, and if you do connect Vector Searching results with Relation Data Linkage to Gain more correlation responses. In future versions, Timescale Vector will further optimize rich Attribute filtering to achieve faster similarity Searching when filtering Yuan Data by the hour.

vector database compiled by LlamaIndex

link

Vector Store Options & Feature Support

Vector StoreTypeMetadata FilteringHybrid SearchDeleteStore DocumentsAsync
Apache Cassandra®self-hosted / cloud
Astra DBcloud
Azure Cognitive Searchcloud
Azure CosmosDB MongoDBcloud
ChatGPT Retrieval Pluginaggregator
Chromaself-hosted
DashVectorcloud
Deeplakeself-hosted / cloud
DocArrayaggregator
DynamoDBcloud
Elasticsearchself-hosted / cloud
FAISSin-memory
txtaiin-memory
Jaguarself-hosted / cloud
LanceDBcloud
Lanternself-hosted / cloud
Metalcloud
MongoDB Atlasself-hosted / cloud
MyScalecloud
Milvus / Zillizself-hosted / cloud
Neo4jVectorself-hosted / cloud
OpenSearchself-hosted / cloud
Pineconecloud
Postgresself-hosted / cloud
pgvecto.rsself-hosted / cloud
Qdrantself-hosted / cloud
Redisself-hosted / cloud
Simplein-memory
SingleStoreself-hosted / cloud
Supabaseself-hosted / cloud
Taircloud
TencentVectorDBcloud
Timescale
Typesenseself-hosted / cloud
Weaviateself-hosted / cloud

Most supported database

ector StoreTypeMetadata FilteringHybrid SearchDeleteStore DocumentsAsync
DashVectorcloud
Elasticsearchself-hosted / cloudIt always feels heavy
Jaguarself-hosted / cloud
Lanternself-hosted / cloud
MyScalecloud
Pineconecloud
Postgresself-hosted / cloud
pgvecto.rsself-hosted / cloud
Qdrantself-hosted / cloudThe founder seems to have run away
TencentVectorDBcloud
Weaviateself-hosted / cloud

Elasticsearch: I always feel heavy

Postgress: Let's start with the simplest one.

Qdrant: The founder seems to have left.

LangChain Comparison of database

original

database nameapplication scenarios
HNSWLib, Faiss, LanceDB, CloseVectorIf you need an in-memory database that can run in your Node.js Application, no other server is needed
MemoryVectorStore, CloseVectorIf you're looking for something that can run in memory in a browser-like Environment
HNSWLib, FaissIf you're from Python and you're looking for something similar to FAISS
ChromaIf you're looking for an Open-Source, full-featured vector database that can run locally in a docker container
ZepIf you're looking for an Open-Source vector database that provides low-latency, local Embedding Document support and supports edge applications
WeaviateIf you're looking for an Open-Source, production-ready vector database that can be run locally in a docker container or hosted in the cloud
Supabase vector storeIf you are already using Supabase, look at Supabase vector Storage and use the same Postgres Database to Storage your Embedding
PineconeIf you're looking for a production-ready vector Storage, you don't have to worry about hosting it yourself
SingleStore vector storeIf you are already using SingleStore, or you need a distributed, high-performance database, you might consider SingleStore vector Storage
AnalyticDB vector storeIf you are looking for an Online MPP (Massively Parallel treament) Data Warehouse service, you might consider AnalyticDB vector Storage
MyScaleIf you are looking for a cost-effective vector Database that allows you to use SQL for vector Searching
CloseVectorIf you're looking for a vector database that can be loaded from the browser and server, take a look at CloseVector. It is a vector database designed to cross platforms
ClickHouseIf you are looking for a scalable, Open-Source Column database with excellent performance for Analytical Query

Comparison of Different database

Comparison of Open-Source vector database

RAG selection

Elasticsearch

Qdrant

Postgress

reference

https://mp.weixin.qq.com/s/YENmch0b4rbNJ73bvBLUpQ