Index & Vector Databases
Index Database
The index database is an additional (relational) metadata database which keeps track of all datasets and ingested files and their relationships. It enables some extra convenience features but does not store the actual data (content & embeddings). The current implementation uses SQLite by default, which is fully embedded and does not require any additional setup.
You can configure it by setting a database connection string via the KNOW_INDEX_DSN
environment variable.
The following options are available:
- SQLite (default):
KNOW_INDEX_DSN="sqlite:///home/me/mysqlite.db"
- Postgres:
KNOW_INDEX_DSN="postgres://knowledge:knowledge@localhost:5432/knowledge?sslmode=disable"
Vector Database
The vector database is the main storage for the content and embeddings of the ingested documents along with some metadata (e.g. source file information). The current implementation uses chromem-go by default, which is fully embedded and does not require any additional setup.
You can configure it by setting a database connection string via the KNOW_VECTOR_DSN
environment variable.
The following options are available:
- Chromem-Go (default):
KNOW_VECTOR_DSN="chromem:///path/to/directory"
(Note: we're using a customized fork of chromem-go, so some details may differ from the original project) - PGVector:
KNOW_VECTOR_DSN="pgvector://knowledge:knowledge@localhost:5432/knowledge?sslmode=disable"
- SQLite-Vec:
KNOW_VECTOR_DSN="sqlite-vec:///home/me/mysqlite.db"