> ## Documentation Index
> Fetch the complete documentation index at: https://docs.argentos.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Knowledge Library & ACL

> PostgreSQL-backed RAG system with agent-level access control, multi-format ingestion, and hybrid search.

<Info>
  **ArgentOS Business** -- This feature is part of ArgentOS Business. The architecture is documented here for all users, but full functionality requires a Business license. [Learn more about Business](/business)
</Info>

## Overview

The Knowledge Library is ArgentOS's Retrieval-Augmented Generation (RAG) system. It provides agents with a structured document store where files can be ingested, chunked, embedded, and searched. Access is controlled through a fine-grained ACL system that governs which agents can read, write, and own which collections.

<Warning>
  This is a **PG-only** feature -- it requires PostgreSQL and has no SQLite fallback.
</Warning>

```mermaid theme={null}
flowchart LR
    A["Ingest (PDF, DOCX, XLSX, text)"] --> B["Chunk + Embed"]
    B --> C["PostgreSQL (pgvector + tsvector)"]
    C --> D["Hybrid Search (BM25 + vector)"]
    C --> E[ACL Check]
    D --> F[Agent Results]
    E --> F
```

## Collections

Collections are named document buckets that organize knowledge by scope and purpose. Example collections:

* `corporate` -- Company-wide policies and procedures
* `department-sales` -- Sales team playbooks and materials
* `department-support` -- Support knowledge base
* `agent-personal` -- An individual agent's personal reference documents

Each collection has an **owner agent**, **ACL grants** controlling access, and a **collection tag** (normalized slug).

## Document Ingestion

### Supported Formats

| Format   | MIME Types                                                                | Notes                            |
| -------- | ------------------------------------------------------------------------- | -------------------------------- |
| **PDF**  | `application/pdf`                                                         | Text extraction, max page limits |
| **DOCX** | `application/vnd.openxmlformats-officedocument.wordprocessingml.document` | Full text extraction             |
| **XLSX** | `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`       | Cell-level extraction            |
| **Text** | `text/plain`, `text/markdown`, `text/csv`, `application/json`             | Direct content                   |
| **HTML** | `text/html`                                                               | Tag stripping                    |

### Chunking

Documents are split into overlapping chunks for embedding:

| Parameter   | Default         | Description                     |
| ----------- | --------------- | ------------------------------- |
| `chunkSize` | 1800 characters | Maximum characters per chunk    |
| `overlap`   | 200 characters  | Overlap between adjacent chunks |

Each chunk receives a citation reference in the format `filename.pdf#chunk-5`, enabling precise source attribution in search results.

## Embedding

Chunks are embedded using the configured embedding provider:

* **OpenAI** -- `text-embedding-3-small` or `text-embedding-3-large`
* **Gemini** -- Google embedding models
* **Ollama** -- Local embedding models (free)

Embeddings are stored as pgvector columns with HNSW indexes for fast approximate nearest-neighbor search.

## Hybrid Search

Search combines two retrieval strategies for maximum recall and precision:

<Tabs>
  <Tab title="BM25 Keyword Search">
    PostgreSQL `tsvector` with GIN indexes provides fast full-text keyword matching. This catches exact terminology, proper nouns, and technical terms that vector search might miss.
  </Tab>

  <Tab title="Vector Similarity Search">
    pgvector HNSW indexes enable semantic similarity search. This catches paraphrases, related concepts, and queries that use different terminology than the source documents.
  </Tab>
</Tabs>

Results from both strategies are merged and ranked. The hybrid approach ensures that exact keyword queries find precise matches, conceptual queries find semantically related content, and rare terms are not lost in vector space.

## ACL Enforcement

The ACL system controls access at the collection level. Every knowledge operation checks permissions before proceeding.

### Permission Model

| Permission  | Description                                                |
| ----------- | ---------------------------------------------------------- |
| `can_read`  | Agent can search and view documents in the collection      |
| `can_write` | Agent can ingest new documents into the collection         |
| `is_owner`  | Agent has full control: read, write, delete, manage grants |

### Grant Resolution

Grants are resolved in this order:

1. **Exact agent match**: Direct grant for the requesting agent
2. **Alias resolution**: `main` and `argent` are treated as equivalent
3. **Wildcard grant**: `agent_id='*'` grants access to all agents

### Fail-Closed vs Fail-Open

<Warning>
  The ACL system defaults to **fail-closed** when PostgreSQL is the configured backend -- if ACL tables are unavailable or a query fails, access is denied.
</Warning>

Override with environment variables:

* `ARGENT_KNOWLEDGE_ACL_FAIL_OPEN=1` -- Allow access when ACL check fails
* `ARGENT_KNOWLEDGE_ACL_FAIL_CLOSED=1` -- Deny access when ACL check fails (explicit)

## 4-Level Scoping

Knowledge collections can be organized at four hierarchical levels:

| Level          | Example                                  | Typical Use                                      |
| -------------- | ---------------------------------------- | ------------------------------------------------ |
| **Global**     | `corporate`, `policies`                  | Company-wide reference, accessible to all agents |
| **Department** | `department-support`, `department-sales` | Team-specific knowledge bases                    |
| **Agent**      | `agent-argent-personal`                  | Individual agent's reference documents           |
| **Worker**     | `worker-exec-research`                   | Task-specific temporary collections              |

<Note>
  Scoping is by convention (collection naming) rather than enforced hierarchy. ACL grants control actual access.
</Note>

## Dashboard Integration

The ConfigPanel provides a full library browser:

* **Collection list** with ACL indicators (read/write/owner per collection)
* **Document browser** with search, sort, and filter
* **Ingest UI** for uploading documents to collections
* **ACL manager** for granting/revoking agent access
* **Reindex controls** for regenerating embeddings
