Data Sources

Data sources are how you organize documents in Edgebric. Think of them as folders — each data source holds a collection of related documents that Edgebric can search and answer questions from.

Types of Data Sources

Network Sources

Created by admins, shared across the organization. Everyone with access can query them.

Examples: Company policies, HR handbook, product documentation, legal contracts.

Vault Sources

Personal and private. Only you can see and query them. Stored encrypted on your device.

Examples: Personal notes, medical records, tax documents, private research.

Creating a Data Source

Click New Source in the Library page
Enter a name and optional description
Choose the type (Network or Vault)
For Network sources, set access permissions:
- All — Everyone in the organization can query this source
- Restricted — Only specific users you add can query it

Uploading Documents

Edgebric supports these file formats:

Format	Extension	Notes
PDF	`.pdf`	Text PDFs and scanned (via OCR)
Word	`.docx`	Converted to clean Markdown
Plain text	`.txt`	Direct ingestion
Markdown	`.md`	Direct ingestion

How to Upload

Open a data source from the Library page
Drag and drop files onto the upload area, or click to browse
Edgebric processes each document automatically

What Happens During Processing

When you upload a document, Edgebric:

Detects the file type using the file's content (not just the extension)
Extracts text — PDFs are processed with Docling (layout-aware extraction that handles tables, columns, and complex formatting). Scanned PDFs fall back to OCR. Word files are converted to Markdown.
Splits into chunks — The text is divided at natural boundaries (headings, sections) rather than fixed character counts. Tables are kept intact.
Checks for personal information — A PII detector scans for names, addresses, and other sensitive data. If found, an admin must review and approve before the document becomes searchable.
Creates embeddings — Each chunk is converted into a numerical representation for semantic search.
Indexes for keyword search — Full-text search is built alongside the vector index.

Document Status

Each document shows a status indicator:

Status	Meaning
Processing	Extraction and indexing in progress
Ready	Document is searchable
PII Review	Personal information detected — admin review needed
Rejected	Admin rejected due to PII concerns
Failed	Processing error — try re-uploading

Managing Documents

View — Click a document to see its extracted content organized by section
Download — Download the original file
Delete — Remove a document from the data source (also removes its search index)
Re-upload — Upload a newer version of a document to replace the old one

PII Detection

Edgebric automatically scans documents for personally identifiable information (PII) — names, email addresses, phone numbers, Social Security numbers, and similar data.

When PII is detected:

The document is paused in a PII Review state
An admin sees a warning showing what was found
The admin can approve (allow the document to be indexed) or reject (delete the document)

This protects against accidentally making sensitive personal data searchable across the organization.

Admins can configure PII detection behavior in Security settings:

Mode	Behavior
Warn	Flag documents with PII for review (default)
Block	Automatically reject documents with PII
Off	Skip PII detection

Document Staleness

Edgebric tracks how old documents are. When a document hasn't been updated in a while (default: 6 months), it's flagged as potentially stale. This helps you keep your knowledge base current.

Cloud Sync

Instead of uploading files manually, you can sync documents from cloud storage. See Cloud Sync for setup instructions.

Data Sources ​

Types of Data Sources ​

Network Sources ​

Vault Sources ​

Creating a Data Source ​

Uploading Documents ​

How to Upload ​

What Happens During Processing ​

Document Status ​

Managing Documents ​

PII Detection ​

Document Staleness ​

Cloud Sync ​