Upload Document
Upload one or more documents for ingestion. Supports both single and batch uploads. For single file: send one file with the ‘files’ field. For batch: send multiple files with the ‘files’ field (FastAPI will parse as list).
Examples
- API Request
- TypeScript
- Python (Sync)
Metadata Parameters
When uploading multiple files, you can provide metadata for each file using thefile_metadata parameter. This allows you to associate custom metadata, organize documents, and control processing behavior on a per-file basis.
file_metadata Array
The file_metadata parameter accepts a JSON array where each object corresponds to one of the uploaded files. The order of metadata objects should match the order of files in the files parameter.
Structure:
Metadata Fields
file_id (string, optional)
- Description: A unique identifier for the document. If not provided, the system will auto-generate one.
- Use Case: Use this to reference the document later, enable idempotent uploads, or maintain your own document naming scheme.
- Example:
"doc_a","invoice_2024_001","manual_v2.3"
tenant_metadata (object, optional)
- Description: Key-value pairs that represent tenant-level metadata. This metadata is shared across all documents within the tenant and is useful for organization-wide filtering and categorization.
- Use Case: Store department information, project tags, organizational units, or any tenant-scoped attributes that help organize and filter documents.
- Example:
- Note: This metadata is indexed and can be used for filtering in search queries.
document_metadata (object, optional)
- Description: Key-value pairs that represent document-specific metadata. This metadata is unique to each document and provides context about the document itself.
- Use Case: Store document-specific information like author, creation date, document type, version, or any attributes that describe the individual document.
- Example:
- Note: This metadata is indexed and can be used for filtering in search queries.
relations (boolean, optional)
- Description: Controls whether the system should extract and index relationships between entities in the document. When set to
true, the system will analyze the document for entity relationships and create a knowledge graph. - Use Case: Enable relationship extraction for documents where understanding connections between entities (people, places, concepts) is important for your use case.
- Default:
false - Example:
true: Extract relationships for documents like organizational charts, knowledge bases, or interconnected documentationfalse: Skip relationship extraction for simple documents or when graph features aren’t needed
file_metadata array should match the order of files in the files parameter. The first metadata object applies to the first file, the second to the second file, and so on.tenant_metadata and document_metadata are indexed and can be used to filter search results. This enables powerful query capabilities like “find all sales documents from Q4” or “retrieve documents authored by Alice”.Supported file formats
Document Processing Pipeline
When you upload a document, it goes through a comprehensive processing pipeline designed to make your content searchable and retrievable:1. Immediate Upload & Queue
- Your document is immediately accepted and stored securely
- It’s added to our processing queue for background processing
- You receive a confirmation response with a
source_idfor tracking
2. Processing Phase
Our system automatically handles:- Content Extraction: Extracting text from various formats (PDF, DOCX, TXT, etc.)
- Document Parsing: Understanding document structure, headers, and formatting
- Text Cleaning: Removing formatting artifacts and normalizing content
3. Intelligent Chunking
- Documents are split into semantically meaningful chunks
- Chunk size is optimized for both context preservation and search accuracy
- Overlapping boundaries ensure no information is lost between chunks
- Metadata is preserved and associated with each chunk
4. Embedding Generation
- Each chunk is converted into high-dimensional vector embeddings
- Embeddings capture semantic meaning and context
- Vectors are optimized for similarity search and retrieval
5. Indexing & Database Updates
- Embeddings are stored in our vector database for fast similarity search
- Full-text search indexes are created for keyword-based queries
- Metadata is indexed for filtering and faceted search
- Cross-references are established for related documents
6. Quality Assurance
- Automated quality checks ensure processing accuracy
- Content validation verifies extracted text completeness
- Embedding quality is assessed for optimal retrieval performance
sub_tenant_id, the document will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide documents that should be accessible across all departments.File ID Management: The system uses a priority-based approach for file ID assignment:
- First Priority: If you provide a
file_idin thefile_metadata- Auto-Generation: If neither source provides a
file_id, the system will automatically generate a unique identifier
Duplicate File ID Behavior
When you upload a document with afile_id that already exists in your tenant:
- Overwrite Behavior: The existing document with the same
file_idwill be completely replaced with the new document - Processing: The new document will go through the full processing pipeline (content extraction, chunking, embedding generation, indexing)
- Search Results: Previous search results and embeddings from the old document will be replaced with the new document’s content
- Idempotency: Uploading the same document with the same
file_idmultiple times is safe and will result in the same final state
Processing Status & Monitoring
After uploading, you can monitor your document’s processing status:Immediate Response
Upon successful upload, you’ll receive:Processing States
Your document will progress through these states:queued: Document is in the processing queue, waiting to be processedin_progress: Document is actively being processed (includes content extraction, chunking, embedding generation, and indexing)success: Document is fully processed and searchableerrored: Processing encountered an error (rare occurrence)
in_progress, the system is actually performing multiple steps: content extraction, document parsing, intelligent chunking, embedding generation, and database indexing. These happen sequentially but are all part of the single in_progress state.When Your Document is Ready
Once processing is complete, your document will be:- ✅ Searchable via semantic search and Q&A endpoints
- ✅ Available for AI-powered applications
- ✅ Indexed for fast query performance
Best Practices
Document Preparation
- File Size: Documents up to 50MB are processed efficiently
- Content Quality: Clear, well-structured documents produce better embeddings
- Metadata: Include rich metadata for better filtering and organization
Processing Optimization
- Batch Uploads: For multiple documents, consider using our batch upload endpoint
- Metadata Consistency: Use consistent metadata schemas across your organization
- File Naming: Descriptive filenames help with document identification
Troubleshooting
Document Not Appearing in Search?- Wait 5-10 minutes for processing to complete
- Check if the document status is
errored(rare occurrence) - Verify your search query and filters
- Large documents (100+ pages) take longer to process
- Complex formatting may require additional processing time
- High system load may temporarily slow processing
- If status shows
errored, ensure your document isn’t corrupted or password-protected - Check that the file format is supported (see Supported File Formats section above)
- Verify your API key has sufficient permissions
- For unsupported formats, you’ll receive a
400error with the message:"Unsupported file format: [filename]. Please check our supported file formats documentation."
file_id for assistance.Error Responses
All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Files to be uploaded
Unique identifier for the tenant/organization
"tenant_1234"
Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.
"sub_tenant_4567"
true
JSON Array of file metadata objects
Response
Successful Response
true
List of upload results for each source.
[]Number of sources successfully queued.
1
Number of sources that failed to upload.
1