Upload Content
Upload raw text or markdown content for ingestion. Supports both single and batch uploads via the contents list in the request.
Examples
- API Request
- TypeScript
- Python (Sync)
Text Processing Pipeline
When you upload text content, it goes through a streamlined processing pipeline optimized for direct text input:1. Immediate Upload & Queue
- Your text content is immediately accepted and stored securely
- It’s added to our processing queue for background processing
- You receive a confirmation response with a
file_idfor tracking
2. Text Processing Phase
Our system automatically handles:- Content Validation: Ensuring text content is properly formatted and accessible
- Format Detection: Identifying markdown, plain text, or structured content
- Text Normalization: Cleaning and standardizing text formatting
3. Intelligent Chunking
- Text is split into semantically meaningful chunks
- Chunk size is optimized for both context preservation and search accuracy
- Overlapping boundaries ensure no information is lost between chunks
- Metadata is preserved and associated with each chunk
4. Embedding Generation
- Each chunk is converted into high-dimensional vector embeddings
- Embeddings capture semantic meaning and context
- Vectors are optimized for similarity search and retrieval
5. Indexing & Database Updates
- Embeddings are stored in our vector database for fast similarity search
- Full-text search indexes are created for keyword-based queries
- Metadata is indexed for filtering and faceted search
- Cross-references are established for related content
6. Quality Assurance
- Automated quality checks ensure processing accuracy
- Content validation verifies text completeness
- Embedding quality is assessed for optimal retrieval performance
sub_tenant_id, the text content will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide content that should be accessible across all departments.File ID Management: The system uses a priority-based approach for file ID assignment:
- First Priority: If you provide a
file_idas a direct body parameter, that specific ID will be used- Second Priority: If no direct
file_idis provided, the system checks for afile_idin thedocument_metadataobject- Auto-Generation: If neither source provides a
file_id, the system will automatically generate a unique identifier
Duplicate File ID Behavior
When you upload text content with afile_id that already exists in your tenant:
- Overwrite Behavior: The existing text content with the same
file_idwill be completely replaced with the new content - Processing: The new text content will go through the full processing pipeline (validation, chunking, embedding generation, indexing)
- Search Results: Previous search results and embeddings from the old content will be replaced with the new content
- Idempotency: Uploading the same text content with the same
file_idmultiple times is safe and will result in the same final state
Processing Status & Monitoring
After uploading, you can monitor your text content’s processing status:Immediate Response
Upon successful upload, you’ll receive:Processing States
Your text content will progress through these states:queued: Text content is in the processing queue, waiting to be processedin_progress: Text content is actively being processed (includes validation, chunking, embedding generation, and indexing)success: Text content is fully processed and searchableerrored: Processing encountered an error (rare occurrence)
in_progress, the system is actually performing multiple steps: content validation, format detection, intelligent chunking, embedding generation, and database indexing. These happen sequentially but are all part of the single in_progress state.When Your Text is Ready
Once processing is complete, your text content will be:- ✅ Searchable via semantic search and Q&A endpoints
- ✅ Retrievable through our retrieval APIs
- ✅ Available for AI-powered applications
- ✅ Indexed for fast query performance
Error Responses
All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
CONTENT_DESCRIPTION
Unique identifier for the tenant/organization
"tenant_1234"
Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.
"sub_tenant_4567"
If true, update existing sources with the same source_id. Defaults to True.
true
Response
Successful Response
true
List of upload results for each source.
[]Number of sources successfully queued.
1
Number of sources that failed to upload.
1