Examples
- API Request
- TypeScript
- Python (Sync)
Text Processing Pipeline
When you upload text content, it goes through a streamlined processing pipeline optimized for direct text input:1. Immediate Upload & Queue
- Your text content is immediately accepted and stored securely
- It’s added to our processing queue for background processing
- You receive a confirmation response with a
file_idfor tracking
2. Text Processing Phase
Our system automatically handles:- Content Validation: Ensuring text content is properly formatted and accessible
- Format Detection: Identifying markdown, plain text, or structured content
- Text Normalization: Cleaning and standardizing text formatting
3. Intelligent Chunking
- Text is split into semantically meaningful chunks
- Chunk size is optimized for both context preservation and search accuracy
- Overlapping boundaries ensure no information is lost between chunks
- Metadata is preserved and associated with each chunk
4. Embedding Generation
- Each chunk is converted into high-dimensional vector embeddings
- Embeddings capture semantic meaning and context
- Vectors are optimized for similarity search and retrieval
5. Indexing & Database Updates
- Embeddings are stored in our vector database for fast similarity search
- Full-text search indexes are created for keyword-based queries
- Metadata is indexed for filtering and faceted search
- Cross-references are established for related content
6. Quality Assurance
- Automated quality checks ensure processing accuracy
- Content validation verifies text completeness
- Embedding quality is assessed for optimal retrieval performance
Processing Time: Text content is typically processed and searchable within 1-3 minutes. Large text blocks (10,000+ words) may take up to 5 minutes. You can check processing status using the document ID returned in the response.
Default Sub-Tenant Behavior: If you don’t specify a
sub_tenant_id, the text content will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide content that should be accessible across all departments.File ID Management: The system uses a priority-based approach for file ID assignment:
- First Priority: If you provide a
file_idas a direct body parameter, that specific ID will be used- Second Priority: If no direct
file_idis provided, the system checks for afile_idin thedocument_metadataobject- Auto-Generation: If neither source provides a
file_id, the system will automatically generate a unique identifier
Duplicate File ID Behavior
When you upload text content with afile_id that already exists in your tenant:
- Overwrite Behavior: The existing text content with the same
file_idwill be completely replaced with the new content - Processing: The new text content will go through the full processing pipeline (validation, chunking, embedding generation, indexing)
- Search Results: Previous search results and embeddings from the old content will be replaced with the new content
- Idempotency: Uploading the same text content with the same
file_idmultiple times is safe and will result in the same final state
Processing Status & Monitoring
After uploading, you can monitor your text content’s processing status:Immediate Response
Upon successful upload, you’ll receive:Processing States
Your text content will progress through these states:queued: Text content is in the processing queue, waiting to be processedin_progress: Text content is actively being processed (includes validation, chunking, embedding generation, and indexing)success: Text content is fully processed and searchableerrored: Processing encountered an error (rare occurrence)
In-Progress Details: While the status shows
in_progress, the system is actually performing multiple steps: content validation, format detection, intelligent chunking, embedding generation, and database indexing. These happen sequentially but are all part of the single in_progress state.When Your Text is Ready
Once processing is complete, your text content will be:- ✅ Searchable via semantic search and Q&A endpoints
- ✅ Retrievable through our retrieval APIs
- ✅ Available for AI-powered applications
- ✅ Indexed for fast query performance
Error Responses
All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.Authorizations
Body
application/json