Batch Upload

Hit the Try it button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.

Examples

API Request
TypeScript
Python (Sync)

curl --request POST \
  --url 'https://api.usecortex.ai/upload/batch_upload?tenant_id=tenant_1234&sub_tenant_id=sub_tenant_4567' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: multipart/form-data' \
  --form files=@example-file \
  --form 'file_ids=["doc_123"]' \
  --form 'tenant_metadata={}' \
  --form 'document_metadata={}'

Upload multiple documents simultaneously to your tenant’s knowledge base. All documents will be processed, chunked, and indexed for search and retrieval.

Batch Processing Pipeline

When you upload multiple documents, each document goes through our comprehensive processing pipeline in parallel:

1. Immediate Upload & Queue

All documents are immediately accepted and stored securely
Each document is added to our processing queue for background processing
You receive a confirmation response with individual file_ids for tracking each file

2. Parallel Processing Phase

Each document is processed independently with:

Content Extraction: Extracting text from various supported formats (see Supported File Formats section below)
Document Parsing: Understanding document structure, headers, and formatting
Text Cleaning: Removing formatting artifacts and normalizing content

3. Intelligent Chunking

Each document is split into semantically meaningful chunks
Chunk size is optimized for both context preservation and search accuracy
Overlapping boundaries ensure no information is lost between chunks
Metadata is preserved and associated with each chunk

4. Embedding Generation

Each chunk is converted into high-dimensional vector embeddings
Embeddings capture semantic meaning and context
Vectors are optimized for similarity search and retrieval

5. Indexing & Database Updates

Embeddings are stored in our vector database for fast similarity search
Full-text search indexes are created for keyword-based queries
Metadata is indexed for filtering and faceted search
Cross-references are established between related documents

6. Quality Assurance

Automated quality checks ensure processing accuracy for each document
Content validation verifies extracted text completeness
Embedding quality is assessed for optimal retrieval performance

Processing Time: Batch uploads are processed in parallel. Most documents are fully processed and searchable within 2-5 minutes. Larger documents (100+ pages) may take up to 15 minutes. You can check processing status using the individual document IDs returned in the response.

Default Sub-Tenant Behavior: If you don’t specify a sub_tenant_id, all documents will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide document batches that should be accessible across all departments.

Recommended: For optimal performance, limit each batch to a maximum of 20 sources per request. Send multiple batch requests with an interval of 1 second between each request.

File ID Management: The system uses a priority-based approach for file ID assignment:

First Priority: If you provide a file_id as a direct body parameter, that specific ID will be used

Second Priority: If no direct file_id is provided, the system checks for a file_id in the document_metadata object

Auto-Generation: If neither source provides a file_id, the system will automatically generate a unique identifier

Duplicate File ID Behavior

When you upload documents with file_ids that already exist in your tenant:

Overwrite Behavior: Each existing document with a matching file_id will be completely replaced with the new document
Processing: Each new document will go through the full processing pipeline independently
Search Results: Previous search results and embeddings from old documents will be replaced with the new documents’ content
Idempotency: Uploading the same documents with the same file_ids multiple times is safe and will result in the same final state

Important: When overwriting existing documents, all previous chunks, embeddings, and search indexes associated with those file_ids will be permanently removed and replaced. This action cannot be undone.

Example Success Response for Duplicate File IDs in Batch:

{
  "message": "Batch upload successful. Documents with existing file_ids have been overwritten.",
  "document_ids": ["doc_123456", "doc_789012", "doc_345678", "doc_901234"],
  "overwritten_file_ids": ["doc_123456", "doc_789012"],
  "status": "success"
}

Supported File Formats

Cortex supports a comprehensive range of file formats for document processing. Files are automatically parsed and their content extracted for indexing and search.

Complete Reference: For a comprehensive list of all supported file formats with detailed information, see our Supported File Formats documentation.

Unsupported File Formats: If you attempt to upload a file format that is not supported, you will receive an error response with status code 400 and the message: "Unsupported file format: [filename]. Please check our supported file formats documentation." Ensure your files are in one of the supported formats listed above before uploading.

Best Practices

Document Preparation

File Size: Documents up to 50MB are processed efficiently
Content Quality: Clear, well-structured documents produce better embeddings
Metadata: Include rich metadata for better filtering and organization

Processing Optimization

Batch Size: Limit each batch to a maximum of 20 sources per request
Request Intervals: Send multiple batch requests with an interval of 1 second between each request
Metadata Consistency: Use consistent metadata schemas across your organization
File Naming: Descriptive filenames help with document identification

Troubleshooting

Documents Not Appearing in Search?

Wait 5-10 minutes for processing to complete
Check if any document status is errored (rare occurrence)
Verify your search query and filters

Slow Processing?

Large documents (100+ pages) take longer to process
Complex formatting may require additional processing time
High system load may temporarily slow processing

Processing Failures?

If status shows errored, ensure your documents aren’t corrupted or password-protected
Check that the file format is supported (see Supported File Formats section above)
Verify your API key has sufficient permissions
For unsupported formats, you’ll receive a 400 error with the message: "Unsupported file format: [filename]. Please check our supported file formats documentation."

Need Help? If documents fail to process or you’re experiencing issues, contact our support team with the file_ids for assistance.

Error Responses

All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.

Authorizations

Authorization

string

header

required

Query Parameters

tenant_id

string

required

Example:

sub_tenant_id

string

default:""

Example:

Body

multipart/form-data

files

file[]

required

file_ids

string | null

tenant_metadata

string | null

default:{}

document_metadata

string | null

default:{}

Response

uploaded

FileUploadResult · object[]

required

Show child attributes

Example:

message

string

required

success

boolean

default:true

Example:

API Documentation

Tenant Management

Knowledge Ingestion

Query & Retrieval

Knowledge Management

Embeddings

User Memories

Examples

Batch Processing Pipeline

1. Immediate Upload & Queue

2. Parallel Processing Phase

3. Intelligent Chunking

4. Embedding Generation

5. Indexing & Database Updates

6. Quality Assurance

Duplicate File ID Behavior

Supported File Formats

Best Practices

Document Preparation

Processing Optimization

Troubleshooting

Error Responses

Authorizations

Query Parameters

Body

Response

API Documentation

Tenant Management

Knowledge Ingestion

Query & Retrieval

Knowledge Management

Embeddings

User Memories

​Examples

​Batch Processing Pipeline

​1. Immediate Upload & Queue

​2. Parallel Processing Phase

​3. Intelligent Chunking

​4. Embedding Generation

​5. Indexing & Database Updates

​6. Quality Assurance

​Duplicate File ID Behavior

​Supported File Formats

​Best Practices

​Document Preparation

​Processing Optimization

​Troubleshooting

​Error Responses

Authorizations

Query Parameters

Body

Response

Examples

Batch Processing Pipeline

1. Immediate Upload & Queue

2. Parallel Processing Phase

3. Intelligent Chunking

4. Embedding Generation

5. Indexing & Database Updates

6. Quality Assurance

Duplicate File ID Behavior

Supported File Formats

Best Practices

Document Preparation

Processing Optimization

Troubleshooting

Error Responses