Upload Document

Hit the Try it button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.

Examples

API Request
TypeScript
Python (Sync)

curl -X POST https://api.usecortex.ai/upload-document \
-F "tenant_id=tenant_123" \
-F "upsert=true" \
-F "files=@a.pdf" \
-F "files=@b.pdf" \
-F 'file_metadata=[
{
  "file_id": "doc_a",
  "tenant_metadata": { "dept": "sales" },
  "document_metadata": { "author": "Alice" },
  "relations": false
},
{
  "file_id": "doc_b",
  "tenant_metadata": { "dept": "marketing" },
  "document_metadata": { "author": "Bob" },
  "relations": true
}
]'

import fs from 'fs';

const uploadResult = await client.upload.uploadDocument({
  file: fs.readFileSync("example-file.pdf"),
  tenant_id: "tenant_1234",
  sub_tenant_id: "sub_tenant_4567",
  file_id: "doc_123456",
  tenant_metadata: {},
  document_metadata: {}
});

# Async usage is similar, just use async_client and await
with open("example-file.pdf", 'rb') as file_obj:
    file_data = ("example-file.pdf", file_obj)
    upload_result = client.upload.upload_document(
        tenant_id="tenant_1234",
        sub_tenant_id="sub_tenant_4567",
        file=file_data,
        file_id="doc_123456",
        tenant_metadata={},
        document_metadata={}
    )

Upload documents to your tenant’s knowledge base for processing, chunking, and indexing to enable search and retrieval.

Metadata Parameters

When uploading multiple files, you can provide metadata for each file using the file_metadata parameter. This allows you to associate custom metadata, organize documents, and control processing behavior on a per-file basis.

`file_metadata` Array

The file_metadata parameter accepts a JSON array where each object corresponds to one of the uploaded files. The order of metadata objects should match the order of files in the files parameter. Structure:

[
  {
    "file_id": "string",
    "tenant_metadata": {},
    "document_metadata": {},
    "relations": boolean
  }
]

Metadata Fields

`file_id` (string, optional)

Description: A unique identifier for the document. If not provided, the system will auto-generate one.
Use Case: Use this to reference the document later, enable idempotent uploads, or maintain your own document naming scheme.
Example: "doc_a", "invoice_2024_001", "manual_v2.3"

`tenant_metadata` (object, optional)

Description: Key-value pairs that represent tenant-level metadata. This metadata is shared across all documents within the tenant and is useful for organization-wide filtering and categorization.
Use Case: Store department information, project tags, organizational units, or any tenant-scoped attributes that help organize and filter documents.

Example:

{
  "dept": "sales",
  "project": "Q4_2024",
  "region": "us-west"
}

Note: This metadata is indexed and can be used for filtering in search queries.

`document_metadata` (object, optional)

Description: Key-value pairs that represent document-specific metadata. This metadata is unique to each document and provides context about the document itself.
Use Case: Store document-specific information like author, creation date, document type, version, or any attributes that describe the individual document.

Example:

{
  "author": "Alice",
  "created_date": "2024-01-15",
  "document_type": "invoice",
  "version": "1.0"
}

Note: This metadata is indexed and can be used for filtering in search queries.

`relations` (boolean, optional)

Description: Controls whether the system should extract and index relationships between entities in the document. When set to true, the system will analyze the document for entity relationships and create a knowledge graph.
Use Case: Enable relationship extraction for documents where understanding connections between entities (people, places, concepts) is important for your use case.
Default: false
Example:
- true: Extract relationships for documents like organizational charts, knowledge bases, or interconnected documentation
- false: Skip relationship extraction for simple documents or when graph features aren’t needed

Metadata Ordering: The order of objects in the file_metadata array should match the order of files in the files parameter. The first metadata object applies to the first file, the second to the second file, and so on.

Metadata Indexing: Both tenant_metadata and document_metadata are indexed and can be used to filter search results. This enables powerful query capabilities like “find all sales documents from Q4” or “retrieve documents authored by Alice”.

Supported file formats

Complete Reference: For a comprehensive list of all supported file formats with detailed information, see our Supported File Formats documentation.

Unsupported File Formats: If you attempt to upload a file format that is not supported, you will receive an error response with status code 400 and the message: "Unsupported file format: [filename]. Please check our supported file formats documentation." Ensure your files are in one of the supported formats listed above before uploading.

Document Processing Pipeline

When you upload a document, it goes through a comprehensive processing pipeline designed to make your content searchable and retrievable:

1. Immediate Upload & Queue

Your document is immediately accepted and stored securely
It’s added to our processing queue for background processing
You receive a confirmation response with a source_id for tracking

2. Processing Phase

Our system automatically handles:

Content Extraction: Extracting text from various formats (PDF, DOCX, TXT, etc.)
Document Parsing: Understanding document structure, headers, and formatting
Text Cleaning: Removing formatting artifacts and normalizing content

3. Intelligent Chunking

Documents are split into semantically meaningful chunks
Chunk size is optimized for both context preservation and search accuracy
Overlapping boundaries ensure no information is lost between chunks
Metadata is preserved and associated with each chunk

4. Embedding Generation

Each chunk is converted into high-dimensional vector embeddings
Embeddings capture semantic meaning and context
Vectors are optimized for similarity search and retrieval

5. Indexing & Database Updates

Embeddings are stored in our vector database for fast similarity search
Full-text search indexes are created for keyword-based queries
Metadata is indexed for filtering and faceted search
Cross-references are established for related documents

6. Quality Assurance

Automated quality checks ensure processing accuracy
Content validation verifies extracted text completeness
Embedding quality is assessed for optimal retrieval performance

Processing Time: Most documents are fully processed and searchable within 1-5 minutes. Larger documents (100+ pages) may take up to 15 minutes. You can check processing status using the document ID returned in the response.

Default Sub-Tenant Behavior: If you don’t specify a sub_tenant_id, the document will be uploaded to the default sub-tenant created when your tenant was set up. This is perfect for organization-wide documents that should be accessible across all departments.

File ID Management: The system uses a priority-based approach for file ID assignment:

First Priority: If you provide a file_id in the file_metadata

Auto-Generation: If neither source provides a file_id, the system will automatically generate a unique identifier

Duplicate File ID Behavior

When you upload a document with a file_id that already exists in your tenant:

Overwrite Behavior: The existing document with the same file_id will be completely replaced with the new document
Processing: The new document will go through the full processing pipeline (content extraction, chunking, embedding generation, indexing)
Search Results: Previous search results and embeddings from the old document will be replaced with the new document’s content
Idempotency: Uploading the same document with the same file_id multiple times is safe and will result in the same final state

Important: When overwriting an existing document, all previous chunks, embeddings, and search indexes associated with that file_id will be permanently removed and replaced. This action cannot be undone.

Processing Status & Monitoring

After uploading, you can monitor your document’s processing status:

Immediate Response

Upon successful upload, you’ll receive:

{
  "filename": "file_abc.pdf",
  "source_id": "doc_123456",
  "status": "queued"
}

Processing States

Your document will progress through these states:

queued: Document is in the processing queue, waiting to be processed
in_progress: Document is actively being processed (includes content extraction, chunking, embedding generation, and indexing)
success: Document is fully processed and searchable
errored: Processing encountered an error (rare occurrence)

In-Progress Details: While the status shows in_progress, the system is actually performing multiple steps: content extraction, document parsing, intelligent chunking, embedding generation, and database indexing. These happen sequentially but are all part of the single in_progress state.

When Your Document is Ready

Once processing is complete, your document will be:

✅ Searchable via semantic search and Q&A endpoints
✅ Available for AI-powered applications
✅ Indexed for fast query performance

Important: Don’t attempt to search or retrieve your document immediately after upload. Wait for processing to complete (typically 1-5 minutes) to ensure optimal results.

Best Practices

Document Preparation

File Size: Documents up to 50MB are processed efficiently
Content Quality: Clear, well-structured documents produce better embeddings
Metadata: Include rich metadata for better filtering and organization

Processing Optimization

Batch Uploads: For multiple documents, consider using our batch upload endpoint
Metadata Consistency: Use consistent metadata schemas across your organization
File Naming: Descriptive filenames help with document identification

Troubleshooting

Document Not Appearing in Search?

Wait 5-10 minutes for processing to complete
Check if the document status is errored (rare occurrence)
Verify your search query and filters

Slow Processing?

Large documents (100+ pages) take longer to process
Complex formatting may require additional processing time
High system load may temporarily slow processing

Processing Failures?

If status shows errored, ensure your document isn’t corrupted or password-protected
Check that the file format is supported (see Supported File Formats section above)
Verify your API key has sufficient permissions
For unsupported formats, you’ll receive a 400 error with the message: "Unsupported file format: [filename]. Please check our supported file formats documentation."

Need Help? If a document fails to process or you’re experiencing issues, contact our support team with the file_id for assistance.

Error Responses

All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data

files

file[]

required

Files to be uploaded

tenant_id

string

required

Unique identifier for the tenant/organization

Example:

"tenant_1234"

sub_tenant_id

string

default:""

Optional sub-tenant identifier used to organize data within a tenant. If omitted, the default sub-tenant created during tenant setup will be used.

Example:

"sub_tenant_4567"

upsert

boolean

default:true

Example:

true

file_metadata

string | null

JSON Array of file metadata objects

Response

Successful Response

success

boolean

default:true

Example:

true

message

string

default:Upload initiated successfully

results

SourceUploadResultItem · object[]

List of upload results for each source.

Show child attributes

Example:

[]

success_count

integer

default:0

Number of sources successfully queued.

Example:

1

failed_count

integer

default:0

Number of sources that failed to upload.

Example:

1

API Documentation

Tenant Management

Knowledge Ingestion

Query & Retrieval

Knowledge Management

Embeddings

User Memories

Examples

Metadata Parameters

`file_metadata` Array

Metadata Fields

`file_id` (string, optional)

`tenant_metadata` (object, optional)

`document_metadata` (object, optional)

`relations` (boolean, optional)

Supported file formats

Document Processing Pipeline

1. Immediate Upload & Queue

2. Processing Phase

3. Intelligent Chunking

4. Embedding Generation

5. Indexing & Database Updates

6. Quality Assurance

Duplicate File ID Behavior

Processing Status & Monitoring

Immediate Response

Processing States

When Your Document is Ready

Best Practices

Document Preparation

Processing Optimization

Troubleshooting

Error Responses

Authorizations

Body

Response

API Documentation

Tenant Management

Knowledge Ingestion

Query & Retrieval

Knowledge Management

Embeddings

User Memories

​Examples

​Metadata Parameters

​file_metadata Array

​Metadata Fields

​file_id (string, optional)

​tenant_metadata (object, optional)

​document_metadata (object, optional)

​relations (boolean, optional)

​Supported file formats

​Document Processing Pipeline

​1. Immediate Upload & Queue

​2. Processing Phase

​3. Intelligent Chunking

​4. Embedding Generation

​5. Indexing & Database Updates

​6. Quality Assurance

​Duplicate File ID Behavior

​Processing Status & Monitoring

​Immediate Response

​Processing States

​When Your Document is Ready

​Best Practices

​Document Preparation

​Processing Optimization

​Troubleshooting

​Error Responses

Authorizations

Body

Response

Examples

Metadata Parameters

`file_metadata` Array

Metadata Fields

`file_id` (string, optional)

`tenant_metadata` (object, optional)

`document_metadata` (object, optional)

`relations` (boolean, optional)

Supported file formats

Document Processing Pipeline

1. Immediate Upload & Queue

2. Processing Phase

3. Intelligent Chunking

4. Embedding Generation

5. Indexing & Database Updates

6. Quality Assurance

Duplicate File ID Behavior

Processing Status & Monitoring

Immediate Response

Processing States

When Your Document is Ready

Best Practices

Document Preparation

Processing Optimization

Troubleshooting

Error Responses