Upload App Sources

Hit the Try it button to try this API now in our playground. It’s the best way to check the full request and response in one place, customize your parameters, and generate ready-to-use code snippets.

Sample

curl --request POST \
  --url 'https://api.usecortex.ai/upload/upload_app_sources?tenant_id=tenant_1234&sub_tenant_id=sub_tenant_4567' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "id": "", "title": "", "type": "", "description": "", "note": "", "url": "", "timestamp": "",
    "content": {
      "text": "<string>", "html_base64": "<string>", "csv_base64": "<string>", "markdown": "<string>", "files": [{}], "layout": [{}]
    },
    "tenant_metadata": {},
    "document_metadata": {},
    "meta": {},
    "attachments": [
      {
        "id": "",
        "url": "",
        "title": "",
        "content_type": "",
        "content_url": "",
        "misc": {},
        "content": {
          "text": "<string>",
          "html_base64": "<string>",
          "csv_base64": "<string>",
          "markdown": "<string>",
          "files": [
            {}
          ],
          "layout": [
            {}
          ]
        }
      }
    ]
  }
]'

SDK Examples

TypeScript
Python (Sync)

const result = await client.upload.uploadAppSources({
  tenant_id: "tenant_1234",
  sub_tenant_id: "sub_tenant_4567",
  body: [
    {
      id: "user-guide-1",
      title: "Feature X Guide",
      content: { text: "How to use feature X" },
      document_metadata: { source: "database" }
    },
    {
      id: "user-guide-2", 
      title: "Feature Y Guide",
      content: { text: "How to use feature Y" },
      document_metadata: { source: "api" }
    }
  ]
});

Works similar to the upload endpoint but is specifically designed to upload multiple app sources (e.g., Gmail, Slack, Notion) in a single request for processing and indexing. Each app upload is handled using specialized pipelines inside Cortex and can include various content types with rich metadata.

Supported Apps

The following apps are currently supported for app source uploads: File Storage & Cloud Services:

drive - Google Drive
dropbox - Dropbox Business
dropboxpersonal - Dropbox Personal
onedrive - Microsoft OneDrive
sharepoint - Microsoft SharePoint

CRM & Sales:

intercom - Intercom
salesforce - Salesforce
hubspot - HubSpot

Communication & Collaboration:

msteams - Microsoft Teams
gmail - Gmail
slack - Slack
outlook - Microsoft Outlook

Project Management:

jira - Atlassian Jira
confluence - Atlassian Confluence
shortcut - Shortcut
linear - Linear
asana - Asana

Productivity & Organization:

notion - Notion
googlecalendar - Google Calendar

App Source Processing Pipeline

When you upload app sources, each source goes through specialized processing pipelines tailored to the specific app type:

1. Immediate Upload & App Detection

All app sources are immediately accepted and stored securely
App type is automatically detected (Gmail, Slack, Notion, etc.)
Each source is routed to its specialized processing pipeline
You receive a confirmation response with individual file_ids for tracking

2. App-Specific Processing Phase

Each app source is processed using specialized pipelines:

Gmail: Email parsing, thread reconstruction, attachment handling
Slack: Message threading, channel context, user mentions
Notion: Page hierarchy, block structure, database relationships
Documents: Format-specific parsing (PDF, DOCX, etc.)
Custom Apps: Configurable parsing based on app metadata

3. Content Extraction & Normalization

Multi-format Support: Text, HTML, CSV, Markdown, and file attachments
Context Preservation: Maintaining app-specific context and relationships
Metadata Enrichment: Extracting app-specific metadata and timestamps
Content Cleaning: Normalizing content while preserving structure

4. Intelligent Chunking

App-aware chunking strategies preserve context and relationships
Thread-based chunking for Gmail and Slack conversations
Hierarchical chunking for Notion pages and databases
Metadata is preserved and associated with each chunk

5. Embedding Generation

Each chunk is converted into high-dimensional vector embeddings
Embeddings capture semantic meaning and app-specific context
Vectors are optimized for similarity search and retrieval
Cross-app relationship embeddings for related content

6. Indexing & Database Updates

Embeddings are stored in our vector database for fast similarity search
Full-text search indexes are created for keyword-based queries
App-specific metadata is indexed for filtering and faceted search
Cross-references are established between related app sources

7. Quality Assurance

App-specific quality checks ensure processing accuracy
Content validation verifies extracted text completeness
Relationship validation ensures proper context preservation
Embedding quality is assessed for optimal retrieval performance

Processing Time: App sources are processed in parallel using specialized pipelines. Most sources are fully processed and searchable within 2-5 minutes. Complex sources with multiple attachments may take up to 10 minutes. You can check processing status using the individual document IDs returned in the response.

Recommended: For optimal performance, limit each batch to a maximum of 20 app sources per request. Send multiple batch requests with an interval of 1 second between each request.

File ID Management: When you provide a file_id as a key in the document_metadata object, that specific ID will be used to identify your content. If no file_id is provided in the document_metadata, the system will automatically generate a unique identifier for you. This allows you to maintain consistent references to your content across your application while ensuring every piece of content has a unique identifier.

Duplicate File ID Behavior

When you upload app sources with file_ids that already exist in your tenant:

Overwrite Behavior: Each existing app source with a matching file_id will be completely replaced with the new source
Processing: Each new app source will go through its specialized processing pipeline independently
Search Results: Previous search results and embeddings from old app sources will be replaced with the new sources’ content
Idempotency: Uploading the same app sources with the same file_ids multiple times is safe and will result in the same final state

Important: When overwriting existing app sources, all previous chunks, embeddings, and search indexes associated with those file_ids will be permanently removed and replaced. This action cannot be undone.

Example Success Response for Duplicate File IDs in App Upload:

{
  "message": "App sources uploaded successfully. Sources with existing file_ids have been overwritten.",
  "document_ids": ["gmail_123456", "slack_789012", "notion_345678", "drive_901234"],
  "overwritten_file_ids": ["gmail_123456", "slack_789012"],
  "status": "success"
}

Attachments Field Structure

The attachments field allows you to include additional files, documents, or content alongside your main app source. Each attachment supports multiple content formats and can contain nested structures for complex documents.

Attachment Object Structure

{
  "attachments": [
    {
      "id": "unique_attachment_id",
      "url": "https://example.com/document.pdf",
      "title": "Document Title",
      "content_type": "application/pdf",
      "content_url": "https://api.example.com/content/123",
      "misc": {
        "custom_field": "value"
      },
      "content": {
        "text": "Plain text content",
        "html_base64": "base64_encoded_html",
        "csv_base64": "base64_encoded_csv",
        "markdown": "# Markdown content",
        "files": [{"name": "file.pdf", "data": "base64_data"}],
        "layout": [{"type": "section", "content": "..."}]
      }
    }
  ]
}

When to Use Each Field

Core Identification Fields:

id (optional): Unique identifier for the attachment. If not provided, system generates one automatically.
title (optional): Human-readable name for the attachment.
url (optional): External URL where the attachment can be accessed.
content_type (optional): MIME type of the attachment (e.g., “application/pdf”, “text/plain”).
content_url (optional): API endpoint URL for retrieving attachment content.

Content Storage Fields: Use these fields to store different types of content directly in the attachment:

content.text: Use for plain text content. Best for simple text documents, notes, or extracted text from other formats.
content.html_base64: Use for HTML content encoded in base64. Ideal for web pages, rich text documents, or formatted content that needs to preserve HTML structure.
content.csv_base64: Use for CSV data encoded in base64. Perfect for tabular data, spreadsheets, or structured data exports.
content.markdown: Use for Markdown-formatted content. Great for documentation, README files, or any content that uses Markdown syntax.
content.files: Use for binary file attachments as an array of file objects. Each file object should contain at least a name and data field (base64 encoded).
content.layout: Use for structured document layouts as an array of layout objects. Useful for complex documents with sections, headers, or custom formatting.

Metadata Field:

misc (optional): Dictionary for storing custom metadata, additional properties, or app-specific information about the attachment.

Content Format Guidelines

For Text Content:

{
  "content": {
    "text": "This is plain text content that will be processed and indexed."
  }
}

For HTML Content:

{
  "content": {
    "html_base64": "PGgxPkhlbGxvIFdvcmxkPC9oMT4="
  }
}

For CSV Data:

{
  "content": {
    "csv_base64": "TmFtZSxBbW91bnQKSm9obiwxMDAKSmFuZSwyMDA="
  }
}

For Markdown:

{
  "content": {
    "markdown": "# Document Title\n\nThis is **markdown** content with formatting."
  }
}

For File Attachments:

{
  "content": {
    "files": [
      {
        "name": "document.pdf",
        "data": "JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9NZWRpYUJveCBbMCAwIDU5NSA4NDJdCi9SZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDIgMCBSCj4+Cj4+Ci9Db250ZW50cyA0IDAgUgo+PgplbmRvYmoK..."
      }
    ]
  }
}

Best Practices

Choose the Right Format: Use the content field that best matches your data type for optimal processing.
Base64 Encoding: Always encode binary data (HTML, CSV, files) in base64 format.
File Size Limits: Keep individual attachments under 10MB for optimal processing performance.
Metadata Usage: Use the misc field to store app-specific metadata that might be useful for filtering or organization.
Content Type Specification: Always specify content_type when possible to help with proper content processing.

Error Responses

All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.

Authorizations

Authorization

string

header

required

Query Parameters

tenant_id

string

required

Example:

sub_tenant_id

string

default:""

Example:

Body

application/json · SourceModel · object[]

string

default:""

Example:

title

string

default:""

Example:

type

string

default:""

Example:

description

string

default:""

Example:

note

string

default:""

Example:

url

string

default:""

Example:

timestamp

string

default:""

Example:

content

object

Show child attributes

tenant_metadata

object

document_metadata

object

Response

uploaded

FileUploadResult · object[]

required

Show child attributes

Example:

message

string

required

success

boolean

default:true

Example:

API Documentation

Tenant Management

Knowledge Ingestion

Query & Retrieval

Knowledge Management

Embeddings

User Memories

Sample

SDK Examples

Supported Apps

App Source Processing Pipeline

1. Immediate Upload & App Detection

2. App-Specific Processing Phase

3. Content Extraction & Normalization

4. Intelligent Chunking

5. Embedding Generation

6. Indexing & Database Updates

7. Quality Assurance

Duplicate File ID Behavior

Attachments Field Structure

Attachment Object Structure

When to Use Each Field

Content Format Guidelines

Best Practices

Error Responses

Authorizations

Query Parameters

Body

Response

API Documentation

Tenant Management

Knowledge Ingestion

Query & Retrieval

Knowledge Management

Embeddings

User Memories

​Sample

​SDK Examples

​Supported Apps

​App Source Processing Pipeline

​1. Immediate Upload & App Detection

​2. App-Specific Processing Phase

​3. Content Extraction & Normalization

​4. Intelligent Chunking

​5. Embedding Generation

​6. Indexing & Database Updates

​7. Quality Assurance

​Duplicate File ID Behavior

​Attachments Field Structure

​Attachment Object Structure

​When to Use Each Field

​Content Format Guidelines

​Best Practices

​Error Responses

Authorizations

Query Parameters

Body

Response

Sample

SDK Examples

Supported Apps

App Source Processing Pipeline

1. Immediate Upload & App Detection

2. App-Specific Processing Phase

3. Content Extraction & Normalization

4. Intelligent Chunking

5. Embedding Generation

6. Indexing & Database Updates

7. Quality Assurance

Duplicate File ID Behavior

Attachments Field Structure

Attachment Object Structure

When to Use Each Field

Content Format Guidelines

Best Practices

Error Responses