Sample
SDK Examples
- TypeScript
- Python (Sync)
Supported Apps
The following apps are currently supported for app source uploads: File Storage & Cloud Services:drive- Google Drivedropbox- Dropbox Businessdropboxpersonal- Dropbox Personalonedrive- Microsoft OneDrivesharepoint- Microsoft SharePoint
intercom- Intercomsalesforce- Salesforcehubspot- HubSpot
msteams- Microsoft Teamsgmail- Gmailslack- Slackoutlook- Microsoft Outlook
jira- Atlassian Jiraconfluence- Atlassian Confluenceshortcut- Shortcutlinear- Linearasana- Asana
notion- Notiongooglecalendar- Google Calendar
App Source Processing Pipeline
When you upload app sources, each source goes through specialized processing pipelines tailored to the specific app type:1. Immediate Upload & App Detection
- All app sources are immediately accepted and stored securely
- App type is automatically detected (Gmail, Slack, Notion, etc.)
- Each source is routed to its specialized processing pipeline
- You receive a confirmation response with individual
file_ids for tracking
2. App-Specific Processing Phase
Each app source is processed using specialized pipelines:- Gmail: Email parsing, thread reconstruction, attachment handling
- Slack: Message threading, channel context, user mentions
- Notion: Page hierarchy, block structure, database relationships
- Documents: Format-specific parsing (PDF, DOCX, etc.)
- Custom Apps: Configurable parsing based on app metadata
3. Content Extraction & Normalization
- Multi-format Support: Text, HTML, CSV, Markdown, and file attachments
- Context Preservation: Maintaining app-specific context and relationships
- Metadata Enrichment: Extracting app-specific metadata and timestamps
- Content Cleaning: Normalizing content while preserving structure
4. Intelligent Chunking
- App-aware chunking strategies preserve context and relationships
- Thread-based chunking for Gmail and Slack conversations
- Hierarchical chunking for Notion pages and databases
- Metadata is preserved and associated with each chunk
5. Embedding Generation
- Each chunk is converted into high-dimensional vector embeddings
- Embeddings capture semantic meaning and app-specific context
- Vectors are optimized for similarity search and retrieval
- Cross-app relationship embeddings for related content
6. Indexing & Database Updates
- Embeddings are stored in our vector database for fast similarity search
- Full-text search indexes are created for keyword-based queries
- App-specific metadata is indexed for filtering and faceted search
- Cross-references are established between related app sources
7. Quality Assurance
- App-specific quality checks ensure processing accuracy
- Content validation verifies extracted text completeness
- Relationship validation ensures proper context preservation
- Embedding quality is assessed for optimal retrieval performance
Processing Time: App sources are processed in parallel using specialized pipelines. Most sources are fully processed and searchable within 2-5 minutes. Complex sources with multiple attachments may take up to 10 minutes. You can check processing status using the individual document IDs returned in the response.
Recommended: For optimal performance, limit each batch to a maximum of 20 app sources per request. Send multiple batch requests with an interval of 1 second between each request.
File ID Management: When you provide afile_idas a key in thedocument_metadataobject, that specific ID will be used to identify your content. If nofile_idis provided in thedocument_metadata, the system will automatically generate a unique identifier for you. This allows you to maintain consistent references to your content across your application while ensuring every piece of content has a unique identifier.
Duplicate File ID Behavior
When you upload app sources withfile_ids that already exist in your tenant:
- Overwrite Behavior: Each existing app source with a matching
file_idwill be completely replaced with the new source - Processing: Each new app source will go through its specialized processing pipeline independently
- Search Results: Previous search results and embeddings from old app sources will be replaced with the new sources’ content
- Idempotency: Uploading the same app sources with the same
file_ids multiple times is safe and will result in the same final state
Attachments Field Structure
Theattachments field allows you to include additional files, documents, or content alongside your main app source. Each attachment supports multiple content formats and can contain nested structures for complex documents.
Attachment Object Structure
When to Use Each Field
Core Identification Fields:id(optional): Unique identifier for the attachment. If not provided, system generates one automatically.title(optional): Human-readable name for the attachment.url(optional): External URL where the attachment can be accessed.content_type(optional): MIME type of the attachment (e.g., “application/pdf”, “text/plain”).content_url(optional): API endpoint URL for retrieving attachment content.
content.text: Use for plain text content. Best for simple text documents, notes, or extracted text from other formats.content.html_base64: Use for HTML content encoded in base64. Ideal for web pages, rich text documents, or formatted content that needs to preserve HTML structure.content.csv_base64: Use for CSV data encoded in base64. Perfect for tabular data, spreadsheets, or structured data exports.content.markdown: Use for Markdown-formatted content. Great for documentation, README files, or any content that uses Markdown syntax.content.files: Use for binary file attachments as an array of file objects. Each file object should contain at least anameanddatafield (base64 encoded).content.layout: Use for structured document layouts as an array of layout objects. Useful for complex documents with sections, headers, or custom formatting.
misc(optional): Dictionary for storing custom metadata, additional properties, or app-specific information about the attachment.
Content Format Guidelines
For Text Content:Best Practices
- Choose the Right Format: Use the content field that best matches your data type for optimal processing.
- Base64 Encoding: Always encode binary data (HTML, CSV, files) in base64 format.
- File Size Limits: Keep individual attachments under 10MB for optimal processing performance.
- Metadata Usage: Use the
miscfield to store app-specific metadata that might be useful for filtering or organization. - Content Type Specification: Always specify
content_typewhen possible to help with proper content processing.
Error Responses
All endpoints return consistent error responses following the standard format. For detailed error information, see our Error Responses documentation.Authorizations
Body
application/json · SourceModel · object[]
Example:
Example:
Example:
Example:
Example:
Example:
Example: