Working with Files

Clarity agents can analyze, interpret, and search uploaded files, and can also generate new, downloadable files. This guide outlines best practices for preparing files for upload and writing clear instructions to achieve the best results.

Uploading Files

In Clarity there is the possibility to upload files during a conversation and to a Custom Agent’s knowledge source. The following restrictions apply to files uploaded during conversations and to Custom Agent knowledge sources:

  • Total Files: No limit on the total files uploaded to a conversation or Custom Agent
  • Per Prompt: Up to 10 files can be uploaded simultaneously 
  • Individual file size limit: 512MB per file
  • Extracted text content limit: 10,000,000 characters per file

Analysis and Interpretation of Files

Clarity supports a wide range of file types for analysis and interpretation.

Clarity can search within the following file types:

  • Documents: .doc, .docx, .pdf, .pptx, .txt, .md, .tex
  • Data and Spreadsheets: .json
  • Code: .c, .cpp, .cs, .css, .html, .java, .js, .php, .py, .rb, .sh, .ts

Clarity can interpret files using code in the following file types:

  • Documents: .doc, .docx, .pdf, .pptx, .txt, .md, .tex
  • Data and Spreadsheets: .csv, .xlsx, .json, .xml
  • Code: .c, .cpp, .cs, .css, .html, .java, .js, .php, .py, .rb, .sh, .ts
  • Images: .gif, .jpeg, .jpg, .png
  • Archives:  .tar, .zip
Specify the file type in the prompt to yield the best results: “Please use code to summarize the uploaded .xlsx file.”

Best Practices for Formatting Your Files

Whether uploading files to an agent in the Clarity User Portal or adding files to a Custom Agent’s knowledge source, well-structured files help Large Language Models (LLMs) understand the content hierarchy and context, leading to better responses.

Key guidelines for formatting documents:

  • Structure
    • Use clear headings and subheadings
    • Keep paragraphs focused on single topics
    • Use consistent formatting 
    • Include a table of contents in long documents
  • Readability: 
    • Break content into digestible sections with focused paragraphs 
    • Use standard fonts
    • Ensure adequate contrast and left-align the text
  • Tables: 
    • Use the word processor’s built-in table formatting instead of relying on tabs or spaces
    • Include clear column headers
    • Keep tables simple as overly complex tables may not parse well
    • For critical data, consider restating key information in text as well
  • Visuals: 
    • Add captions or alt text to images and charts
    • Place explanatory text near related visuals as LLMs may have limited ability to extract text from complex images
    • For graphs and charts, consider including the underlying data in text or table form
  • Technical: 
    • Use searchable PDFs (not scanned images of text) 
    • Avoid password protection or encryption
    • Use standard formats like .pdf and .docx 
  • Content Organization: 
    • Prioritize the most important information at the beginning of the document
    • Use bullet points and numbered lists appropriately
    • Include all necessary context, as the agent may not have access to other documents
  • Web Links:
    • If your Word document contains hidden hyperlinks, convert it to Markdown (.md) format before uploading it to your Custom Agent or to one of the Agents in Clarity. This conversion makes all hyperlinks visible and accessible.

Key guidelines for formatting spreadsheets:

  • Structure:
    • Use clear and descriptive column headers
  • Clean the Data:
    • Remove any extra titles, notes, or blank rows from the top of the file so that the headers are in the very first row
  • Avoid Hidden Data: 
    • Unhide any data the agent should analyze 
    • Instead of hiding data, consider removing it entirely
  • Keep it Simple: 
    • If the Excel file has multiple sheets but only one needs to be analyzed, consider exporting that single sheet as a CSV file and use that instead

When to Use JSON

Clarity can use CSV or XLSX documents for tabular data. CSV files are the most reliable choice due to their universally compatible format, allowing simpler parsing by LLMs. For data with a complex or nested data structure, JSON may be the better option.

Choose JSON format when: 
  • Data is nested or hierarchical data includes arrays or lists within records.
  • Exact data types need to be preserved (like numbers, text, and Booleans). 
  • Working with API responses or configuration data. 

Creating and Downloading Files

Clarity can generate downloadable files for the following file types:

  • Documents:  .pdf, .docx .pptx .odt .odp, .txt, .md, .rtf, .tex  
  • Data and Spreadsheets: .csv, .ods, .xlsx, .xls 
  • Code: .py, .java, .c, .cpp, .js , .html, .css , sh, .r, .tsv 
  • Archives: .zip, .tar, .gz, .7z 
  • Databases: .db, .json, .sql, .xml 
Specify the desired format in your prompt when asking Clarity to generate a file: “Please generate a PowerPoint presentation using the answer you generated.”