Introduction
Properly preparing your files before uploading to Clarity ensures Large Language Models generate more accurate and useful results.
Well-structured, clearly formatted files help LLMs better understand your content’s hierarchy, relationships, and context, leading to better responses and fewer errors. Following specific formatting guidelines with different file types will help maximize the value you extract from these tools.
Below, please find best practices for preparing text-based documents and structured data files to ensure your content is properly prepared for LLM consumption and analysis.
Documents
Key formatting tips for documents processed by LLMs:
Text Structure
- Use clear headings and subheadings to create hierarchy.
- Keep paragraphs focused on single topics.
- Use consistent formatting throughout (don’t switch between styles arbitrarily).
- Include a table of contents for longer documents to help the LLM understand the document structure.
Readability
- Avoid dense walls of text—break content into digestible sections.
- Use standard fonts (avoid decorative or handwritten fonts).
- Ensure adequate contrast and readable font sizes.
- Left-align text rather than justified to avoid irregular spacing.
Tables and Data
- Use actual table formatting rather than spacing with tabs.
- Include clear column headers.
- Keep tables simple when possible—overly complex tables may not parse well.
- For critical data, consider restating key information in text as well.
Visual Elements
- Add alt text or captions to images, charts, and diagrams.
- Place explanatory text near related visuals (LLMs may have limited ability to extract text from complex images).
- For graphs/charts, consider including the underlying data in text or table form.
Technical Considerations
- Use searchable PDFs (not scanned images) when possible.
- Avoid password protection or encryption.
- Use standard formats (PDF, DOCX) rather than proprietary formats.
- Ensure proper text layer exists in PDFs.
Content Organization
- Put the most important information first.
- Use bullet points and numbered lists appropriately (but not excessively).
- Include context—don’t assume the LLM has access to other documents.
- Be explicit rather than relying on visual cues alone.
Tabular data
CSV files are a good format for spreadsheet data, but you can also upload tabular data in XLSX formats, and in some instances, JSON will do a better job of preserving data structure, hierarchy, and data types.
CSV (Comma-Separated Values) - This is the most reliable choice. CSV files are simple, universally compatible, and LLMs can parse them effectively. They work well for most tabular data.
XLSX (Excel) – Many LLMs can also read Excel files directly, which is useful if you have multiple sheets or want to preserve formatting.
Key considerations:
- Don’t forget headers: Make sure your first row contains clear column headers. LLMs use these headers to understand your data structure. And keep these column headers close by, as you can use them in your prompts to be clearer about which data you want Clarity to analyze.
- Clean up your data: Remove any extra rows at the top (like titles, notes, or dates) that aren’t part of the actual data table. The cleaner your data, the easier it is to work with.
- Multiple sheets: If you have an Excel file with multiple sheets and only need one, consider exporting just that sheet as a CSV to keep things simple.
- Don’t include hidden data: LLMs struggle with hidden data in spreadsheets. Unhide this data, or if it is truly not necessary for your task, delete it, before uploading a file to Clarity.
Sample CSV structure:
Name, Age, Department, Salary
John Smith, 32, Engineering, 95000
Jane Doe, 28, Marketing, 75000
After you upload a file, you can ask the agent to analyze it, create visualizations, or perform calculations.
What about JSON?
JSON is sometimes the best choice for structured data.
Advantages of JSON:
- Preserves structure: Unlike CSV, JSON can accommodate nested data, arrays, and complex hierarchical relationships.
- Data types: JSON maintains proper data types (numbers, Booleans, null values) without ambiguity.
- No parsing issues: CSV can be tricky with commas, quotes, and special characters—JSON avoids these problems.
- Multiple related datasets: You can include multiple arrays or objects in one file.
Choose JSON when:
- You have nested or hierarchical data.
- Your data includes arrays or lists within records .
- You need to preserve exact data types.
- You’re working with API responses or configuration data.
Example JSON structure:
json
{
“employees”: [
{
“name”: “John Smith”,
“age”: 32,
“department”: “Engineering”,
“skills”: [“Python”, “JavaScript”, “SQL”]
},
{
“name”: “Jane Doe”,
“age”: 28,
“department”: “Marketing”,
“skills”: [“SEO”, “Content Strategy”]
}
]
}