Analyse AI Bot Traffic with AWS Athena and Cloudfront logs

AWS Athena CloudFront Data Analytics AI llmstxt Tech Infrastructure Data-Driven Serverless Analytics Log Analysis

🔍 From Raw Logs to Business Intelligence: How We Track the Rise of AI Bot Traffic at Certible

The data doesn’t lie - AI is everywhere, and we have the logs to prove it!

The Challenge: With increasing bot traffic hitting our llms.txt endpoint, we needed a scalable way to analyze CloudFront access patterns without breaking the bank on log analysis costs.

Our Solution: AWS Athena + Partitioned CloudFront Logs = Cost-Effective Analytics Magic

Here’s how we architected it:

The Results Speak for Themselves:

Our data shows a steady increase in llms.txt requests throughout 2025, confirming what we all suspected - AI agents are actively discovering and indexing content at an unprecedented rate.

Sample query that reveals the trend:

WITH monthly_accesses AS (
  SELECT month, COUNT(*) AS access_count
  FROM default.cloudfront_logs
  WHERE uri LIKE '/llms.txt%' 
    AND domain = 'www.certible.com'
    AND year = '2025'
  GROUP BY month
)
SELECT month, access_count,
  ROUND((access_count - LAG(access_count) OVER (ORDER BY month)) 
    * 100.0 / LAG(access_count) OVER (ORDER BY month), 2) AS percent_change
FROM monthly_accesses;

Key Takeaways:

This infrastructure doesn’t just give us cool charts - it helps us understand how AI is interacting with our platform and optimize accordingly.