🔍 From Raw Logs to Business Intelligence: How We Track the Rise of AI Bot Traffic at Certible
The data doesn’t lie - AI is everywhere, and we have the logs to prove it!
The Challenge: With increasing bot traffic hitting our llms.txt endpoint, we needed a scalable way to analyze CloudFront access patterns without breaking the bank on log analysis costs.
Our Solution: AWS Athena + Partitioned CloudFront Logs = Cost-Effective Analytics Magic
Here’s how we architected it:
- Partitioned CloudFront logs by year/month/day for optimal query performance
- Athena for serverless SQL queries - pay only for what you scan
- Time-series analysis to track monthly growth patterns
The Results Speak for Themselves:
Our data shows a steady increase in llms.txt requests throughout 2025, confirming what we all suspected - AI agents are actively discovering and indexing content at an unprecedented rate.
Sample query that reveals the trend:
WITH monthly_accesses AS (
SELECT month, COUNT(*) AS access_count
FROM default.cloudfront_logs
WHERE uri LIKE '/llms.txt%'
AND domain = 'www.certible.com'
AND year = '2025'
GROUP BY month
)
SELECT month, access_count,
ROUND((access_count - LAG(access_count) OVER (ORDER BY month))
* 100.0 / LAG(access_count) OVER (ORDER BY month), 2) AS percent_change
FROM monthly_accesses;
Key Takeaways:
- ✅ Partitioning saves money - scan only relevant data
- ✅ Athena scales effortlessly - from GBs to TBs of logs
- ✅ Business insights emerge from infrastructure data
- ✅ AI adoption is measurable through access patterns
This infrastructure doesn’t just give us cool charts - it helps us understand how AI is interacting with our platform and optimize accordingly.