A simple worker for testing Cloudflare's AI models with open source LLMs.
Get information about supported models and usage examples.
Chat with AI models. Send JSON with:
model (optional): Model name, defaults to llama-3.1-8b-instructmessages (required): Array of chat messagesmax_tokens (optional): Max tokens, defaults to 256 (capped at 512 for free tier)curl -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "@cf/meta/llama-3.1-8b-instruct",
"messages": [
{"role": "user", "content": "Hello! Tell me about yourself."}
],
"max_tokens": 100
}' \
https://your-worker.your-subdomain.workers.dev/chat
fetch('/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: '@cf/microsoft/phi-2',
messages: [
{ role: 'user', content: 'Explain quantum computing in simple terms' }
],
max_tokens: 200
})
}).then(r => r.json()).then(console.log);
phi-2 and tinyllama use fewer resourcesmax_tokens low to conserve quota