I’ve been exploring on website scraping solutions for a while now, and I felt like posting about it after a nice productive weekend on it.
Let’s start off with working with website data, or scraping with Cloudflare’s own solution, the Browser Rendering API.
I’ve been playing around with Cloudflare Browser Rendering for quite a while now, and it’s been over a few weeks since I’ve used it in tools. It’s very good! I love that it supports /markdown
right away, and it’s perfect for stringing workflows together with LLMs in n8n.
I liked it so much that today, I decided to apply the simple flow as a tool call in Open WebUI, and I published it as a Tool that you can add on your own setup:
Cloudflare Browser Rendering Tool • Open WebUI Community
You can also view it as a GitHub Gist here: https://gist.github.com/jerieljan/36cbb9bf71b9061efa122b41c902bdf2
Here’s how it looks in action:
Only the part where I typed the prompt was sped up. The actual Cloudflare portion, and Groq speedily returning Kimi K2 is all in real-time.
Originally, I didn’t want to do this because Open WebUI is capable of taking web page content anyway with #<URL>
when you prompt it, but for some reason, newer versions could not read any of my content so why not put Browser Rendering straight into this.
While doing all that stuff above, I also stumbled on the Jina AI service when I was about to upload my code to the Open WebUI portal. Looks like I wasn’t alone with this problem, and people have discovered another promising alternative to what CBR was doing.
Their Reader API is fascinating, because it is drop dead easy to use! It’s literally just https://r.jina.ai/<your-url>
and you’re immediately getting a decent scrape off a site.
I felt this was still great to use because Cloudflare Browser Rendering has a painful 6 requests per minute
limit on the free tier, and Jina AI’s token-based usage is a decent alternative compared to how other scraping services generally charge for.
jina.ai’s pricing is token-driven. The more complex the sites, the more tokens you’ll burn. A lengthy article in Japanese for example costed me around 12,000 tokens.
To compare with others, they usually do it like a subscription: you pay for a particular tier that you might not even spend all of it, and bumping up a tier isn’t appealing for hobbyists such as myself and they favor American IT business prices, which is NOT a good fit for what I’m willing to pay for
Browserless has a generous free tier, sure, but I can see myself getting blocked easily with 1 conc. browser, and the next tier and up is overkill for what I need.
My third best option besides Jina AI and Cloudflare Browser Rendering would be Scraping Fish.