Simon Willison
Simon Willison
@simonw
Apr 6 16 days ago 5 tweets Read on X
AI Summary

I'm looking for OCR models that can "guess" partial words and aren't restricted by safety filters or content policies, especially useful for journalism. Big models like GPT-4o and Google Gemini sometimes refuse to extract text, which isn't ideal. Microsoft's Kosmos-2.5 looks promising but may generate hallucinations. I want reliable, uncensored OCR tools.

Any OCR models out there with LLM-like capabilities - like the ability to "guess" partial words based on context - but that don't follow extra instructions or apply safety filters of any kind?

I want reliable OCR that can't be prompt injected and that won't sometimes refuse text

Multimodal models like GPT-4o and Claude 3 Opus and Google Gemini seem great for OCR at first, but they're no good if they're going to refuse to return text because the content disagrees with their content policies, or they skip text labeled "ignore this text:" in the document!

This is not a theoretical concern: here's Claude 3 Opus refusing to extract JSON from a campaign finance report document because "... that would involve extracting and structuring private details about the individual"!

Tweet image 1

This is currently my strongest argument in favor of "uncensored" models: sometimes you just want to be able to do something useful - like OCR - against an arbitrary document

Especially relevant to journalism, which often involves handling content from unsavory sources!

Anyone tried Microsoft's Kosmos-2.5?

Looks promising: "a multimodal literate model for machine reading of text-intensive images" - but the README does warn "Since this is a generative model, there is a risk of hallucination during the generation process"

@pagilgukey
@simonw sounds like you want something like https://t.co/d6KLcOH5Rs

Missing some Tweet in this thread? You can try to Update

More Threads by @simonw

3 tweets • 3 days ago
Read Thread
I'm looking for OCR models that can "guess" partial words and aren't restricted by safety filters or content policies, e...
5 tweets • 16 days ago
Read Thread
8 tweets • 17 days ago
Read Thread
2 tweets • 2 months ago
Read Thread
2 tweets • 3 months ago
Read Thread

Unroll Another Thread

Convert any Twitter threads to an easy-to-read article instantly

Have you tried our Twitter bot?

You can now unroll any thread without leaving Twitter/X. Here's how to use our Twitter bot to do it.

  • Give us a follow on Twitter. follow us
  • Drop a comment, mentioning us @unrollnow on the thread you want to Unroll.
  • Wait For Some Time, We will reply to your comment with Unroll Link.
UnrollNow Twitter Bot
Modal Image
0:00 / 0:00