GLM-OCR: Accurate × Fast × Comprehensive
- coder543 - 27269 sekunder sedanThere are a bunch of new OCR models.
I’ve also heard very good things about these two in particular:
- LightOnOCR-2-1B: https://huggingface.co/lightonai/LightOnOCR-2-1B
- PaddleOCR-VL-1.5: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5
The OCR leaderboards I’ve seen leave a lot to be desired.
With the rapid release of so many of these models, I wish there were a better way to know which ones are actually the best.
I also feel like most/all of these models don’t handle charts, other than to maybe include a link to a cropped image. It would be nice for the OCR model to also convert charts into markdown tables, but this is obviously challenging.
- alaanor - 23799 sekunder sedanThere was so many OCR models released in the past few months, all VLM models and yet none of them handle Korean well. Every time I try with a random screenshot (not a A4 document) they just fail at a "simple" task. And funnily enough Qwen3 8B VL is the best model that usually get it right (although I couldn't get the bbox quite well). Even more funny, whatever is running on an iphone locally on cpu is insanely good, same with google's OCR api. I don't know why we don't get more of the traditional OCR stuff. Paddlepaddle v5 is the closest I could find. At this point, I feel like I might be doing something wrong with those VLMs.
- aliljet - 27649 sekunder sedanThis is actually the thing I really desperately need. I'm routinely analyzing contracts that were faxed to me, scanned with monstrously poor resolution, wet signed, all kinds of shit. The big LLM providers choke on this raw input and I burn up the entire context window for 30 pages of text. Understandable evals of the quality of these OCR systems (which are moving wicked fast) would be helpful...
And here's the kicker. I can't afford mistakes. Missing a single character or misinterpreting it could be catastrophic. 4 units vacant? 10 days to respond? Signature missing? Incredibly critical things. I can't find an eval that gives me confidence around this.
- mikae1 - 20059 sekunder sedanText me back when there's a working PDF to EPUB conversion tool. I've been waiting (and searching for one) long enough. :D
EDIT: https://github.com/overcuriousity/pdf2epub looks interesting.
- surfacedamage - 8055 sekunder sedanThis might be a niche question, but does glm-ocr (or other libraries) have the ability to extract/interpret QR code data?
- ks2048 - 22018 sekunder sedanI've been trying different OCR models on what should be very simple - subtitles (these are simple machine-rendered text). While all models do very well (95+% accuracy), I haven't seen a model not occasionally make very obvious mistakes. Maybe it will take a different approach to get the last 1%...
- rdos - 25027 sekunder sedanIs it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.
- sinandrei - 21126 sekunder sedanHas anyone experiment with using VLM to detect "marks"? Thinking of pen/pencil based markings like underlines, circles,checkmarks.. Can these models do it?
- ThrowawayTestr - 3308 sekunder sedanWhat's the current SOTA for Japanese and Korean OCR? BalloonsTranslator has a great workflow but the models are pretty old.
- bugglebeetle - 22672 sekunder sedanI tested this pretty extensively and it has a common failure mode that prevents me from using: extracting footnotes and similar from the full text of academic works. For some reason, many of these models are trained in a way that results in these being excluded, despite these document sections often containing import details and context. Both versions of DeepseekOCR have the same problem. Of the others I’ve tested, dot-ocr in layout mode works best (but is slow) and then datalab’s chandra model (which is larger and has bad license constraints).
- - 22689 sekunder sedan
- raphaelmolly8 - 21968 sekunder sedan[dead]
Nördnytt! 🤓