• Empleos
  • Sobre nosotros
  • profesionales
    • Inicio
    • Empleos
    • Cursos y retos
  • empresas
    • Inicio
    • Publicar vacante
    • Nuestro proceso
    • Precios
    • Evaluaciones
    • Nómina
    • Blog
    • Comercial
    • Calculadora de salario

0

176
Vistas
Only tesseract.js can extract the text from an image and other libraries cant

so I came across this weird "issue". I wrote a program in C# and JS to extract the numbers from an image, however only the JS code, which uses the tesseract.js library can successfully get the text. The image given to both programs are identical and they both use the same model. I took the model from the Tesseract.JS GitHub to ensure they were both using the same models. The model can be found here

I presumed that the tesseract.js library may be altering the image in some way, so I looked through the source code and didn't manage to find anything.

Js Library: Tesseract.JS

C# library: Tesseract.Net.SDK

Image I used:

Here is the image I gave each program

C# code:

using var objOcr = OcrApi.Create();

objOcr.SetVariable("tessedit_char_whitelist", "0123456789");

objOcr.Init(Patagames.Ocr.Enums.Languages.English);

Bitmap image= new Bitmap("image.png")
var text = objOcr.GetTextFromImage(image);

JS code:

import Tesseract from 'tesseract.js';

Tesseract.recognize(
  'image.png',
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {
 console.log(text);
})
about 3 years ago · Juan Pablo Isaza
Responde la pregunta
Encuentra empleos remotos

¡Descubre la nueva forma de encontrar empleo!

Top de empleos
Top categorías de empleo
Empresas
Publicar vacante Precios Nuestro proceso Comercial
Legal
Términos y condiciones Política de privacidad
© 2025 PeakU Inc. All Rights Reserved.

Andres GPT

Recomiéndame algunas ofertas
Necesito ayuda