Converting Images and PDFs to Text using JavaScript and Appscript with OCR Technology

Posted by

Alfalfa

–

November 13, 2023

OCR – Image and PDF to Text using JavaScript and Appscript

Optical Character Recognition (OCR) is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. In this article, we will explore how to use JavaScript and Appscript to perform OCR on images and PDFs to extract text from them.

Using JavaScript for OCR

JavaScript, with the help of various OCR libraries and APIs, can be used to perform OCR on images and PDFs. One popular library for OCR in JavaScript is Tesseract.js, which is a pure JavaScript port of the popular Tesseract OCR engine. With Tesseract.js, you can easily extract text from images and PDFs using client-side JavaScript.

Example code using Tesseract.js

      
const { createWorker } = Tesseract;
const worker = createWorker({
  logger: m => console.log(m)
});

(async () => {
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
  console.log(text);
  await worker.terminate();
})();

Using Appscript for OCR

Appscript is a scripting language developed by Google for automating various tasks in Google Workspace applications such as Google Sheets, Docs, and Drive. With Appscript, you can also perform OCR on images and PDFs stored in Google Drive using the built-in OCR feature of Google’s Cloud Vision API.

Example code using Appscript and Cloud Vision API

      
function imageToText() {
  var file = DriveApp.getFileById('YOUR_IMAGE_FILE_ID');
  var imageBlob = file.getBlob();
  var vision = CloudVision.Images;
  var req = {
    requests: [
      {
        image: {
          content: imageBlob.getBytes()
        },
        features: [
          {
            type: 'TEXT_DETECTION'
          }
        ]
      }
    ]
  };
  var results = vision.annotate(req);
  var text = results.responses[0].textAnnotations[0].description;
  Logger.log(text);
}

Conclusion

OCR is a powerful technology that can be leveraged using JavaScript and Appscript to extract text from images and PDFs. Whether you need to extract text for processing, analysis, or search, these tools provide a straightforward way to achieve this goal. By incorporating OCR into your web and app development projects, you can unlock new possibilities for working with textual content.

and, Angular, Angular.js, appscript, converting, Express.js, Gatsby.js, Guides, images, javascript, Next.js, node.js, ocr, pdfs, react.js, technology, text, using, video, vite, vite.js, vue, Vue.js, with

Alfalfa

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Samiksha Mishra

1 year ago

sir can u plz help me mere laptop p ye work ni kr ra h plz reply

Fabricio Montivero

1 year ago

how do you manage CORS?

surya pratap biswal

1 year ago

api is free or paid after some usage

Daniel Štefanka

1 year ago

great vid, thank you!

All the best

1 year ago

Thank you for this awesome trick

Converting Images and PDFs to Text using JavaScript and Appscript with OCR Technology

OCR – Image and PDF to Text using JavaScript and Appscript

Using JavaScript for OCR

Example code using Tesseract.js

Using Appscript for OCR

Example code using Appscript and Cloud Vision API

Conclusion

Like this:

Recent Posts

Categories

Tags

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Converting Images and PDFs to Text using JavaScript and Appscript with OCR Technology

OCR – Image and PDF to Text using JavaScript and Appscript

Using JavaScript for OCR

Example code using Tesseract.js

Using Appscript for OCR

Example code using Appscript and Cloud Vision API

Conclusion

Share this:

Like this:

Recent Posts

Categories

Tags