AWS

Generative AI powered Intelligent Document Processing

Steps to use Amazon Bedrock to get process Documents.


Over the past few months, Metal Toad has received multiple sales leads from clients wanting to automate the extraction of information from documents.  Upon learning that AWS was going to host a webinar on Generative AI Powered Intelligent Document Processing using AWS Bedrock, I eagerly seized the opportunity to observe how AWS recommends the process, allowing Metal Toad to refine our own procedures.

At its simplest, level a document processing has a few main components. 

  1. File Input. 
  2. Text extraction: Textract
  3. Generative AI Query to prompt the data
  4. Result Output
  5. Present results to the user. 



File Input:

Acquiring a file is the initial step in any document processing workflow. In most of Metal Toad's workflows, the process begins with S3.  Files may enter S3 through a web portal or services. Once in S3, a Step function is triggered to orchestrate steps 2-4.

 

Text Extraction:

Text extraction is straightforward. AWS provides a service called Textract that performs document-to-text conversion. One of the key advantages of Textract is its variety of analyses that optimize the process for your business. 

  • Tabular data is employed for extracting tables of data from the document. 
  • Layout Based can be used if the data you need is consistently located in the same place on documents. 
  • Signatures will identify signatures in the document. 
  • Raw text is used to obtain a dump of all text.
  • Expense, ID, and Lending are tailor-made for their respective use cases. 
  • Queries allow you to ask a question in plain English and receive an answer. It uses a visual and text-based ML to identify the requested information. 

Textract Queries doesn't use a LLM but may be what you need. It offers a way to train a custom model for you with very little data. However, the downside is that the query is expensive compared to Raw text or Layout-based options. 

 

Generative AI Query:

After obtaining the text output, there are several ways to query it with a BedRock LLM. It can be passed straight from the results, preprocessed, or loaded into a vector database. Additionally, you need to choose the Bedrock model that suits your use case and provides the required accuracy.

To accomplish this, Metal Toad recommends activating an Amazon SageMaker notebook instance. This provides you with a fully managed Jupyter notebook environment with AWS credentials. This provides you with a fully managed Jupyter notebook environment with AWS credentials.

Once you have one you like, you can integrate it into your step function to process the data in real time. 

 

Result Output:

After obtaining all text, the prompt, the results, and confidence ratings, what's next? The next step is to identify the appropriate data store for your results.The specific output depends on the last step, Result Presentation, but generally, using S3 for raw results and DynamoDB for results is a good starting point. There is no restriction on using any datastore you need. Metal Toad has collaborated with clients to push results to their API or database. 

 

Result Presentation:

Most users won't find a JSON representation of data appealing. It won't mean anything to them, and they won't know how to retrieve it from the database. For that reason, you need a way to visualize the results. This can vary depending on the use case. It could be an image of the document with the highlighted text identified, or it could be as simple as confidence ratings over time. The key is to monitor how your platform is performing over time, both to leverage your data and to detect if it drifts and requires retraining.

 

I hope you enjoyed this high-level outline of a Generative AI-powered Intelligent Document Processing system. Document processing is a fascinating problem that Metal Toad has worked with at various times over the years. With Generative AI, it's becoming easier to process files with more inconsistent formats to extract data that can appear in multiple different formats.

Similar posts

Get notified on new marketing insights

Be the first to know about new B2B SaaS Marketing insights to build or refine your marketing function with the tools and knowledge of today’s industry.