DocAI

Introduction

DocAI is a powerful tool for automating document processing and extraction. It can help streamline your workflows and improve efficiency. As AI services are evolving very fast, this feature is currently labeled as experimental. Please use it with caution, and we encourage you to report any issues you encounter. Microsoft Azure provides the underlying AI services. We recommend reviewing their privacy policy and terms of service before using this feature. We currently utilize Azure's OpenAI services and the Azure Document Intelligence API. To use this feature, you must have an Azure account and set up the required services.

You need to deploy your own Azure OpenAI and Document Intelligence resources. This gives you complete control over your data and usage. By deploying services in the EU region, you can ensure that your data remains within the EU, complying with data residency requirements. Please refer to the official Azure documentation for guidance on setting up these services and to verify compliance with your requirements.

Configuration

The configuration of the necessary services is done in the Tenant settings.

Azure Document Intelligence API Settings

The Azure Document Intelligence API requires the following settings (see here):

  • DocumentIntelligenceApiEndpoint: The endpoint URL for the Azure Document Intelligence API. Typically, in the format https://<your-resource-name>.cognitiveservices.azure.com/.

  • DocumentIntelligenceApiKey: The API key for accessing the Azure Document Intelligence service. Typically found in the Azure Portal under the "Keys and Endpoint" section of your resource.

  • DocumentIntelligenceDocumentModel: The document model to be used by the Azure Document Intelligence API.For future use only. At this moment, only the default model 'prebuilt-layout' is supported.

Azure OpenAI Settings

The Azure OpenAI service requires the following settings (see here):

  • AzureOpenAIEndpoint: The endpoint URL for the Azure OpenAI service. Typically, in the format https://<your-resource-name>.openai.azure.com/.

  • AzureOpenAIDefaultDeployment: The default deployment name of the model you want to use. This is configured when you create a deployment in the Azure Portal.

  • AzureOpenAIApiVersion: The API version to use for the Azure OpenAI service. Check the Azure documentation for the latest supported versions.

  • AzureOpenAIKey: The API key for accessing the Azure OpenAI service. Typically found in the Azure Portal under the "Keys and Endpoint" section of your resource.

DocAI methods

The DocAI feature offers several methods for interacting with documents and images utilizing AI capabilities. Below are the available DocAI methods along with their descriptions and parameters.

DocAI.chat()

The chat() method allows you to interact with the underlying LLM for general information, not related to any document. You provide a prompt, and the model responds based on its training data. This can be useful for getting insights, explanations, or general knowledge. Parameters include the prompt and (optional) the AI model to use. Results include the response text and some metadata regarding the model used and the token usage.

The chat() method is similar to the standard chat completions available in Azure OpenAI, but it is integrated into the DocAI feature for convenience. This way, you can use both document-related and general AI capabilities within the same context.

DocAI.ask()

The ask() method enables you to query a specific document or image for information. The document is processed (including OCR for images) to extract text, and then the LLM is used to answer your question based on that text. This is particularly useful for extracting specific details from documents. Parameters include the document, the prompt/question, and (optional) the AI model to use.

DocAI.classify()

The classify() method is used to categorize a document or image into one of several predefined types. This is useful for organizing documents and automating workflows based on document type. You provide the document and a list of possible types, and the method returns the most likely type for the document. Parameters include the document and the list of possible types. The result consists of the predicted type, a confidence score, a reasoning explanation, and some metadata regarding the model used and the token usage.

DocAI.extract()

The extract() method allows you to extract specific information from a document or image based on a provided example JSON format. This is useful for pulling out structured data from unstructured documents. You provide the document and an example format that defines the structure of the data you want to extract. The method returns the extracted information in the specified format. Parameters include the document and the example format. The result is an object containing the extracted data and some metadata regarding the model used and the token usage.

DocAI.extractMarkdown()

The extractMarkdown() method converts a document or image into markdown format. This is useful for transforming various document types into a more readable format for further processing. This method supports both an internal extraction algorithm and Azure Document Intelligence for the conversion. Parameters include the document and (optional) the markdown extraction method. The default extraction method is 'internal'.

DocAI.extractStructuredData()

The extractStructuredData() method is designed to extract structured data from documents or images using pretrained models. Supported models include ID documents, invoices, receipts, and contracts. This method is particularly useful for automating data entry and processing tasks. You provide the document, the model type, and (optional) the output format (structured JSON or raw response). The method returns the extracted structured data based on the selected model. Parameters include the document, the model type, and (optional) the output format. The result is an object containing the extracted structured data.

The big difference with the extract() method is that here pretrained models are used for specific document types, while in extract() you define the structure of the data you want to extract yourself. The accuracy of the extraction may be higher with extractStructuredData() for the supported document types, as the models are trained explicitly for those use cases. If the document type is not supported by extractStructuredData(), you can use the more flexible extract() method to define your own data structure; however, the accuracy may vary depending on the document's complexity and the example format provided.

Last updated