DocAI

Collection of DocAI functions. Needs the configuration of several underlying services (see Tenant settings).

Methods

classify

classify(document: string, possibleTypes: string[])

Method to classify a document/image.

Parameters

document string

A base64 document or image you want to classify.

possibleTypes string[]

The possible types you expect for the document.

Return type

{ result: { type: string, confidence: number, reasoning: string }, model: string, usage: { promptTokens: number, completionTokens: number, totalTokens: number } }

Examples

// Classify a PDF document.
DocAI.classify('base64encodedPDFDocument', [
  "appraisal report",
  "passport",
  "salary slip"
]);

extract

extract(document: string, exampleFormat: { [key: string]: any; })

Method to extract information from a document/image.

Parameters

document string

A base64 document or image from which you want to extract information.

exampleFormat { [key: string]: any; }

The format you want the result to have.

Return type

{ result: { [key: string]: any; }, model: string, usage: { promptTokens: number, completionTokens: number, totalTokens: number } }

Examples

// Extract information of a PDF document.
DocAI.extract('base64encodedPDFDocument', {
  address: {
       street: "Dorpsstraat",
       number: "123",
       postcode: "1234 BB",
       city: "The Hague"
   }
});

extractMarkdown

extractMarkdown(document: string, markdownMethod?: 'internal' | 'azure')

Converts a document to markdown. Returns a string containing the document's markdown. The document can be a PDF, image or other document type. Images are OCRed before conversion.

Parameters

document string

A base64 encoded document or image that you want to convert to markdown.

markdownMethod 'internal' | 'azure'

(optional) The method to use for markdown extraction. internal uses an internal algorithm for extraction, azure uses Azure Document Intelligence to extract the markdown. The internal method may be more accurate for some documents, but the implementation is still experimental and subject to change in future versions.

Return type

{ result: string }

Examples

// Convert a document or image to markdown.
DocAI.extractMarkdown('base64encodedDocument');

ask

ask(document: string, prompt: string, options?: { 
        /**
         * The LLM model to use for this DocAI method. The LLM should be up and running in your environment.
         */
        model?: string,
    })

Method to ask for information from a document/image. Images are OCRed before querying. This method uses the underlying LLM to answer the question based on the content of the document. It cannot be used to describe images, it only answers based on the text in the document.

Parameters

document string

A base64 document or image that you want to query for information.

prompt string

The prompt/question you want to ask about the document.

options { /* * The LLM model to use for this DocAI method. The LLM should be up and running in your environment. / model?: string, }

(optional) The options for the DocAI service. If not specified, the default options will be used.

Return type

{ result: any, model: string, usage: { promptTokens: number, completionTokens: number, totalTokens: number } }

Examples

// Ask a question about a PDF document.
DocAI.ask('base64encodedPDFDocument', 
   "Did the valuation reveal any special features?"
);

chat

chat(prompt: string, options?: { 
        /**
         * The LLM model to use for this DocAI method. The LLM should be up and running in your environment.
         */
        model?: string,
    })

Method to ask the underlying LLM for general information, not related to any document. This returns the response from the LLM along with model and usage information. The result can be either a text or an object.

Parameters

prompt string

The prompt/question you want to ask.

options { /* * The LLM model to use for this DocAI method. The LLM should be up and running in your environment. / model?: string, }

(optional) The options for the DocAI service. If not specified, the default options will be used.

Return type

{ result: any, model: string, usage: { promptTokens: number, completionTokens: number, totalTokens: number } }

Examples

// Ask a question.
DocAI.chat("What to look for in a valuation report?");

extractStructuredData

extractStructuredData(document: string, model: 'prebuilt-idDocument' | 'prebuilt-invoice' | 'prebuilt-receipt' | 'prebuilt-contract', options?: { 
        /**
         * Confidence threshold for extracted data. Defaults to 0.5. If the confidence of a field is below this threshold, it will be omitted from the result.
         */
        confidenceThreshold?: number,
        /**
         * Whether to use Azure's markdown conversion. Defaults to false. Only valid in combination with model prebuilt-layout. If true, the document will be converted to markdown using Azure's PDF to Markdown service. If false, Rulecube's custom extraction will be used.
         */
        useAzureMarkdown?: boolean
    })

Method to extract structured data from a document/image using a pretrained model. The following models are supported: id-documents, invoices, receipts, and contracts.

Parameters

document string

A base64 document or image from which you want to extract structured data.

model 'prebuilt-idDocument' | 'prebuilt-invoice' | 'prebuilt-receipt' | 'prebuilt-contract'

The model defining the structured data to extract from the document.

options { /* * Confidence threshold for extracted data. Defaults to 0.5. If the confidence of a field is below this threshold, it will be omitted from the result. / confidenceThreshold?: number, /* * Whether to use Azure's markdown conversion. Defaults to false. Only valid in combination with model prebuilt-layout. If true, the document will be converted to markdown using Azure's PDF to Markdown service. If false, Rulecube's custom extraction will be used. / useAzureMarkdown?: boolean }

(optional) The options for structured data extraction.

Return type

{ result: { [key: string]: any; } }

Examples

// Extract structured data from a passport:
DocAI.extractStructuredData('base64encodedPassportImage', 'prebuilt-idDocument', 'structured');

Last updated