Import Document AI Webservice

Overview

The Import Document AI Webservice component is an AI-powered web service. The extraction service providing NLP (turning raw document to STIX bundles) relies on an AI model trained by Filigran.

For more information about this feature, see the official OpenCTI documentation.

Enabling the Service

By default, the Import Document AI Webservice is disabled. To enable it, configure the following values in your values.yaml file:

importDocumentAiWebservice:
  enabled: true

Basic Configuration

Docker Image

The service uses the official Filigran image:

importDocumentAiWebservice:
  enabled: true
  image:
    registry: "docker.io"
    repository: "filigran/import-document-ai-webservice"
    version: latest
    pullPolicy: "IfNotPresent"

Kubernetes Service

The service is exposed internally via a Kubernetes service:

importDocumentAiWebservice:
  service:
    type: ClusterIP
    port: 80
    targetPort: 8000

Advanced Configuration

Resources

Define resource limits and requests for the container:

importDocumentAiWebservice:
  resources:
    limits:
      cpu: "2"
      memory: 16Gi
    requests:
      cpu: "1"
      memory: 4Gi

GPU Support

The Import Document AI Webservice can leverage GPU acceleration for improved performance. To enable GPU support, you need to:

Ensure your Kubernetes cluster has GPU nodes with the NVIDIA device plugin installed
Ref: NVIDIA GPU Operator
Request GPU resources in your configuration:

importDocumentAiWebservice:
  resources:
    limits:
      cpu: "2"
      memory: 16Gi
      nvidia.com/gpu: 1  # Request 1 GPU
    requests:
      cpu: "1"
      memory: 4Gi

Optional: Use node selector to target GPU-enabled nodes:

importDocumentAiWebservice:
  nodeSelector:
    accelerator: nvidia-gpu

Note: GPU support is optional. The service will work without GPU but may have reduced performance for large documents.

Environment Variables

Add custom environment variables if needed:

importDocumentAiWebservice:
  env:
    LOG_LEVEL: "info"
    # Add other variables as needed

Health Probes

The deployment includes configurable health probes:

Readiness Probe:

importDocumentAiWebservice:
  readinessProbe:
    failureThreshold: 10
    initialDelaySeconds: 10
    periodSeconds: 5
    successThreshold: 1
    tcpSocket:
      port: 8000

Liveness Probe:

importDocumentAiWebservice:
  livenessProbe:
    failureThreshold: 3
    httpGet:
      path: /health
      port: 8000
    initialDelaySeconds: 30
    periodSeconds: 10

Security

Image Pull Secrets

If you are using a private registry:

importDocumentAiWebservice:
  imagePullSecrets:
    - name: my-registry-secret

External Exposure via Ingress

The Import Document AI webservice can be exposed outside the cluster using an ingress. There are two configuration options:

Option 1: Explicit Host Configuration

Define specific hosts and paths for the ingress:

importDocumentAiWebservice:
  ingress:
    enabled: true
    className: "ChangeMe"
    annotations:
      cert-manager.io/cluster-issuer: "ChangeMe"
    hosts:
      - host: import-ai.example.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - secretName: import-ai-tls
        hosts:
          - import-ai.example.com

This configuration gives you full control over the hostname and paths.

Option 2: Automatic Subdomain Configuration

If you don't specify hosts, the ingress will automatically create a subdomain based on the OpenCTI front host:

importDocumentAiWebservice:
  ingress:
    enabled: true
    className: "ChangeMe"
    annotations:
      cert-manager.io/cluster-issuer: "ChangeMe"
    # No hosts specified - will use automatic subdomain

opencti:
  front:
    ingress:
      hosts:
        - host: opencti.example.com
          paths:
            - path: /
              pathType: Prefix

With this configuration, the Import Document AI Webservice will be automatically accessible at:

import-document-ai-webservice.opencti.example.com

Note: The automatic subdomain is created by prefixing import-document-ai-webservice. to the first host defined in opencti.front.ingress.hosts.

Integration with OpenCTI

For the OpenCTI import-document connector to use this service, configure it with the service URL:

opencti:
  connector:
    connectors:
      - name: import-document
        enabled: true
        env:
          CONNECTOR_WEB_SERVICE_URL: "https://import-document-ai-webservice.opencti.example.com"
          # Other configurations...

Complete Configuration Example

Here is a complete configuration example:

importDocumentAiWebservice:
  enabled: true

  image:
    registry: "docker.io"
    repository: "filigran/import-document-ai-webservice"
    version: "latest"
    pullPolicy: "IfNotPresent"

  resources:
    limits:
      cpu: "4"
      memory: 16Gi
      #nvidia.com/gpu: 1
    requests:
      cpu: "2"
      memory: 4Gi

  ingress:
    enabled: true
    className: "ChangeMe"
    annotations:
      cert-manager.io/cluster-issuer: "ChangeMe"
    hosts:
      - host: import-ai.opencti.example.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - secretName: import-ai-opencti-tls
        hosts:
          - import-ai.opencti.example.com

  env:
    LOG_LEVEL: "info"

Configuration of the import-document connector to use the service

# Configuration of the import-document connector to use the service
opencti:
  connector:
    connectors:
      - name: import-document
        enabled: true
        image:
          registry: "docker.io"
          repository: "opencti/connector-import-document"
          pullPolicy: "IfNotPresent"
        env:
          OPENCTI_TOKEN: "ChangeMe"
          CONNECTOR_ID: "ChangeMe"
          CONNECTOR_TYPE: INTERNAL_IMPORT_FILE
          CONNECTOR_NAME: ImportDocument
          CONNECTOR_SCOPE: application/pdf,text/plain,text/html,text/markdown
          CONNECTOR_VALIDATE_BEFORE_IMPORT: 'true'
          CONNECTOR_AUTO: 'false'
          CONNECTOR_LOG_LEVEL: error
          CONNECTOR_WEB_SERVICE_URL: "https://import-document-ai-webservice.opencti.example.com"

Deployment Verification

After deployment, verify that the service is working correctly:

# Check that the pod is running
kubectl get pods -l app=import-document-ai-webservice

# Check the logs
kubectl logs -l app=import-document-ai-webservice

# Test the health endpoint
kubectl port-forward svc/import-document-ai-webservice 8000:80
curl http://localhost:8000/health