Import Document AI Webservice
Overview
The Import Document AI Webservice component is an AI-powered web service. The extraction service providing NLP (turning raw document to STIX bundles) relies on an AI model trained by Filigran.
For more information about this feature, see the official OpenCTI documentation.
Enabling the Service
By default, the Import Document AI Webservice is disabled. To enable it, configure the following values in your values.yaml file:
Basic Configuration
Docker Image
The service uses the official Filigran image:
importDocumentAiWebservice:
enabled: true
image:
registry: "docker.io"
repository: "filigran/import-document-ai-webservice"
version: latest
pullPolicy: "IfNotPresent"
Kubernetes Service
The service is exposed internally via a Kubernetes service:
Advanced Configuration
Resources
Define resource limits and requests for the container:
GPU Support
The Import Document AI Webservice can leverage GPU acceleration for improved performance. To enable GPU support, you need to:
- Ensure your Kubernetes cluster has GPU nodes with the NVIDIA device plugin installed
-
Ref: NVIDIA GPU Operator
-
Request GPU resources in your configuration:
importDocumentAiWebservice:
resources:
limits:
cpu: "2"
memory: 16Gi
nvidia.com/gpu: 1 # Request 1 GPU
requests:
cpu: "1"
memory: 4Gi
- Optional: Use node selector to target GPU-enabled nodes:
Note: GPU support is optional. The service will work without GPU but may have reduced performance for large documents.
Environment Variables
Add custom environment variables if needed:
Health Probes
The deployment includes configurable health probes:
Readiness Probe:
importDocumentAiWebservice:
readinessProbe:
failureThreshold: 10
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: 8000
Liveness Probe:
importDocumentAiWebservice:
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
Security
Image Pull Secrets
If you are using a private registry:
External Exposure via Ingress
The Import Document AI webservice can be exposed outside the cluster using an ingress. There are two configuration options:
Option 1: Explicit Host Configuration
Define specific hosts and paths for the ingress:
importDocumentAiWebservice:
ingress:
enabled: true
className: "ChangeMe"
annotations:
cert-manager.io/cluster-issuer: "ChangeMe"
hosts:
- host: import-ai.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: import-ai-tls
hosts:
- import-ai.example.com
This configuration gives you full control over the hostname and paths.
Option 2: Automatic Subdomain Configuration
If you don't specify hosts, the ingress will automatically create a subdomain based on the OpenCTI front host:
importDocumentAiWebservice:
ingress:
enabled: true
className: "ChangeMe"
annotations:
cert-manager.io/cluster-issuer: "ChangeMe"
# No hosts specified - will use automatic subdomain
opencti:
front:
ingress:
hosts:
- host: opencti.example.com
paths:
- path: /
pathType: Prefix
With this configuration, the Import Document AI Webservice will be automatically accessible at:
Note: The automatic subdomain is created by prefixing import-document-ai-webservice. to the first host defined in opencti.front.ingress.hosts.
Integration with OpenCTI
For the OpenCTI import-document connector to use this service, configure it with the service URL:
opencti:
connector:
connectors:
- name: import-document
enabled: true
env:
CONNECTOR_WEB_SERVICE_URL: "https://import-document-ai-webservice.opencti.example.com"
# Other configurations...
Complete Configuration Example
Here is a complete configuration example:
importDocumentAiWebservice:
enabled: true
image:
registry: "docker.io"
repository: "filigran/import-document-ai-webservice"
version: "latest"
pullPolicy: "IfNotPresent"
resources:
limits:
cpu: "4"
memory: 16Gi
#nvidia.com/gpu: 1
requests:
cpu: "2"
memory: 4Gi
ingress:
enabled: true
className: "ChangeMe"
annotations:
cert-manager.io/cluster-issuer: "ChangeMe"
hosts:
- host: import-ai.opencti.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: import-ai-opencti-tls
hosts:
- import-ai.opencti.example.com
env:
LOG_LEVEL: "info"
Configuration of the import-document connector to use the service
# Configuration of the import-document connector to use the service
opencti:
connector:
connectors:
- name: import-document
enabled: true
image:
registry: "docker.io"
repository: "opencti/connector-import-document"
pullPolicy: "IfNotPresent"
env:
OPENCTI_TOKEN: "ChangeMe"
CONNECTOR_ID: "ChangeMe"
CONNECTOR_TYPE: INTERNAL_IMPORT_FILE
CONNECTOR_NAME: ImportDocument
CONNECTOR_SCOPE: application/pdf,text/plain,text/html,text/markdown
CONNECTOR_VALIDATE_BEFORE_IMPORT: 'true'
CONNECTOR_AUTO: 'false'
CONNECTOR_LOG_LEVEL: error
CONNECTOR_WEB_SERVICE_URL: "https://import-document-ai-webservice.opencti.example.com"
Deployment Verification
After deployment, verify that the service is working correctly:
# Check that the pod is running
kubectl get pods -l app=import-document-ai-webservice
# Check the logs
kubectl logs -l app=import-document-ai-webservice
# Test the health endpoint
kubectl port-forward svc/import-document-ai-webservice 8000:80
curl http://localhost:8000/health