added envs for unstructured to control OCR quality and OCR languages
This commit is contained in:
+17
@@ -8,3 +8,20 @@ ENABLE_UNSTRUCTURED_PARSING=true
|
||||
|
||||
# Unstructured API endpoint (default for docker-compose setup)
|
||||
UNSTRUCTURED_API_URL=http://unstructured:8000
|
||||
|
||||
# Parsing strategy for the Unstructured service
|
||||
# Valid values: auto, fast, hi_res
|
||||
# - auto: Automatically choose the best strategy based on document type
|
||||
# - fast: Fast parsing without OCR - best for simple text documents
|
||||
# - hi_res: High-resolution parsing with OCR - best for scanned documents, images, and complex layouts (default)
|
||||
UNSTRUCTURED_STRATEGY=hi_res
|
||||
|
||||
# Languages for OCR and document parsing (comma-separated ISO 639-3 language codes)
|
||||
# Default: eng,deu (English and German)
|
||||
# Common language codes:
|
||||
# eng = English deu = German fra = French
|
||||
# spa = Spanish ita = Italian por = Portuguese
|
||||
# rus = Russian ara = Arabic zho = Chinese
|
||||
# jpn = Japanese kor = Korean
|
||||
# Example for English, German, and French: UNSTRUCTURED_LANGUAGES=eng,deu,fra
|
||||
UNSTRUCTURED_LANGUAGES=eng,deu
|
||||
|
||||
Reference in New Issue
Block a user