Integrating finetuned local models such as Nemotron-nano-4b into/with Docling #3492

Romaiz · 2026-05-22T12:02:35Z

Romaiz
May 22, 2026

Hello, I want to use docling for a custom agentic pipeline for document classification, extraction and external api tool calling. I was wondering if docling has the built in mechanisms for this task or if I could replace (what I assume is the backend model) qwen that docling uses with a finetuned Nemotron-nano-4b and if so, how?

Answered by dosubot[bot]

May 22, 2026

Hi @Romaiz! I'm Dosu and I’m helping the docling team.

Hey @Romaiz, great question! Docling does support integrating custom/finetuned local models, and you have several options for using something like Nemotron-nano-4b.

Quick clarification: Docling's default model is actually Granite-Docling-258M (IBM's lightweight model), not Qwen — though Qwen2.5-VL-3B is available as a preset [1].

Integrating a Custom Model

Option 1: Via OpenAI-compatible API (recommended for local serving)

Run your finetuned Nemotron-nano-4b with vLLM or Ollama, then point Docling to it:

from docling.datamodel.pipeline_options import VlmPipelineOptions, VlmConvertOptions, ApiVlmEngineOptions, VlmEngineType

vlm_options

View full answer

@Romaiz · 2026-05-22T12:13:09Z

dosubot[bot]
Bot May 22, 2026

Hi @Romaiz! I'm Dosu and I’m helping the docling team.

Hey @Romaiz, great question! Docling does support integrating custom/finetuned local models, and you have several options for using something like Nemotron-nano-4b.

Quick clarification: Docling's default model is actually Granite-Docling-258M (IBM's lightweight model), not Qwen — though Qwen2.5-VL-3B is available as a preset [1].

Integrating a Custom Model

Option 1: Via OpenAI-compatible API (recommended for local serving)

Run your finetuned Nemotron-nano-4b with vLLM or Ollama, then point Docling to it:

from docling.datamodel.pipeline_options import VlmPipelineOptions, VlmConvertOptions, ApiVlmEngineOptions, VlmEngineType

vlm_options = VlmConvertOptions.from_preset(
    "granite_docling",
    engine_options=ApiVlmEngineOptions(
        runtime_type=VlmEngineType.API,
        url="http://localhost:8000/v1/chat/completions",
        params={"model": "nemotron-nano-4b", "temperature": 0.0, "max_tokens": 8192},
        timeout=90,
    ),
)
pipeline_options = VlmPipelineOptions(vlm_options=vlm_options, enable_remote_services=True)

[2]

Option 2: Direct HuggingFace integration

If your model is on HuggingFace, you can load it directly via Transformers:

from docling.datamodel.pipeline_options_vlm_model import InlineVlmOptions, InferenceFramework

vlm_options = InlineVlmOptions(
    repo_id="your-org/nemotron-nano-4b-finetuned",
    inference_framework=InferenceFramework.TRANSFORMERS,
)

[3]

Note: the CLI --vlm-model flag only supports predefined presets — custom models require the Python API [3].

Built-in Capabilities

Regarding your agentic pipeline goals:

Document extraction: The VlmPipeline handles converting documents to structured formats (Markdown, DocTags, etc.) [4]
Schema-based extraction: The ExtractionVlmPipeline extracts structured data according to custom templates (Pydantic models, dicts, etc.) [5]
Picture classification: Built-in DocumentPictureClassifier classifies images into 15+ categories with confidence thresholds [6]

However, Docling doesn't have a built-in document-level classification system or tool-calling/agentic orchestration framework — you'd need to build that layer yourself around Docling's extraction and conversion capabilities.

Key Consideration

Docling requires transformers >=4.46.0,<5.0.0 [7], so ensure your model is compatible with that version range if using direct Transformers inference. If not, the API-based approach (vLLM/Ollama) avoids this constraint entirely.

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating finetuned local models such as Nemotron-nano-4b into/with Docling #3492

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Integrating finetuned local models such as Nemotron-nano-4b into/with Docling #3492

Uh oh!

Romaiz May 22, 2026

Integrating a Custom Model

Replies: 1 comment

Uh oh!

dosubot[bot] Bot May 22, 2026

Integrating a Custom Model

Built-in Capabilities

Key Consideration

Romaiz
May 22, 2026

dosubot[bot]
Bot May 22, 2026