DocumentWriter
Write documents to a DocumentStore.
DocumentWriter is used in indexing pipelines to persist processed documents into a DocumentStore so they can be retrieved later by query pipelines.
Key Features
- Writes documents to any compatible
DocumentStore. - Configurable duplicate handling:
NONE,SKIP,OVERWRITE, orFAIL. - Returns the number of documents successfully written.
Configuration
- Drag the
DocumentWritercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Configure the
document_storeto specify where the documents should be written. - Set the
policyto control how duplicate documents are handled.
- Configure the
- Go to the Advanced tab if you need to override the policy at run time.
Connections
DocumentWriter receives a list of Document objects through its documents input — typically from a preprocessor, splitter, or embedder at the end of an indexing pipeline. It outputs the count of written documents through its documents_written output. DocumentWriter is usually the last component in an indexing pipeline.
Source Code
To check this component's source code, open document_writer.py in the Haystack repository.
Usage Examples
Basic Configuration
DocumentWriter:
type: components.writers.document_writer.DocumentWriter
init_parameters: {}
components:
DocumentWriter:
type: components.writers.document_writer.DocumentWriter
init_parameters:
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | A list of documents to write to the document store. |
policy | Optional[DuplicatePolicy] | The policy to use when encountering duplicate documents. |
Outputs
| Parameter | Type | Description |
|---|---|---|
documents_written | int | Number of documents written to the document store. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
document_store | DocumentStore | The instance of the document store where you want to store your documents. | |
policy | DuplicatePolicy | DuplicatePolicy.NONE | The policy to apply when a Document with the same ID already exists in the DocumentStore. DuplicatePolicy.NONE: Default policy, relies on the DocumentStore settings. DuplicatePolicy.SKIP: Skips documents with the same ID. DuplicatePolicy.OVERWRITE: Overwrites documents with the same ID. DuplicatePolicy.FAIL: Raises an error if a Document with the same ID is already in the DocumentStore. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | A list of documents to write to the document store. | |
policy | Optional[DuplicatePolicy] | None | The policy to use when encountering duplicate documents. |
Was this page helpful?