Module files
Sync client for files workflow.
upload
def upload(paths: List[Path],
api_key: Optional[str] = None,
api_url: Optional[str] = None,
workspace_name: str = DEFAULT_WORKSPACE_NAME,
write_mode: WriteMode = WriteMode.KEEP,
blocking: bool = True,
timeout_s: Optional[int] = None,
show_progress: bool = True,
recursive: bool = False,
desired_file_types: Optional[List[str]] = None,
enable_parallel_processing: bool = False,
safe_mode: bool = False) -> S3UploadSummary
Upload a folder to deepset AI Platform.
Arguments:
paths: Path to the folder to upload. If the folder contains unsupported file types, they're skipped. deepset AI Platform supports csv, docx, html, json, md, txt, pdf, pptx, xlsx, xml.api_key: deepset API key to use for authentication.api_url: API URL to use for authentication.workspace_name: Name of the workspace to upload the files to. It uses the workspace from the .ENV file by default.write_mode: Specifies what to do when a file with the same name already exists in the workspace. Possible options are: KEEP - uploads the file with the same name and keeps both files in the workspace. OVERWRITE - overwrites the file that is in the workspace. FAIL - fails to upload the file with the same name.blocking: Whether to wait for the files to be uploaded and displayed in deepset AI Platform.timeout_s: Timeout in seconds for theblockingparameter.show_progress: Shows the upload progress.recursive: Uploads files from subfolders as well.desired_file_types: A list of allowed file types to upload. If not provided, all files are uploaded.enable_parallel_processing: IfTrue, deepset AI Platform ingests files in parallel. Use this to speed up the upload process. Make sure you are not running concurrent uploads for the same files.safe_mode: IfTrue, disables ingesting files in parallel.
download
def download(workspace_name: str = DEFAULT_WORKSPACE_NAME,
file_dir: Optional[Union[Path, str]] = None,
name: Optional[str] = None,
odata_filter: Optional[str] = None,
include_meta: bool = True,
batch_size: int = 50,
api_key: Optional[str] = None,
api_url: Optional[str] = None,
show_progress: bool = True,
timeout_s: Optional[int] = None,
safe_mode: bool = False) -> None
Download a folder to deepset AI Platform.
Downloads all files from a workspace to a local folder.
Arguments:
workspace_name: Name of the workspace to upload the files to. It uses the workspace from the .ENV file by default.file_dir: Path to the folder to download.name: Name of the file to odata_filter by.odata_filter: odata_filter by file meta data.include_meta: Whether to include the file meta in the folder.batch_size: Batch size for the listing.api_key: API key to use for authentication.api_url: API URL to use for authentication.show_progress: Shows the upload progress.timeout_s: Timeout in seconds for the API requests.safe_mode: IfTrue, disables ingesting files in parallel.
upload_texts
def upload_texts(files: List[DeepsetCloudFile],
api_key: Optional[str] = None,
api_url: Optional[str] = None,
workspace_name: str = DEFAULT_WORKSPACE_NAME,
write_mode: WriteMode = WriteMode.KEEP,
blocking: bool = True,
timeout_s: Optional[int] = None,
show_progress: bool = True,
enable_parallel_processing: bool = False) -> S3UploadSummary
Upload texts to deepset AI Platform.
Arguments:
files: List of DeepsetCloudFiles to upload.api_key: deepset API key to use for authentication.api_url: API URL to use for authentication.workspace_name: Name of the workspace to upload the files to. It uses the workspace from the .ENV file by default.write_mode: Specifies what to do when a file with the same name already exists in the workspace. Possible options are: KEEP - uploads the file with the same name and keeps both files in the workspace. OVERWRITE - overwrites the file that is in the workspace. FAIL - fails to upload the file with the same name.blocking: Whether to wait for the files to be uploaded and listed in deepset AI Platform.timeout_s: Timeout in seconds for theblockingparameter.show_progress: Shows the upload progress.enable_parallel_processing: IfTrue, deepset AI Platform ingests files in parallel. Use this to speed up the upload process. Make sure you are not running concurrent uploads for the same files.
Example:
from deepset_cloud_sdk.workflows.sync_client.files import upload_texts, DeepsetCloudFile
upload_texts(
api_key="<deepsetCloud_API_key>",
workspace_name="<default_workspace>", # optional, by default the environment variable "DEFAULT_WORKSPACE_NAME" is used
files=[
DeepsetCloudFile(
name="example.txt",
text="this is text",
meta={"key": "value"}, # optional
)
],
blocking=True, # optional, by default True
timeout_s=300, # optional, by default 300
)
upload_bytes
def upload_bytes(files: List[DeepsetCloudFileBytes],
api_key: Optional[str] = None,
api_url: Optional[str] = None,
workspace_name: str = DEFAULT_WORKSPACE_NAME,
write_mode: WriteMode = WriteMode.KEEP,
blocking: bool = True,
timeout_s: Optional[int] = None,
show_progress: bool = True,
enable_parallel_processing: bool = False) -> S3UploadSummary
Upload any supported file types to deepset AI Platform. These include .csv, .docx, .html, .json, .md, .txt, .pdf, .pptx, .xlsx and .xml.
Arguments:
files: List of DeepsetCloudFilesBytes to upload.api_key: deepset API key to use for authentication.api_url: API URL to use for authentication.workspace_name: Name of the workspace to upload the files to. It uses the workspace from the .ENV file by default.write_mode: Specifies what to do when a file with the same name already exists in the workspace. Possible options are: KEEP - uploads the file with the same name and keeps both files in the workspace. OVERWRITE - overwrites the file that is in the workspace. FAIL - fails to upload the file with the same name.blocking: Whether to wait for the files to be uploaded and listed in deepset AI Platform.timeout_s: Timeout in seconds for theblockingparameter.show_progress: Shows the upload progress.enable_parallel_processing: IfTrue, deepset AI Platform ingests files in parallel. Use this to speed up the upload process. Make sure you are not running concurrent uploads for the same files.
get_upload_session
def get_upload_session(
session_id: UUID,
api_key: Optional[str] = None,
api_url: Optional[str] = None,
workspace_name: str = DEFAULT_WORKSPACE_NAME) -> UploadSessionStatus
Get the status of an upload session.
Arguments:
session_id: ID of the upload session to get the status for.api_key: deepset API key to use for authentication.api_url: API URL to use for authentication.workspace_name: Name of the workspace to upload the files to.
list_files
def list_files(
api_key: Optional[str] = None,
api_url: Optional[str] = None,
workspace_name: str = DEFAULT_WORKSPACE_NAME,
name: Optional[str] = None,
odata_filter: Optional[str] = None,
batch_size: int = 100,
timeout_s: Optional[int] = None) -> Generator[List[File], None, None]
List files in a deepset workspace.
Arguments:
api_key: deepset API key to use for authentication.api_url: API URL to use for authentication.workspace_name: Name of the workspace to list the files from. It uses the workspace from the .ENV file by default.name: Name of the file to odata_filter for.odata_filter: odata_filter to apply to the file list. For example,odata_filter="category eq 'news'"lists files with metadata:
{"meta": {"category": "news"}}
batch_size: Batch size to use for the file list.timeout_s: Timeout in seconds for the API requests.
list_upload_sessions
def list_upload_sessions(
api_key: Optional[str] = None,
api_url: Optional[str] = None,
workspace_name: str = DEFAULT_WORKSPACE_NAME,
is_expired: Optional[bool] = False,
batch_size: int = 100,
timeout_s: Optional[int] = None
) -> Generator[List[UploadSessionDetail], None, None]
List the details of all upload sessions, including the closed ones.
Arguments:
api_key: deepset API key to use for authentication.api_url: API URL to use for authentication.workspace_name: Name of the workspace whose sessions you want to list. It uses the workspace from the .ENV file by default.is_expired: Lists expired sessions.batch_size: Batch size to use for the session list.timeout_s: Timeout in seconds for the API request.
Module pipeline_client
Sync pipeline client for importing pipelines and indexes to deepset AI Platform.
PipelineClient
class PipelineClient()
Sync client for importing Haystack pipelines and indexes to deepset AI platform.
Example for importing a Haystack pipeline or index to deepset AI platform:
from deepset_cloud_sdk import (
PipelineClient,
PipelineConfig,
PipelineInputs,
PipelineOutputs,
IndexConfig,
IndexInputs,
)
from haystack import Pipeline
# Initialize the client with configuration from environment variables (after running `deepset-cloud login`)
client = PipelineClient()
# or initialize the client with explicit configuration
client = PipelineClient(
api_key="your-api-key",
workspace_name="your-workspace",
api_url="https://api.cloud.deepset.ai/api/v1"
)
# Configure your pipeline
pipeline = Pipeline()
# Configure import
# if importing a pipeline, use PipelineConfig
config = PipelineConfig(
name="my-pipeline",
inputs=PipelineInputs(
query=["prompt_builder.query"],
filters=["bm25_retriever.filters", "embedding_retriever.filters"],
),
outputs=PipelineOutputs(
answers="answers_builder.answers",
documents="ranker.documents",
),
strict_validation=False, # Fail on validation errors (default: False, warnings only)
overwrite=False, # Overwrite existing pipelines with the same name. If True, creates if it doesn't exist (default: False)
)
# if importing an index, use IndexConfig
config = IndexConfig(
name="my-index",
inputs=IndexInputs(files=["file_type_router.sources"]),
strict_validation=False, # Fail on validation errors (default: False, warnings only)
overwrite=False, # Overwrite existing indexes with the same name. If True, creates if it doesn't exist (default: False)
)
# sync execution
client.import_into_deepset(pipeline, config)
PipelineClient.__init__
def __init__(api_key: str | None = None,
workspace_name: str | None = None,
api_url: str | None = None) -> None
Initialize the Pipeline Client.
The client can be configured in two ways:
-
Using environment variables (recommended):
- Run
deepset-cloud loginto set up the following environment variables:API_KEY: Your deepset AI platform API keyAPI_URL: The URL of the deepset AI platform APIDEFAULT_WORKSPACE_NAME: The workspace name to use.
- Run
-
Using explicit parameters:
- Provide the values directly to this constructor
- Any missing parameters will fall back to environment variables
Arguments:
api_key: Your deepset AI platform API key. Falls back toAPI_KEYenvironment variable.workspace_name: The workspace to use. Falls back toDEFAULT_WORKSPACE_NAMEenvironment variable.api_url: The URL of the deepset AI platform API. Falls back toAPI_URLenvironment variable.
Raises:
ValueError: If no api key or workspace name is provided andAPI_KEYorDEFAULT_WORKSPACE_NAMEis not set in the environment.
PipelineClient.import_into_deepset
def import_into_deepset(pipeline: PipelineProtocol,
config: IndexConfig | PipelineConfig) -> None
Import a Haystack Pipeline or AsyncPipeline into deepset AI Platform synchronously.
The pipeline must be imported as either an index or a pipeline:
- An index: Processes files and stores them in a document store, making them available for pipelines to search.
- A pipeline: For other use cases, for example, searching through documents stored by index pipelines.
Arguments:
pipeline: The HaystackPipelineorAsyncPipelineto import.config: Configuration for importing into deepset, use eitherIndexConfigorPipelineConfig. If importing an index, the config argument is expected to be of typeIndexConfig, if importing a pipeline, the config argument is expected to be of typePipelineConfig.
Was this page helpful?