servicex package

Subpackages

Submodules

servicex.configuration module

pydantic model servicex.configuration.Configuration[source]

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field api_endpoints: List[Endpoint] [Required]
field default_endpoint: str | None = None (alias 'default-endpoint')
field cache_path: str | None = None
field shortened_downloaded_filename: bool | None = False
validator expand_cache_path  »  all fields[source]

Expand the cache path to a full path, and create it if it doesn’t exist. Expand ${USER} to be the user name on the system. Works for windows, too. :param v: :return:

endpoint_dict() Dict[str, Endpoint][source]
classmethod read(config_path: str | None = None)[source]

Read configuration from .servicex or servicex.yaml file. :param config_path: If provided, use this as the path to the .servicex file.

Otherwise, search, starting from the current working directory and look in enclosing directories

Returns:

Populated configuration object

pydantic model servicex.configuration.Endpoint[source]

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field endpoint: str [Required]
field name: str [Required]
field token: str | None = ''

servicex.databinder_models module

pydantic model servicex.databinder_models.General[source]

Bases: DocStringBaseModel

Represents a group of samples to be transformed together.

Parameters:
  • Codegen – (typing.Optional[str]) Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

  • OutputFormat – (OutputFormatEnum) Output format for the transform request.

  • Delivery – (DeliveryEnum) Specifies the delivery method for the output files.

  • OutputDirectory – (typing.Optional[str]) Directory to output a yaml file describing the output files.

  • OutFilesetName – (str) Name of the yaml file that will be created in the output directory.

  • IgnoreLocalCache – (bool) Flag to ignore local cache for all samples.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specifies the output format for the transform request.

parquet = 'parquet'

Save the output as a parquet file https://parquet.apache.org/

root_ttree = 'root-ttree'

Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html

to_ResultFormat() ResultFormat[source]

This method is used to convert the OutputFormatEnum enum to the ResultFormat enum, which is what is actually used for the TransformRequest. This allows us to use different string values in the two enum classes to maintain backend compatibility

class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

LocalCache = 'LocalCache'

Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache

URLs = 'URLs'

Return URLs to the files stored in the ServiceX object store

field Codegen: str | None = None

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree

Output format for the transform request.

field Delivery: DeliveryEnum = DeliveryEnum.LocalCache

Specifies the delivery method for the output files.

field OutputDirectory: str | None = None

Directory to output a yaml file describing the output files.

field OutFilesetName: str = 'servicex_fileset'

Name of the yaml file that will be created in the output directory.

field IgnoreLocalCache: bool = False

Flag to ignore local cache for all samples.

pydantic model servicex.databinder_models.Sample[source]

Bases: DocStringBaseModel

Represents a single transform request within a larger submission.

Parameters:
  • Name – (str)

  • Dataset – (typing.Optional[servicex.dataset_identifier.DataSetIdentifier])

  • NFiles – (typing.Optional[int])

  • Query – (typing.Union[str, servicex.query_core.QueryStringGenerator, NoneType])

  • IgnoreLocalCache – (bool)

  • Codegen – (typing.Optional[str])

  • RucioDID – (typing.Optional[str])

  • XRootDFiles – (typing.Union[str, typing.List[str], NoneType])

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field Name: str [Required]

The name of the sample. This makes it easier to identify the sample in the output.

field Dataset: DataSetIdentifier | None = None

Dataset identifier for the sample

field NFiles: int | None = None

Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.

field Query: str | QueryStringGenerator | None = None

Query string or query generator for the sample.

field IgnoreLocalCache: bool = False

Flag to ignore local cache.

field Codegen: str | None = None

Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class

field RucioDID: str | None = None
Rucio Dataset Identifier, if applicable.

Deprecated: Use ‘Dataset’ instead.

field XRootDFiles: str | List[str] | None = None
XRootD file(s) associated with the sample.

Deprecated: Use ‘Dataset’ instead.

property dataset_identifier: DataSetIdentifier

Access the dataset identifier for the sample.

validator validate_did_xor_file  »  all fields[source]

Ensure that only one of Dataset, RootFile, or RucioDID is specified. :param values: :return:

validator validate_nfiles_is_not_zero  »  all fields[source]

Ensure that NFiles is not set to zero

validator truncate_long_sample_name  »  Name[source]

Truncate sample name to 128 characters if exceed Print warning message

property hash
pydantic model servicex.databinder_models.ServiceXSpec[source]

Bases: DocStringBaseModel

ServiceX Submission Specification - pass this into the ServiceX deliver function

Parameters:
  • General – (General) General settings for the transform request

  • Sample – (typing.List[servicex.databinder_models.Sample]) List of samples to be transformed

  • Definition – (typing.Optional[typing.List]) Any reusable definitions that are needed for the transform request

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset', IgnoreLocalCache=False)

General settings for the transform request

field Sample: List[Sample] [Required]

List of samples to be transformed

field Definition: List | None = None

Any reusable definitions that are needed for the transform request

validator validate_unique_sample  »  Sample[source]

servicex.dataset_group module

class servicex.dataset_group.DatasetGroup(datasets: List[Query])[source]

Bases: object

A group of datasets that are to be transformed together. This is a convenience class to allow you to submit multiple datasets to a ServiceX instance and then wait for all of them to complete.

Parameters:

datasets – List of transform request as dataset instances

as_files(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException]
async as_files_async(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException][source]
as_signed_urls(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException]
async as_signed_urls_async(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException][source]
set_result_format(result_format: ResultFormat)[source]

Set the result format for all the datasets in the group.

Parameters:

result_format – ResultFormat instance

servicex.dataset_identifier module

class servicex.dataset_identifier.CERNOpenDataDatasetIdentifier(dataset: int, num_files: int | None = None)[source]

Bases: DataSetIdentifier

CERN Open Data Dataset - this will be looked up using the CERN Open Data DID finder.

Parameters:
  • dataset – The dataset ID - this is an integer.

  • num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.

classmethod from_yaml(_, node)[source]
yaml_tag = '!CERNOpenData'
class servicex.dataset_identifier.DataSetIdentifier(scheme: str, dataset: str, num_files: int | None = None)[source]

Bases: object

Base class for specifying the dataset to transform. This can either be a list of xRootD URIs or a rucio DID

property did
property hash
populate_transform_request(transform_request: TransformRequest) None[source]
class servicex.dataset_identifier.FileListDataset(files: List[str] | str)[source]

Bases: DataSetIdentifier

Dataset specified as a list of XRootD URIs.

Parameters:

files – Either a list of URIs or a single URI string

property did
files: List[str]
classmethod from_yaml(constructor, node)[source]
property hash
num_files: int | None
populate_transform_request(transform_request: TransformRequest) None[source]
yaml_tag = '!FileList'
class servicex.dataset_identifier.RucioDatasetIdentifier(dataset: str, num_files: int | None = None)[source]

Bases: DataSetIdentifier

Rucio Dataset - this will be looked up using the Rucio data management service.

Parameters:
  • dataset – The rucio DID - this can be a dataset or a container of datasets.

  • num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.

classmethod from_yaml(_, node)[source]
yaml_tag = '!Rucio'
class servicex.dataset_identifier.XRootDDatasetIdentifier(pattern: str, num_files: int | None = None)[source]

Bases: DataSetIdentifier

CERN Open Data Dataset - this will be looked up using the CERN Open Data DID finder.

Parameters:
  • dataset – The dataset ID - this is an integer.

  • num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.

classmethod from_yaml(_, node)[source]
yaml_tag = '!XRootD'

servicex.expandable_progress module

class servicex.expandable_progress.ExpandableProgress(display_progress: bool = True, provided_progress: Progress | ExpandableProgress | None = None, overall_progress: bool = False)[source]

Bases: object

We want to be able to use rich progress bars in the async code, but there are some situtations where the user doesn’t want them. Also we might be running several simultaneous progress bars, and we want to be able to control that.

We still want to keep the context manager interface, so this class implements the context manager but if display_progress is False, then it does nothing. If provided_progress is set then we just use that. Otherwise we create a new progress bar

Parameters:
  • display_progress

  • provided_progress

add_task(param, start, total)[source]
advance(task_id, task_type)[source]
refresh()[source]
start_task(task_id, task_type)[source]
update(task_id, task_type, total=None, completed=None, **fields)[source]
class servicex.expandable_progress.ProgressCounts(description: str, task_id: TaskID, start: int | None = None, total: int | None = None, completed: int | None = None)[source]

Bases: object

class servicex.expandable_progress.TranformStatusProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)[source]

Bases: Progress

get_renderables()[source]

Get a number of renderables for the progress display.

servicex.minio_adapter module

class servicex.minio_adapter.MinioAdapter(endpoint_host: str, secure: bool, access_key: str, secret_key: str, bucket: str)[source]

Bases: object

MAX_PATH_LEN = 60
async download_file(object_name: str, local_dir: str, shorten_filename: bool = False) Path[source]
classmethod for_transform(transform: TransformStatus)[source]
async get_signed_url(object_name: str) str[source]
classmethod hash_path(file_name)[source]

Make the path safe for object store or POSIX, by keeping the length less than MAX_PATH_LEN. Replace the leading (less interesting) characters with a forty character hash. :param file_name: Input filename :return: Safe path string

async list_bucket() List[ResultFile][source]

servicex.models module

pydantic model servicex.models.CachedDataset[source]

Bases: BaseModel

Model for a cached dataset held by ServiceX server

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field id: int [Required]
field name: str [Required]
field did_finder: str [Required]
field n_files: int [Required]
field size: int [Required]
field events: int [Required]
field last_used: datetime [Required]
field last_updated: datetime [Required]
field lookup_status: str [Required]
field is_stale: bool [Required]
field files: List[DatasetFile] | None = []
pydantic model servicex.models.DatasetFile[source]

Bases: BaseModel

Model for a file in a cached dataset

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field id: int [Required]
field adler32: str | None [Required]
field file_size: int [Required]
field file_events: int [Required]
field paths: str [Required]
pydantic model servicex.models.DocStringBaseModel[source]

Bases: BaseModel

Class to autogenerate a docstring for a Pydantic model

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class servicex.models.ResultDestination(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Direct the output to object store or posix volume

object_store = 'object-store'
volume = 'volume'
pydantic model servicex.models.ResultFile[source]

Bases: DocStringBaseModel

Record reporting the properties of a transformed file result

Parameters:
  • filename – (str)

  • size – (int)

  • extension – (str)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field filename: str [Required]
field size: int [Required]
field extension: str [Required]
class servicex.models.ResultFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specify the file format for the generated output

parquet = 'parquet'
root_ttree = 'root-file'
class servicex.models.Status(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Status of a submitted transform

canceled = 'Canceled'
complete = 'Complete'
fatal = 'Fatal'
looking = 'Lookup'
pending = 'Pending Lookup'
running = 'Running'
submitted = 'Submitted'
pydantic model servicex.models.TransformRequest[source]

Bases: DocStringBaseModel

Transform request sent to ServiceX

Parameters:
  • title – (typing.Optional[str])

  • did – (typing.Optional[str])

  • file_list – (typing.Optional[typing.List[str]])

  • selection – (str)

  • image – (typing.Optional[str])

  • codegen – (str)

  • tree_name – (typing.Optional[str])

  • result_destination – (ResultDestination)

  • result_format – (ResultFormat)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field title: str | None = None
field did: str | None = None
field file_list: List[str] | None = None (alias 'file-list')
field selection: str [Required]
field image: str | None = None
field codegen: str [Required]
field tree_name: str | None = None (alias 'tree-name')
field result_destination: ResultDestination [Required]
field result_format: ResultFormat [Required]
compute_hash()[source]

Compute a hash for this submission. Only include properties that impact the result so we have maximal ability to reuse transforms

Returns:

SHA256 hash of request

pydantic model servicex.models.TransformStatus[source]

Bases: DocStringBaseModel

Status object returned by servicex

Parameters:
  • request_id – (str)

  • did – (str)

  • title – (typing.Optional[str])

  • selection – (str)

  • tree_name – (typing.Optional[str])

  • image – (str)

  • result_destination – (ResultDestination)

  • result_format – (ResultFormat)

  • generated_code_cm – (str)

  • status – (Status)

  • app_version – (str)

  • files – (int)

  • files_completed – (int)

  • files_failed – (int)

  • files_remaining – (typing.Optional[int])

  • submit_time – (datetime)

  • finish_time – (typing.Optional[datetime.datetime])

  • minio_endpoint – (typing.Optional[str])

  • minio_secured – (typing.Optional[bool])

  • minio_access_key – (typing.Optional[str])

  • minio_secret_key – (typing.Optional[str])

  • log_url – (typing.Optional[str])

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field request_id: str [Required]
field did: str [Required]
field title: str | None = None
field selection: str [Required]
field tree_name: str | None [Required]
field image: str [Required]
field result_destination: ResultDestination [Required]
field result_format: ResultFormat [Required]
field generated_code_cm: str [Required]
field status: Status [Required]
field app_version: str [Required]
field files: int [Required]
field files_completed: int [Required]
field files_failed: int [Required]
field files_remaining: int | None = 0
field submit_time: datetime = None
field finish_time: datetime | None = None
field minio_endpoint: str | None = None
field minio_secured: bool | None = None
field minio_access_key: str | None = None
field minio_secret_key: str | None = None
field log_url: str | None = None
validator parse_finish_time  »  finish_time[source]
pydantic model servicex.models.TransformedResults[source]

Bases: DocStringBaseModel

Returned for a submission. Gives you everything you need to know about a completed

transform.

Parameters:
  • hash – (str) Unique hash for transformation (used to look up results in cache)

  • title – (str) Title of transformation request

  • codegen – (str) Code generator used (internal ServiceX information related to query type)

  • request_id – (str) Associated request ID from the ServiceX server

  • submit_time – (datetime) Time of submission

  • data_dir – (str) Local directory for output

  • file_list – (typing.List[str]) List of downloaded files on local disk

  • signed_url_list – (typing.List[str]) List of URLs to retrieve output from remote ServiceX object store

  • files – (int) Number of files in result

  • result_format – (ResultFormat) File format for results

  • log_url – (typing.Optional[str]) URL for looking up logs on the ServiceX server

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field hash: str [Required]

Unique hash for transformation (used to look up results in cache)

field title: str [Required]

Title of transformation request

field codegen: str [Required]

Code generator used (internal ServiceX information related to query type)

field request_id: str [Required]

Associated request ID from the ServiceX server

field submit_time: datetime [Required]

Time of submission

field data_dir: str [Required]

Local directory for output

field file_list: List[str] [Required]

List of downloaded files on local disk

field signed_url_list: List[str] [Required]

List of URLs to retrieve output from remote ServiceX object store

field files: int [Required]

Number of files in result

field result_format: ResultFormat [Required]

File format for results

field log_url: str | None = None

URL for looking up logs on the ServiceX server

servicex.python_dataset module

class servicex.python_dataset.PythonFunction(python_function: str | Callable | None = None)[source]

Bases: QueryStringGenerator

default_codegen: str | None = 'python'
classmethod from_yaml(_, node)[source]
generate_selection_string() str[source]

override with the selection string to send to ServiceX

with_uproot_function(f: str | Callable) Self[source]
yaml_tag = '!PythonFunction'

servicex.query module

servicex.query_cache module

exception servicex.query_cache.CacheException[source]

Bases: Exception

class servicex.query_cache.QueryCache(config: Configuration)[source]

Bases: object

cache_path_for_transform(transform_status: TransformStatus) Path[source]
cache_transform(record: TransformedResults)[source]
cached_queries() List[TransformedResults][source]
close()[source]
contains_hash(hash: str) bool[source]

Check if the cache has completed records for a hash

delete_codegen_by_backend(backend: str)[source]
delete_record_by_hash(hash: str)[source]
delete_record_by_request_id(request_id: str)[source]
get_codegen_by_backend(backend: str) dict | None[source]
get_transform_by_hash(hash: str) TransformedResults | None[source]

Returns completed transformations by hash

get_transform_by_request_id(request_id: str) TransformedResults | None[source]

Returns completed transformed results using a request id

get_transform_request_id(hash_value: str) str | None[source]

Return the request id of cached record

is_transform_request_submitted(hash_value: str) bool[source]

Returns True if request is submitted Returns False if the request is not in the cache at all or not submitted

transformed_results(transform: TransformRequest, completed_status: TransformStatus, data_dir: str, file_list: List[str], signed_urls) TransformedResults[source]
update_codegen_by_backend(backend: str, codegen_list: list)[source]
update_record(record: TransformedResults)[source]
update_transform_request_id(hash_value: str, request_id: str) None[source]

Update the cached record request id

update_transform_status(hash_value: str, status: str) None[source]

Update the cached record status

servicex.servicex_adapter module

exception servicex.servicex_adapter.AuthorizationError[source]

Bases: BaseException

class servicex.servicex_adapter.ServiceXAdapter(url: str, refresh_token: str | None = None)[source]

Bases: object

async cancel_transform(transform_id=None)[source]
async delete_dataset(dataset_id=None) bool[source]
async delete_transform(transform_id=None)[source]
get_code_generators()[source]
async get_dataset(dataset_id=None) CachedDataset[source]
async get_datasets(did_finder=None, show_deleted=False) List[CachedDataset][source]
async get_transform_status(request_id: str) TransformStatus[source]
async get_transforms() List[TransformStatus][source]
async submit_transform(transform_request: TransformRequest) str[source]

servicex.servicex_client module

class servicex.servicex_client.GuardList(data: Sequence | Exception)[source]

Bases: Sequence

valid() bool[source]
exception servicex.servicex_client.ReturnValueException(exc)[source]

Bases: Exception

An exception occurred at some point while obtaining this result from ServiceX

class servicex.servicex_client.ServiceXClient(backend=None, url=None, config_path=None)[source]

Bases: object

Connection to a ServiceX deployment. Instances of this class can deployment data from the service and also interact with previously run transformations. Instances of this class are factories for Datasets`

If both backend and url are unspecified then it will attempt to pick up the default backend from .servicex

Parameters:
  • backend – Name of a deployment from the .servicex file

  • url – Direct URL of a serviceX deployment instead of using .servicex. Can only work with hosts without auth, or the token is found in a file pointed to by the environment variable BEARER_TOKEN_FILE

  • config_path – Optional path te the .servicex file. If not specified, will search in local directory and up in enclosing directories

cancel_transform(transform_id)[source]

Cancel a Transform by its request ID :return: A Query object

delete_dataset(dataset_id)[source]

Delete a dataset by its ID :return: A Query object

delete_transform(transform_id)[source]

Delete a Transform by its request ID :return: A Query object

delete_transform_from_cache(transform_id: str)[source]
generic_query(dataset_identifier: DataSetIdentifier | FileListDataset, query: str | QueryStringGenerator, codegen: str | None = None, title: str = 'ServiceX Client', result_format: ResultFormat = ResultFormat.parquet, ignore_cache: bool = False, fail_if_incomplete: bool = True) Query[source]

Generate a Query object for a generic codegen specification

Parameters:
  • dataset_identifier – The dataset identifier or filelist to be the source of files

  • title – Title to be applied to the transform. This is also useful for relating transform results.

  • codegen – Name of the code generator to use with this transform

  • result_format – Do you want Paqrquet or Root? This can be set later with the set_result_format method

  • ignore_cache – Ignore the query cache and always run the query

Returns:

A Query object

get_code_generators(backend=None)[source]

Retrieve the code generators deployed with the serviceX instance :return: The list of code generators as json dictionary

get_dataset(dataset_id)[source]

Retrieve a dataset by its ID :return: A Query object

get_datasets(did_finder=None, show_deleted=False)[source]

Retrieve all datasets you have run on the server :return: List of Query objects

get_transform_status(transform_id) TransformStatus

Get the status of a given transform :param transform_id: The uuid of the transform :return: The current status for the transform

async get_transform_status_async(transform_id) TransformStatus[source]

Get the status of a given transform :param transform_id: The uuid of the transform :return: The current status for the transform

get_transforms() List[TransformStatus]

Retrieve all transforms you have run on the server :return: List of Transform status objects

async get_transforms_async() List[TransformStatus][source]

Retrieve all transforms you have run on the server :return: List of Transform status objects

servicex.servicex_client.deliver(config: ServiceXSpec | Mapping[str, Any] | str | Path, config_path: str | None = None, servicex_name: str | None = None, return_exceptions: bool = True, fail_if_incomplete: bool = True, ignore_local_cache: bool = False)[source]

servicex.types module

Module contents

servicex.Delivery

alias of DeliveryEnum

pydantic model servicex.General[source]

Bases: DocStringBaseModel

Represents a group of samples to be transformed together.

Parameters:
  • Codegen – (typing.Optional[str]) Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

  • OutputFormat – (OutputFormatEnum) Output format for the transform request.

  • Delivery – (DeliveryEnum) Specifies the delivery method for the output files.

  • OutputDirectory – (typing.Optional[str]) Directory to output a yaml file describing the output files.

  • OutFilesetName – (str) Name of the yaml file that will be created in the output directory.

  • IgnoreLocalCache – (bool) Flag to ignore local cache for all samples.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specifies the output format for the transform request.

parquet = 'parquet'

Save the output as a parquet file https://parquet.apache.org/

root_ttree = 'root-ttree'

Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html

to_ResultFormat() ResultFormat[source]

This method is used to convert the OutputFormatEnum enum to the ResultFormat enum, which is what is actually used for the TransformRequest. This allows us to use different string values in the two enum classes to maintain backend compatibility

class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

LocalCache = 'LocalCache'

Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache

URLs = 'URLs'

Return URLs to the files stored in the ServiceX object store

field Codegen: str | None = None

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree

Output format for the transform request.

field Delivery: DeliveryEnum = DeliveryEnum.LocalCache

Specifies the delivery method for the output files.

field OutputDirectory: str | None = None

Directory to output a yaml file describing the output files.

field OutFilesetName: str = 'servicex_fileset'

Name of the yaml file that will be created in the output directory.

field IgnoreLocalCache: bool = False

Flag to ignore local cache for all samples.

servicex.OutputFormat

alias of OutputFormatEnum

class servicex.ResultDestination(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Direct the output to object store or posix volume

object_store = 'object-store'
volume = 'volume'
pydantic model servicex.Sample[source]

Bases: DocStringBaseModel

Represents a single transform request within a larger submission.

Parameters:
  • Name – (str)

  • Dataset – (typing.Optional[servicex.dataset_identifier.DataSetIdentifier])

  • NFiles – (typing.Optional[int])

  • Query – (typing.Union[str, servicex.query_core.QueryStringGenerator, NoneType])

  • IgnoreLocalCache – (bool)

  • Codegen – (typing.Optional[str])

  • RucioDID – (typing.Optional[str])

  • XRootDFiles – (typing.Union[str, typing.List[str], NoneType])

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field Name: str [Required]

The name of the sample. This makes it easier to identify the sample in the output.

field Dataset: DataSetIdentifier | None = None

Dataset identifier for the sample

field NFiles: int | None = None

Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.

field Query: str | QueryStringGenerator | None = None

Query string or query generator for the sample.

field IgnoreLocalCache: bool = False

Flag to ignore local cache.

field Codegen: str | None = None

Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class

field RucioDID: str | None = None
Rucio Dataset Identifier, if applicable.

Deprecated: Use ‘Dataset’ instead.

field XRootDFiles: str | List[str] | None = None
XRootD file(s) associated with the sample.

Deprecated: Use ‘Dataset’ instead.

property dataset_identifier: DataSetIdentifier

Access the dataset identifier for the sample.

validator validate_did_xor_file  »  all fields[source]

Ensure that only one of Dataset, RootFile, or RucioDID is specified. :param values: :return:

validator validate_nfiles_is_not_zero  »  all fields[source]

Ensure that NFiles is not set to zero

validator truncate_long_sample_name  »  Name[source]

Truncate sample name to 128 characters if exceed Print warning message

property hash
pydantic model servicex.ServiceXSpec[source]

Bases: DocStringBaseModel

ServiceX Submission Specification - pass this into the ServiceX deliver function

Parameters:
  • General – (General) General settings for the transform request

  • Sample – (typing.List[servicex.databinder_models.Sample]) List of samples to be transformed

  • Definition – (typing.Optional[typing.List]) Any reusable definitions that are needed for the transform request

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset', IgnoreLocalCache=False)

General settings for the transform request

field Sample: List[Sample] [Required]

List of samples to be transformed

field Definition: List | None = None

Any reusable definitions that are needed for the transform request

validator validate_unique_sample  »  Sample[source]
servicex.deliver(config: ServiceXSpec | Mapping[str, Any] | str | Path, config_path: str | None = None, servicex_name: str | None = None, return_exceptions: bool = True, fail_if_incomplete: bool = True, ignore_local_cache: bool = False)[source]