hip_data_tools.etl package

Submodules

hip_data_tools.etl.adwords_to_athena module

Module to deal with data transfer from Adwords to Athena

class hip_data_tools.etl.adwords_to_athena.AdWordsReportsToAthena(settings: hip_data_tools.etl.adwords_to_athena.AdWordsReportsToAthenaSettings)

Bases: hip_data_tools.etl.adwords_to_s3.AdWordsReportsToS3

ETL Class to handle the transfer of data from adwords based on AWQL to S3 as parquet files :param settings: the etl settings to be used :type settings: AdWordsToS3Settings

add_partitions()

Add the current Data Transfer’s partition to Athena’s Metadata Returns: None

create_athena_table() → None

Creates an athena table on top of the transferred data Returns: None

get_target_prefix_with_partition_dirs() → str

Return the target s3 key prefix which includes partition directories Returns: modified target key prefix string

class hip_data_tools.etl.adwords_to_athena.AdWordsReportsToAthenaSettings(source_query: googleads.adwords.ReportQuery, source_include_zero_impressions: bool, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, transformation_field_type_mask: Optional[Dict[str, numpy.dtype]], target_database: str, target_table: str, target_table_ddl_progress: bool, is_partitioned_table: bool, partition_values: Optional[List[Tuple[str, Any]]])

Bases: hip_data_tools.etl.adwords_to_s3.AdWordsReportToS3Settings

Settings container for Adwords to Athena ETL

class hip_data_tools.etl.adwords_to_athena.AdWordsToAthena(settings: hip_data_tools.etl.adwords_to_athena.AdWordsToAthenaSettings)

Bases: hip_data_tools.etl.adwords_to_s3.AdWordsToS3

ETL Class to handle the transfer of data from adwords based on AWQL to S3 as parquet files :param settings: the etl settings to be used :type settings: AdWordsToS3Settings

create_athena_table() → None

Creates an athena table on top of the transferred data Returns: None

class hip_data_tools.etl.adwords_to_athena.AdWordsToAthenaSettings(source_query_fragment: googleads.adwords.ServiceQueryBuilder, source_service: str, source_service_version: str, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, target_database: str, target_table: str, target_table_ddl_progress: bool, is_partitioned_table: bool, partition_values: Optional[List[Tuple[str, Any]]])

Bases: hip_data_tools.etl.adwords_to_s3.AdWordsToS3Settings

Settings container for Adwords to Athena ETL

hip_data_tools.etl.adwords_to_athena.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.adwords_to_s3 module

Module to deal with data transfer from Adwords to S3

class hip_data_tools.etl.adwords_to_s3.AdWordsReportToS3Settings(source_query: googleads.adwords.ReportQuery, source_include_zero_impressions: bool, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, transformation_field_type_mask: Optional[Dict[str, numpy.dtype]])

Bases: object

S3 to Cassandra ETL settings

class hip_data_tools.etl.adwords_to_s3.AdWordsReportsToS3(settings: hip_data_tools.etl.adwords_to_s3.AdWordsReportToS3Settings)

Bases: object

ETL Class to handle the transfer of data from adwords reports based on AWQL to S3 as parquet :param settings: the etl settings to be used :type settings: AdWordsToS3Settings

transfer(**kwargs)

Transfer the entire report to s3 in parquet format Returns: None

class hip_data_tools.etl.adwords_to_s3.AdWordsToS3(settings: hip_data_tools.etl.adwords_to_s3.AdWordsToS3Settings)

Bases: object

ETL Class to handle the transfer of data from adwords based on AWQL to S3 as parquet files :param settings: the etl settings to be used :type settings: AdWordsToS3Settings

build_query(start_index: int, page_size: int, num_iterations: int) → None

Builds the query based on the query fragment in settings, to be able to load data in parallel from the given start index for a number of iterations :param start_index: the start index to offset the beginning of query paging :type start_index: int :param page_size: number of elements in each page/api call :type page_size: int :param num_iterations: total number of pages required for transfer of entire data :type num_iterations: int

Returns: None

get_parallel_payloads(page_size: int, number_of_workers: int) → List[dict]

gives a list of dicts that contain start index, page size, and number of iterations :param page_size: number of elements in each page / api call :type page_size: int :param number_of_workers: total number of parallel workers for which the payload needs :type number_of_workers: int :param to be distributed:

Returns: List[dict] eg: [

{‘number_of_pages’: 393, ‘page_size’: 1000, ‘start_index’: 0, ‘worker’: 0}, {‘number_of_pages’: 393, ‘page_size’: 1000, ‘start_index’: 393000, ‘worker’: 1}, {‘number_of_pages’: 393, ‘page_size’: 1000, ‘start_index’: 786000, ‘worker’: 2},

]

transfer_all() → None

Iteratively transfer all pages of data Returns: None

transfer_next_iteration() → bool

Transfers the next page of data Returns: bool true if the data transfer succeeded, False if reached end of iterations

class hip_data_tools.etl.adwords_to_s3.AdWordsToS3Settings(source_query_fragment: googleads.adwords.ServiceQueryBuilder, source_service: str, source_service_version: str, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)

Bases: object

S3 to Cassandra ETL settings

hip_data_tools.etl.adwords_to_s3.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.athena_to_adwords module

handle ETL of data from Athena to Cassandra

class hip_data_tools.etl.athena_to_adwords.AthenaToAdWordsOfflineConversion(settings: hip_data_tools.etl.athena_to_adwords.AthenaToAdWordsOfflineConversionSettings)

Bases: hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrame

Class to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: AthenaToCassandraSettings

upload_all() → List[dict]

Upload all files from the Athena table onto AdWords offline conversion Returns List[dict]: a list of issues in the format [

{
“error”: ” some error”, “data”: { … original data body of the data that caused issues }

}.

]

upload_next() → List[dict]

Upload the next file in line from the athena table onto AdWords offline conversion Returns List[dict]: a list of issues in the format [

{
“error”: ” some error”, “data”: { … original data body of the data that caused issues }

},

]

class hip_data_tools.etl.athena_to_adwords.AthenaToAdWordsOfflineConversionSettings(source_database: str, source_table: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, transformation_column_mapping: dict, etl_identifier: str, etl_state_manager_keyspace: str, etl_state_manager_connection: hip_data_tools.apache.cassandra.CassandraConnectionSettings, destination_batch_size: int, destination_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings)

Bases: hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameSettings

S3 to Cassandra ETL settings

hip_data_tools.etl.athena_to_adwords.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.athena_to_athena module

handle ETL of data from Athena to Athena

class hip_data_tools.etl.athena_to_athena.AthenaToAthena(settings: hip_data_tools.etl.athena_to_athena.AthenaToAthenaSettings)

Bases: object

ETL To transfer data from an Athena SQL into an Athena Table :param settings: Settings for the ETL :type settings: AthenaToAthenaSettings

execute() → None

Execute the ETL by running an Athena CTAS statement Returns: None

generate_create_table_statement() → str

Generates an Athena compliant ctas sql statement for the ETL Returns: str

class hip_data_tools.etl.athena_to_athena.AthenaToAthenaSettings(source_sql: str, source_database: str, target_database: str, target_table: str, target_data_format: str, target_s3_bucket: str, target_s3_dir: str, target_partition_columns: Optional[List[str]], connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)

Bases: object

Athena To Athena ETL settings

hip_data_tools.etl.athena_to_athena.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.athena_to_cassandra module

handle ETL of data from Athena to Cassandra

class hip_data_tools.etl.athena_to_cassandra.AthenaToCassandra(settings: hip_data_tools.etl.athena_to_cassandra.AthenaToCassandraSettings)

Bases: hip_data_tools.etl.s3_to_cassandra.S3ToCassandra

Class to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: AthenaToCassandraSettings

class hip_data_tools.etl.athena_to_cassandra.AthenaToCassandraSettings(source_database: str, source_table: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, destination_keyspace: str, destination_table: str, destination_table_primary_keys: list, destination_connection_settings: hip_data_tools.apache.cassandra.CassandraConnectionSettings, destination_table_options_statement: str = '', destination_batch_size: int = 1)

Bases: object

S3 to Cassandra ETL settings

hip_data_tools.etl.athena_to_cassandra.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.athena_to_dataframe module

handle ETL of data from Athena to Cassandra

class hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrame(settings: hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameSettings)

Bases: hip_data_tools.etl.s3_to_dataframe.S3ToDataFrame

Class to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: AthenaToCassandraSettings

class hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameSettings(source_database: str, source_table: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)

Bases: object

S3 to Cassandra ETL settings

hip_data_tools.etl.athena_to_dataframe.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.athena_to_s3 module

handle ETL of data from Athena to S3

class hip_data_tools.etl.athena_to_s3.AthenaToS3(settings: hip_data_tools.etl.athena_to_s3.AthenaToS3Settings)

Bases: object

ETL To transfer data from an Athena sql to an s3 location :param settings: Settings for the etl :type settings: AthenaToS3Settings

execute() → None

Execute the ETL to transfer data from Athena to S3 Returns: None

class hip_data_tools.etl.athena_to_s3.AthenaToS3Settings(source_sql: str, source_database: str, temporary_database: Optional[str], temporary_table: Optional[str], target_data_format: str, target_s3_bucket: str, target_s3_dir: str, target_partition_columns: Optional[List[str]], connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)

Bases: object

Athena To S3 ETL settings

hip_data_tools.etl.athena_to_s3.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.common module

Common ETL specific utilities and methods

class hip_data_tools.etl.common.DataFrameTransformer

Bases: hip_data_tools.etl.common.Transformer

Abstract base class to handle DataFrame to DataFrame transformation

transform(data: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Transform the data element provided

Parameters:data (Any) – A data Element

Returns: Any

class hip_data_tools.etl.common.DictTransformer

Bases: hip_data_tools.etl.common.Transformer

Abstract base class to handle dict to dict transformation

transform(data: Dict[KT, VT]) → Dict[KT, VT]

Transform the data element provided

Parameters:data (Any) – A data Element

Returns: Any

class hip_data_tools.etl.common.ETL(extractor: hip_data_tools.etl.common.Extractor, loader: hip_data_tools.etl.common.Loader, transformers: Optional[List[hip_data_tools.etl.common.Transformer]] = None)

Bases: abc.ABC

Base ETL class that defines the interaction of the three components of an ETL :param extractor: source data extractor :type extractor: Extractor :param transformers: series of transformation :type transformers: List[Transformer] :param loader: loader to write data into a sink :type loader: Loader

execute_all() → None

Execute the Extraction, transformation and loading of all data element from the source

Returns: None

execute_next() → None

Execute the Extraction, transformation and loading of the next data element from the source

Returns: None

has_next() → bool

Does the Source have any more data elements

Returns (bool): True if data exists

reset_source() → None

Reset state of the source Loader, this will cause the loader to forget the items that have been loaded already and start afresh

Returns:

class hip_data_tools.etl.common.EtlSinkRecordState(**values)

Bases: cassandra.cqlengine.models.Model

Cassandra ORM model for the Etl Sink States

exception DoesNotExist

Bases: cassandra.cqlengine.models.DoesNotExist

exception MultipleObjectsReturned

Bases: cassandra.cqlengine.models.MultipleObjectsReturned

etl_signature = <cassandra.cqlengine.models.ColumnQueryEvaluator object>
pk = <cassandra.cqlengine.models.ColumnQueryEvaluator object>
record_identifier = <cassandra.cqlengine.models.ColumnQueryEvaluator object>
record_state = <cassandra.cqlengine.models.ColumnQueryEvaluator object>
state_created = <cassandra.cqlengine.models.ColumnQueryEvaluator object>
state_last_updated = <cassandra.cqlengine.models.ColumnQueryEvaluator object>
class hip_data_tools.etl.common.EtlSinkRecordStateManager(record_identifier: str, etl_signature: str)

Bases: object

The Generic ETL Sink State manager that manages and persists the state of a record :param record_identifier: A unique Identifier string to identify the sink record :type record_identifier: str :param etl_signature: The Unique ETL Signature to identify the ETL :type etl_signature: str

current_state() → hip_data_tools.etl.common.EtlStates

Get current state of the sid record Returns (EtlStates): current state

failed() → None

Mark Record as failed Returns: None

processing() → None

Mark Record as processing Returns: None

ready() → None

Mark Record as ready Returns: None

succeeded() → None

Mark Record as succeeded Returns: None

class hip_data_tools.etl.common.EtlStates

Bases: enum.Enum

Enumerator for the possible states of an ETL

Failed = 'failed'
Processing = 'processing'
Ready = 'ready'
Succeeded = 'succeeded'
class hip_data_tools.etl.common.Extractor(settings: hip_data_tools.etl.common.SourceSettings)

Bases: abc.ABC

Abstract base class to define the functionalities of an Extractor

Parameters:settings (SourceSettings) – settings used to connect to a source
extract_next() → Any

Extracts a single datapoint

Returns: Any

has_next() → bool

Checks if the extractor has any more data points

Returns: bool

reset() → None

Reset the state of the Extractor, usually reverting the state to its initial position

Returns: None

class hip_data_tools.etl.common.Loader(settings: hip_data_tools.etl.common.SinkSettings)

Bases: abc.ABC

Abstract Base class for defining Loaders that write data to sinks

Parameters:settings (SinkSettings) – setting to connect to the sink
load(data: Any) → None

Load a given data point onto the sink

Parameters:data (Any) – Any single data point

Returns: None

class hip_data_tools.etl.common.SinkSettings

Bases: object

Dataclass to encapsulate settings to connect to and write to a data sink

class hip_data_tools.etl.common.SourceSettings

Bases: object

Abstract base dataclass for source settings

class hip_data_tools.etl.common.Transformer

Bases: abc.ABC, typing.Generic

Abstract Base class for handling data transformations

transform(data: FromDataElementType) → ToDataElementType

Transform the data element provided

Parameters:data (Any) – A data Element

Returns: Any

hip_data_tools.etl.common.current_epoch() → int

Get the current epoch to millisecond precision Returns: int

hip_data_tools.etl.common.get_random_string(length: int) → str

A random ascii lowercase string of certain length :param length: length of the random string :type length: int

Returns: str

hip_data_tools.etl.common.random() → x in the interval [0, 1).
hip_data_tools.etl.common.sync_etl_state_table()

Utility method to sync (Create) the table as per ORM model Returns: None

hip_data_tools.etl.google_sheet_to_athena module

Module to deal with data transfer from Google sheets to Athena

class hip_data_tools.etl.google_sheet_to_athena.GoogleSheetToAthena(settings: hip_data_tools.etl.google_sheet_to_athena.GoogleSheetsToAthenaSettings)

Bases: hip_data_tools.etl.google_sheet_to_s3.GoogleSheetToS3

Class to transfer data from google sheet to athena :param settings: the settings around the etl to be executed :type settings: GoogleSheetsToAthenaSettings

load_sheet_to_athena()

Load google sheet into Athena :return: None

class hip_data_tools.etl.google_sheet_to_athena.GoogleSheetsToAthenaSettings(source_workbook_url: str, source_sheet: str, source_row_range: str, source_field_names_row_number: int, source_field_types_row_number: int, source_data_start_row_number: int, source_connection_settings: hip_data_tools.google.common.GoogleApiConnectionSettings, manual_partition_key_value: dict, target_s3_bucket: str, target_s3_dir: str, target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, target_database: str, target_table_name: str, target_table_ddl_progress: bool)

Bases: hip_data_tools.etl.google_sheet_to_s3.GoogleSheetsToS3Settings

Google sheets to Athena ETL settings :param source_workbook_url: the url of the workbook

Parameters:
  • source_sheet – name of the google sheet (eg: sheet1)
  • source_row_range – range of rows (eg: ‘2:5’)
  • source_field_names_row_number – row number of the field names (eg: 4). Assumes the data starts at first column and there is no gaps. There should not be 2 fields with the same name.
  • source_field_types_row_number – row number of the field types (eg: 5)
  • source_data_start_row_number – starting row number of the actual data
  • source_connection_settings – GoogleApiConnectionSettings with google api keys dictionary object
  • manual_partition_key_value – a dictionary with partition column name and value. Only one partition key can be used and this value need to be string (eg: {“column”: “start_date”, “value”: “2020-03-08”})
  • target_database – name of the athena database (eg: dev)
  • target_table_name – name of the athena table (eg: ‘sheet_table’)
  • target_s3_bucket – s3 bucket to store the files (eg: au-test-bucket)
  • target_s3_dir – s3 directory to store the files (eg: sheets/new)
  • target_connection_settings – aws connection settings
  • target_table_ddl_progress – if this is true, the target table will be dropped and recreated
hip_data_tools.etl.google_sheet_to_athena.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.google_sheet_to_s3 module

Module to deal with data transfer from Google sheets to S3

class hip_data_tools.etl.google_sheet_to_s3.GoogleSheetToS3(settings: hip_data_tools.etl.google_sheet_to_s3.GoogleSheetsToS3Settings)

Bases: object

Class to transfer data from google sheet to athena :param settings: the settings around the etl to be executed :type settings: GoogleSheetsToAthenaSettings

write_sheet_data_to_s3()

Write the data frame into S3 :param s3_key: s3 key :type s3_key: string

Returns:None
class hip_data_tools.etl.google_sheet_to_s3.GoogleSheetsToS3Settings(source_workbook_url: str, source_sheet: str, source_row_range: str, source_field_names_row_number: int, source_field_types_row_number: int, source_data_start_row_number: int, source_connection_settings: hip_data_tools.google.common.GoogleApiConnectionSettings, manual_partition_key_value: dict, target_s3_bucket: str, target_s3_dir: str, target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)

Bases: object

Google sheets to Athena ETL settings :param source_workbook_url: str :param source_sheet: str :param source_row_range: str :param source_field_names_row_number: int :param source_field_types_row_number: int :param source_data_start_row_number: int :param source_connection_settings: GoogleApiConnectionSettings :param manual_partition_key_value: dict :param target_s3_bucket: str :param target_s3_dir: str :param target_connection_settings: AwsConnectionSettings

hip_data_tools.etl.google_sheet_to_s3.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.s3_to_cassandra module

Module to deal with data transfer from S3 to Cassandra

class hip_data_tools.etl.s3_to_cassandra.S3ToCassandra(settings: hip_data_tools.etl.s3_to_cassandra.S3ToCassandraSettings)

Bases: hip_data_tools.etl.s3_to_dataframe.S3ToDataFrame

Class to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: S3ToCassandraSettings

create_and_upsert_all() → List[List[cassandra.datastax.graph.query.Result]]

First creates the table and then upsert all s3 files to the table Returns: None

create_table()

Creates the destination cassandra table if not exists Returns: None

upsert_all_files() → List[List[cassandra.datastax.graph.query.Result]]

Upsert all files from s3 sequentially into cassandra Returns: None

upsert_file(key: str) → List[cassandra.datastax.graph.query.Result]

Read a parquet file from s3 and upsert the records to Cassandra :param key: s3 key for the parquet file

Returns: None

class hip_data_tools.etl.s3_to_cassandra.S3ToCassandraSettings(source_bucket: str, source_key_prefix: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, destination_keyspace: str, destination_table: str, destination_table_primary_keys: list, destination_connection_settings: hip_data_tools.apache.cassandra.CassandraConnectionSettings, destination_table_options_statement: str = '', destination_batch_size: int = 1)

Bases: hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameSettings

S3 to Cassandra ETL settings

hip_data_tools.etl.s3_to_cassandra.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.s3_to_dataframe module

Module to deal with data transfer from S3 to Cassandra

class hip_data_tools.etl.s3_to_dataframe.S3ToDataFrame(settings: hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameSettings)

Bases: object

Class to transfer parquet data from s3 to Pandas DataFrame :param settings: the settings around the etl to be executed :type settings: S3ToDataFrameSettings

get_all_files_as_data_frame() → pandas.core.frame.DataFrame

Downloads and collates all files in a given s3 dir and returns a single DataFrame Returns: DataFrame

get_data_frame(key: str) → pandas.core.frame.DataFrame

Read a parquet file from s3 and convert it to a parquet DataFrame :param key: s3 key for the parquet file

Returns: None

list_source_files() → List[str]

Lists all the files that are encompassed under the s3 location in settings Returns: list[str]

next() → pandas.core.frame.DataFrame

Gets the next DataFrame from the next file on s3.

Please note, if you are trying to run window functions or operations on the data set that spans multiple rows, then using this method may result in incorrect or inaccurate results. For such use cases, use get_all_files_as_data_frame() Returns: DataFrame

class hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameSettings(source_bucket: str, source_key_prefix: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)

Bases: object

S3 to Cassandra ETL settings

hip_data_tools.etl.s3_to_dataframe.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)

A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.

Parameters:
  • these (dict of str to attr.ib) –

    A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add __repr__ methods to Django models) or don’t want to.

    If these is not None, attrs will not search the class body for attributes and will not remove any attributes from it.

    If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.

  • repr_ns (str) – When using nested classes, there’s no way in Python 2 to automatically detect that. Therefore it’s possible to set the namespace explicitly for a more meaningful repr output.
  • auto_detect (bool) –

    Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to True unless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).

    So for example by implementing __eq__ on a class yourself, attrs will deduce eq=False and won’t create neither __eq__ nor __ne__ (but Python classes come with a sensible __ne__ by default, so it should be enough to only implement __eq__ in most cases).

    Warning

    If you prevent attrs from creating the ordering methods for you (order=False, e.g. by implementing __le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.

    Passing True or False to init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.

    auto_detect requires Python 3. Setting it True on Python 2 raises a PythonTooOldError.

  • repr (bool) – Create a __repr__ method with a human readable representation of attrs attributes..
  • str (bool) – Create a __str__ method that is identical to __repr__. This is usually not necessary except for Exceptions.
  • eq (Optional[bool]) –

    If True or None (default), add __eq__ and __ne__ methods that check two instances for equality.

    They compare the instances as if they were tuples of their attrs attributes if and only if the types of both classes are identical!

  • order (Optional[bool]) – If True, add __lt__, __le__, __gt__, and __ge__ methods that behave like eq above and allow instances to be ordered. If None (default) mirror value of eq.
  • cmp (Optional[bool]) – Setting to True is equivalent to setting eq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order.
  • hash (Optional[bool]) –

    If None (default), the __hash__ method is generated according how eq and frozen are set.

    1. If both are True, attrs will generate a __hash__ for you.
    2. If eq is True and frozen is False, __hash__ will be set to None, marking it unhashable (which it is).
    3. If eq is False, __hash__ will be left untouched meaning the __hash__ method of the base class will be used (if base class is object, this means it will fall back to id-based hashing.).

    Although not recommended, you can decide for yourself and force attrs to create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passing True or not. Both of these cases are rather special and should be used carefully.

    See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.

  • init (bool) – Create a __init__ method that initializes the attrs attributes. Leading underscores are stripped for the argument name. If a __attrs_post_init__ method exists on the class, it will be called after the class is fully initialized.
  • slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
  • frozen (bool) –

    Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.

    Please note:

    1. This is achieved by installing a custom __setattr__ method on your class, so you can’t implement your own.
    2. True immutability is impossible in Python.
    3. This does have a minor a runtime performance impact <how-frozen> when initializing new instances. In other words: __init__ is slightly slower with frozen=True.
    4. If a class is frozen, you cannot modify self in __attrs_post_init__ or a self-written __init__. You can circumvent that limitation by using object.__setattr__(self, "attribute_name", value).
    5. Subclasses of a frozen class are frozen too.
  • weakref_slot (bool) – Make instances weak-referenceable. This has no effect unless slots is also enabled.
  • auto_attribs (bool) –

    If True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.

    In this case, you must annotate every field. If attrs encounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Use field_name: typing.Any = attr.ib(...) if you don’t want to set a type.

    If you assign a value to those attributes (e.g. x: int = 42), that value becomes the default value like if it were passed using attr.ib(default=42). Passing an instance of Factory also works as expected.

    Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.

  • kw_only (bool) – Make all attributes keyword-only (Python 3+) in the generated __init__ (if init is False, this parameter is ignored).
  • cache_hash (bool) – Ensure that the object’s hash code is computed only once and stored on the object. If this is set to True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined.
  • auto_exc (bool) –

    If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:

    • the values for eq, order, and hash are ignored and the instances compare and hash by the instance’s ids (N.B. attrs will not remove existing implementations of __hash__ or the equality methods. It just won’t add own ones.),
    • all attributes that are either passed into __init__ or have a default value are additionally available as a tuple in the args attribute,
    • the value of str is ignored leaving __str__ to base classes.
  • collect_by_mro (bool) –

    Setting this to True fixes the way attrs collects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.

    See issue #428 for more details.

  • getstate_setstate (Optional[bool]) –

    Note

    This is usually only interesting for slotted classes and you should probably just set auto_detect to True.

    If True, __getstate__ and __setstate__ are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes and False for dict classes.

    If auto_detect is True, and getstate_setstate is left None, and either __getstate__ or __setstate__ is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want).

  • on_setattr

    A callable that is run whenever the user attempts to set an attribute (either by assignment like i.x = 42 or by using setattr like setattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.

    If no exception is raised, the attribute is set to the return value of the callable.

    If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

hip_data_tools.etl.s3_to_s3 module

Module to deal with data transfer from S3 to Cassandra

class hip_data_tools.etl.s3_to_s3.S3ToS3FileCopy(source: hip_data_tools.etl.s3.S3SourceSettings, sink: hip_data_tools.etl.s3.S3SinkSettings, transformers: Optional[List[hip_data_tools.etl.s3.S3FileNameTransformer]] = None)

Bases: hip_data_tools.etl.common.ETL

Class to transfer objects from s3 to s3 :param source: :type source: S3FileLocationExtractor :param sink: :type sink: S3FileCopyLoader

Eg: >>> etl = S3ToS3FileCopy( … source = S3SourceSettings( … bucket=”MY_SOURCE_BUCKET_NAME”, … key_prefix=”foo/bar/”, … suffix=”parquet”, … connection_settings=aws_setting, … ), … sink = S3SinkSettings( … bucket=”MY_TARGET_BUCKET_NAME”, … connection_settings=aws_setting, … ), … transformers = [AddTargetS3KeyTransformer(target_key_prefix=”bar/baz/”)], … ) … >>> >>> etl.execute_next() …

Parameters:
  • source (S3SourceSettings) – Settings for source s3 files
  • sink (S3SinkSettings) – Settings for target s3 directory
  • transformers (List[S3FileNameTransformer]) – Transformers to change file names
list_source_files() → List[NewType.<locals>.new_type]

List the source files as per the source settings

Returns: List[S3Key]

Module contents