hip_data_tools.etl package¶
Submodules¶
hip_data_tools.etl.adwords_to_athena module¶
Module to deal with data transfer from Adwords to Athena
-
class
hip_data_tools.etl.adwords_to_athena.AdWordsReportsToAthena(settings: hip_data_tools.etl.adwords_to_athena.AdWordsReportsToAthenaSettings)¶ Bases:
hip_data_tools.etl.adwords_to_s3.AdWordsReportsToS3ETL Class to handle the transfer of data from adwords based on AWQL to S3 as parquet files :param settings: the etl settings to be used :type settings: AdWordsToS3Settings
-
add_partitions()¶ Add the current Data Transfer’s partition to Athena’s Metadata Returns: None
-
create_athena_table() → None¶ Creates an athena table on top of the transferred data Returns: None
-
get_target_prefix_with_partition_dirs() → str¶ Return the target s3 key prefix which includes partition directories Returns: modified target key prefix string
-
-
class
hip_data_tools.etl.adwords_to_athena.AdWordsReportsToAthenaSettings(source_query: googleads.adwords.ReportQuery, source_include_zero_impressions: bool, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, transformation_field_type_mask: Optional[Dict[str, numpy.dtype]], target_database: str, target_table: str, target_table_ddl_progress: bool, is_partitioned_table: bool, partition_values: Optional[List[Tuple[str, Any]]])¶ Bases:
hip_data_tools.etl.adwords_to_s3.AdWordsReportToS3SettingsSettings container for Adwords to Athena ETL
-
class
hip_data_tools.etl.adwords_to_athena.AdWordsToAthena(settings: hip_data_tools.etl.adwords_to_athena.AdWordsToAthenaSettings)¶ Bases:
hip_data_tools.etl.adwords_to_s3.AdWordsToS3ETL Class to handle the transfer of data from adwords based on AWQL to S3 as parquet files :param settings: the etl settings to be used :type settings: AdWordsToS3Settings
-
create_athena_table() → None¶ Creates an athena table on top of the transferred data Returns: None
-
-
class
hip_data_tools.etl.adwords_to_athena.AdWordsToAthenaSettings(source_query_fragment: googleads.adwords.ServiceQueryBuilder, source_service: str, source_service_version: str, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, target_database: str, target_table: str, target_table_ddl_progress: bool, is_partitioned_table: bool, partition_values: Optional[List[Tuple[str, Any]]])¶ Bases:
hip_data_tools.etl.adwords_to_s3.AdWordsToS3SettingsSettings container for Adwords to Athena ETL
-
hip_data_tools.etl.adwords_to_athena.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.adwords_to_s3 module¶
Module to deal with data transfer from Adwords to S3
-
class
hip_data_tools.etl.adwords_to_s3.AdWordsReportToS3Settings(source_query: googleads.adwords.ReportQuery, source_include_zero_impressions: bool, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, transformation_field_type_mask: Optional[Dict[str, numpy.dtype]])¶ Bases:
objectS3 to Cassandra ETL settings
-
class
hip_data_tools.etl.adwords_to_s3.AdWordsReportsToS3(settings: hip_data_tools.etl.adwords_to_s3.AdWordsReportToS3Settings)¶ Bases:
objectETL Class to handle the transfer of data from adwords reports based on AWQL to S3 as parquet :param settings: the etl settings to be used :type settings: AdWordsToS3Settings
-
transfer(**kwargs)¶ Transfer the entire report to s3 in parquet format Returns: None
-
-
class
hip_data_tools.etl.adwords_to_s3.AdWordsToS3(settings: hip_data_tools.etl.adwords_to_s3.AdWordsToS3Settings)¶ Bases:
objectETL Class to handle the transfer of data from adwords based on AWQL to S3 as parquet files :param settings: the etl settings to be used :type settings: AdWordsToS3Settings
-
build_query(start_index: int, page_size: int, num_iterations: int) → None¶ Builds the query based on the query fragment in settings, to be able to load data in parallel from the given start index for a number of iterations :param start_index: the start index to offset the beginning of query paging :type start_index: int :param page_size: number of elements in each page/api call :type page_size: int :param num_iterations: total number of pages required for transfer of entire data :type num_iterations: int
Returns: None
-
get_parallel_payloads(page_size: int, number_of_workers: int) → List[dict]¶ gives a list of dicts that contain start index, page size, and number of iterations :param page_size: number of elements in each page / api call :type page_size: int :param number_of_workers: total number of parallel workers for which the payload needs :type number_of_workers: int :param to be distributed:
Returns: List[dict] eg: [
{‘number_of_pages’: 393, ‘page_size’: 1000, ‘start_index’: 0, ‘worker’: 0}, {‘number_of_pages’: 393, ‘page_size’: 1000, ‘start_index’: 393000, ‘worker’: 1}, {‘number_of_pages’: 393, ‘page_size’: 1000, ‘start_index’: 786000, ‘worker’: 2},]
-
transfer_all() → None¶ Iteratively transfer all pages of data Returns: None
-
transfer_next_iteration() → bool¶ Transfers the next page of data Returns: bool true if the data transfer succeeded, False if reached end of iterations
-
-
class
hip_data_tools.etl.adwords_to_s3.AdWordsToS3Settings(source_query_fragment: googleads.adwords.ServiceQueryBuilder, source_service: str, source_service_version: str, source_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings, target_bucket: str, target_key_prefix: str, target_file_prefix: Optional[str], target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)¶ Bases:
objectS3 to Cassandra ETL settings
-
hip_data_tools.etl.adwords_to_s3.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.athena_to_adwords module¶
handle ETL of data from Athena to Cassandra
-
class
hip_data_tools.etl.athena_to_adwords.AthenaToAdWordsOfflineConversion(settings: hip_data_tools.etl.athena_to_adwords.AthenaToAdWordsOfflineConversionSettings)¶ Bases:
hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameClass to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: AthenaToCassandraSettings
-
upload_all() → List[dict]¶ Upload all files from the Athena table onto AdWords offline conversion Returns List[dict]: a list of issues in the format [
- {
- “error”: ” some error”, “data”: { … original data body of the data that caused issues }
}.
]
-
upload_next() → List[dict]¶ Upload the next file in line from the athena table onto AdWords offline conversion Returns List[dict]: a list of issues in the format [
- {
- “error”: ” some error”, “data”: { … original data body of the data that caused issues }
},
]
-
-
class
hip_data_tools.etl.athena_to_adwords.AthenaToAdWordsOfflineConversionSettings(source_database: str, source_table: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, transformation_column_mapping: dict, etl_identifier: str, etl_state_manager_keyspace: str, etl_state_manager_connection: hip_data_tools.apache.cassandra.CassandraConnectionSettings, destination_batch_size: int, destination_connection_settings: hip_data_tools.google.adwords.GoogleAdWordsConnectionSettings)¶ Bases:
hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameSettingsS3 to Cassandra ETL settings
-
hip_data_tools.etl.athena_to_adwords.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.athena_to_athena module¶
handle ETL of data from Athena to Athena
-
class
hip_data_tools.etl.athena_to_athena.AthenaToAthena(settings: hip_data_tools.etl.athena_to_athena.AthenaToAthenaSettings)¶ Bases:
objectETL To transfer data from an Athena SQL into an Athena Table :param settings: Settings for the ETL :type settings: AthenaToAthenaSettings
-
execute() → None¶ Execute the ETL by running an Athena CTAS statement Returns: None
-
generate_create_table_statement() → str¶ Generates an Athena compliant ctas sql statement for the ETL Returns: str
-
-
class
hip_data_tools.etl.athena_to_athena.AthenaToAthenaSettings(source_sql: str, source_database: str, target_database: str, target_table: str, target_data_format: str, target_s3_bucket: str, target_s3_dir: str, target_partition_columns: Optional[List[str]], connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)¶ Bases:
objectAthena To Athena ETL settings
-
hip_data_tools.etl.athena_to_athena.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.athena_to_cassandra module¶
handle ETL of data from Athena to Cassandra
-
class
hip_data_tools.etl.athena_to_cassandra.AthenaToCassandra(settings: hip_data_tools.etl.athena_to_cassandra.AthenaToCassandraSettings)¶ Bases:
hip_data_tools.etl.s3_to_cassandra.S3ToCassandraClass to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: AthenaToCassandraSettings
-
class
hip_data_tools.etl.athena_to_cassandra.AthenaToCassandraSettings(source_database: str, source_table: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, destination_keyspace: str, destination_table: str, destination_table_primary_keys: list, destination_connection_settings: hip_data_tools.apache.cassandra.CassandraConnectionSettings, destination_table_options_statement: str = '', destination_batch_size: int = 1)¶ Bases:
objectS3 to Cassandra ETL settings
-
hip_data_tools.etl.athena_to_cassandra.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.athena_to_dataframe module¶
handle ETL of data from Athena to Cassandra
-
class
hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrame(settings: hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameSettings)¶ Bases:
hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameClass to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: AthenaToCassandraSettings
-
class
hip_data_tools.etl.athena_to_dataframe.AthenaToDataFrameSettings(source_database: str, source_table: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)¶ Bases:
objectS3 to Cassandra ETL settings
-
hip_data_tools.etl.athena_to_dataframe.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.athena_to_s3 module¶
handle ETL of data from Athena to S3
-
class
hip_data_tools.etl.athena_to_s3.AthenaToS3(settings: hip_data_tools.etl.athena_to_s3.AthenaToS3Settings)¶ Bases:
objectETL To transfer data from an Athena sql to an s3 location :param settings: Settings for the etl :type settings: AthenaToS3Settings
-
execute() → None¶ Execute the ETL to transfer data from Athena to S3 Returns: None
-
-
class
hip_data_tools.etl.athena_to_s3.AthenaToS3Settings(source_sql: str, source_database: str, temporary_database: Optional[str], temporary_table: Optional[str], target_data_format: str, target_s3_bucket: str, target_s3_dir: str, target_partition_columns: Optional[List[str]], connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)¶ Bases:
objectAthena To S3 ETL settings
-
hip_data_tools.etl.athena_to_s3.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.common module¶
Common ETL specific utilities and methods
-
class
hip_data_tools.etl.common.DataFrameTransformer¶ Bases:
hip_data_tools.etl.common.TransformerAbstract base class to handle DataFrame to DataFrame transformation
-
transform(data: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Transform the data element provided
Parameters: data (Any) – A data Element Returns: Any
-
-
class
hip_data_tools.etl.common.DictTransformer¶ Bases:
hip_data_tools.etl.common.TransformerAbstract base class to handle dict to dict transformation
-
transform(data: Dict[KT, VT]) → Dict[KT, VT]¶ Transform the data element provided
Parameters: data (Any) – A data Element Returns: Any
-
-
class
hip_data_tools.etl.common.ETL(extractor: hip_data_tools.etl.common.Extractor, loader: hip_data_tools.etl.common.Loader, transformers: Optional[List[hip_data_tools.etl.common.Transformer]] = None)¶ Bases:
abc.ABCBase ETL class that defines the interaction of the three components of an ETL :param extractor: source data extractor :type extractor: Extractor :param transformers: series of transformation :type transformers: List[Transformer] :param loader: loader to write data into a sink :type loader: Loader
-
execute_all() → None¶ Execute the Extraction, transformation and loading of all data element from the source
Returns: None
-
execute_next() → None¶ Execute the Extraction, transformation and loading of the next data element from the source
Returns: None
-
has_next() → bool¶ Does the Source have any more data elements
Returns (bool): True if data exists
-
reset_source() → None¶ Reset state of the source Loader, this will cause the loader to forget the items that have been loaded already and start afresh
Returns:
-
-
class
hip_data_tools.etl.common.EtlSinkRecordState(**values)¶ Bases:
cassandra.cqlengine.models.ModelCassandra ORM model for the Etl Sink States
-
exception
DoesNotExist¶ Bases:
cassandra.cqlengine.models.DoesNotExist
-
exception
MultipleObjectsReturned¶ Bases:
cassandra.cqlengine.models.MultipleObjectsReturned
-
etl_signature= <cassandra.cqlengine.models.ColumnQueryEvaluator object>¶
-
pk= <cassandra.cqlengine.models.ColumnQueryEvaluator object>¶
-
record_identifier= <cassandra.cqlengine.models.ColumnQueryEvaluator object>¶
-
record_state= <cassandra.cqlengine.models.ColumnQueryEvaluator object>¶
-
state_created= <cassandra.cqlengine.models.ColumnQueryEvaluator object>¶
-
state_last_updated= <cassandra.cqlengine.models.ColumnQueryEvaluator object>¶
-
exception
-
class
hip_data_tools.etl.common.EtlSinkRecordStateManager(record_identifier: str, etl_signature: str)¶ Bases:
objectThe Generic ETL Sink State manager that manages and persists the state of a record :param record_identifier: A unique Identifier string to identify the sink record :type record_identifier: str :param etl_signature: The Unique ETL Signature to identify the ETL :type etl_signature: str
-
current_state() → hip_data_tools.etl.common.EtlStates¶ Get current state of the sid record Returns (EtlStates): current state
-
failed() → None¶ Mark Record as failed Returns: None
-
processing() → None¶ Mark Record as processing Returns: None
-
ready() → None¶ Mark Record as ready Returns: None
-
succeeded() → None¶ Mark Record as succeeded Returns: None
-
-
class
hip_data_tools.etl.common.EtlStates¶ Bases:
enum.EnumEnumerator for the possible states of an ETL
-
Failed= 'failed'¶
-
Processing= 'processing'¶
-
Ready= 'ready'¶
-
Succeeded= 'succeeded'¶
-
-
class
hip_data_tools.etl.common.Extractor(settings: hip_data_tools.etl.common.SourceSettings)¶ Bases:
abc.ABCAbstract base class to define the functionalities of an Extractor
Parameters: settings (SourceSettings) – settings used to connect to a source -
extract_next() → Any¶ Extracts a single datapoint
Returns: Any
-
has_next() → bool¶ Checks if the extractor has any more data points
Returns: bool
-
reset() → None¶ Reset the state of the Extractor, usually reverting the state to its initial position
Returns: None
-
-
class
hip_data_tools.etl.common.Loader(settings: hip_data_tools.etl.common.SinkSettings)¶ Bases:
abc.ABCAbstract Base class for defining Loaders that write data to sinks
Parameters: settings (SinkSettings) – setting to connect to the sink -
load(data: Any) → None¶ Load a given data point onto the sink
Parameters: data (Any) – Any single data point Returns: None
-
-
class
hip_data_tools.etl.common.SinkSettings¶ Bases:
objectDataclass to encapsulate settings to connect to and write to a data sink
-
class
hip_data_tools.etl.common.SourceSettings¶ Bases:
objectAbstract base dataclass for source settings
-
class
hip_data_tools.etl.common.Transformer¶ Bases:
abc.ABC,typing.GenericAbstract Base class for handling data transformations
-
transform(data: FromDataElementType) → ToDataElementType¶ Transform the data element provided
Parameters: data (Any) – A data Element Returns: Any
-
-
hip_data_tools.etl.common.current_epoch() → int¶ Get the current epoch to millisecond precision Returns: int
-
hip_data_tools.etl.common.get_random_string(length: int) → str¶ A random ascii lowercase string of certain length :param length: length of the random string :type length: int
Returns: str
-
hip_data_tools.etl.common.random() → x in the interval [0, 1).¶
-
hip_data_tools.etl.common.sync_etl_state_table()¶ Utility method to sync (Create) the table as per ORM model Returns: None
hip_data_tools.etl.google_sheet_to_athena module¶
Module to deal with data transfer from Google sheets to Athena
-
class
hip_data_tools.etl.google_sheet_to_athena.GoogleSheetToAthena(settings: hip_data_tools.etl.google_sheet_to_athena.GoogleSheetsToAthenaSettings)¶ Bases:
hip_data_tools.etl.google_sheet_to_s3.GoogleSheetToS3Class to transfer data from google sheet to athena :param settings: the settings around the etl to be executed :type settings: GoogleSheetsToAthenaSettings
-
load_sheet_to_athena()¶ Load google sheet into Athena :return: None
-
-
class
hip_data_tools.etl.google_sheet_to_athena.GoogleSheetsToAthenaSettings(source_workbook_url: str, source_sheet: str, source_row_range: str, source_field_names_row_number: int, source_field_types_row_number: int, source_data_start_row_number: int, source_connection_settings: hip_data_tools.google.common.GoogleApiConnectionSettings, manual_partition_key_value: dict, target_s3_bucket: str, target_s3_dir: str, target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, target_database: str, target_table_name: str, target_table_ddl_progress: bool)¶ Bases:
hip_data_tools.etl.google_sheet_to_s3.GoogleSheetsToS3SettingsGoogle sheets to Athena ETL settings :param source_workbook_url: the url of the workbook
Parameters: - source_sheet – name of the google sheet (eg: sheet1)
- source_row_range – range of rows (eg: ‘2:5’)
- source_field_names_row_number – row number of the field names (eg: 4). Assumes the data starts at first column and there is no gaps. There should not be 2 fields with the same name.
- source_field_types_row_number – row number of the field types (eg: 5)
- source_data_start_row_number – starting row number of the actual data
- source_connection_settings – GoogleApiConnectionSettings with google api keys dictionary object
- manual_partition_key_value – a dictionary with partition column name and value. Only one partition key can be used and this value need to be string (eg: {“column”: “start_date”, “value”: “2020-03-08”})
- target_database – name of the athena database (eg: dev)
- target_table_name – name of the athena table (eg: ‘sheet_table’)
- target_s3_bucket – s3 bucket to store the files (eg: au-test-bucket)
- target_s3_dir – s3 directory to store the files (eg: sheets/new)
- target_connection_settings – aws connection settings
- target_table_ddl_progress – if this is true, the target table will be dropped and recreated
-
hip_data_tools.etl.google_sheet_to_athena.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.google_sheet_to_s3 module¶
Module to deal with data transfer from Google sheets to S3
-
class
hip_data_tools.etl.google_sheet_to_s3.GoogleSheetToS3(settings: hip_data_tools.etl.google_sheet_to_s3.GoogleSheetsToS3Settings)¶ Bases:
objectClass to transfer data from google sheet to athena :param settings: the settings around the etl to be executed :type settings: GoogleSheetsToAthenaSettings
-
write_sheet_data_to_s3()¶ Write the data frame into S3 :param s3_key: s3 key :type s3_key: string
Returns: None
-
-
class
hip_data_tools.etl.google_sheet_to_s3.GoogleSheetsToS3Settings(source_workbook_url: str, source_sheet: str, source_row_range: str, source_field_names_row_number: int, source_field_types_row_number: int, source_data_start_row_number: int, source_connection_settings: hip_data_tools.google.common.GoogleApiConnectionSettings, manual_partition_key_value: dict, target_s3_bucket: str, target_s3_dir: str, target_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)¶ Bases:
objectGoogle sheets to Athena ETL settings :param source_workbook_url: str :param source_sheet: str :param source_row_range: str :param source_field_names_row_number: int :param source_field_types_row_number: int :param source_data_start_row_number: int :param source_connection_settings: GoogleApiConnectionSettings :param manual_partition_key_value: dict :param target_s3_bucket: str :param target_s3_dir: str :param target_connection_settings: AwsConnectionSettings
-
hip_data_tools.etl.google_sheet_to_s3.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.s3_to_cassandra module¶
Module to deal with data transfer from S3 to Cassandra
-
class
hip_data_tools.etl.s3_to_cassandra.S3ToCassandra(settings: hip_data_tools.etl.s3_to_cassandra.S3ToCassandraSettings)¶ Bases:
hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameClass to transfer parquet data from s3 to Cassandra :param settings: the settings around the etl to be executed :type settings: S3ToCassandraSettings
-
create_and_upsert_all() → List[List[cassandra.datastax.graph.query.Result]]¶ First creates the table and then upsert all s3 files to the table Returns: None
-
create_table()¶ Creates the destination cassandra table if not exists Returns: None
-
upsert_all_files() → List[List[cassandra.datastax.graph.query.Result]]¶ Upsert all files from s3 sequentially into cassandra Returns: None
-
upsert_file(key: str) → List[cassandra.datastax.graph.query.Result]¶ Read a parquet file from s3 and upsert the records to Cassandra :param key: s3 key for the parquet file
Returns: None
-
-
class
hip_data_tools.etl.s3_to_cassandra.S3ToCassandraSettings(source_bucket: str, source_key_prefix: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings, destination_keyspace: str, destination_table: str, destination_table_primary_keys: list, destination_connection_settings: hip_data_tools.apache.cassandra.CassandraConnectionSettings, destination_table_options_statement: str = '', destination_batch_size: int = 1)¶ Bases:
hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameSettingsS3 to Cassandra ETL settings
-
hip_data_tools.etl.s3_to_cassandra.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.s3_to_dataframe module¶
Module to deal with data transfer from S3 to Cassandra
-
class
hip_data_tools.etl.s3_to_dataframe.S3ToDataFrame(settings: hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameSettings)¶ Bases:
objectClass to transfer parquet data from s3 to Pandas DataFrame :param settings: the settings around the etl to be executed :type settings: S3ToDataFrameSettings
-
get_all_files_as_data_frame() → pandas.core.frame.DataFrame¶ Downloads and collates all files in a given s3 dir and returns a single DataFrame Returns: DataFrame
-
get_data_frame(key: str) → pandas.core.frame.DataFrame¶ Read a parquet file from s3 and convert it to a parquet DataFrame :param key: s3 key for the parquet file
Returns: None
-
list_source_files() → List[str]¶ Lists all the files that are encompassed under the s3 location in settings Returns: list[str]
-
next() → pandas.core.frame.DataFrame¶ Gets the next DataFrame from the next file on s3.
Please note, if you are trying to run window functions or operations on the data set that spans multiple rows, then using this method may result in incorrect or inaccurate results. For such use cases, use get_all_files_as_data_frame() Returns: DataFrame
-
-
class
hip_data_tools.etl.s3_to_dataframe.S3ToDataFrameSettings(source_bucket: str, source_key_prefix: str, source_connection_settings: hip_data_tools.aws.common.AwsConnectionSettings)¶ Bases:
objectS3 to Cassandra ETL settings
-
hip_data_tools.etl.s3_to_dataframe.dataclass(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, *, auto_attribs=True, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None)¶ A class decorator that adds dunder-methods according to the specified attributes using attr.ib or the these argument.
Parameters: - these (dict of str to attr.ib) –
A dictionary of name to attr.ib mappings. This is useful to avoid the definition of your attributes within the class body because you can’t (e.g. if you want to add
__repr__methods to Django models) or don’t want to.If these is not
None,attrswill not search the class body for attributes and will not remove any attributes from it.If these is an ordered dict (dict on Python 3.6+, collections.OrderedDict otherwise), the order is deduced from the order of the attributes inside these. Otherwise the order of the definition of the attributes is used.
- repr_ns (str) – When using nested classes, there’s no way in Python 2
to automatically detect that. Therefore it’s possible to set the
namespace explicitly for a more meaningful
reproutput. - auto_detect (bool) –
Instead of setting the init, repr, eq, order, and hash arguments explicitly, assume they are set to
Trueunless any of the involved methods for one of the arguments is implemented in the current class (i.e. it is not inherited from some base class).So for example by implementing
__eq__on a class yourself,attrswill deduceeq=Falseand won’t create neither__eq__nor__ne__(but Python classes come with a sensible__ne__by default, so it should be enough to only implement__eq__in most cases).Warning
If you prevent
attrsfrom creating the ordering methods for you (order=False, e.g. by implementing__le__), it becomes your responsibility to make sure its ordering is sound. The best way is to use the functools.total_ordering decorator.Passing
TrueorFalseto init, repr, eq, order, cmp, or hash overrides whatever auto_detect would determine.auto_detect requires Python 3. Setting it
Trueon Python 2 raises a PythonTooOldError. - repr (bool) – Create a
__repr__method with a human readable representation ofattrsattributes.. - str (bool) – Create a
__str__method that is identical to__repr__. This is usually not necessary except for Exceptions. - eq (Optional[bool]) –
If
TrueorNone(default), add__eq__and__ne__methods that check two instances for equality.They compare the instances as if they were tuples of their
attrsattributes if and only if the types of both classes are identical! - order (Optional[bool]) – If
True, add__lt__,__le__,__gt__, and__ge__methods that behave like eq above and allow instances to be ordered. IfNone(default) mirror value of eq. - cmp (Optional[bool]) – Setting to
Trueis equivalent to settingeq=True, order=True. Deprecated in favor of eq and order, has precedence over them for backward-compatibility though. Must not be mixed with eq or order. - hash (Optional[bool]) –
If
None(default), the__hash__method is generated according how eq and frozen are set.- If both are True,
attrswill generate a__hash__for you. - If eq is True and frozen is False,
__hash__will be set to None, marking it unhashable (which it is). - If eq is False,
__hash__will be left untouched meaning the__hash__method of the base class will be used (if base class isobject, this means it will fall back to id-based hashing.).
Although not recommended, you can decide for yourself and force
attrsto create one (e.g. if the class is immutable even though you didn’t freeze it programmatically) by passingTrueor not. Both of these cases are rather special and should be used carefully.See our documentation on hashing, Python’s documentation on object.__hash__, and the GitHub issue that led to the default behavior for more details.
- If both are True,
- init (bool) – Create a
__init__method that initializes theattrsattributes. Leading underscores are stripped for the argument name. If a__attrs_post_init__method exists on the class, it will be called after the class is fully initialized. - slots (bool) – Create a slotted class <slotted classes> that’s more memory-efficient.
- frozen (bool) –
Make instances immutable after initialization. If someone attempts to modify a frozen instance, attr.exceptions.FrozenInstanceError is raised.
Please note:
- This is achieved by installing a custom
__setattr__method on your class, so you can’t implement your own. - True immutability is impossible in Python.
- This does have a minor a runtime performance impact
<how-frozen> when initializing new instances. In other words:
__init__is slightly slower withfrozen=True. - If a class is frozen, you cannot modify
selfin__attrs_post_init__or a self-written__init__. You can circumvent that limitation by usingobject.__setattr__(self, "attribute_name", value). - Subclasses of a frozen class are frozen too.
- This is achieved by installing a custom
- weakref_slot (bool) – Make instances weak-referenceable. This has no
effect unless
slotsis also enabled. - auto_attribs (bool) –
If
True, collect PEP 526-annotated attributes (Python 3.6 and later only) from the class body.In this case, you must annotate every field. If
attrsencounters a field that is set to an attr.ib but lacks a type annotation, an attr.exceptions.UnannotatedAttributeError is raised. Usefield_name: typing.Any = attr.ib(...)if you don’t want to set a type.If you assign a value to those attributes (e.g.
x: int = 42), that value becomes the default value like if it were passed usingattr.ib(default=42). Passing an instance of Factory also works as expected.Attributes annotated as typing.ClassVar, and attributes that are neither annotated nor set to an attr.ib are ignored.
- kw_only (bool) – Make all attributes keyword-only (Python 3+)
in the generated
__init__(ifinitisFalse, this parameter is ignored). - cache_hash (bool) – Ensure that the object’s hash code is computed
only once and stored on the object. If this is set to
True, hashing must be either explicitly or implicitly enabled for this class. If the hash code is cached, avoid any reassignments of fields involved in hash code computation or mutations of the objects those fields point to after object creation. If such changes occur, the behavior of the object’s hash code is undefined. - auto_exc (bool) –
If the class subclasses BaseException (which implicitly includes any subclass of any exception), the following happens to behave like a well-behaved Python exceptions class:
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
attrswill not remove existing implementations of__hash__or the equality methods. It just won’t add own ones.), - all attributes that are either passed into
__init__or have a default value are additionally available as a tuple in theargsattribute, - the value of str is ignored leaving
__str__to base classes.
- the values for eq, order, and hash are ignored and the
instances compare and hash by the instance’s ids (N.B.
- collect_by_mro (bool) –
Setting this to True fixes the way
attrscollects attributes from base classes. The default behavior is incorrect in certain cases of multiple inheritance. It should be on by default but is kept off for backward-compatability.See issue #428 for more details.
- getstate_setstate (Optional[bool]) –
Note
This is usually only interesting for slotted classes and you should probably just set auto_detect to True.
If True,
__getstate__and__setstate__are generated and attached to the class. This is necessary for slotted classes to be pickleable. If left None, it’s True by default for slotted classes andFalsefor dict classes.If auto_detect is True, and getstate_setstate is left None, and either
__getstate__or__setstate__is detected directly on the class (i.e. not inherited), it is set to False (this is usually what you want). - on_setattr –
A callable that is run whenever the user attempts to set an attribute (either by assignment like
i.x = 42or by using setattr likesetattr(i, "x", 42)). It receives the same argument as validators: the instance, the attribute that is being modified, and the new value.If no exception is raised, the attribute is set to the return value of the callable.
If a list of callables is passed, they’re automatically wrapped in an attr.setters.pipe.
New in version 16.0.0: slots
New in version 16.1.0: frozen
New in version 16.3.0: str
New in version 16.3.0: Support for
__attrs_post_init__.Changed in version 17.1.0: hash supports
Noneas value which is also the default now.New in version 17.3.0: auto_attribs
Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.
Changed in version 18.1.0: If these is ordered, the order is retained.
New in version 18.2.0: weakref_slot
Deprecated since version 18.2.0:
__lt__,__le__,__gt__, and__ge__now raise a DeprecationWarning if the classes compared are subclasses of each other.__eqand__ne__never tried to compared subclasses to each other.Changed in version 19.2.0:
__lt__,__le__,__gt__, and__ge__now do not consider subclasses comparable anymore.New in version 18.2.0: kw_only
New in version 18.2.0: cache_hash
New in version 19.1.0: auto_exc
Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.
New in version 19.2.0: eq and order
New in version 20.1.0: auto_detect
New in version 20.1.0: collect_by_mro
New in version 20.1.0: getstate_setstate
New in version 20.1.0: on_setattr
- these (dict of str to attr.ib) –
hip_data_tools.etl.s3_to_s3 module¶
Module to deal with data transfer from S3 to Cassandra
-
class
hip_data_tools.etl.s3_to_s3.S3ToS3FileCopy(source: hip_data_tools.etl.s3.S3SourceSettings, sink: hip_data_tools.etl.s3.S3SinkSettings, transformers: Optional[List[hip_data_tools.etl.s3.S3FileNameTransformer]] = None)¶ Bases:
hip_data_tools.etl.common.ETLClass to transfer objects from s3 to s3 :param source: :type source: S3FileLocationExtractor :param sink: :type sink: S3FileCopyLoader
Eg: >>> etl = S3ToS3FileCopy( … source = S3SourceSettings( … bucket=”MY_SOURCE_BUCKET_NAME”, … key_prefix=”foo/bar/”, … suffix=”parquet”, … connection_settings=aws_setting, … ), … sink = S3SinkSettings( … bucket=”MY_TARGET_BUCKET_NAME”, … connection_settings=aws_setting, … ), … transformers = [AddTargetS3KeyTransformer(target_key_prefix=”bar/baz/”)], … ) … >>> >>> etl.execute_next() …
Parameters: - source (S3SourceSettings) – Settings for source s3 files
- sink (S3SinkSettings) – Settings for target s3 directory
- transformers (List[S3FileNameTransformer]) – Transformers to change file names
-
list_source_files() → List[NewType.<locals>.new_type]¶ List the source files as per the source settings
Returns: List[S3Key]