datopy._examples#

Description

A home for one-off tests and data-generating routines.

Warning

The contents of this module will be moved in a future release.

class MediaQuery(
title: str,
artist: str | None = None,
)[source]#

Bases: NamedTuple

Query object types for media metadata retrieval.

title: str#

Alias for field number 0

artist: str | None#

Alias for field number 1

class DataModel(
obj,
schema,
json_schema,
serialized,
normalized,
)#

Bases: tuple

Custom data model return type.

json_schema#

Alias for field number 2

normalized#

Alias for field number 4

obj#

Alias for field number 0

schema#

Alias for field number 1

serialized#

Alias for field number 3

class Film(
title: str,
artist: str | None = None,
)#

Bases: MediaQuery

class Album(
title: str,
artist: str | None = None,
)#

Bases: MediaQuery

class Book(
title: str,
artist: str | None = None,
)#

Bases: MediaQuery

find_project_root()[source]#

Obtain an absolute path to the project root for saving and loading.

Notes

To set your project root explicitly as an environment variable, run:

os.environ["PROJECT_ROOT"] = "/path/to/src/pkg"

Examples

>>> from datopy._examples import find_project_root
>>> import pathlib
>>> project_root = find_project_root()
>>> input_dir = pathlib.Path(project_root, "input")
>>> output_dir = pathlib.Path(project_root, "output")
>>> pathlib.Path(*input_dir.parts[-3:])
PosixPath('src/datopy/input')
>>> pathlib.Path(*output_dir.parts[-3:])
PosixPath('src/datopy/output')
spotify_album_retrieve(
album: Album,
) dict[source]#

Retrieve metadata for a given musical album via Spotify.

imdb_film_retrieve(
film: Film,
) dict[source]#

IMDb film metadata retrieval routine.

wiki_metadata_retrieve(
query: Film | Album | Book,
) dict[source]#

Extract metadata for the supplied work.

Parameters:

query (Film | Album | Book) – The work to be inexed.

Returns:

A dictionary containing metadata retrieved from the Wikipedia infobox.

Return type:

dict

extract_datamodel(
obj,
verbose: bool = False,
) DataModel[source]#

Construct a data model from a scraped data structure.

The constructed objects include dictionary elements, a json-style schema of (key, type)/(key, value) pairs, and a dataframe entry.

Parameters:
  • obj (__type__) – __description__.

  • verbose (bool, default=False) – An option to enable/disable printing of outputs.

Returns:

A dictionary containing the fields: schema, json_schema, obj_serialized, and obj_normalized.

Return type:

DataModel

save_datamodel(
schema: dict,
json_schema: dict,
obj_serialized: dict,
obj_normalized: DataFrame,
source: str,
search_terms: Film | Album | Book,
) None[source]#

Save json-style schema of (key, type)/(key, value) pairs and a df.

run_auto_datamodel_example(
source: Literal['imdb', 'spotify', 'wiki'],
search_terms: Film | Album | Book,
verbose: bool = False,
do_save: bool = False,
) DataModel[source]#

Generate an exemplar data model from an API-extracted data structure.

Parameters:
  • source (Literal[‘imdb’, ‘spotify’, ‘wiki’])) – The source from which to retrieve data about the requested topic.

  • search_terms (Film | Album | Book) – A namedtuple of required properties (e.g., title) for the topic query.

  • verbose (bool, default=False) – Option to enable printouts of the retrieved data and schema.

  • do_save (bool, default=False) – Option to enable saving of the retrieved data and schema.

Returns:

The output of extract_datamodel.

Return type:

DataModel

Examples

Setup

>>> import re
>>> from datopy._examples import run_auto_datamodel_example
>>> from datopy.etl import omit_string_patterns
>>> from datopy._examples import Album, Book, Film
>>> do_save=False

IMDb film

>>> film = Film("eternal sunshine of the spotless mind")
>>> datamodel = run_auto_datamodel_example(
...     source="imdb", search_terms=film,
...     verbose=False, do_save=do_save)
>>> dict(datamodel.obj)['genres']
['Drama', 'Romance', 'Sci-Fi']
>>> datamodel.schema['genres']
{1: 'str', 2: 'str', 3: 'str'}
>>> datamodel.normalized['original air date'][0]
'19 Mar 2004 (USA)'

Spotify album

..
    # >>> album = Album("kid A", "radiohead")
    # >>> datamodel = run_auto_datamodel_example(
    # ...     source="spotify", search_terms=album, do_save=do_save)
    # >>> datamodel.obj['total_tracks']
    # 11
    # >>> datamodel.schema['total_tracks']
    # 'int'
    # >>> datamodel.normalized['id'][0]
    # '6GjwtEZcfenmOf6l18N7T7'

Wikipedia novel

>>> book = Book("to kill a mockingbird")
>>> outputs = run_auto_datamodel_example(
...    source="wiki", search_terms=book, do_save=do_save)
>>> re.search(r'\[\[(.*?)\]\]', outputs.obj['author']).group(1)
'Harper Lee'
>>> outputs.schema['author']
'str'
>>> outputs.normalized['pages'][0]
'281'

Wikipedia film

>>> film = Film("eternal sunshine of the spotless mind")
>>> outputs = run_auto_datamodel_example(
...    source="wiki", search_terms=film, do_save=do_save)
>>> re.search(r'\[\[(.*?) \]\]', outputs.obj['director']).group(1)
'Michel Gondry'
>>> outputs.schema['director']
'str'
>>> outputs.normalized['budget'][0]
'$20 million'

Wikipedia album

>>> album = Album("kid A", "radiohead")
>>> outputs = run_auto_datamodel_example(
...    source="wiki", search_terms=album, do_save=do_save)
>>> genres_raw = outputs.obj['genre']
>>> patterns_to_omit = ["[[", "* ", " * ", "\n", "{{nowrap|", "}}"]
>>> genres_processed = omit_string_patterns(
...     genres_raw, patterns_to_omit)
>>> print(genres_processed.replace("]]", ", ").rstrip(", "))
Experimental rock, post-rock, art rock, electronica
>>> outputs.schema['genre']
'str'
>>> outputs.normalized['type'][0]
'studio'

Classes

Album(title[, artist])

Book(title[, artist])

DataModel(obj, schema, json_schema, ...)

Custom data model return type.

Film(title[, artist])

MediaQuery(title[, artist])

Query object types for media metadata retrieval.

Functions

extract_datamodel(obj[, verbose])

Construct a data model from a scraped data structure.

find_project_root()

Obtain an absolute path to the project root for saving and loading.

imdb_film_retrieve(film)

IMDb film metadata retrieval routine.

run_auto_datamodel_example(source, search_terms)

Generate an exemplar data model from an API-extracted data structure.

save_datamodel(schema, json_schema, ...)

Save json-style schema of (key, type)/(key, value) pairs and a df.

spotify_album_retrieve(album)

Retrieve metadata for a given musical album via Spotify.

wiki_metadata_retrieve(query)

Extract metadata for the supplied work.