datopy._examples#
Description
A home for one-off tests and data-generating routines.
Warning
The contents of this module will be moved in a future release.
- class MediaQuery( )[source]#
Bases:
NamedTupleQuery object types for media metadata retrieval.
- class DataModel(
- obj,
- schema,
- json_schema,
- serialized,
- normalized,
Bases:
tupleCustom data model return type.
- json_schema#
Alias for field number 2
- normalized#
Alias for field number 4
- obj#
Alias for field number 0
- schema#
Alias for field number 1
- serialized#
Alias for field number 3
- class Film( )#
Bases:
MediaQuery
- class Album( )#
Bases:
MediaQuery
- class Book( )#
Bases:
MediaQuery
- find_project_root()[source]#
Obtain an absolute path to the project root for saving and loading.
Notes
To set your project root explicitly as an environment variable, run:
os.environ["PROJECT_ROOT"] = "/path/to/src/pkg"
Examples
>>> from datopy._examples import find_project_root >>> import pathlib
>>> project_root = find_project_root() >>> input_dir = pathlib.Path(project_root, "input") >>> output_dir = pathlib.Path(project_root, "output") >>> pathlib.Path(*input_dir.parts[-3:]) PosixPath('src/datopy/input') >>> pathlib.Path(*output_dir.parts[-3:]) PosixPath('src/datopy/output')
- spotify_album_retrieve(
- album: Album,
Retrieve metadata for a given musical album via Spotify.
- wiki_metadata_retrieve( ) dict[source]#
Extract metadata for the supplied work.
- Parameters:
query (Film | Album | Book) – The work to be inexed.
- Returns:
A dictionary containing metadata retrieved from the Wikipedia infobox.
- Return type:
- extract_datamodel(
- obj,
- verbose: bool = False,
Construct a data model from a scraped data structure.
The constructed objects include dictionary elements, a json-style schema of (key, type)/(key, value) pairs, and a dataframe entry.
- Parameters:
obj (__type__) – __description__.
verbose (bool, default=False) – An option to enable/disable printing of outputs.
- Returns:
A dictionary containing the fields:
schema,json_schema,obj_serialized, andobj_normalized.- Return type:
- save_datamodel(
- schema: dict,
- json_schema: dict,
- obj_serialized: dict,
- obj_normalized: DataFrame,
- source: str,
- search_terms: Film | Album | Book,
Save json-style schema of (key, type)/(key, value) pairs and a df.
- run_auto_datamodel_example(
- source: Literal['imdb', 'spotify', 'wiki'],
- search_terms: Film | Album | Book,
- verbose: bool = False,
- do_save: bool = False,
Generate an exemplar data model from an API-extracted data structure.
- Parameters:
source (Literal[‘imdb’, ‘spotify’, ‘wiki’])) – The source from which to retrieve data about the requested topic.
search_terms (Film | Album | Book) – A namedtuple of required properties (e.g., title) for the topic query.
verbose (bool, default=False) – Option to enable printouts of the retrieved data and schema.
do_save (bool, default=False) – Option to enable saving of the retrieved data and schema.
- Returns:
The output of
extract_datamodel.- Return type:
Examples
Setup
>>> import re >>> from datopy._examples import run_auto_datamodel_example >>> from datopy.etl import omit_string_patterns >>> from datopy._examples import Album, Book, Film
>>> do_save=False
IMDb film
>>> film = Film("eternal sunshine of the spotless mind") >>> datamodel = run_auto_datamodel_example( ... source="imdb", search_terms=film, ... verbose=False, do_save=do_save) >>> dict(datamodel.obj)['genres'] ['Drama', 'Romance', 'Sci-Fi'] >>> datamodel.schema['genres'] {1: 'str', 2: 'str', 3: 'str'} >>> datamodel.normalized['original air date'][0] '19 Mar 2004 (USA)'
Spotify album
.. # >>> album = Album("kid A", "radiohead") # >>> datamodel = run_auto_datamodel_example( # ... source="spotify", search_terms=album, do_save=do_save) # >>> datamodel.obj['total_tracks'] # 11 # >>> datamodel.schema['total_tracks'] # 'int' # >>> datamodel.normalized['id'][0] # '6GjwtEZcfenmOf6l18N7T7'
Wikipedia novel
>>> book = Book("to kill a mockingbird") >>> outputs = run_auto_datamodel_example( ... source="wiki", search_terms=book, do_save=do_save) >>> re.search(r'\[\[(.*?)\]\]', outputs.obj['author']).group(1) 'Harper Lee' >>> outputs.schema['author'] 'str' >>> outputs.normalized['pages'][0] '281'
Wikipedia film
>>> film = Film("eternal sunshine of the spotless mind") >>> outputs = run_auto_datamodel_example( ... source="wiki", search_terms=film, do_save=do_save) >>> re.search(r'\[\[(.*?) \]\]', outputs.obj['director']).group(1) 'Michel Gondry' >>> outputs.schema['director'] 'str' >>> outputs.normalized['budget'][0] '$20 million'
Wikipedia album
>>> album = Album("kid A", "radiohead") >>> outputs = run_auto_datamodel_example( ... source="wiki", search_terms=album, do_save=do_save) >>> genres_raw = outputs.obj['genre'] >>> patterns_to_omit = ["[[", "* ", " * ", "\n", "{{nowrap|", "}}"] >>> genres_processed = omit_string_patterns( ... genres_raw, patterns_to_omit) >>> print(genres_processed.replace("]]", ", ").rstrip(", ")) Experimental rock, post-rock, art rock, electronica >>> outputs.schema['genre'] 'str' >>> outputs.normalized['type'][0] 'studio'
Classes
|
|
|
|
|
Custom data model return type. |
|
|
|
Query object types for media metadata retrieval. |
Functions
|
Construct a data model from a scraped data structure. |
Obtain an absolute path to the project root for saving and loading. |
|
|
IMDb film metadata retrieval routine. |
|
Generate an exemplar data model from an API-extracted data structure. |
|
Save json-style schema of (key, type)/(key, value) pairs and a df. |
|
Retrieve metadata for a given musical album via Spotify. |
|
Extract metadata for the supplied work. |