datopy._examples.run_auto_datamodel_example#

run_auto_datamodel_example(
source: Literal['imdb', 'spotify', 'wiki'],
search_terms: Film | Album | Book,
verbose: bool = False,
do_save: bool = False,
) DataModel[source]#

Generate an exemplar data model from an API-extracted data structure.

Parameters:
  • source (Literal[‘imdb’, ‘spotify’, ‘wiki’])) – The source from which to retrieve data about the requested topic.

  • search_terms (Film | Album | Book) – A namedtuple of required properties (e.g., title) for the topic query.

  • verbose (bool, default=False) – Option to enable printouts of the retrieved data and schema.

  • do_save (bool, default=False) – Option to enable saving of the retrieved data and schema.

Returns:

The output of extract_datamodel.

Return type:

DataModel

Examples

Setup

>>> import re
>>> from datopy._examples import run_auto_datamodel_example
>>> from datopy.etl import omit_string_patterns
>>> from datopy._examples import Album, Book, Film
>>> do_save=False

IMDb film

>>> film = Film("eternal sunshine of the spotless mind")
>>> datamodel = run_auto_datamodel_example(
...     source="imdb", search_terms=film,
...     verbose=False, do_save=do_save)
>>> dict(datamodel.obj)['genres']
['Drama', 'Romance', 'Sci-Fi']
>>> datamodel.schema['genres']
{1: 'str', 2: 'str', 3: 'str'}
>>> datamodel.normalized['original air date'][0]
'19 Mar 2004 (USA)'

Spotify album

..
    # >>> album = Album("kid A", "radiohead")
    # >>> datamodel = run_auto_datamodel_example(
    # ...     source="spotify", search_terms=album, do_save=do_save)
    # >>> datamodel.obj['total_tracks']
    # 11
    # >>> datamodel.schema['total_tracks']
    # 'int'
    # >>> datamodel.normalized['id'][0]
    # '6GjwtEZcfenmOf6l18N7T7'

Wikipedia novel

>>> book = Book("to kill a mockingbird")
>>> outputs = run_auto_datamodel_example(
...    source="wiki", search_terms=book, do_save=do_save)
>>> re.search(r'\[\[(.*?)\]\]', outputs.obj['author']).group(1)
'Harper Lee'
>>> outputs.schema['author']
'str'
>>> outputs.normalized['pages'][0]
'281'

Wikipedia film

>>> film = Film("eternal sunshine of the spotless mind")
>>> outputs = run_auto_datamodel_example(
...    source="wiki", search_terms=film, do_save=do_save)
>>> re.search(r'\[\[(.*?) \]\]', outputs.obj['director']).group(1)
'Michel Gondry'
>>> outputs.schema['director']
'str'
>>> outputs.normalized['budget'][0]
'$20 million'

Wikipedia album

>>> album = Album("kid A", "radiohead")
>>> outputs = run_auto_datamodel_example(
...    source="wiki", search_terms=album, do_save=do_save)
>>> genres_raw = outputs.obj['genre']
>>> patterns_to_omit = ["[[", "* ", " * ", "\n", "{{nowrap|", "}}"]
>>> genres_processed = omit_string_patterns(
...     genres_raw, patterns_to_omit)
>>> print(genres_processed.replace("]]", ", ").rstrip(", "))
Experimental rock, post-rock, art rock, electronica
>>> outputs.schema['genre']
'str'
>>> outputs.normalized['type'][0]
'studio'