datopy.models.media#
Description
Data models, validators, and ETL tools for scraped media data.
Includes support for film reviews (via IMDb), music albums (via Spotify), and related information (via Wikipedia).
Note
WIP.
Overview#
Data models
Data model for processed imdb metadata. |
|
Data model for processed Spotify metadata. |
API#
- class MediaQuery( )[source]#
Bases:
NamedTupleQuery object types for media metadata retrieval.
- class Film( )#
Bases:
MediaQuery
- class Album( )#
Bases:
MediaQuery
- class Book( )#
Bases:
MediaQuery
- pydantic model IMDbFilm[source]#
Bases:
BaseModelData model for processed imdb metadata.
Examples
>>> from pydantic import ValidationError >>> from datopy.models.media import IMDbFilm >>> from datopy._examples import imdb_film_retrieve
Valid film
>>> valid_film = IMDbFilm( ... title='name 10!', imdb_id='tt1234567', kind='movie', ... year=1990, rating=7.2, votes=122, ... genres='romantic comedy, thriller', cast='mrs smith,mr smith', ... plot='alas! once upon a time, ...', ... budget_mil=1123929)
Invalid film
>>> invalid_film = dict( ... title='name', imdb_id='tt12', year=1975, votes=-2, rating=5.0) >>> try: ... IMDbFilm(**invalid_film) ... except ValidationError as e: ... print(e) # use pprint.pp(e.errors()) for easy-to-read list 3 validation errors for IMDbFilm imdb_id String should match pattern '^tt.*\d{7}$' [type=string_pattern_mismatch, input_value='tt12', input_type=str] For further information visit https://errors.pydantic.dev/2.8/v/string_pattern_mismatch kind Field required [type=missing, input_value={'title': 'name', 'imdb_i...tes': -2, 'rating': 5.0}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/missing votes Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-2, input_type=int] For further information visit https://errors.pydantic.dev/2.8/v/greater_than_equal
Survey available fields and types
>>> import pprint >>> from datopy.models.media import Film >>> from datopy._examples import imdb_film_retrieve >>> from datopy.modeling import apply_recursive >>> film = imdb_film_retrieve(Film('spirited away'))
Show JSON schema
{ "title": "IMDbFilm", "description": "Data model for processed imdb metadata.\n\nExamples\n--------\n>>> from pydantic import ValidationError\n>>> from datopy.models.media import IMDbFilm\n>>> from datopy._examples import imdb_film_retrieve\n\nValid film\n\n>>> valid_film = IMDbFilm(\n... title='name 10!', imdb_id='tt1234567', kind='movie',\n... year=1990, rating=7.2, votes=122,\n... genres='romantic comedy, thriller', cast='mrs smith,mr smith',\n... plot='alas! once upon a time, ...',\n... budget_mil=1123929)\n\nInvalid film\n\n>>> invalid_film = dict(\n... title='name', imdb_id='tt12', year=1975, votes=-2, rating=5.0)\n>>> try:\n... IMDbFilm(**invalid_film)\n... except ValidationError as e:\n... print(e) # use pprint.pp(e.errors()) for easy-to-read list\n3 validation errors for IMDbFilm\nimdb_id\n String should match pattern '^tt.*\\d{7}$' [type=string_pattern_mismatch, input_value='tt12', input_type=str]\n For further information visit https://errors.pydantic.dev/2.8/v/string_pattern_mismatch\nkind\n Field required [type=missing, input_value={'title': 'name', 'imdb_i...tes': -2, 'rating': 5.0}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.8/v/missing\nvotes\n Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-2, input_type=int]\n For further information visit https://errors.pydantic.dev/2.8/v/greater_than_equal\n\nSurvey available fields and types\n\n>>> import pprint\n>>> from datopy.models.media import Film\n>>> from datopy._examples import imdb_film_retrieve\n>>> from datopy.modeling import apply_recursive\n>>> film = imdb_film_retrieve(Film('spirited away'))\n\n..\n # >>> film.keys()\n # >>> pprint.pp(apply_recursive(lambda x: type(x).__name__, film), depth=3)", "type": "object", "properties": { "title": { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVnumstr``", "pattern": "^[a-z0-9,.! ]+$", "title": "Title", "type": "string" }, "imdb_id": { "description": "Unique 7-digit IMDb tt identifier", "pattern": "^tt.*\\d{7}$", "title": "Imdb Id", "type": "string" }, "kind": { "description": "Retrieved from: `type`", "examples": [ "movie", "tv series" ], "pattern": "^[a-z0-9,.! ]+$", "title": "Kind", "type": "string" }, "year": { "maximum": 3000, "minimum": 1880, "title": "Year", "type": "integer" }, "rating": { "maximum": 10.0, "minimum": 0.0, "title": "Rating", "type": "number" }, "votes": { "minimum": 0, "title": "Votes", "type": "integer" }, "runtime_mins": { "anyOf": [ { "exclusiveMinimum": 0.0, "type": "number" }, { "type": "null" } ], "default": null, "title": "Runtime Mins" }, "genres": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVstr``", "pattern": "^[a-z, ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Genres" }, "countries": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVstr``", "pattern": "^[a-z, ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Countries" }, "director": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVstr``", "pattern": "^[a-z, ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Director" }, "writer": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVstr``", "pattern": "^[a-z, ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Writer" }, "composer": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVstr``", "pattern": "^[a-z, ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Composer" }, "cast": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVstr``", "pattern": "^[a-z, ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Cast" }, "plot": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVnumsent``", "pattern": "^[a-z0-9,.! ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Plot" }, "synopsis": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVnumsent``", "pattern": "^[a-z0-9,.! ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Synopsis" }, "plot_outline": { "anyOf": [ { "description": ":attr:`~datopy.modeling.CustomTypes` : ``CSVnumsent``", "pattern": "^[a-z0-9,.! ]+$", "type": "string" }, { "type": "null" } ], "default": null, "title": "Plot Outline" }, "budget_mil": { "anyOf": [ { "minimum": 0.0, "type": "number" }, { "type": "null" } ], "default": null, "description": "Strip $/, & text after first space", "title": "Budget Mil" }, "opening_weekend_gross_mil": { "anyOf": [ { "minimum": 0.0, "type": "number" }, { "type": "null" } ], "default": null, "title": "Opening Weekend Gross Mil" }, "cumulative_worldwide_gross_mil": { "anyOf": [ { "minimum": 0.0, "type": "number" }, { "type": "null" } ], "default": null, "title": "Cumulative Worldwide Gross Mil" } }, "required": [ "title", "imdb_id", "kind", "year", "rating", "votes" ] }
- Fields:
- Validators:
- field title: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVnumstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z0-9,.! ]+$')])] [Required]#
CustomTypes:CSVnumstr- Constraints:
pattern = ^[a-z0-9,.! ]+$
- field imdb_id: str [Required]#
Unique 7-digit IMDb tt identifier
- Constraints:
pattern = ^tt.*d{7}$
- Validated by:
- field kind: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVnumstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z0-9,.! ]+$')])] [Required]#
Retrieved from: type
- Constraints:
pattern = ^[a-z0-9,.! ]+$
- Validated by:
- field genres: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z, ]+$')])] | None = None#
- field countries: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z, ]+$')])] | None = None#
- field director: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z, ]+$')])] | None = None#
- field writer: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z, ]+$')])] | None = None#
- field composer: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z, ]+$')])] | None = None#
- field cast: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVstr``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z, ]+$')])] | None = None#
- field plot: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVnumsent``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z0-9,.! ]+$')])] | None = None#
- field synopsis: Annotated[str, FieldInfo(annotation=NoneType, required=True, description=':attr:`~datopy.modeling.CustomTypes` : ``CSVnumsent``', metadata=[_PydanticGeneralMetadata(pattern='^[a-z0-9,.! ]+$')])] | None = None#
- pydantic model SpotifyAlbum[source]#
Bases:
BaseModelData model for processed Spotify metadata.
Raw data schema reference: ‘datopy/output/spotify_album_schema.json’.
Show JSON schema
{ "title": "SpotifyAlbum", "description": "Data model for processed Spotify metadata.\n\nRaw data schema reference: 'datopy/output/spotify_album_schema.json'.", "type": "object", "properties": { "title": { "title": "Title", "type": "string" }, "album_type": { "title": "Album Type", "type": "string" } }, "required": [ "title", "album_type" ] }
- Fields:
- pydantic model WikiBook[source]#
Bases:
BaseModelData model for processed Wikipedia novel metadata.
Raw data schema reference: ‘output/wiki_book_schema.json’.
Show JSON schema
{ "title": "WikiBook", "description": "Data model for processed Wikipedia novel metadata.\n\nRaw data schema reference: 'output/wiki_book_schema.json'.", "type": "object", "properties": { "title": { "title": "Title", "type": "string" } }, "required": [ "title" ] }
- Fields:
- pydantic model WikiFilm[source]#
Bases:
BaseModelData model for processed Wikipedia film metadata.
Raw data schema reference: ‘datopy/output/wiki_film_schema.json’.
Show JSON schema
{ "title": "WikiFilm", "description": "Data model for processed Wikipedia film metadata.\n\nRaw data schema reference: 'datopy/output/wiki_film_schema.json'.", "type": "object", "properties": { "title": { "title": "Title", "type": "string" } }, "required": [ "title" ] }
- Fields:
- pydantic model WikiAlbum[source]#
Bases:
BaseModelData model for processed Wikipedia album metadata.
Raw data schema reference: ‘datopy/output/wiki_album_schema.json’.
Show JSON schema
{ "title": "WikiAlbum", "description": "Data model for processed Wikipedia album metadata.\n\nRaw data schema reference: 'datopy/output/wiki_album_schema.json'.", "type": "object", "properties": { "title": { "title": "Title", "type": "string" } }, "required": [ "title" ] }
- Fields:
- class IMDbFilmProcessor(
- model: BaseModel,
- query: NamedTuple,
Bases:
BaseProcessor_summary_.
Methods
process()Prepare (extract/clean) the retrieved data.
retrieve()Extract data for the query from the API of the supplied model.
- retrieve()[source]#
Extract data for the query from the API of the supplied model.
- Raises:
NotImplementedError – _description_.
- process()[source]#
Prepare (extract/clean) the retrieved data.
- Raises:
NotImplementedError – _description_.
Classes
|
|
|
|
|
|
|
_summary_. |
|
Query object types for media metadata retrieval. |