datopy._media_scrape#

Description

Data models and retrieval/processing tools for scraping metadata for movies and movie reviews (via IMDb), music albums (via Spotify), and related topics (via Wikipedia).

get_film_metadata(
movie_title: str,
) DataFrame[source]#

_summary_

Parameters:

movie_title (str) – Title of a film or tv show (sensitive to spelling but not case).

Returns:

film_df – _description_

Return type:

pd.DataFrame

Examples

Setup

>>> from datopy._media_scrape import get_film_metadata
>>> title = 'donnie darko'
>>> film_df = get_film_metadata(title)

..
    # >>> film_df.T[0]
    # title                                                 Donnie Darko
    # imdbID                                                     0246578
    # type                                                         movie
    # year                                                          2001
    # genres                            Drama, Mystery, Sci-Fi, Thriller
    # writers                                              Richard Kelly
    # countries                                            United States
    # runtime (min)                                                  113
    # directors                                            Richard Kelly
    # composer                                           Michael Andrews
    # cast             Jake Gyllenhaal, Holmes Osborne, Maggie Gyllen...
    # rating                                                         8.0
    # Votes                                                       847582
    # Plot Outline     Donnie Darko doesn't get along too well with h...
    # Plot             After narrowly escaping a bizarre accident, a ...
    # Synopsis         Donnie Darko (Jake Gyllenhall) is a troubled t...
    # Name: 0, dtype: object
get_imdb_id(
movie_title: str,
) str | None[source]#

Retrieves the unique IMDb identifier associated with a film or tv show.

Parameters:

movie_title (str) – Title of film or tv show (sensitive to spelling but not case).

Returns:

imdb_id – The unique IMDb tt identifier associated with the show.

Return type:

str

Examples

>>> from datopy._media_scrape import get_imdb_id

>>> movie_title = "the shawshank redemption"
>>> tt_id = get_imdb_id(movie_title)
>>> tt_id
'tt0111161'

>>> movie_title = "ths shukshank redumption"
>>> tt_id = get_imdb_id(movie_title)
>>> tt_id
"No IMDb Identifier found for 'ths shukshank redumption'."
get_imdb_reviews(
movie_id: str,
num_reviews: int = 5,
) List[str] | None[source]#

_summary_

Parameters:
  • movie_id (str) – The unique IMDb tt identifier supplied by get_imdb_id.

  • num_reviews (int, default=5) – Number of reviews to retrieve.

Returns:

reviews – _description_

Return type:

List[str]

Examples

>>> import textwrap
>>> from datopy._media_scrape import get_imdb_reviews, get_imdb_id

>>> movie_title = "finding nemo"
>>> movie_id = get_imdb_id(movie_title)
>>> movie_reviews = get_imdb_reviews(movie_id, num_reviews=2)
>>> for i, review in enumerate(movie_reviews, start=1):
...     print(f"Review {i}:\n{textwrap.fill(review[:50], 79)} ...\n");
Review 1:
I have enjoyed most of the computer-animated films ...

Review 2:
I'll be totally honest and confirm to you that eve ...

Functions

get_film_metadata(movie_title)

_summary_

get_imdb_id(movie_title)

Retrieves the unique IMDb identifier associated with a film or tv show.

get_imdb_reviews(movie_id[, num_reviews])

_summary_