datopy.etl#
Description
Tools for efficient web-based data retrieval, data processing, table creation, and populating empty metadata fields.
- omit_string_patterns( ) str[source]#
Helper to prune multiple character patterns from a string at once.
- Parameters:
input_string (str) – The to-be-cleaned string.
patterns (List[str]) – A list of patterns to omit from the string.
- Returns:
str
- Return type:
The input string with the supplied patterns ommitted.
Examples
>>> from datopy.etl import omit_string_patterns
>>> input_string = "[[A \\\\ messy * string * with undesirable /patterns]]" >>> print(input_string) [[A \\ messy * string * with undesirable /patterns]] >>> patterns_to_omit = ["[[", "]]", "* ", "\\\\ ", "/", "messy ", "un" ] >>> output_string = omit_string_patterns(input_string, patterns_to_omit) >>> print(output_string) A string with desirable patterns
- retrieve_wiki_topics( ) List[str][source]#
_summary_
Notes
Only hyperlinked topics (those with a Wikipedia page) are retrieved. Search Wikipedia’s catalogue of listing pages here: https://en.wikipedia.org/wiki/List_of_lists_of_lists
- Parameters:
listing_page (str) – The title of a Wikipedia article containing topics to be retrieved.
verbose (bool, default=True) – Option to enable/disable printouts.
- Returns:
target_pages – A list of topics (by article name) extracted from the listing page.
- Return type:
List[str]
Functions
|
Helper to prune multiple character patterns from a string at once. |
|
_summary_ |