Inputs
- class textClustPy.Input(textclust, preprocessor, timeformat='%Y-%m-%d %H:%M:%S', timeprecision='seconds', config=None, callback=None)
Abstract input class
- Parameters
textclust (
textClustPy.textclust) – A textclust instance of typetextClustPy.textclustpreprocessor (
textClustPy.Preprocessor) – Preprocessor instance of typetextClustPy.textclusttimeformat – Specifies the time format. Described as strftime directives (see https://strftime.org). Default is: %Y-%m-%d %H:%M:%S
timeprecision (string) – If realtimefading is enabled, timeprecision specifies on which time unit the fading factor is applied (seconds/minutes/hours). Default = “seconds”
config (string) – Relative path/name of config file
callback (function) – Callback function that is called for each incoming observation. The callback function expects four parameters: ID, time, text and a Observation object.
- class textClustPy.CSVInput(csvfile=None, delimiter='|', quotechar=';', newline='\n', col_id=1, col_time=1, col_text=2, col_label=3, **kwargs)
This class implements the a csv input
- Parameters
csvfile (string) – Relative path and filename of the csv document
delimiter (char) – Delimiter that separates different columns
quotechar (char) – Character that is used for quotes
newline (char) – Character indicating a new line.
col_id (int) – Column index that contains the text id
col_time (int) – Column index that contains the time
col_text (int) – Column index that contains the text
col_label (int) – Column index that contains the true cluster belonging
- run()
Update the textclust algorithm with the complete data in the data frame
- update(n)
Update the textclust algorithm on new observations
- Parameters
n (int) – Number of observations that should be used by textclust
- class textClustPy.InMemInput(pdframe, col_id=1, col_time=1, col_text=2, col_label=None, **kwargs)
- Parameters
pdframe (DataFrame) – Pandas data frame that serves as stream input
col_id (int) – Column index that contains the text id
col_time (int) – Column index that contains the time
col_text (int) – Column index that contains the text
col_text – Column index that contains the true cluster belonging
- run()
Update the textclust algorithm with the complete data in the data frame
- update(n)
Update the textclust algorithm on new observations
- Parameters
n (int) – Number of observations that should be used by textclust
- class textClustPy.TwitterInput(api_key, api_secret, access_token, access_secret, terms, languages=['en'], conf=None, **kwargs)
A twitter input accesses the twitter stream and directly applies textclust on the incoming data.
- Parameters
api_key (string) – Twitter API key
api_secret (string) – Twitter API secret
access_token (string) – Twitter access token
access_secret (string) – Twitter access secret
terms (List of strings) – List of searchterms/hashtags that should be monitored in twitter
languages (List of strings) – Filter teweets by languages
callback (Function) – Callback function that expects one parameter of Tweepy type
Status(see http://docs.tweepy.org/en/latest/)