Core

lda_over_time.lda_over_time

LdaOverTime is a framework that brings an easier way of doing Topic Modeling Analysis Over Time and get visualization of results.

In brief, Topic Modeling is a technique that finds topics that each document from a collection covers. And, by addind the time in this equation, we can study how much and why one certain topic is more or less discussed in a time slice.

class lda_over_time.lda_over_time.LdaOverTime(model: DtmModelInterface)[source]

Bases: object

LdaOverTime provides an easier way of taking a pre-processed set of documents, choose a DTM model and get an analysis of the topic’s evolution over time.

Choose a model to work with, create an instance of it by passing the right parameters and then you can instantiate LdaOverTime by passing the previous object.

Parameters: model (DtmModelInterface) – instance of the chosen model.
Returns: Nothing
Return type: None

get_results() → DataFrame[source]

Get the model’s result in format of a table.

In this table, rows represents each time slice.

For the columns, the date column holds the time slices’ timestamps and the remaing n_topics columns indexed from 1 to n_topics holds the proportion of each topic of each time slice.

You can get each topic’s main words by calling get_topic_words, e.g. if you want the top 10 words from the topic 3 of this table in the first row, call get_topic_words(topic_id=3, timeslice=1, n=10)

Returns: table with results
Return type: pandas.DataFrame

get_topic_words(topic_id: int, timeslice: int, n: int = 10) → List[str][source]

Get the top n words of from a specific topic in the chosen timeslice.

Parameters

topic_id (int) – The id of the desired topic.
timeslice (int) – The position of the desired timeslice in chronological order the first (oldest) time slice is indexed by 1.
n (int) – This specifies how many words that better describes the topic at a specific time slice should be returned.

Returns

It returns a list of top n words that best describes the requested topic in a specific time.

Return type

list[str]

classmethod load(file_path: str) → LdaOverTime[source]

Load your last work.

Parameters: file_path (str) – Location where you saved your last work.
Returns: Last saved work.
Return type: LdaOverTime

plot(title: str, legend_title: Optional[str] = None, path_to_save: Optional[str] = None, rotation: int = 90, mode: str = 'line', display: bool = True, date_format: Optional[str] = None)[source]

Plot the evolution of topics over time.

To rename topics’ names, use method rename_topics.

Parameters

title (str) – title of plot
legend_title (str, optional) – legend’s title
path_to_save (str, optional) – set it with path to save the graph. Default behaviour does not save the graph.
rotation (int, optional) – value in degrees to rotate horizontal labels. Default is 90.
mode (str, optional) – type of plotting. It can be either a simple line plot or stack plot. Default is line.
display (bool, optional) – set it to False to not display graph. Default behaviour is to display.
date_format (str, optional) – date format to be displayed

Returns

Nothing

Return type

None

rename_topics(new_names: List[str])[source]

Rename topic’s names with the list with new names.

It will rename based on the given order, that is the first name will overwrite the first topic, the second will overwrite second topic, and so on.

The length should be equal to number of topics, otherwise it will raise ValueError.

Parameters: new_names (list[str]) – List with new names to overwrite the topics’ names
Returns: Nothing
Return type: None
Raises: ValueError – when the given list’s length does not match with the number of topics.

save(file_path: str) → None[source]

Save your current work in the location file_path. You can reload your work later by calling load with the same file_path.

Parameters: file_path (str) – Location to save your current work.
Returns: Nothing
Return type: None

showvis(time_id: int)[source]

Show the PyLdaVis analysis of your model in a specific time slice. It is useful to evaluate how good your model is.

This method is only available inside jupyter notebooks.

Parameters: time_id (int) – Position of the time slice from 1 to n_timeslices in chronological order
Returns: Nothing
Return type: None