Core

lda_over_time.lda_over_time

LdaOverTime is a framework that brings an easier way of doing Topic Modeling Analysis Over Time and get visualization of results.

In brief, Topic Modeling is a technique that finds topics that each document from a collection covers. And, by addind the time in this equation, we can study how much and why one certain topic is more or less discussed in a time slice.

class lda_over_time.lda_over_time.LdaOverTime(model: DtmModelInterface)[source]

Bases: object

LdaOverTime provides an easier way of taking a pre-processed set of documents, choose a DTM model and get an analysis of the topic’s evolution over time.

Choose a model to work with, create an instance of it by passing the right parameters and then you can instantiate LdaOverTime by passing the previous object.

Parameters

model (DtmModelInterface) – instance of the chosen model.

Returns

Nothing

Return type

None

get_results() DataFrame[source]

Get the model’s result in format of a table.

In this table, rows represents each time slice.

For the columns, the date column holds the time slices’ timestamps and the remaing n_topics columns indexed from 1 to n_topics holds the proportion of each topic of each time slice.

You can get each topic’s main words by calling get_topic_words, e.g. if you want the top 10 words from the topic 3 of this table in the first row, call get_topic_words(topic_id=3, timeslice=1, n=10)

Returns

table with results

Return type

pandas.DataFrame

get_topic_words(topic_id: int, timeslice: int, n: int = 10) List[str][source]

Get the top n words of from a specific topic in the chosen timeslice.

Parameters
  • topic_id (int) – The id of the desired topic.

  • timeslice (int) – The position of the desired timeslice in chronological order the first (oldest) time slice is indexed by 1.

  • n (int) – This specifies how many words that better describes the topic at a specific time slice should be returned.

Returns

It returns a list of top n words that best describes the requested topic in a specific time.

Return type

list[str]

classmethod load(file_path: str) LdaOverTime[source]

Load your last work.

Parameters

file_path (str) – Location where you saved your last work.

Returns

Last saved work.

Return type

LdaOverTime

plot(title: str, legend_title: Optional[str] = None, path_to_save: Optional[str] = None, rotation: int = 90, mode: str = 'line', display: bool = True, date_format: Optional[str] = None)[source]

Plot the evolution of topics over time.

To rename topics’ names, use method rename_topics.

Parameters
  • title (str) – title of plot

  • legend_title (str, optional) – legend’s title

  • path_to_save (str, optional) – set it with path to save the graph. Default behaviour does not save the graph.

  • rotation (int, optional) – value in degrees to rotate horizontal labels. Default is 90.

  • mode (str, optional) – type of plotting. It can be either a simple line plot or stack plot. Default is line.

  • display (bool, optional) – set it to False to not display graph. Default behaviour is to display.

  • date_format (str, optional) – date format to be displayed

Returns

Nothing

Return type

None

rename_topics(new_names: List[str])[source]

Rename topic’s names with the list with new names.

It will rename based on the given order, that is the first name will overwrite the first topic, the second will overwrite second topic, and so on.

The length should be equal to number of topics, otherwise it will raise ValueError.

Parameters

new_names (list[str]) – List with new names to overwrite the topics’ names

Returns

Nothing

Return type

None

Raises

ValueError – when the given list’s length does not match with the number of topics.

save(file_path: str) None[source]

Save your current work in the location file_path. You can reload your work later by calling load with the same file_path.

Parameters

file_path (str) – Location to save your current work.

Returns

Nothing

Return type

None

showvis(time_id: int)[source]

Show the PyLdaVis analysis of your model in a specific time slice. It is useful to evaluate how good your model is.

This method is only available inside jupyter notebooks.

Parameters

time_id (int) – Position of the time slice from 1 to n_timeslices in chronological order

Returns

Nothing

Return type

None