Core
lda_over_time.lda_over_time
LdaOverTime is a framework that brings an easier way of doing Topic Modeling Analysis Over Time and get visualization of results.
In brief, Topic Modeling is a technique that finds topics that each document from a collection covers. And, by addind the time in this equation, we can study how much and why one certain topic is more or less discussed in a time slice.
- class lda_over_time.lda_over_time.LdaOverTime(model: DtmModelInterface)[source]
Bases:
object
LdaOverTime provides an easier way of taking a pre-processed set of documents, choose a DTM model and get an analysis of the topic’s evolution over time.
Choose a model to work with, create an instance of it by passing the right parameters and then you can instantiate LdaOverTime by passing the previous object.
- Parameters
model (DtmModelInterface) – instance of the chosen model.
- Returns
Nothing
- Return type
None
- get_results() DataFrame [source]
Get the model’s result in format of a table.
In this table, rows represents each time slice.
For the columns, the date column holds the time slices’ timestamps and the remaing n_topics columns indexed from 1 to n_topics holds the proportion of each topic of each time slice.
You can get each topic’s main words by calling get_topic_words, e.g. if you want the top 10 words from the topic 3 of this table in the first row, call get_topic_words(topic_id=3, timeslice=1, n=10)
- Returns
table with results
- Return type
pandas.DataFrame
- get_topic_words(topic_id: int, timeslice: int, n: int = 10) List[str] [source]
Get the top n words of from a specific topic in the chosen timeslice.
- Parameters
topic_id (int) – The id of the desired topic.
timeslice (int) – The position of the desired timeslice in chronological order the first (oldest) time slice is indexed by 1.
n (int) – This specifies how many words that better describes the topic at a specific time slice should be returned.
- Returns
It returns a list of top n words that best describes the requested topic in a specific time.
- Return type
list[str]
- classmethod load(file_path: str) LdaOverTime [source]
Load your last work.
- Parameters
file_path (str) – Location where you saved your last work.
- Returns
Last saved work.
- Return type
- plot(title: str, legend_title: Optional[str] = None, path_to_save: Optional[str] = None, rotation: int = 90, mode: str = 'line', display: bool = True, date_format: Optional[str] = None)[source]
Plot the evolution of topics over time.
To rename topics’ names, use method rename_topics.
- Parameters
title (str) – title of plot
legend_title (str, optional) – legend’s title
path_to_save (str, optional) – set it with path to save the graph. Default behaviour does not save the graph.
rotation (int, optional) – value in degrees to rotate horizontal labels. Default is 90.
mode (str, optional) – type of plotting. It can be either a simple line plot or stack plot. Default is line.
display (bool, optional) – set it to False to not display graph. Default behaviour is to display.
date_format (str, optional) – date format to be displayed
- Returns
Nothing
- Return type
None
- rename_topics(new_names: List[str])[source]
Rename topic’s names with the list with new names.
It will rename based on the given order, that is the first name will overwrite the first topic, the second will overwrite second topic, and so on.
The length should be equal to number of topics, otherwise it will raise ValueError.
- Parameters
new_names (list[str]) – List with new names to overwrite the topics’ names
- Returns
Nothing
- Return type
None
- Raises
ValueError – when the given list’s length does not match with the number of topics.
- save(file_path: str) None [source]
Save your current work in the location file_path. You can reload your work later by calling load with the same file_path.
- Parameters
file_path (str) – Location to save your current work.
- Returns
Nothing
- Return type
None
- showvis(time_id: int)[source]
Show the PyLdaVis analysis of your model in a specific time slice. It is useful to evaluate how good your model is.
This method is only available inside jupyter notebooks.
- Parameters
time_id (int) – Position of the time slice from 1 to n_timeslices in chronological order
- Returns
Nothing
- Return type
None