Topic Modelling

Topic Modelling

Topic Modelling Tool

Topic Modelling Tool "is a simple GUI-based application for topic modeling that uses the popular MALLET toolkit for the back-end". Topic modelling is a "way to analyze large volumes of unlabeled text" by generating topics: "clusters of words that frequently occur together". Topic Modelling Tool uses contextual clues to connect words with similar meanings and differentiate words with multiple meanings. Using the Topic Modelling Tool in its basic mode, the user would input their data and then constrain it using a specified number of topics.


Gensim began as a "a collection of various Python scripts for the Czech Digital Mathematics Library in 2008, where it served to generate a short list of the most similar articles to a given article". Gensim was created to address the challenges of efficiency, scalability, and computation power in this library system. Gensim is "the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text.

Stanford Topic Modeling Toolbox

Stanford Topic Modeling Toolbox (TMT) is a resource developed by The Stanford Natural Language Processing Group. TMT "brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component". TMT features the ability to import and manipulate texts, train topics models to create textual summaries, and generate compatible "outputs for tracking word usage across topics, time, and other groupings of data". TMT was written in 2009-2010 and uses an old version of Scala.


Mallet, or a Machine Learning for Language Toolkit, "is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text". Mallet tools are optimized for five functions: importing data, classifying documents, sequence tagging, topic modelling, and algorithmic, numerical implementation. Mallet also offers an add-on package, GRMM, that expands the tools to contain support for general graphic modelling.