The topics-package enables Differential Language Analysis using words, phrases and topics
Please reference our tutorial article when using the package:
Language visualisation methods for psychological
assessments and Ackermann L., Zhuojun G. & Kjell O.N.E.
(2024). An R-package for visualizing text in topics. https://github.com/theharmonylab/topics.
DOI:zenodo.org/records/11165378
..
This Getting Started tutorial is going through the most central topics functions.
In an example where the topics are used to predict the PHQ-9 score, the pipeline can be run as follows:
1. Data Preprocessing
To preprocess the data,
run the following command:
library(topics)
#>
#> This is topics: your text's new best friend (version 0.40.2).
#> Please note that the topics package requires you to download and install java from www.java.com.
#>
#> For more information about the topics package see www.r-topics.org and www.r-text.org.
dtm <- topicsDtm(
data = dep_wor_data$Depword)
# Check the results from the dtm and refine stopwords and removal rates if necessary
dtm_evaluation <- topicsDtmEval(
dtm)
dtm_evaluation$frequency_plot
2. Model Training
To train the LDA model, run
the following command:
3. Model Inference
To infer the topic term
distribution of the documents, run the following command:
4. Statistical Analysis
To analyze the
relationship between the topics and the prediction variable, run the
following command:
test <- topicsTest(
data = dep_wor_data,
model = model,
preds = preds,
x_variable = "PHQ9tot",
controls = c("Age"),
test_method = "linear_regression")
5. Visualization
To visualize the significant
topics as wordclouds, run the following command:
plot_list <- topicsPlot(
model = model,
test = test,
figure_format = "png")
# showing some of the plots
plot_list$square1
#> $t_5
Differentiating balance and harmony through natural language analysis: A cross-national exploration of two understudied wellbeing-related concepts
The below list consists of papers analyzing human language in a similar fashion that is possible in topics.
Methods Articles
Gaining
insights from social media language: Methodologies and
challenges..
Kern et al., (2016). Psychological Methods.
Computer Science: Python Software
DLATK: Differential language analysis toolkit. Schwartz, H. A., Giorgi, et al., (2017). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations