NEWS

topics 0.72

topicsGrams(): Fixed incorrect normalization of freq_per_user. Values were previously column-normalized (each cell = a document's share of the corpus-wide count for that n-gram), which produced statistically uninterpretable predictors in topicsTest(). Values are now row-normalized (each cell = relative frequency of the n-gram within that document's own token count), which is the correct definition of per-user relative frequency.
making n-gram overview plot layout symmetrical
adding parameter to number of prevalent topics plotted in overview plot

topics 0.71

adding check_matrix_size in topicsModel() with advice on how to avoid matrix size to explode.

topics 0.70 (2026-02-16)

adding compatability with textTopicsWordCloud().
updated parameter defaults top_frequent = NULL and ngram_select = "estimate".

Performance & Robustness

topicsGrams() speed-up: Rebuilt the n-gram and per-document frequency computation using a single sparse-matrix pass with quanteda, replacing the slow per-n-gram regex counting loop (major runtime improvement on medium/large datasets).
Memory-safe output: freq_per_user now avoids accidental sparse → dense coercion (the “allocating GiB” warning). It supports auto wide/long output, returning long format when wide would be too large.

Harmonization with topicsDtm()

Aligned settings & reproducibility: topicsGrams() now mirrors topicsDtm() preprocessing controls (e.g., lower, punctuation/numbers removal, removalword, shuffle, seed, threads, optional stemming/lemmatization hook) and returns a saved settings list in the output.

topics 0.65

New Functions

topicsTutorialData(): New utility function to download and prepare long-text essay data directly from Hugging Face. Supports custom sample_size, min_word_count, max_word_count, and seed.
topicsPlotOverview(): Introduced a high-level plotting function for structured overviews. Supports side-by-side comparisons (ngrams), 1D layouts, and 2-D 3x3 grids with a central distribution plot.

Improvements to topicsTest()

Categorical Variable Support: x_variable and y_variable now fully support Factors and Character vectors.
Intelligent Method Detection: The test_method is now assigned per-variable. The package automatically detects binary data (0/1 or 2-level factors) to apply logistic_regression while using linear_regression for continuous data.
Baseline Reporting: Logistic regression results now include a logistic_level string in the output list to clarify the Baseline (0) vs. Target (1) mapping.

Enhancements & Aliases

Function Aliasing: topicsPreds() can now be accessed via descriptive aliases:
- topicsPredict()
- topicsAssess()
- topicsClassify()
Visual Refinement: Updated default color palettes in topicsPlot() for better aesthetic consistency.

topics 0.60 (2025-07-22)

ready for CRAN and installation-harmonized with the text-package.

topics 0.54

topicsGrams() now uses exact word boundary matching for n-grams (e.g., "lack" is matched as a standalone word, excluding partial matches like "black" or "lacking").
added ability to handle NAs in topicsTest().

topics 0.51

adding function to plot circles in the scatter legend.
fixing where non-significant plots were the same.
improving the structure of the creat_plot help function.
moving rJava to suggest to enable compatibility with the text-package.

topics 0.40.6

addting scatter_legend_dots_alpha and scatter_legend_bg_dots_alpha parameters for the topicsPlot() function.
adding setting for having the dot sizes according to their prevalence.

topics 0.40.5

Fixing bug when plotting test based on logistic_regression.

topics 0.40.4

added occurance_rate to topicsGrams()
added removal_mode, removal_rate_most and removal_rate_least to topicsGrams()
ngram_window = c(1) now supported by topicsDtm()
legend added to topicsPlot() with ngrams
The size in the dot legend will be based on prevalence if scatter_legend_dot_size = "prevalence". And the popouts are not transparent.
Fix the issues of tick and label of the x-axis in 1-dim dot legend.
Able to save the pop-out grey topics in the target folder.
Fix the bugs of rounding in generate_scatter_plot.
The default value of highlight_topic_words is set to NULL in the topicsPlot() function.

topics 0.40.2

changed some behaviours in topicsGrams(), including removing top_n and treating n-grams type differently.
added stopwords function to topicsGrams().
fixed the pmi calculation.
fixed the ngrams_max parameter in `topicsPlot()```.

topics 0.40.1

adding allowed_word_overlap in topicsPlot() for plotting the most prevalence.
improving help texts
highlight_topic_words parameter to add different colours for a word list.
added stopwords removal for topicsGram().
added ngrams_max functionality to topicsPlot().

topics 0.40

removing save_dir and load_dir from all function; only topicsPlot() now has the save_dir as an option.
size of the dots in distributions can be plotted according to prevalence.
adding p_adjust_method to topicsPlots().

topics 0.30.5

plots are not added as a list (and not only saved to the folder)
added scatter_show_axis_values to the topcisPlot().
adding feature to plot the n_most_prevalent_topics.

topics 0.30.4

scaling controls with scale instead of manually resulting in slightly different estimates. (but still same p-value and t-values)
removed ridge regression, t-test and correlation codes since they did not work
removed automatic removal of NAs in the topics predictions (this should be handled explicitly).
topicsTest() default to linear_regression if not the variable only contains 0s and 1s; i.e., now different tests can be applied to different axes.

topics 0.30.3

saving settings in dtm for downstream use in other functions.
adding parameters in the topicsPred() function including num_iteration, sampling_interval, burn_in.
implemented create_new_dtm for creating a new dtm for new data
adding test for using topics dimension for training using textTrainRegression().
removing forcing user to set save_dir on most functions (only need to do it for topics functions).

topics 0.30.2

fixing coherence bug
showing prevalence and coherence for in results
restructuring the files

topics 0.30.0

Harmonizing parameters in topicsTest() incl. x_variable, y_variable and controls
fixing error that variable names cannot be names with 1 underscore.

topics 0.22.1

added pmi_threshold (experimental) to topicsDtm()
removed the saving of raw data and the split procedure in the topicsDtm()
adding function that name emphasized topics so the file name starts with 0_.
add a parameter to turn off the shuffling of the data in topicsDtm()

topics 0.22

change p_threshold to p_alpha
moved p_alpha from the topicsTest() function to the topicsPlots() function
removed unnecessary list items from topicsTest()

topics 0.21

Changes related to compatability with the text-package

topics 0.20

Cleaning up code and ensuring improved compatibility across platforms.
Started the journey of improving documentation.

topics 0.10.1

Change

Removing dim and grid_plot arguments in topicsPlot().
Fixing the color bugs.
Adding possibility for the user to use gradient colors in all plots.
Adding a stop warning when the variable name contains an underscore in topicsTest().