Changes in version 0.72 - topicsGrams(): Fixed incorrect normalization of freq_per_user. Values were previously column-normalized (each cell = a document's share of the corpus-wide count for that n-gram), which produced statistically uninterpretable predictors in topicsTest(). Values are now row-normalized (each cell = relative frequency of the n-gram within that document's own token count), which is the correct definition of per-user relative frequency. - making n-gram overview plot layout symmetrical - adding parameter to number of prevalent topics plotted in overview plot Changes in version 0.71 - adding check_matrix_size in topicsModel() with advice on how to avoid matrix size to explode. Changes in version 0.70 (2026-02-16) - adding compatability with textTopicsWordCloud(). - updated parameter defaults top_frequent = NULL and ngram_select = "estimate". Performance & Robustness - topicsGrams() speed-up: Rebuilt the n-gram and per-document frequency computation using a single sparse-matrix pass with quanteda, replacing the slow per-n-gram regex counting loop (major runtime improvement on medium/large datasets). - Memory-safe output: freq_per_user now avoids accidental sparse → dense coercion (the “allocating GiB” warning). It supports auto wide/long output, returning long format when wide would be too large. Harmonization with topicsDtm() - Aligned settings & reproducibility: topicsGrams() now mirrors topicsDtm() preprocessing controls (e.g., lower, punctuation/numbers removal, removalword, shuffle, seed, threads, optional stemming/lemmatization hook) and returns a saved settings list in the output. Changes in version 0.65 New Functions - topicsTutorialData(): New utility function to download and prepare long-text essay data directly from Hugging Face. Supports custom sample_size, min_word_count, max_word_count, and seed. - topicsPlotOverview(): Introduced a high-level plotting function for structured overviews. Supports side-by-side comparisons (ngrams), 1D layouts, and 2-D 3x3 grids with a central distribution plot. Improvements to topicsTest() - Categorical Variable Support: x_variable and y_variable now fully support Factors and Character vectors. - Intelligent Method Detection: The test_method is now assigned per-variable. The package automatically detects binary data (0/1 or 2-level factors) to apply logistic_regression while using linear_regression for continuous data. - Baseline Reporting: Logistic regression results now include a logistic_level string in the output list to clarify the Baseline (0) vs. Target (1) mapping. Enhancements & Aliases - Function Aliasing: topicsPreds() can now be accessed via descriptive aliases: - topicsPredict() - topicsAssess() - topicsClassify() - Visual Refinement: Updated default color palettes in topicsPlot() for better aesthetic consistency. Changes in version 0.60 (2025-07-22) - ready for CRAN and installation-harmonized with the text-package. Changes in version 0.54 - topicsGrams() now uses exact word boundary matching for n-grams (e.g., "lack" is matched as a standalone word, excluding partial matches like "black" or "lacking"). - added ability to handle NAs in topicsTest(). Changes in version 0.51 - adding function to plot circles in the scatter legend. - fixing where non-significant plots were the same. - improving the structure of the creat_plot help function. - moving rJava to suggest to enable compatibility with the text-package. Changes in version 0.40.6 - addting scatter_legend_dots_alpha and scatter_legend_bg_dots_alpha parameters for the topicsPlot() function. - adding setting for having the dot sizes according to their prevalence. Changes in version 0.40.5 - Fixing bug when plotting test based on logistic_regression. Changes in version 0.40.4 - added occurance_rate to topicsGrams() - added removal_mode, removal_rate_most and removal_rate_least to topicsGrams() - ngram_window = c(1) now supported by topicsDtm() - legend added to topicsPlot() with ngrams - The size in the dot legend will be based on prevalence if scatter_legend_dot_size = "prevalence". And the popouts are not transparent. - Fix the issues of tick and label of the x-axis in 1-dim dot legend. - Able to save the pop-out grey topics in the target folder. - Fix the bugs of rounding in generate_scatter_plot. - The default value of highlight_topic_words is set to NULL in the topicsPlot() function. Changes in version 0.40.2 - changed some behaviours in topicsGrams(), including removing top_n and treating n-grams type differently. - added stopwords function to topicsGrams(). - fixed the pmi calculation. - fixed the ngrams_max parameter in `topicsPlot()```. Changes in version 0.40.1 - adding allowed_word_overlap in topicsPlot() for plotting the most prevalence. - improving help texts - highlight_topic_words parameter to add different colours for a word list. - added stopwords removal for topicsGram(). - added ngrams_max functionality to topicsPlot(). Changes in version 0.40 - removing save_dir and load_dir from all function; only topicsPlot() now has the save_dir as an option. - size of the dots in distributions can be plotted according to prevalence. - adding p_adjust_method to topicsPlots(). Changes in version 0.30.5 - plots are not added as a list (and not only saved to the folder) - added scatter_show_axis_values to the topcisPlot(). - adding feature to plot the n_most_prevalent_topics. Changes in version 0.30.4 - scaling controls with scale instead of manually resulting in slightly different estimates. (but still same p-value and t-values) - removed ridge regression, t-test and correlation codes since they did not work - removed automatic removal of NAs in the topics predictions (this should be handled explicitly). - topicsTest() default to linear_regression if not the variable only contains 0s and 1s; i.e., now different tests can be applied to different axes. Changes in version 0.30.3 - saving settings in dtm for downstream use in other functions. - adding parameters in the topicsPred() function including num_iteration, sampling_interval, burn_in. - implemented create_new_dtm for creating a new dtm for new data - adding test for using topics dimension for training using textTrainRegression(). - removing forcing user to set save_dir on most functions (only need to do it for topics functions). Changes in version 0.30.2 - fixing coherence bug - showing prevalence and coherence for in results - restructuring the files Changes in version 0.30.0 - Harmonizing parameters in topicsTest() incl. x_variable, y_variable and controls - fixing error that variable names cannot be names with 1 underscore. Changes in version 0.22.1 - added pmi_threshold (experimental) to topicsDtm() - removed the saving of raw data and the split procedure in the topicsDtm() - adding function that name emphasized topics so the file name starts with 0_. - add a parameter to turn off the shuffling of the data in topicsDtm() Changes in version 0.22 - change p_threshold to p_alpha - moved p_alpha from the topicsTest() function to the topicsPlots() function - removed unnecessary list items from topicsTest() Changes in version 0.21 - Changes related to compatability with the text-package Changes in version 0.20 - Cleaning up code and ensuring improved compatibility across platforms. - Started the journey of improving documentation. Changes in version 0.10.1 Change - Removing dim and grid_plot arguments in topicsPlot(). - Fixing the color bugs. - Adding possibility for the user to use gradient colors in all plots. - Adding a stop warning when the variable name contains an underscore in topicsTest().