topicsGrams(): Fixed incorrect normalization of freq_per_user. Values were
previously column-normalized (each cell = a document's share of the corpus-wide
count for that n-gram), which produced statistically uninterpretable predictors
in topicsTest(). Values are now row-normalized (each cell = relative frequency
of the n-gram within that document's own token count), which is the correct
definition of per-user relative frequency.textTopicsWordCloud().top_frequent = NULL and ngram_select = "estimate".Performance & Robustness
topicsGrams() speed-up: Rebuilt the n-gram and per-document frequency computation using a single sparse-matrix pass with quanteda, replacing the slow per-n-gram regex counting loop (major runtime improvement on medium/large datasets).freq_per_user now avoids accidental sparse → dense coercion (the “allocating GiB” warning). It supports auto wide/long output, returning long format when wide would be too large.Harmonization with topicsDtm()
topicsGrams() now mirrors topicsDtm() preprocessing controls (e.g., lower, punctuation/numbers removal, removalword, shuffle, seed, threads, optional stemming/lemmatization hook) and returns a saved settings list in the output.topicsTutorialData(): New utility function to download and prepare long-text essay data directly from Hugging Face. Supports custom sample_size, min_word_count, max_word_count, and seed.topicsPlotOverview(): Introduced a high-level plotting function for structured overviews. Supports side-by-side comparisons (ngrams), 1D layouts, and 2-D 3x3 grids with a central distribution plot.x_variable and y_variable now fully support Factors and Character vectors.test_method is now assigned per-variable. The package automatically detects binary data (0/1 or 2-level factors) to apply logistic_regression while using linear_regression for continuous data.logistic_level string in the output list to clarify the Baseline (0) vs. Target (1) mapping.topicsPreds() can now be accessed via descriptive aliases:
topicsPredict()topicsAssess()topicsClassify()topicsPlot() for better aesthetic consistency.text-package.topicsGrams() now uses exact word boundary matching for n-grams (e.g., "lack" is matched
as a standalone word, excluding partial matches like "black" or "lacking").topicsTest().creat_plot help function.rJava to suggest to enable compatibility with the text-package.scatter_legend_dots_alpha and scatter_legend_bg_dots_alpha parameters for the topicsPlot() function.logistic_regression.occurance_rate to topicsGrams()removal_mode, removal_rate_most and removal_rate_least to topicsGrams()ngram_window = c(1) now supported by topicsDtm()topicsPlot() with ngramssize in the dot legend will be based on prevalence if scatter_legend_dot_size = "prevalence". And the popouts are not transparent.generate_scatter_plot.highlight_topic_words is set to NULL in the topicsPlot() function.topicsGrams(), including removing top_n and treating
n-grams type differently.stopwords function to topicsGrams().pmi calculation.ngrams_max parameter in `topicsPlot()```.allowed_word_overlap in topicsPlot() for plotting the most prevalence.highlight_topic_words parameter to add different colours for a word list.stopwords removal for topicsGram().ngrams_max functionality to topicsPlot().save_dir and load_dir from all function; only topicsPlot() now has the save_dir as an option.prevalence.p_adjust_method to topicsPlots().scatter_show_axis_values to the topcisPlot().n_most_prevalent_topics.default to linear_regression if not the variable only contains 0s and 1s; i.e., now different tests can be applied to different axes.dtm for downstream use in other functions.topicsPred() function including num_iteration, sampling_interval, burn_in.create_new_dtm for creating a new dtm for new datatopics dimension for training using textTrainRegression().topicsTest() incl. x_variable, y_variable and controlspmi_threshold (experimental) to topicsDtm()split procedure in the topicsDtm()topicsDtm()p_threshold to p_alphap_alpha from the topicsTest() function to the topicsPlots() functiontopicsTest()text-packagetopicsPlot().topicsTest().