IAES Inter national J our nal of Articial Intelligence (IJ-AI) V ol. 15, No. 1, February 2026, pp. 129 139 ISSN: 2252-8938, DOI: 10.11591/ijai.v15.i1.pp129-139 129 A utomated data exploration with mutual inf ormation in natural language to visualization Hue Luong-Thi-Minh 1 , V inh-The Nguy en 1 , V an-V iet Nguy en 1 , Kim-Son Nguy en 1 , Huu-Khanh Nguy en 2 1 F aculty of Information T echnology , Thai Nguyen Uni v ersity of Information and Communication T echnology , Thai Nguyen, V iet Nam 2 Distance Learning Center , Thai Nguyen Uni v ersity , Thai Nguyen, V iet Nam Article Inf o Article history: Recei v ed Sep 22, 2025 Re vised No v 13, 2025 Accepted Jan 10, 2026 K eyw ords: Ev aluation and benchmarking Feature selection Information theory Mutual information Natural language to visualization ABSTRA CT T ranscribing natural language to visualization (NL2VIS) has been in v estig ated for years b ut still suf fer from se v eral fundamental limitations (e.g., feature selection). Although lar ge language models (LLMs) are good candidates b ut the y incur computation cost and hard to trace their made decisions. T o alle viate this proble m, we introduced an alternati v e information-theoretic frame w ork that utilized mutual information (MI) to quantify the statistical relationship between utterances and database features. In our approach, k ernel density estimation (KDE) a nd neural estimation techniques were utilized to estimate MI, and to optimize a di v ersity-promoting objecti v e balancing feature rele v ance and redundanc y . W e also introduced the information co v erage ratio (ICR) to quantify the amount of information content preserv ed in feature selection decisions. In our e xperiments, we found that the proposed approach impro v ed information-theoretic metrics, with F1-score of 0.863 and an ICR of 0.891. W e observ ed that these impro v ements did not come at the cost of traditional benchmarks: v alidity reached 88.9%, le g ality 85.2%, and chart-type accurac y 87.6%. Moreo v er , signicance tests ( p < 0 . 001 ) and lar ge ef fect sizes (Cohen’ s d > 0 . 8 ) further supported that these impro v ements were meaningful for feature selection. Thus, this study pro vides a mathematical frame w ork for applications requiring analytical v alidity that e xtends be yond NL2VIS to other machine learning conte xts. This is an open access article under the CC BY -SA license . Corresponding A uthor: V inh The Nguyen F aculty of Information T echnology , Thai Nguyen Uni v ersity of Information and Communication T echnology Thai Nguyen, V iet Nam Email: vinhnt@ictu.edu.vn 1. INTR ODUCTION In the era of big data, consuming a lar ge amount of information plays a crucial role in the decision-making proc ess, and data visualization (VIS) is a viable solution [1]–[3]. T raditional VIS tools relied on rules, heuristics and probability , creating a barrier for non-technical users [4]–[6]. Recently , natural language for data visualization (NL2VIS) has emer ged as one of the most promising approaches that allo ws users to generate visualizations (e.g., bar charts, line graphs, scatter plots, and heat maps) using only simple con v ersational utterances [7], [8]. F or instance, instead of writing a computer language such as “SELECT re gion, SUM(re v enue) FR OM sales WHERE date > = ’2023-01-01’ GR OUP BY re gion ORDER BY SUM(re v enue) DESC”, a user may use natural language “sho w me total sales by re gion this year”. The system then interprets the request and b uilds a corresponding visualization, as illustrated in Figure 1, which J ournal homepage: http://ijai.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
130 ISSN: 2252-8938 presents an e xample of the NL2VIS problem. Thus, this idea fundamentally shortens the g ap between domain e xperts and normal users in data analysis w orko ws [2], [5]. Figure 1. Example of NL2VIS problem In formal terms, the NL2VIS problem can be seen as a transformation ψ : ( Q , D ) V , where a natural-language query Q and dataset D are transformed into an appropriate visualization V = arg max V P ( V | Q, D ) . In practice, this mapping is rarely straightforw ard. The core challenge lies in feature selection which identify the subset of dataset attrib utes F F that best e xpresses the user’ s analytical intent I ( Q ) . Earlier approaches mostly treated this as a similarity-matching problem. Ho we v er , such approaches often f ailed to capture the probability distrib ution P ( F | Q ) , which describes ho w rele v ant each feature is to a gi v en query . As an ill ustration, traditi onal dependenc y models relied on Pearson’ s coef cient to approximate statistical relationships as indicated in (1). r xy = P n i =1 ( x i ¯ x )( y i ¯ y ) p P n i =1 ( x i ¯ x ) 2 p P n i =1 ( y i ¯ y ) 2 (1) This measures only capture linear relationships and f ail to detect comple x, non-linear dependencies between query intent I ( Q ) and features F . Recent ef forts [9]–[11] utilized machine learning approaches that learn non-linear relationships through neural netw orks. This formulation is e xpressed in (2). P ( F | Q, D ) = softmax ( f θ ( e Q , e D )) (2) Where f θ : R d Q + d D R |F | is a neural netw ork with parameters θ , e Q R d Q is the query embedding, and e D R d D is the dataset embedding. Due to the lack of data for training, especially understanding users’ intention, this approach has been adv anced by modern tools such as LID A [12] or V izagent [7], which emplo y lar ge language models (LLMs ) lik e GPT -4 to automate the visualization generation task. The primary limitation of utilizing LLMs is the ability to do sophisticated prompt engineering and consume e xtensi v e tok ens, which consequently incurs substantial computational costs [13]. As such, it presents a signicant barrier for researchers with constrained nancial resources to iterati v ely conduct e xperiments [14]. Furthermore, LLMs of fer more kno wledge (trained on a v ast amount of data on the internet) than needed in this proble m, so the research question is “can we tackle the same issue with an af fordable approach?”. From the aforementioned pain points, there is a need for an alternati v e solution that could balance the learned capabilities of LLMs with computational ef cienc y and accessibility [15], [16]. The sparking idea is to le v erage the state-of-the-art semantic understanding capabilities of pre-trained models while remaining lightweight and cost-ef fecti v e of applications. This thought moti v ates us to de v elop methods that can capture comple x query-feature dependencies through princi pled mathematical frame w orks, without the o v erhead associated with lar ge-scale language model deplo yment. Int J Artif Intell, V ol. 15, No. 1, February 2026: 129–139 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 131 Thus, the current study proposed a unique informat ion-theoretic approach for feature sel ection, particularly in NL2VIS systems. The proposed frame w ork pro vided mathematically grounded principles that mo v e be yond simple e xisting similarity measures. Building on prior surv e ys [3], [15], [16], we position NL2VIS as presented in T able 1 which reported in our e xperiments, and qualitati v e properties from prior w ork. T able 1. Comparati v e positioning of NL2VIS approaches (taxonomy) Criterion Rule-based Similarity-based Neural ranking LLM-based MI-based (Ours) Principle Heuristic Similarity Learned similarity Generati v e reasoning Information-theoretic T ypical methods Grammars / Rules Cosine; TF-IDF + Corr Contrasti v e ranking GPT -4 prompting; LID A; V izAgent KDE + MINE Interpretability High Medium Lo w Lo w–medium High Compute cost Lo w Lo w Medium High (tok en-dependent) Medium (+31.7% time) Accurac y N/A (task-specic) V al 82.3–85.7; Le g 74.1–78.9; F1 0.62–0.69; ICR 0.72–0.76 V al 87.8; Le g 84.1; F1 0.782; ICR 0.847 V al 93.4; Le g 89.7; F1 0.758; ICR 0.834 V al 88.9; Le g 85.2; F1 0.863; ICR 0.891 Notes T ransparent rules; brittle in open domains Simple; struggles with non-linear intent Learns non-linear patterns Strong UX/aesthetics; higher cost Principled, redundanc y-a w are selection 2. METHOD 2.1. Resear ch design T o address the limitations identied in e xisting NL2VIS systems, we propose a unique applicati o n of mutual information (MI) theory to the NL2VIS domain. While MI is a well-established concept in information theory [17], [18], its systematic application in NL2VIS systems remains une xplored [15], [19]. F or tw o discrete random v ariables X and Y , MI is e xpressed as (3). I ( X ; Y ) = X x ∈X X y ∈Y p ( x, y ) log p ( x, y ) p ( x ) p ( y ) (3) Where p ( x, y ) is the joint probability distrib ution of X and Y , and p ( x ) and p ( y ) are the mar ginal probability distrib utions. Alternati v ely , MI can also be e xpressed in terms of entrop y as in (4). I ( X ; Y ) = H ( X ) H ( X | Y ) = H ( Y ) H ( Y | X ) (4) Where H ( X ) = P x p ( x ) log p ( x ) is the Shannon entrop y [20] of X , and the conditional entrop y of X gi v en Y is dened as (5). H ( X | Y ) = X y p ( y ) X x p ( x | y ) log p ( x | y ) (5) Our frame w ork consists of four main components: query intent e xtraction, feature represent ation, MI computation, and optimizat ion-based feature selection. First, we transformed natural language queries (so called utterances) into higher dimensional spaces using pretrained language models. Mathematically , gi v en a query Q Q , we e xtracted its semantic representation as v Q R d where d is the number of dimensions in the continuous embedding spaces. F or feature repre sentation, each feature in the database f i is encoded as a multi-dimensional v ector v f i and that v ector contains semantic, statistical, and structural information. The semantic component utilizes some properties such as the feature name and metadata to create embeddings [21], [22], while the statistical component captures data dist rib ution characteristics such as cardinality , sk e wness, and data type. The structural component encodes relational information, including primary/foreign k e y relationships and table hierarchies [6]. The ultimate purpose of the transformation is to let features interact with each other . F ormally , we dene as (6). v f i = [ v sem ( f i ); v stat ( f i ); v str uct ( f i )] (6) where [; ] represents v ector concatenation. The core idea of our approach is to compute MI between continuous v ector representations. Originally , the MI is dened for discrete v ariables, we emplo y k ernel density estimation A utomated data e xplor ation with mutual information in natur al langua g e to ... (Minh Hue-Luong Thi) Evaluation Warning : The document was created with Spire.PDF for Python.
132 ISSN: 2252-8938 (KDE) to estimate probability densities for continuous v ectors [23], [24]. F or v ectors v Q and v f i , we estimate their joint density ˆ p ( v Q , v f i ) and mar ginal densities ˆ p ( v Q ) and ˆ p ( v f i ) using Gaussian k ernels as (7). ˆ p ( v ) = 1 n n X j =1 K h ( v v j ) (7) Where K h is a Gaussian k ernel with bandwidth h , and n is number of samples. The MI estimate becomes (8). ˆ I ( v q ; v f i ) = Z ˆ p ( v q , v f i ) log ˆ p ( v q , v f i ) ˆ p ( v q ) ˆ p ( v f i ) d v q d v f i (8) T o handle the computational comple xity of high-dimensional MI esti mation, we also in v estig ate neural estimation approaches. W e emplo y the mutual information neural estimation (MINE) frame w ork [25], which uses neural netw orks to approximate the K ullback-Lei bler di v er gence between the joint and product distrib utions. The MINE estimator is dened as (9). ˆ I M I N E ( X ; Y ) = sup θ E p ( x,y ) [ T θ ( x, y )] log E p ( x ) p ( y ) [ e T θ ( x,y ) ] (9) Where T θ is a neural netw ork parameterized by θ , and the supremum is tak en o v er all possible netw ork parameters. Our feature selection optimization objecti v e aims to identify the subset of features F that maximizes the total MI with the query intent while maintaining di v ersity am o ng selected features. W e formulate this as (10). F = arg max S ⊆F , |S |≤ k X f i ∈S I ( v q ; v f i ) λ X f i ,f j ∈S ,i ̸ = j I ( v f i ; v f j ) (10) Where k is the maximum number of features to select, and λ is a re gularization parameter that penalizes redundanc y between selected features. The rst term encourages selection of fea tures highly rele v ant to the query , while the s econd term promotes di v ersity by penalizing features that are highly correlated with each other . Since this optimization problem is NP-hard, we emplo y a greedy approximation algori thm that iterati v ely selects features based on their mar ginal contrib ution to the objecti v e function. At each step, we compute the incremental g ain of adding each remaining feature and select the one that maximizes as in (11). ∆( f i |S ) = I ( v q ; v f i ) λ X f j ∈S I ( v f i ; v f j ) (11) Where S is the current selected features set. Once the good candidate features were identied, we proceed to the ne xt stage of generating vi sualization. First, appropriate chart types and encoding assignments were determined. F or this task, we considered it as a classication problem, where the input features contains the selected features and query intent representation. F or the encoding assi gnment, we used a constraint satisf action approach t hat ensures visual encoding principles are respected while maximizi ng the utilization of the information content pro vided by the selected features. The whole pipeline of our proposed approach w as presented in Figure 2. 2.2. Ev aluation T o assess the ef fecti v eness of our proposed approach, we conducted comprehensi v e e xperiments with the V isEv al benchmark dataset [26]. W e also compared the current method ag ainst state-of-the-art NL2VIS systems. In the domain of visualization, there is a scarcity of dataset. Thus, Microsoft research curated V isEv al as a comprehensi v e benchmark for NL2VIS. In general, this dataset pro vided standardized items across di v erse domains such as b usiness intelligence, healthcare, social media, and nancial analytics. Here, each domain co v ers challenges in feature comple xity , semantic interpretation, and visualization requirements. The core v alue of this benchmark is that each item w as curated and annotated by domain e xperts with ground truth feature selections and optimal visualization specications. Ov erall, it assesses three critical dimensions: v alidity - whether the generated code can run and render gures, le g ality - if the rendered gure meets query requirements, and readability - whether t h e visualization can con v e y information to users. The rst tw o metrics were computed by the program while the latter metric w as conducted with 12 e xperts (rating the charts using Int J Artif Intell, V ol. 15, No. 1, February 2026: 129–139 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 133 Lik ert-scale of 5). This standardized and curated benchmark has been widely used re cently to e v aluate the performance of the ne wly de v eloped NL2VIS approach. In terms of performance, we compared our proposed approach with se v eral baseline methods that ha v e been reported in the niche eld of NL2VIS so f ar . The rst baseline used the cosine similarity method to estimate the direct correlation between query embeddings and feature embeddings, as implemented in systems lik e Data2V is [10]. The second baseline utilized term frequenc y in v erse document frequenc y (TF-IDF) weighted k e yw ord matching combined with statistical feature ranking. This method w as based on Pearson correlation coef cients. The third baseline used LLM (in this case, we used the current latest GPT -4) with carefully designed prompts to perform feature selection. Here, we reproduced the e xperiment of the e xisting LID A frame w ork [12] b ut using more adv anced LLM model. The fourth baseline implemented a neural ranking model trained on query-feature pairs using contrasti v e learning. T o the best of our kno wledge, all of these baselines represent recent adv ances for feature selection in NL2VIS. As pre viously mentioned, our e v aluation follo wed the V isEv al frame w ork in te rms of v alidity , le g ality , and readability . In addition, we also emplo yed accurac y , F1-score to e v aluate the structural accurac y of feature selection. F ormally , F1-score is e xpressed as (12). F1 = 2 TP 2 TP + FP + FN (12) Ho we v er , this metric considered all features equally no matter ho w indi vidual informational contrib utes to the query intent. T o alle viate this issue, we introduced a ne w metric called information co v erage ratio (ICR). ICR quanties the information-theoretic quality of feature selection decisions based on the (13). ICR = P f i ∈F I ( v q ; v f i ) P f j ∈F g t I ( v q ; v f j ) (13) Where F represents the predicted feature subset and F g t denotes the ground truth feature subset. Conceptually , ICR quanties ho w much of the total query-related information content in the opti mal feature selection is preserv ed by t he predicted selection. Compared to the F1 score, which only reects binary correctness, the ICR score tak es into account a continuous and information-weighted asses sment. That captures the de gree of analytical completeness. Figure 2. The pipeline of our proposed approach A utomated data e xplor ation with mutual information in natur al langua g e to ... (Minh Hue-Luong Thi) Evaluation Warning : The document was created with Spire.PDF for Python.
134 ISSN: 2252-8938 3. RESUL TS T able 2 presents the snapshot of performance comparison across the tw o k e y V isEv al e v aluation dimensions, F1-score and ICR. Experimental results on the V isEv al benchmark demonstrated the higher performance of our information-theoretic approach across a ll e v aluation dimensions in terms of F1-score and ICR. Figure 3 depicts the approximate linear relationship between F1-score and ICR across all e v aluated NL2VIS models. This near -linear trend con v e ys insight that while both m etrics are ali gned, ICR tends to yield higher v alues by weighting features due to their information contrib ution. This reinforce our assumption that the MI-based approach can capture not only structural correctness (as reected by F1) b ut also the depth of analytical information preserv ed in the selected feature subsets. T able 2. Performance comparison on V isEv al benchmark Method V alidity (%) Le g ality (%) F1 Score ICR Cosine similarity 82.3 74.1 0.623 0.721 TF-IDF + correlation 85.7 78.9 0.691 0.759 GPT -4 prompting 93.4 89.7 0.758 0.834 Neural ranking 87.8 84.1 0.782 0.847 Our method 88.9 85.2 0.863 0.891 Figure 3. Correlation between F1-score and ICR across NL2VIS models Back to T able 2, some interesting patterns were re v ealed between our information-theoretic approach and other methods. In terms of v alidity and le g ality , GPT -4 prompting achie v ed t he highest performance compared to other methods with 93.4% and 89.7% respecti v ely . This is not uncommon because this recent model w as trained on the v ast amount of data including code, thus, not surprisingly , demonstrates superior natural language understanding capabilities for interpreting user intent and producing code for generating visualizations. On the other hand, our method e xcelled in specialized information-theoretic measures: F1-score of 0.863 and ICR of 0.891. This performance pattern re v ealed a fundamental distinction: GPT -4 demonstrated superior semantic comprehension and visualizati on generation, b ut our approach pro vided more mathematically principled and statistically sound feature selection that ensures analytical correctness and e xplainable. Figure 4 compared the performance of dif ferent methods with respect to readability and chart accurac y . In terms of chart accurac y (compared to ground truth), results re v ealed that GPT -4 prompting took the lead with 91.2% compared to our method of 87.6%. This indicated that when pretrained on mass data, it can capture the relationship between user intent and commonly used charts accordingly . Ho we v er , our approach also demonstrated competiti v e performance while of fering adv antages in analytical solution and computational cost-ef fecti v eness. Readability scores sho wed GPT -4 achie ving 4.35/5.0 compared to our 4.18/5.0 which indicated that while adv anced language models produce more visually intuiti v e visualizations, our approach maintains high user satisf action le v els while pro viding stronger guarantees of analytical correctness. Int J Artif Intell, V ol. 15, No. 1, February 2026: 129–139 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 135 Figure 4. Comparison of chart accurac y and readability across NL2VIS methods T able 3 reports encoding appropriateness, processing time (sec) and time o v erhead of our proposed method wi th the best baseline performed on our de vice. When normalized the score, the encoding appropriateness achie v ed 0.824. This implied that MI demonstrated superior technical quality in systematic feature-to-encoding mappings that ensure statistical v alidity . In terms of computational ef cienc y , result sho wed t hat our approach incurred a computational trade-of f for impro v ed feature selection quality . That is, the a v erage processing time for feature selection ranges from 3.4 to 11.2 seconds per query compared to 2.1 to 8.9 seconds for the best baseline methods which resulted in a +31.7% time o v erhead. This computational cost reects the mathematical comple xity of MI estimation b ut deli v ers encoding appropriatenes s (0.824 vs 0.789). The KDE-based MI estimation accounted for approximately 60% of the total computation time, while our neural MINE estimation approach reduced this o v erhead by 35% with minimal accurac y trade-of fs (a v erage F1-score reduction of 0.023). W e also emplo yed some stat istical signicance testings to quantify our deeper understanding of the metrics. Result from t-test sho wed that there is a statistical dif ference between F1-score and ICR metric (p < 0.001 for F1-score and ICR metrics). While traditional metrics sho w mix ed results, with GPT -4 prompting leading in user e xperience measure s, our information-theoretic metrics demonstrated substanti al g ains. Ef fect size analysis using Cohen’ s d re v ealed lar ge ef fect sizes (d > 0.8) for F1-score and ICR comparisons. This indicated that impro v ements in feature selection quali ty are practically meaningful for real-w orld NL2VIS applications. These consistent performance g ains in the benchmark scenarios demonstrated the rob ustness of our MI approach for analytical feature selection. In addition, we also performed ablation studies which in v estig ated the contrib ution of dif ferent components. When we remo v ed the di v ersity re gularization term (setting λ = 0 in equation 10) results in a v erage F1-score decreases of 0.067. This implies the importance of promoti ng feature di v ersity . When we replaced the multi-modal feature representation with purely semantic embeddings, the performance w as reduced by 0.089 F1-score. Thi s suggests the v alue of incorporating statistical and structural information. When we used only KDE-based MI estimation without the neural MINE alternat i v e. The computation time w as increased by 43% without impro ving accurac y , and thus it supports our h ybrid approach. Error analysis re v ealed that the most challenging cases for our method in v olv ed queries with highly ambiguous intent or domains with uncon v entional feature naming con v entions. F or instance, user utterance such as “sho w me interesting patterns in the data” lack specic analytical direction, making it dif cult for an y automated method to identify rele v ant features. Similarly , datasets with cryptic column nam es (e.g., “col A ”, “v ar 123”) that pro vided no semantic information pose challenges for the semantic component of our feature representation. T able 3. Computational performance analysis Metric Our method Best baseline Encoding appropriateness 0.824 0.789 Processing time (sec) 3.4–11.2 2.1–8.9 T ime o v erhead +31.7% A utomated data e xplor ation with mutual information in natur al langua g e to ... (Minh Hue-Luong Thi) Evaluation Warning : The document was created with Spire.PDF for Python.
136 ISSN: 2252-8938 4. DISCUSSION The comprehensi v e e v aluation on the V isEv al benchmark demonstrated that our proposed solution yielded a reasonable solution in addressing fundamental challenges in NL2VIS systems. Similar to prior research [3], [15], we found that analytical frame w orks can outperform LLMs on domain-spec ic analytical correctness. The consistent performance impro v ements in specialized information-theoretic measures (F1-score of 0.863 and ICR of 0.891) suggested that MI pro vided a e xplainable mathematical foundation for quantifying the relationship between user intent and data features. The ICR complements precision–recall metrics by directly measuring information content preserv ation in feature selection [25]. In our e xperiment, GPT -4 stilled achie v ed higher scores in man y f acets (93.4% v alidity and 89.7% le g ality compared to our method’ s 88.9% and 85.2%, respecti v ely). This is e xplainable because it w as trained on a mass amount of data. In the pre vious GPT v ersions, the y were mainly trained on te xt from internet and only co v ered a small port ion of code, thus their performances were not e xpected. Ho we v er the y were still achie v ed higher score than con v entional approaches. Recently , GPT -4 w as trained more on code, thus in a recent benchmark for te xt-to-visualization, GPT -4 achie v ed the highest pass rate [26]. F or casual users without prompting techniques, GPT -4 acts lik e a blackbox because the result is inconsistent meaning that same query may gi v e dif ferent charts. Therefore, this g ap highlights a fundamental trade-of f: while GPT -4 demonstrates good understanding of user intent and produces runnable code (4.35/5.0 readability and 91.2% chart type accurac y) [19], our approach emphasizes on e xplainable decision via mathematical rigor and analytical correctness in feature selection. This distinction represents an insight for the eld that is the choice between con v ersati on a l uenc y and statistical soundness which depends on the specic application requirements. Our approach is similar to the K olmogoro v-Arnold netw orks idea [27] where we sacriced computation for e xplainable ones. In a broader conte xt, the current study contrib utes be yond the immediate application to NL2VIS systems. The frame w ork for computing MI between real/oat numbers posits a fundamental challenge in high-dimensional data [25], while maintaining the accurac y of MI estimation. In addition, the optimization function with a re gularization parameter enables for a principled selection of the feature set, and thus, it ensures both query rele v ance and a v oiding duplication of information between features. This intuition can be e xtended to man y other feature selection problems [28]. Finally , we attempted to use as much information as possible in the gi v en dataset to represent a feature (combining semantic, statistical, and structural information). This representation can also be useful for other domains that require a deeper understanding of the relationship between data structure and semantic meaning [29]. There are se v eral limitations in the current w ork that should be ackno wledged. First, we relied only a single LLM (GPT -4) to perform the e xperiment. This is due to computational cost i ncurred when using proprietary API. This implies that interested searchers could reproduce the w ork with dif ferent models. Second, our method utilized small pre-trained models such as bidirectional encoder representations from transformers (BER T) or rob ustly optimized BER T pretraining approach (RoBER T a), which sometimes do not capture domain-specic terms. Thus, unlik e GPT -4, our approach w as constrained by the semantic boundaries of the selected embedding models. Furthermore, while GPT -4 can handle ambiguous queries such as “sho w me something i nteresting” through cr eati v e interpretation that embedded in the model, our frame w ork required more specic analytical direction to perform ef fecti v e feature selection. Despite these limitations, we hope the MI frame w ork remains useful be yond NL2VIS, particularly in e xplainable AI and biomedical analytics, where understanding feature rele v ance and redundanc y is essential for transparent decision-making. In another f acet, our method performed ef ciently on datasets with small numbers of features. Ho we v er , the computation cost of MI estimation w ould gro ws rapidly at lar ger scales. As sho wed in the result section, the runtime increas ed by roughly 31.7% compared with baseline methods. Furthermore, the current frame w ork is dedicated for structured data, thus, it is less e xibl e than GPT -4 when dealing with di v erse data types or visualization settings. Ev en so, the MI-based selection mechanism w ould be promising be yond NL2VIS as it can help identify k e y sensor signals (continuous) for monitoring or f ault detection. In addition, its lightweight computations w ould t well with embedded or edge-le v el dashboards. 5. CONCLUSION In summary , the current study introduced an MI frame w ork for NL2VIS systems. The proposed method relied on MI to select features and impro v ed analytical accurac y . Specically , the model achie v ed an Int J Artif Intell, V ol. 15, No. 1, February 2026: 129–139 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 137 F1-score of 0.863 and an ICR of 0.891. It also maintained visualiza tion quality with 87.6% c hart-type accurac y . Moreo v er , the approach emphasized mathematical rigor instead of con v ersational uenc y . W e dened a ne w metric to measure information co v erage and optimize feature di v ersity . In addition, the current w ork can be e xtended to other machine-learning domains that required transparent feature selection. The computational cost remained a practical concern. Finally , future research planned to balance analytical precision with aesthetic quality through h ybrid models that combined e xplainable AI and LLMs. A CKNO WLEDGMENTS This research w as supported by the DH2025-TN07-05 project conducted at the Thai Nguyen Uni v ersity of Information and Communication T echnology , Thai Nguyen, V ietnam, with additional support from the AI&SE Lab . FUNDING INFORMA TION Authors state no funding in v olv ed. A UTHOR CONTRIB UTIONS ST A TEMENT This journal uses the Contrib utor Roles T axonomy (CRediT) to recognize indi vidual author contrib utions, reduce authorship disputes, and f acilitate collaboration. Name of A uthor C M So V a F o I R D O E V i Su P Fu Hue Luong-Thi-Minh V inh-The Nguyen V an-V iet Nguyen Kim-Son Nguyen Huu-Khanh Nguyen C : C onceptualization I : I n v estig ation V i : V i sualization M : M ethodology R : R esources Su : Su pervision So : So ftw are D : D ata Curation P : P roject Administration V a : V a lidation O : Writing - O riginal Draft Fu : Fu nding Acquisition F o : F o rmal Analysis E : Writing - Re vie w & E diting CONFLICT OF INTEREST ST A TEMENT The authors ha v e no nancial, personal, or professional relationships that could inappropriat ely inuence the research presented in this paper . The authors state no conict of interest. INFORMED CONSENT W e ha v e obtained informed consent from all indi viduals included in this study . ETHICAL APPR O V AL This research does not require ethical appro v al as it does not in v olv e human participants, a nimal subjects, or sensiti v e data. D A T A A V AILABILITY No ne w data were created or analyzed in this study . Results are based on the publicly a v ai lable V isEv al benchmark dataset. Implementation code is a v ailable upon reasonable request and will be released publicly after 24 months from publication, subject to project policies. A utomated data e xplor ation with mutual information in natur al langua g e to ... (Minh Hue-Luong Thi) Evaluation Warning : The document was created with Spire.PDF for Python.
138 ISSN: 2252-8938 REFERENCES [1] V . T . Nguyen, K. Jung, and V . Gupta, “Examining data visualization pi tf alls in scientic publications, V isual Computing for Industry , Biomedicine , and Art , v ol. 4, no. 1, Dec. 2021, doi: 10.1186/s42492-021-00092-y . [2] S. P ark, B. Bek emeier , A. Flaxman, and M. Schultz, “Impact of data visualization on decision-making and its implications for public health practice: a systematic literature re vie w , Informatics for Health and Social Car e , v ol. 47, no. 2, pp. 175–193, Apr . 2022, doi: 10.1080/17538157.2021.1982949. [3] A. W u et al ., AI4VIS: Surv e y on articial intelligence approaches for data visual ization, IEEE T r ansactions on V isualization and Computer Gr aphics , v ol. 28, no. 12, pp. 5049–5070, Dec. 2022, doi: 10.1109/TVCG.2021.3099002. [4] T .-V . Nguyen and T .-N. Phung, “Enhanced literature re vie w visualization: a no v el sorted stream graphs with inte grated w ord elements, in Advances in Information and Communication T ec hnolo gy (ICT A 2024) , Cham, Switzerland: Springer , 2024, pp. 159–168, doi: 10.1007/978-3-031-80943-9 17. [5] E. Hoque and M. S. Islam, “Natural language gene ration for visualizations: state of the art, challenges and future directions, Computer Gr aphics F orum , v ol. 44, no. 1, Feb . 2025, doi: 10.1111/cgf.15266. [6] K. Zhou, Z. Liu, R. Chen, L. Li, S.-H. Choi, and X. Hu, “T able2Graph: transforming tab ular data to unied weighted graph, in Pr oceedings of the Thirty-F ir st International J oint Confer ence on Articial Intellig ence , 2022, pp. 2420–2426, doi: 10.24963/ijcai.2022/336. [7] H. L. T . Minh, V . N. The, and T . Q. Xuan, “V izAgent: to w ards an intelligent and v ersatile data visualizati on frame w ork po wered by lar ge language models, in Advances in Information and Communication T ec hnolo gy (ICT A 2024) , Cham, Switzerland: Springer , 2024, pp. 89–97, doi: 10.1007/978-3-031-80943-9 10. [8] N. V . V iet et al ., “Re v olutionizing education: an e xtensi v e anal ysis of lar ge language models inte gration, International Resear c h J ournal of Science , T ec hnolo gy , Education, and Mana g ement , v ol. 4, no. 4, pp. 10-21, 2024, doi: 10.5281/zenodo.00000000. [9] Y . Luo, X. Qin, N. T ang, and G. Li, “DeepEye: to w ards automatic data visualization, in 2018 IEEE 34th International C onfer ence on Data Engineering (ICDE) , P aris: IEEE, Apr . 2018, pp. 101–112, doi: 10.1109/ICDE.2018.00019. [10] V . Dibia and C. Demiralp, “Data2V is: Automatic generation of data visualizations using sequence-to-sequence recurrent neural netw orks, IEEE Computer Gr aphics and Applications , v ol. 39, no. 5, pp. 33–46, Sep. 2019, doi: 10.1109/MCG.2019.2924636. [11] R. T abalba et al ., Articulate+: an al w ays-listening natural language interf ace for creating data visualizations, in Pr oceedings of the 4th Confer ence on Con ver sational User Interfaces , 2022, pp. 1–6, doi: 10.1145/3543829.3544534. [12] V . Dibia, “LID A: a tool for automatic generation of grammar -agnostic visualizations and infographics using lar ge language models, arXiv:2303.02927 , 2023. [13] G. K usano, K. Akimoto, and K. T ak eoka, “Re vis iting prompt engineering: a comprehensi v e e v aluation for LLM-based personalized recommendation, in Pr oceedings of the Nineteenth A CM Confer ence on Recommender Systems , Prague Czech Republic: A CM, Sep. 2025, pp. 832–841, doi: 10.1145/3705328.3748159. [14] B. Chen, Z. Zhang, N. Langren ´ e, and S. Zhu, “Unleashing the potential of prompt engineering for lar ge language models, P atterns , v ol. 6, no. 6, Jun. 2025, doi: 10.1016/j.patter .2025.101260. [15] L. Shen et al ., “T o w ards natural language interf aces for data visualization: a surv e y , IEEE T r ansactions on V isualization and Computer Gr aphics , v ol. 29, no. 6, pp. 3121–3144, Jun. 2023, doi: 10.1109/TVCG.2022.3148007. [16] W . Y ang, M. Liu, Z. W ang, and S. Liu, “F oundation models meet visualizations: challenges and opportunities, Computational V isual Media , v ol. 10, no. 3, pp. 399–424, Jun. 2024, doi: 10.1007/s41095-023-0393-x. [17] S. Liu and M. Motani, “Impro ving mutual information based feature selection by boosting unique rele v ance, J ournal of Articial Intellig ence Resear c h , v ol. 82, pp. 1267–1292, Mar . 2025, doi: 10.1613/jair .1.17219. [18] J. T ang, Y . Luo, M. Ouzzani, G. Li, and H. Chen, “Se vi: speech-to-visualization through neural machine translation, in Pr oceedings of the 2022 International Confer ence on Mana g ement of Data , 2022, pp. 2353–2356, doi: 10.1145/3514221.3520150. [19] P . Maddig an and T . Susnjak, “Chat2VIS: generating data visualizations via natural language using ChatGPT , Code x and GPT -3 lar ge language models, IEEE Access , v ol. 11, pp. 45181–45193, 2023, doi: 10.1109/A CCESS.2023.3274199. [20] P . Sarai v a, “On Shannon entrop y and its applications, K uwait J ournal of Science , v ol. 50, no. 3, pp. 194–199, Jul. 2023, doi: 10.1016/j.kjs.2023.05.004. [21] N. M. Gardazi, A. Daud, M. K. Malik, A. Bukhari, T . Alsah, and B. Alshemaimri, “BER T applications in natural language processing: a re vie w , Articial Intellig ence Re vie w , v ol. 58, no. 6, Mar . 2025, doi: 10.1007/s10462-025-11162-5. [22] H. Man, N. T . Ngo, V . D. Lai, R. A. Rossi, F . Dernoncourt, and T . H. Nguyen, “LUSIFER: language uni v ersal space inte gration for enhanced representation in multilingual te xt embedding models, in Pr oceedings of the 48th International A CM SIGIR Confer ence on Resear c h and De velopment in Information Retrie val , P adua, Italy: A CM, Jul. 2025, pp. 1360–1370, doi: 10.1145/3726302.3730029. [23] Y . Ning et al ., A mutual information theory-based approach for assessing uncertainties in deterministic multi-cate gory precipitation forecasts, W ater Resour ces Resear c h , v ol. 58, no. 11, No v . 2022, doi: 10.1029/2022WR032631. [24] A. Moreo, P . Gonz ´ alez, and J. J. D. Coz, “K ernel density estimation for multiclass quantication, Mac hine Learning , v ol. 114, no. 4, Apr . 2025, doi: 10.1007/s10994-024-06726-5. [25] M. I. Belghazi et al ., “Mutual information neural estimation, in Pr oceedings of the 35th International Confer ence on Mac hine Learning , PMLR, Jul. 2018, pp. 531–540. [26] N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Y ang, “V isEv al: a benchmark for data visualization in the era of lar ge language models, IEEE T r ansactions on V isualization and Computer Gr aphics , v ol. 31, no. 1, pp. 1301–1311, Jan. 2025, doi: 10.1109/TVCG.2024.3456320. [27] S. Somv anshi, S. A. Ja v ed, M. M. Islam, D. P andit, and S. Das, A surv e y on K olmogoro v-Arnold netw ork, A CM Computing Surve ys , v ol. 58, no. 2, pp. 1–35, Jan. 2026, doi: 10.1145/3743128. [28] H. Peng, F . Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependenc y , max-rele v ance, and min-redundanc y , IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence , v ol. 27, no. 8, pp. 1226–1238, Aug. 2005, doi: 10.1109/TP AMI.2005.159. [29] K. Hu, M. A. Bakk er , S. Li, T . Kraska, and C. Hidalgo, “V izml: a machine learning approach to visualization recommendation, in Pr oceedings of the 2019 CHI Confer ence on Human F actor s in Computing Systems , 2019, pp. 1–12, doi: 10.1145/3290605.3300358. Int J Artif Intell, V ol. 15, No. 1, February 2026: 129–139 Evaluation Warning : The document was created with Spire.PDF for Python.