IAES
Inter
national
J
our
nal
of
Articial
Intelligence
(IJ-AI)
V
ol.
15,
No.
1,
February
2026,
pp.
129
∼
139
ISSN:
2252-8938,
DOI:
10.11591/ijai.v15.i1.pp129-139
❒
129
A
utomated
data
exploration
with
mutual
inf
ormation
in
natural
language
to
visualization
Hue
Luong-Thi-Minh
1
,
V
inh-The
Nguy
en
1
,
V
an-V
iet
Nguy
en
1
,
Kim-Son
Nguy
en
1
,
Huu-Khanh
Nguy
en
2
1
F
aculty
of
Information
T
echnology
,
Thai
Nguyen
Uni
v
ersity
of
Information
and
Communication
T
echnology
,
Thai
Nguyen,
V
iet
Nam
2
Distance
Learning
Center
,
Thai
Nguyen
Uni
v
ersity
,
Thai
Nguyen,
V
iet
Nam
Article
Inf
o
Article
history:
Recei
v
ed
Sep
22,
2025
Re
vised
No
v
13,
2025
Accepted
Jan
10,
2026
K
eyw
ords:
Ev
aluation
and
benchmarking
Feature
selection
Information
theory
Mutual
information
Natural
language
to
visualization
ABSTRA
CT
T
ranscribing
natural
language
to
visualization
(NL2VIS)
has
been
in
v
estig
ated
for
years
b
ut
still
suf
fer
from
se
v
eral
fundamental
limitations
(e.g.,
feature
selection).
Although
lar
ge
language
models
(LLMs)
are
good
candidates
b
ut
the
y
incur
computation
cost
and
hard
to
trace
their
made
decisions.
T
o
alle
viate
this
proble
m,
we
introduced
an
alternati
v
e
information-theoretic
frame
w
ork
that
utilized
mutual
information
(MI)
to
quantify
the
statistical
relationship
between
utterances
and
database
features.
In
our
approach,
k
ernel
density
estimation
(KDE)
a
nd
neural
estimation
techniques
were
utilized
to
estimate
MI,
and
to
optimize
a
di
v
ersity-promoting
objecti
v
e
balancing
feature
rele
v
ance
and
redundanc
y
.
W
e
also
introduced
the
information
co
v
erage
ratio
(ICR)
to
quantify
the
amount
of
information
content
preserv
ed
in
feature
selection
decisions.
In
our
e
xperiments,
we
found
that
the
proposed
approach
impro
v
ed
information-theoretic
metrics,
with
F1-score
of
0.863
and
an
ICR
of
0.891.
W
e
observ
ed
that
these
impro
v
ements
did
not
come
at
the
cost
of
traditional
benchmarks:
v
alidity
reached
88.9%,
le
g
ality
85.2%,
and
chart-type
accurac
y
87.6%.
Moreo
v
er
,
signicance
tests
(
p
<
0
.
001
)
and
lar
ge
ef
fect
sizes
(Cohen’
s
d
>
0
.
8
)
further
supported
that
these
impro
v
ements
were
meaningful
for
feature
selection.
Thus,
this
study
pro
vides
a
mathematical
frame
w
ork
for
applications
requiring
analytical
v
alidity
that
e
xtends
be
yond
NL2VIS
to
other
machine
learning
conte
xts.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
V
inh
The
Nguyen
F
aculty
of
Information
T
echnology
,
Thai
Nguyen
Uni
v
ersity
of
Information
and
Communication
T
echnology
Thai
Nguyen,
V
iet
Nam
Email:
vinhnt@ictu.edu.vn
1.
INTR
ODUCTION
In
the
era
of
big
data,
consuming
a
lar
ge
amount
of
information
plays
a
crucial
role
in
the
decision-making
proc
ess,
and
data
visualization
(VIS)
is
a
viable
solution
[1]–[3].
T
raditional
VIS
tools
relied
on
rules,
heuristics
and
probability
,
creating
a
barrier
for
non-technical
users
[4]–[6].
Recently
,
natural
language
for
data
visualization
(NL2VIS)
has
emer
ged
as
one
of
the
most
promising
approaches
that
allo
ws
users
to
generate
visualizations
(e.g.,
bar
charts,
line
graphs,
scatter
plots,
and
heat
maps)
using
only
simple
con
v
ersational
utterances
[7],
[8].
F
or
instance,
instead
of
writing
a
computer
language
such
as
“SELECT
re
gion,
SUM(re
v
enue)
FR
OM
sales
WHERE
date
>
=
’2023-01-01’
GR
OUP
BY
re
gion
ORDER
BY
SUM(re
v
enue)
DESC”,
a
user
may
use
natural
language
“sho
w
me
total
sales
by
re
gion
this
year”.
The
system
then
interprets
the
request
and
b
uilds
a
corresponding
visualization,
as
illustrated
in
Figure
1,
which
J
ournal
homepage:
http://ijai.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
130
❒
ISSN:
2252-8938
presents
an
e
xample
of
the
NL2VIS
problem.
Thus,
this
idea
fundamentally
shortens
the
g
ap
between
domain
e
xperts
and
normal
users
in
data
analysis
w
orko
ws
[2],
[5].
Figure
1.
Example
of
NL2VIS
problem
In
formal
terms,
the
NL2VIS
problem
can
be
seen
as
a
transformation
ψ
:
(
Q
,
D
)
→
V
,
where
a
natural-language
query
Q
and
dataset
D
are
transformed
into
an
appropriate
visualization
V
∗
=
arg
max
V
P
(
V
|
Q,
D
)
.
In
practice,
this
mapping
is
rarely
straightforw
ard.
The
core
challenge
lies
in
feature
selection
which
identify
the
subset
of
dataset
attrib
utes
F
∗
⊆
F
that
best
e
xpresses
the
user’
s
analytical
intent
I
(
Q
)
.
Earlier
approaches
mostly
treated
this
as
a
similarity-matching
problem.
Ho
we
v
er
,
such
approaches
often
f
ailed
to
capture
the
probability
distrib
ution
P
(
F
|
Q
)
,
which
describes
ho
w
rele
v
ant
each
feature
is
to
a
gi
v
en
query
.
As
an
ill
ustration,
traditi
onal
dependenc
y
models
relied
on
Pearson’
s
coef
cient
to
approximate
statistical
relationships
as
indicated
in
(1).
r
xy
=
P
n
i
=1
(
x
i
−
¯
x
)(
y
i
−
¯
y
)
p
P
n
i
=1
(
x
i
−
¯
x
)
2
p
P
n
i
=1
(
y
i
−
¯
y
)
2
(1)
This
measures
only
capture
linear
relationships
and
f
ail
to
detect
comple
x,
non-linear
dependencies
between
query
intent
I
(
Q
)
and
features
F
.
Recent
ef
forts
[9]–[11]
utilized
machine
learning
approaches
that
learn
non-linear
relationships
through
neural
netw
orks.
This
formulation
is
e
xpressed
in
(2).
P
(
F
∗
|
Q,
D
)
=
softmax
(
f
θ
(
e
Q
,
e
D
))
(2)
Where
f
θ
:
R
d
Q
+
d
D
→
R
|F
|
is
a
neural
netw
ork
with
parameters
θ
,
e
Q
∈
R
d
Q
is
the
query
embedding,
and
e
D
∈
R
d
D
is
the
dataset
embedding.
Due
to
the
lack
of
data
for
training,
especially
understanding
users’
intention,
this
approach
has
been
adv
anced
by
modern
tools
such
as
LID
A
[12]
or
V
izagent
[7],
which
emplo
y
lar
ge
language
models
(LLMs
)
lik
e
GPT
-4
to
automate
the
visualization
generation
task.
The
primary
limitation
of
utilizing
LLMs
is
the
ability
to
do
sophisticated
prompt
engineering
and
consume
e
xtensi
v
e
tok
ens,
which
consequently
incurs
substantial
computational
costs
[13].
As
such,
it
presents
a
signicant
barrier
for
researchers
with
constrained
nancial
resources
to
iterati
v
ely
conduct
e
xperiments
[14].
Furthermore,
LLMs
of
fer
more
kno
wledge
(trained
on
a
v
ast
amount
of
data
on
the
internet)
than
needed
in
this
proble
m,
so
the
research
question
is
“can
we
tackle
the
same
issue
with
an
af
fordable
approach?”.
From
the
aforementioned
pain
points,
there
is
a
need
for
an
alternati
v
e
solution
that
could
balance
the
learned
capabilities
of
LLMs
with
computational
ef
cienc
y
and
accessibility
[15],
[16].
The
sparking
idea
is
to
le
v
erage
the
state-of-the-art
semantic
understanding
capabilities
of
pre-trained
models
while
remaining
lightweight
and
cost-ef
fecti
v
e
of
applications.
This
thought
moti
v
ates
us
to
de
v
elop
methods
that
can
capture
comple
x
query-feature
dependencies
through
princi
pled
mathematical
frame
w
orks,
without
the
o
v
erhead
associated
with
lar
ge-scale
language
model
deplo
yment.
Int
J
Artif
Intell,
V
ol.
15,
No.
1,
February
2026:
129–139
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
131
Thus,
the
current
study
proposed
a
unique
informat
ion-theoretic
approach
for
feature
sel
ection,
particularly
in
NL2VIS
systems.
The
proposed
frame
w
ork
pro
vided
mathematically
grounded
principles
that
mo
v
e
be
yond
simple
e
xisting
similarity
measures.
Building
on
prior
surv
e
ys
[3],
[15],
[16],
we
position
NL2VIS
as
presented
in
T
able
1
which
reported
in
our
e
xperiments,
and
qualitati
v
e
properties
from
prior
w
ork.
T
able
1.
Comparati
v
e
positioning
of
NL2VIS
approaches
(taxonomy)
Criterion
Rule-based
Similarity-based
Neural
ranking
LLM-based
MI-based
(Ours)
Principle
Heuristic
Similarity
Learned
similarity
Generati
v
e
reasoning
Information-theoretic
T
ypical
methods
Grammars
/
Rules
Cosine;
TF-IDF
+
Corr
Contrasti
v
e
ranking
GPT
-4
prompting;
LID
A;
V
izAgent
KDE
+
MINE
Interpretability
High
Medium
Lo
w
Lo
w–medium
High
Compute
cost
Lo
w
Lo
w
Medium
High
(tok
en-dependent)
Medium
(+31.7%
time)
Accurac
y
N/A
(task-specic)
V
al
82.3–85.7;
Le
g
74.1–78.9;
F1
0.62–0.69;
ICR
0.72–0.76
V
al
87.8;
Le
g
84.1;
F1
0.782;
ICR
0.847
V
al
93.4;
Le
g
89.7;
F1
0.758;
ICR
0.834
V
al
88.9;
Le
g
85.2;
F1
0.863;
ICR
0.891
Notes
T
ransparent
rules;
brittle
in
open
domains
Simple;
struggles
with
non-linear
intent
Learns
non-linear
patterns
Strong
UX/aesthetics;
higher
cost
Principled,
redundanc
y-a
w
are
selection
2.
METHOD
2.1.
Resear
ch
design
T
o
address
the
limitations
identied
in
e
xisting
NL2VIS
systems,
we
propose
a
unique
applicati
o
n
of
mutual
information
(MI)
theory
to
the
NL2VIS
domain.
While
MI
is
a
well-established
concept
in
information
theory
[17],
[18],
its
systematic
application
in
NL2VIS
systems
remains
une
xplored
[15],
[19].
F
or
tw
o
discrete
random
v
ariables
X
and
Y
,
MI
is
e
xpressed
as
(3).
I
(
X
;
Y
)
=
X
x
∈X
X
y
∈Y
p
(
x,
y
)
log
p
(
x,
y
)
p
(
x
)
p
(
y
)
(3)
Where
p
(
x,
y
)
is
the
joint
probability
distrib
ution
of
X
and
Y
,
and
p
(
x
)
and
p
(
y
)
are
the
mar
ginal
probability
distrib
utions.
Alternati
v
ely
,
MI
can
also
be
e
xpressed
in
terms
of
entrop
y
as
in
(4).
I
(
X
;
Y
)
=
H
(
X
)
−
H
(
X
|
Y
)
=
H
(
Y
)
−
H
(
Y
|
X
)
(4)
Where
H
(
X
)
=
−
P
x
p
(
x
)
log
p
(
x
)
is
the
Shannon
entrop
y
[20]
of
X
,
and
the
conditional
entrop
y
of
X
gi
v
en
Y
is
dened
as
(5).
H
(
X
|
Y
)
=
−
X
y
p
(
y
)
X
x
p
(
x
|
y
)
log
p
(
x
|
y
)
(5)
Our
frame
w
ork
consists
of
four
main
components:
query
intent
e
xtraction,
feature
represent
ation,
MI
computation,
and
optimizat
ion-based
feature
selection.
First,
we
transformed
natural
language
queries
(so
called
utterances)
into
higher
dimensional
spaces
using
pretrained
language
models.
Mathematically
,
gi
v
en
a
query
Q
∈
Q
,
we
e
xtracted
its
semantic
representation
as
v
Q
∈
R
d
where
d
is
the
number
of
dimensions
in
the
continuous
embedding
spaces.
F
or
feature
repre
sentation,
each
feature
in
the
database
f
i
is
encoded
as
a
multi-dimensional
v
ector
v
f
i
and
that
v
ector
contains
semantic,
statistical,
and
structural
information.
The
semantic
component
utilizes
some
properties
such
as
the
feature
name
and
metadata
to
create
embeddings
[21],
[22],
while
the
statistical
component
captures
data
dist
rib
ution
characteristics
such
as
cardinality
,
sk
e
wness,
and
data
type.
The
structural
component
encodes
relational
information,
including
primary/foreign
k
e
y
relationships
and
table
hierarchies
[6].
The
ultimate
purpose
of
the
transformation
is
to
let
features
interact
with
each
other
.
F
ormally
,
we
dene
as
(6).
v
f
i
=
[
v
sem
(
f
i
);
v
stat
(
f
i
);
v
str
uct
(
f
i
)]
(6)
where
[;
]
represents
v
ector
concatenation.
The
core
idea
of
our
approach
is
to
compute
MI
between
continuous
v
ector
representations.
Originally
,
the
MI
is
dened
for
discrete
v
ariables,
we
emplo
y
k
ernel
density
estimation
A
utomated
data
e
xplor
ation
with
mutual
information
in
natur
al
langua
g
e
to
...
(Minh
Hue-Luong
Thi)
Evaluation Warning : The document was created with Spire.PDF for Python.
132
❒
ISSN:
2252-8938
(KDE)
to
estimate
probability
densities
for
continuous
v
ectors
[23],
[24].
F
or
v
ectors
v
Q
and
v
f
i
,
we
estimate
their
joint
density
ˆ
p
(
v
Q
,
v
f
i
)
and
mar
ginal
densities
ˆ
p
(
v
Q
)
and
ˆ
p
(
v
f
i
)
using
Gaussian
k
ernels
as
(7).
ˆ
p
(
v
)
=
1
n
n
X
j
=1
K
h
(
v
−
v
j
)
(7)
Where
K
h
is
a
Gaussian
k
ernel
with
bandwidth
h
,
and
n
is
number
of
samples.
The
MI
estimate
becomes
(8).
ˆ
I
(
v
q
;
v
f
i
)
=
Z
ˆ
p
(
v
q
,
v
f
i
)
log
ˆ
p
(
v
q
,
v
f
i
)
ˆ
p
(
v
q
)
ˆ
p
(
v
f
i
)
d
v
q
d
v
f
i
(8)
T
o
handle
the
computational
comple
xity
of
high-dimensional
MI
esti
mation,
we
also
in
v
estig
ate
neural
estimation
approaches.
W
e
emplo
y
the
mutual
information
neural
estimation
(MINE)
frame
w
ork
[25],
which
uses
neural
netw
orks
to
approximate
the
K
ullback-Lei
bler
di
v
er
gence
between
the
joint
and
product
distrib
utions.
The
MINE
estimator
is
dened
as
(9).
ˆ
I
M
I
N
E
(
X
;
Y
)
=
sup
θ
E
p
(
x,y
)
[
T
θ
(
x,
y
)]
−
log
E
p
(
x
)
p
(
y
)
[
e
T
θ
(
x,y
)
]
(9)
Where
T
θ
is
a
neural
netw
ork
parameterized
by
θ
,
and
the
supremum
is
tak
en
o
v
er
all
possible
netw
ork
parameters.
Our
feature
selection
optimization
objecti
v
e
aims
to
identify
the
subset
of
features
F
∗
that
maximizes
the
total
MI
with
the
query
intent
while
maintaining
di
v
ersity
am
o
ng
selected
features.
W
e
formulate
this
as
(10).
F
∗
=
arg
max
S
⊆F
,
|S
|≤
k
X
f
i
∈S
I
(
v
q
;
v
f
i
)
−
λ
X
f
i
,f
j
∈S
,i
̸
=
j
I
(
v
f
i
;
v
f
j
)
(10)
Where
k
is
the
maximum
number
of
features
to
select,
and
λ
is
a
re
gularization
parameter
that
penalizes
redundanc
y
between
selected
features.
The
rst
term
encourages
selection
of
fea
tures
highly
rele
v
ant
to
the
query
,
while
the
s
econd
term
promotes
di
v
ersity
by
penalizing
features
that
are
highly
correlated
with
each
other
.
Since
this
optimization
problem
is
NP-hard,
we
emplo
y
a
greedy
approximation
algori
thm
that
iterati
v
ely
selects
features
based
on
their
mar
ginal
contrib
ution
to
the
objecti
v
e
function.
At
each
step,
we
compute
the
incremental
g
ain
of
adding
each
remaining
feature
and
select
the
one
that
maximizes
as
in
(11).
∆(
f
i
|S
)
=
I
(
v
q
;
v
f
i
)
−
λ
X
f
j
∈S
I
(
v
f
i
;
v
f
j
)
(11)
Where
S
is
the
current
selected
features
set.
Once
the
good
candidate
features
were
identied,
we
proceed
to
the
ne
xt
stage
of
generating
vi
sualization.
First,
appropriate
chart
types
and
encoding
assignments
were
determined.
F
or
this
task,
we
considered
it
as
a
classication
problem,
where
the
input
features
contains
the
selected
features
and
query
intent
representation.
F
or
the
encoding
assi
gnment,
we
used
a
constraint
satisf
action
approach
t
hat
ensures
visual
encoding
principles
are
respected
while
maximizi
ng
the
utilization
of
the
information
content
pro
vided
by
the
selected
features.
The
whole
pipeline
of
our
proposed
approach
w
as
presented
in
Figure
2.
2.2.
Ev
aluation
T
o
assess
the
ef
fecti
v
eness
of
our
proposed
approach,
we
conducted
comprehensi
v
e
e
xperiments
with
the
V
isEv
al
benchmark
dataset
[26].
W
e
also
compared
the
current
method
ag
ainst
state-of-the-art
NL2VIS
systems.
In
the
domain
of
visualization,
there
is
a
scarcity
of
dataset.
Thus,
Microsoft
research
curated
V
isEv
al
as
a
comprehensi
v
e
benchmark
for
NL2VIS.
In
general,
this
dataset
pro
vided
standardized
items
across
di
v
erse
domains
such
as
b
usiness
intelligence,
healthcare,
social
media,
and
nancial
analytics.
Here,
each
domain
co
v
ers
challenges
in
feature
comple
xity
,
semantic
interpretation,
and
visualization
requirements.
The
core
v
alue
of
this
benchmark
is
that
each
item
w
as
curated
and
annotated
by
domain
e
xperts
with
ground
truth
feature
selections
and
optimal
visualization
specications.
Ov
erall,
it
assesses
three
critical
dimensions:
v
alidity
-
whether
the
generated
code
can
run
and
render
gures,
le
g
ality
-
if
the
rendered
gure
meets
query
requirements,
and
readability
-
whether
t
h
e
visualization
can
con
v
e
y
information
to
users.
The
rst
tw
o
metrics
were
computed
by
the
program
while
the
latter
metric
w
as
conducted
with
12
e
xperts
(rating
the
charts
using
Int
J
Artif
Intell,
V
ol.
15,
No.
1,
February
2026:
129–139
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
133
Lik
ert-scale
of
5).
This
standardized
and
curated
benchmark
has
been
widely
used
re
cently
to
e
v
aluate
the
performance
of
the
ne
wly
de
v
eloped
NL2VIS
approach.
In
terms
of
performance,
we
compared
our
proposed
approach
with
se
v
eral
baseline
methods
that
ha
v
e
been
reported
in
the
niche
eld
of
NL2VIS
so
f
ar
.
The
rst
baseline
used
the
cosine
similarity
method
to
estimate
the
direct
correlation
between
query
embeddings
and
feature
embeddings,
as
implemented
in
systems
lik
e
Data2V
is
[10].
The
second
baseline
utilized
term
frequenc
y
–
in
v
erse
document
frequenc
y
(TF-IDF)
weighted
k
e
yw
ord
matching
combined
with
statistical
feature
ranking.
This
method
w
as
based
on
Pearson
correlation
coef
cients.
The
third
baseline
used
LLM
(in
this
case,
we
used
the
current
latest
GPT
-4)
with
carefully
designed
prompts
to
perform
feature
selection.
Here,
we
reproduced
the
e
xperiment
of
the
e
xisting
LID
A
frame
w
ork
[12]
b
ut
using
more
adv
anced
LLM
model.
The
fourth
baseline
implemented
a
neural
ranking
model
trained
on
query-feature
pairs
using
contrasti
v
e
learning.
T
o
the
best
of
our
kno
wledge,
all
of
these
baselines
represent
recent
adv
ances
for
feature
selection
in
NL2VIS.
As
pre
viously
mentioned,
our
e
v
aluation
follo
wed
the
V
isEv
al
frame
w
ork
in
te
rms
of
v
alidity
,
le
g
ality
,
and
readability
.
In
addition,
we
also
emplo
yed
accurac
y
,
F1-score
to
e
v
aluate
the
structural
accurac
y
of
feature
selection.
F
ormally
,
F1-score
is
e
xpressed
as
(12).
F1
=
2
TP
2
TP
+
FP
+
FN
(12)
Ho
we
v
er
,
this
metric
considered
all
features
equally
no
matter
ho
w
indi
vidual
informational
contrib
utes
to
the
query
intent.
T
o
alle
viate
this
issue,
we
introduced
a
ne
w
metric
called
information
co
v
erage
ratio
(ICR).
ICR
quanties
the
information-theoretic
quality
of
feature
selection
decisions
based
on
the
(13).
ICR
=
P
f
i
∈F
∗
I
(
v
q
;
v
f
i
)
P
f
j
∈F
g
t
I
(
v
q
;
v
f
j
)
(13)
Where
F
∗
represents
the
predicted
feature
subset
and
F
g
t
denotes
the
ground
truth
feature
subset.
Conceptually
,
ICR
quanties
ho
w
much
of
the
total
query-related
information
content
in
the
opti
mal
feature
selection
is
preserv
ed
by
t
he
predicted
selection.
Compared
to
the
F1
score,
which
only
reects
binary
correctness,
the
ICR
score
tak
es
into
account
a
continuous
and
information-weighted
asses
sment.
That
captures
the
de
gree
of
analytical
completeness.
Figure
2.
The
pipeline
of
our
proposed
approach
A
utomated
data
e
xplor
ation
with
mutual
information
in
natur
al
langua
g
e
to
...
(Minh
Hue-Luong
Thi)
Evaluation Warning : The document was created with Spire.PDF for Python.
134
❒
ISSN:
2252-8938
3.
RESUL
TS
T
able
2
presents
the
snapshot
of
performance
comparison
across
the
tw
o
k
e
y
V
isEv
al
e
v
aluation
dimensions,
F1-score
and
ICR.
Experimental
results
on
the
V
isEv
al
benchmark
demonstrated
the
higher
performance
of
our
information-theoretic
approach
across
a
ll
e
v
aluation
dimensions
in
terms
of
F1-score
and
ICR.
Figure
3
depicts
the
approximate
linear
relationship
between
F1-score
and
ICR
across
all
e
v
aluated
NL2VIS
models.
This
near
-linear
trend
con
v
e
ys
insight
that
while
both
m
etrics
are
ali
gned,
ICR
tends
to
yield
higher
v
alues
by
weighting
features
due
to
their
information
contrib
ution.
This
reinforce
our
assumption
that
the
MI-based
approach
can
capture
not
only
structural
correctness
(as
reected
by
F1)
b
ut
also
the
depth
of
analytical
information
preserv
ed
in
the
selected
feature
subsets.
T
able
2.
Performance
comparison
on
V
isEv
al
benchmark
Method
V
alidity
(%)
Le
g
ality
(%)
F1
Score
ICR
Cosine
similarity
82.3
74.1
0.623
0.721
TF-IDF
+
correlation
85.7
78.9
0.691
0.759
GPT
-4
prompting
93.4
89.7
0.758
0.834
Neural
ranking
87.8
84.1
0.782
0.847
Our
method
88.9
85.2
0.863
0.891
Figure
3.
Correlation
between
F1-score
and
ICR
across
NL2VIS
models
Back
to
T
able
2,
some
interesting
patterns
were
re
v
ealed
between
our
information-theoretic
approach
and
other
methods.
In
terms
of
v
alidity
and
le
g
ality
,
GPT
-4
prompting
achie
v
ed
t
he
highest
performance
compared
to
other
methods
with
93.4%
and
89.7%
respecti
v
ely
.
This
is
not
uncommon
because
this
recent
model
w
as
trained
on
the
v
ast
amount
of
data
including
code,
thus,
not
surprisingly
,
demonstrates
superior
natural
language
understanding
capabilities
for
interpreting
user
intent
and
producing
code
for
generating
visualizations.
On
the
other
hand,
our
method
e
xcelled
in
specialized
information-theoretic
measures:
F1-score
of
0.863
and
ICR
of
0.891.
This
performance
pattern
re
v
ealed
a
fundamental
distinction:
GPT
-4
demonstrated
superior
semantic
comprehension
and
visualizati
on
generation,
b
ut
our
approach
pro
vided
more
mathematically
principled
and
statistically
sound
feature
selection
that
ensures
analytical
correctness
and
e
xplainable.
Figure
4
compared
the
performance
of
dif
ferent
methods
with
respect
to
readability
and
chart
accurac
y
.
In
terms
of
chart
accurac
y
(compared
to
ground
truth),
results
re
v
ealed
that
GPT
-4
prompting
took
the
lead
with
91.2%
compared
to
our
method
of
87.6%.
This
indicated
that
when
pretrained
on
mass
data,
it
can
capture
the
relationship
between
user
intent
and
commonly
used
charts
accordingly
.
Ho
we
v
er
,
our
approach
also
demonstrated
competiti
v
e
performance
while
of
fering
adv
antages
in
analytical
solution
and
computational
cost-ef
fecti
v
eness.
Readability
scores
sho
wed
GPT
-4
achie
ving
4.35/5.0
compared
to
our
4.18/5.0
which
indicated
that
while
adv
anced
language
models
produce
more
visually
intuiti
v
e
visualizations,
our
approach
maintains
high
user
satisf
action
le
v
els
while
pro
viding
stronger
guarantees
of
analytical
correctness.
Int
J
Artif
Intell,
V
ol.
15,
No.
1,
February
2026:
129–139
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
135
Figure
4.
Comparison
of
chart
accurac
y
and
readability
across
NL2VIS
methods
T
able
3
reports
encoding
appropriateness,
processing
time
(sec)
and
time
o
v
erhead
of
our
proposed
method
wi
th
the
best
baseline
performed
on
our
de
vice.
When
normalized
the
score,
the
encoding
appropriateness
achie
v
ed
0.824.
This
implied
that
MI
demonstrated
superior
technical
quality
in
systematic
feature-to-encoding
mappings
that
ensure
statistical
v
alidity
.
In
terms
of
computational
ef
cienc
y
,
result
sho
wed
t
hat
our
approach
incurred
a
computational
trade-of
f
for
impro
v
ed
feature
selection
quality
.
That
is,
the
a
v
erage
processing
time
for
feature
selection
ranges
from
3.4
to
11.2
seconds
per
query
compared
to
2.1
to
8.9
seconds
for
the
best
baseline
methods
which
resulted
in
a
+31.7%
time
o
v
erhead.
This
computational
cost
reects
the
mathematical
comple
xity
of
MI
estimation
b
ut
deli
v
ers
encoding
appropriatenes
s
(0.824
vs
0.789).
The
KDE-based
MI
estimation
accounted
for
approximately
60%
of
the
total
computation
time,
while
our
neural
MINE
estimation
approach
reduced
this
o
v
erhead
by
35%
with
minimal
accurac
y
trade-of
fs
(a
v
erage
F1-score
reduction
of
0.023).
W
e
also
emplo
yed
some
stat
istical
signicance
testings
to
quantify
our
deeper
understanding
of
the
metrics.
Result
from
t-test
sho
wed
that
there
is
a
statistical
dif
ference
between
F1-score
and
ICR
metric
(p
<
0.001
for
F1-score
and
ICR
metrics).
While
traditional
metrics
sho
w
mix
ed
results,
with
GPT
-4
prompting
leading
in
user
e
xperience
measure
s,
our
information-theoretic
metrics
demonstrated
substanti
al
g
ains.
Ef
fect
size
analysis
using
Cohen’
s
d
re
v
ealed
lar
ge
ef
fect
sizes
(d
>
0.8)
for
F1-score
and
ICR
comparisons.
This
indicated
that
impro
v
ements
in
feature
selection
quali
ty
are
practically
meaningful
for
real-w
orld
NL2VIS
applications.
These
consistent
performance
g
ains
in
the
benchmark
scenarios
demonstrated
the
rob
ustness
of
our
MI
approach
for
analytical
feature
selection.
In
addition,
we
also
performed
ablation
studies
which
in
v
estig
ated
the
contrib
ution
of
dif
ferent
components.
When
we
remo
v
ed
the
di
v
ersity
re
gularization
term
(setting
λ
=
0
in
equation
10)
results
in
a
v
erage
F1-score
decreases
of
0.067.
This
implies
the
importance
of
promoti
ng
feature
di
v
ersity
.
When
we
replaced
the
multi-modal
feature
representation
with
purely
semantic
embeddings,
the
performance
w
as
reduced
by
0.089
F1-score.
Thi
s
suggests
the
v
alue
of
incorporating
statistical
and
structural
information.
When
we
used
only
KDE-based
MI
estimation
without
the
neural
MINE
alternat
i
v
e.
The
computation
time
w
as
increased
by
43%
without
impro
ving
accurac
y
,
and
thus
it
supports
our
h
ybrid
approach.
Error
analysis
re
v
ealed
that
the
most
challenging
cases
for
our
method
in
v
olv
ed
queries
with
highly
ambiguous
intent
or
domains
with
uncon
v
entional
feature
naming
con
v
entions.
F
or
instance,
user
utterance
such
as
“sho
w
me
interesting
patterns
in
the
data”
lack
specic
analytical
direction,
making
it
dif
cult
for
an
y
automated
method
to
identify
rele
v
ant
features.
Similarly
,
datasets
with
cryptic
column
nam
es
(e.g.,
“col
A
”,
“v
ar
123”)
that
pro
vided
no
semantic
information
pose
challenges
for
the
semantic
component
of
our
feature
representation.
T
able
3.
Computational
performance
analysis
Metric
Our
method
Best
baseline
Encoding
appropriateness
0.824
0.789
Processing
time
(sec)
3.4–11.2
2.1–8.9
T
ime
o
v
erhead
+31.7%
A
utomated
data
e
xplor
ation
with
mutual
information
in
natur
al
langua
g
e
to
...
(Minh
Hue-Luong
Thi)
Evaluation Warning : The document was created with Spire.PDF for Python.
136
❒
ISSN:
2252-8938
4.
DISCUSSION
The
comprehensi
v
e
e
v
aluation
on
the
V
isEv
al
benchmark
demonstrated
that
our
proposed
solution
yielded
a
reasonable
solution
in
addressing
fundamental
challenges
in
NL2VIS
systems.
Similar
to
prior
research
[3],
[15],
we
found
that
analytical
frame
w
orks
can
outperform
LLMs
on
domain-spec
ic
analytical
correctness.
The
consistent
performance
impro
v
ements
in
specialized
information-theoretic
measures
(F1-score
of
0.863
and
ICR
of
0.891)
suggested
that
MI
pro
vided
a
e
xplainable
mathematical
foundation
for
quantifying
the
relationship
between
user
intent
and
data
features.
The
ICR
complements
precision–recall
metrics
by
directly
measuring
information
content
preserv
ation
in
feature
selection
[25].
In
our
e
xperiment,
GPT
-4
stilled
achie
v
ed
higher
scores
in
man
y
f
acets
(93.4%
v
alidity
and
89.7%
le
g
ality
compared
to
our
method’
s
88.9%
and
85.2%,
respecti
v
ely).
This
is
e
xplainable
because
it
w
as
trained
on
a
mass
amount
of
data.
In
the
pre
vious
GPT
v
ersions,
the
y
were
mainly
trained
on
te
xt
from
internet
and
only
co
v
ered
a
small
port
ion
of
code,
thus
their
performances
were
not
e
xpected.
Ho
we
v
er
the
y
were
still
achie
v
ed
higher
score
than
con
v
entional
approaches.
Recently
,
GPT
-4
w
as
trained
more
on
code,
thus
in
a
recent
benchmark
for
te
xt-to-visualization,
GPT
-4
achie
v
ed
the
highest
pass
rate
[26].
F
or
casual
users
without
prompting
techniques,
GPT
-4
acts
lik
e
a
blackbox
because
the
result
is
inconsistent
–
meaning
that
same
query
may
gi
v
e
dif
ferent
charts.
Therefore,
this
g
ap
highlights
a
fundamental
trade-of
f:
while
GPT
-4
demonstrates
good
understanding
of
user
intent
and
produces
runnable
code
(4.35/5.0
readability
and
91.2%
chart
type
accurac
y)
[19],
our
approach
emphasizes
on
e
xplainable
decision
via
mathematical
rigor
and
analytical
correctness
in
feature
selection.
This
distinction
represents
an
insight
for
the
eld
—
that
is
the
choice
between
con
v
ersati
on
a
l
uenc
y
and
statistical
soundness
which
depends
on
the
specic
application
requirements.
Our
approach
is
similar
to
the
K
olmogoro
v-Arnold
netw
orks
idea
[27]
where
we
sacriced
computation
for
e
xplainable
ones.
In
a
broader
conte
xt,
the
current
study
contrib
utes
be
yond
the
immediate
application
to
NL2VIS
systems.
The
frame
w
ork
for
computing
MI
between
real/oat
numbers
posits
a
fundamental
challenge
in
high-dimensional
data
[25],
while
maintaining
the
accurac
y
of
MI
estimation.
In
addition,
the
optimization
function
with
a
re
gularization
parameter
enables
for
a
principled
selection
of
the
feature
set,
and
thus,
it
ensures
both
query
rele
v
ance
and
a
v
oiding
duplication
of
information
between
features.
This
intuition
can
be
e
xtended
to
man
y
other
feature
selection
problems
[28].
Finally
,
we
attempted
to
use
as
much
information
as
possible
in
the
gi
v
en
dataset
to
represent
a
feature
(combining
semantic,
statistical,
and
structural
information).
This
representation
can
also
be
useful
for
other
domains
that
require
a
deeper
understanding
of
the
relationship
between
data
structure
and
semantic
meaning
[29].
There
are
se
v
eral
limitations
in
the
current
w
ork
that
should
be
ackno
wledged.
First,
we
relied
only
a
single
LLM
(GPT
-4)
to
perform
the
e
xperiment.
This
is
due
to
computational
cost
i
ncurred
when
using
proprietary
API.
This
implies
that
interested
searchers
could
reproduce
the
w
ork
with
dif
ferent
models.
Second,
our
method
utilized
small
pre-trained
models
such
as
bidirectional
encoder
representations
from
transformers
(BER
T)
or
rob
ustly
optimized
BER
T
pretraining
approach
(RoBER
T
a),
which
sometimes
do
not
capture
domain-specic
terms.
Thus,
unlik
e
GPT
-4,
our
approach
w
as
constrained
by
the
semantic
boundaries
of
the
selected
embedding
models.
Furthermore,
while
GPT
-4
can
handle
ambiguous
queries
such
as
“sho
w
me
something
i
nteresting”
through
cr
eati
v
e
interpretation
that
embedded
in
the
model,
our
frame
w
ork
required
more
specic
analytical
direction
to
perform
ef
fecti
v
e
feature
selection.
Despite
these
limitations,
we
hope
the
MI
frame
w
ork
remains
useful
be
yond
NL2VIS,
particularly
in
e
xplainable
AI
and
biomedical
analytics,
where
understanding
feature
rele
v
ance
and
redundanc
y
is
essential
for
transparent
decision-making.
In
another
f
acet,
our
method
performed
ef
ciently
on
datasets
with
small
numbers
of
features.
Ho
we
v
er
,
the
computation
cost
of
MI
estimation
w
ould
gro
ws
rapidly
at
lar
ger
scales.
As
sho
wed
in
the
result
section,
the
runtime
increas
ed
by
roughly
31.7%
compared
with
baseline
methods.
Furthermore,
the
current
frame
w
ork
is
dedicated
for
structured
data,
thus,
it
is
less
e
xibl
e
than
GPT
-4
when
dealing
with
di
v
erse
data
types
or
visualization
settings.
Ev
en
so,
the
MI-based
selection
mechanism
w
ould
be
promising
be
yond
NL2VIS
as
it
can
help
identify
k
e
y
sensor
signals
(continuous)
for
monitoring
or
f
ault
detection.
In
addition,
its
lightweight
computations
w
ould
t
well
with
embedded
or
edge-le
v
el
dashboards.
5.
CONCLUSION
In
summary
,
the
current
study
introduced
an
MI
frame
w
ork
for
NL2VIS
systems.
The
proposed
method
relied
on
MI
to
select
features
and
impro
v
ed
analytical
accurac
y
.
Specically
,
the
model
achie
v
ed
an
Int
J
Artif
Intell,
V
ol.
15,
No.
1,
February
2026:
129–139
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
137
F1-score
of
0.863
and
an
ICR
of
0.891.
It
also
maintained
visualiza
tion
quality
with
87.6%
c
hart-type
accurac
y
.
Moreo
v
er
,
the
approach
emphasized
mathematical
rigor
instead
of
con
v
ersational
uenc
y
.
W
e
dened
a
ne
w
metric
to
measure
information
co
v
erage
and
optimize
feature
di
v
ersity
.
In
addition,
the
current
w
ork
can
be
e
xtended
to
other
machine-learning
domains
that
required
transparent
feature
selection.
The
computational
cost
remained
a
practical
concern.
Finally
,
future
research
planned
to
balance
analytical
precision
with
aesthetic
quality
through
h
ybrid
models
that
combined
e
xplainable
AI
and
LLMs.
A
CKNO
WLEDGMENTS
This
research
w
as
supported
by
the
DH2025-TN07-05
project
conducted
at
the
Thai
Nguyen
Uni
v
ersity
of
Information
and
Communication
T
echnology
,
Thai
Nguyen,
V
ietnam,
with
additional
support
from
the
AI&SE
Lab
.
FUNDING
INFORMA
TION
Authors
state
no
funding
in
v
olv
ed.
A
UTHOR
CONTRIB
UTIONS
ST
A
TEMENT
This
journal
uses
the
Contrib
utor
Roles
T
axonomy
(CRediT)
to
recognize
indi
vidual
author
contrib
utions,
reduce
authorship
disputes,
and
f
acilitate
collaboration.
Name
of
A
uthor
C
M
So
V
a
F
o
I
R
D
O
E
V
i
Su
P
Fu
Hue
Luong-Thi-Minh
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
V
inh-The
Nguyen
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
V
an-V
iet
Nguyen
✓
✓
✓
✓
✓
✓
Kim-Son
Nguyen
✓
✓
✓
✓
✓
✓
✓
Huu-Khanh
Nguyen
✓
✓
✓
✓
✓
✓
✓
C
:
C
onceptualization
I
:
I
n
v
estig
ation
V
i
:
V
i
sualization
M
:
M
ethodology
R
:
R
esources
Su
:
Su
pervision
So
:
So
ftw
are
D
:
D
ata
Curation
P
:
P
roject
Administration
V
a
:
V
a
lidation
O
:
Writing
-
O
riginal
Draft
Fu
:
Fu
nding
Acquisition
F
o
:
F
o
rmal
Analysis
E
:
Writing
-
Re
vie
w
&
E
diting
CONFLICT
OF
INTEREST
ST
A
TEMENT
The
authors
ha
v
e
no
nancial,
personal,
or
professional
relationships
that
could
inappropriat
ely
inuence
the
research
presented
in
this
paper
.
The
authors
state
no
conict
of
interest.
INFORMED
CONSENT
W
e
ha
v
e
obtained
informed
consent
from
all
indi
viduals
included
in
this
study
.
ETHICAL
APPR
O
V
AL
This
research
does
not
require
ethical
appro
v
al
as
it
does
not
in
v
olv
e
human
participants,
a
nimal
subjects,
or
sensiti
v
e
data.
D
A
T
A
A
V
AILABILITY
No
ne
w
data
were
created
or
analyzed
in
this
study
.
Results
are
based
on
the
publicly
a
v
ai
lable
V
isEv
al
benchmark
dataset.
Implementation
code
is
a
v
ailable
upon
reasonable
request
and
will
be
released
publicly
after
24
months
from
publication,
subject
to
project
policies.
A
utomated
data
e
xplor
ation
with
mutual
information
in
natur
al
langua
g
e
to
...
(Minh
Hue-Luong
Thi)
Evaluation Warning : The document was created with Spire.PDF for Python.
138
❒
ISSN:
2252-8938
REFERENCES
[1]
V
.
T
.
Nguyen,
K.
Jung,
and
V
.
Gupta,
“Examining
data
visualization
pi
tf
alls
in
scientic
publications,
”
V
isual
Computing
for
Industry
,
Biomedicine
,
and
Art
,
v
ol.
4,
no.
1,
Dec.
2021,
doi:
10.1186/s42492-021-00092-y
.
[2]
S.
P
ark,
B.
Bek
emeier
,
A.
Flaxman,
and
M.
Schultz,
“Impact
of
data
visualization
on
decision-making
and
its
implications
for
public
health
practice:
a
systematic
literature
re
vie
w
,
”
Informatics
for
Health
and
Social
Car
e
,
v
ol.
47,
no.
2,
pp.
175–193,
Apr
.
2022,
doi:
10.1080/17538157.2021.1982949.
[3]
A.
W
u
et
al
.,
“
AI4VIS:
Surv
e
y
on
articial
intelligence
approaches
for
data
visual
ization,
”
IEEE
T
r
ansactions
on
V
isualization
and
Computer
Gr
aphics
,
v
ol.
28,
no.
12,
pp.
5049–5070,
Dec.
2022,
doi:
10.1109/TVCG.2021.3099002.
[4]
T
.-V
.
Nguyen
and
T
.-N.
Phung,
“Enhanced
literature
re
vie
w
visualization:
a
no
v
el
sorted
stream
graphs
with
inte
grated
w
ord
elements,
”
in
Advances
in
Information
and
Communication
T
ec
hnolo
gy
(ICT
A
2024)
,
Cham,
Switzerland:
Springer
,
2024,
pp.
159–168,
doi:
10.1007/978-3-031-80943-9
17.
[5]
E.
Hoque
and
M.
S.
Islam,
“Natural
language
gene
ration
for
visualizations:
state
of
the
art,
challenges
and
future
directions,
”
Computer
Gr
aphics
F
orum
,
v
ol.
44,
no.
1,
Feb
.
2025,
doi:
10.1111/cgf.15266.
[6]
K.
Zhou,
Z.
Liu,
R.
Chen,
L.
Li,
S.-H.
Choi,
and
X.
Hu,
“T
able2Graph:
transforming
tab
ular
data
to
unied
weighted
graph,
”
in
Pr
oceedings
of
the
Thirty-F
ir
st
International
J
oint
Confer
ence
on
Articial
Intellig
ence
,
2022,
pp.
2420–2426,
doi:
10.24963/ijcai.2022/336.
[7]
H.
L.
T
.
Minh,
V
.
N.
The,
and
T
.
Q.
Xuan,
“V
izAgent:
to
w
ards
an
intelligent
and
v
ersatile
data
visualizati
on
frame
w
ork
po
wered
by
lar
ge
language
models,
”
in
Advances
in
Information
and
Communication
T
ec
hnolo
gy
(ICT
A
2024)
,
Cham,
Switzerland:
Springer
,
2024,
pp.
89–97,
doi:
10.1007/978-3-031-80943-9
10.
[8]
N.
V
.
V
iet
et
al
.,
“Re
v
olutionizing
education:
an
e
xtensi
v
e
anal
ysis
of
lar
ge
language
models
inte
gration,
”
International
Resear
c
h
J
ournal
of
Science
,
T
ec
hnolo
gy
,
Education,
and
Mana
g
ement
,
v
ol.
4,
no.
4,
pp.
10-21,
2024,
doi:
10.5281/zenodo.00000000.
[9]
Y
.
Luo,
X.
Qin,
N.
T
ang,
and
G.
Li,
“DeepEye:
to
w
ards
automatic
data
visualization,
”
in
2018
IEEE
34th
International
C
onfer
ence
on
Data
Engineering
(ICDE)
,
P
aris:
IEEE,
Apr
.
2018,
pp.
101–112,
doi:
10.1109/ICDE.2018.00019.
[10]
V
.
Dibia
and
C.
Demiralp,
“Data2V
is:
Automatic
generation
of
data
visualizations
using
sequence-to-sequence
recurrent
neural
netw
orks,
”
IEEE
Computer
Gr
aphics
and
Applications
,
v
ol.
39,
no.
5,
pp.
33–46,
Sep.
2019,
doi:
10.1109/MCG.2019.2924636.
[11]
R.
T
abalba
et
al
.,
“
Articulate+:
an
al
w
ays-listening
natural
language
interf
ace
for
creating
data
visualizations,
”
in
Pr
oceedings
of
the
4th
Confer
ence
on
Con
ver
sational
User
Interfaces
,
2022,
pp.
1–6,
doi:
10.1145/3543829.3544534.
[12]
V
.
Dibia,
“LID
A:
a
tool
for
automatic
generation
of
grammar
-agnostic
visualizations
and
infographics
using
lar
ge
language
models,
”
arXiv:2303.02927
,
2023.
[13]
G.
K
usano,
K.
Akimoto,
and
K.
T
ak
eoka,
“Re
vis
iting
prompt
engineering:
a
comprehensi
v
e
e
v
aluation
for
LLM-based
personalized
recommendation,
”
in
Pr
oceedings
of
the
Nineteenth
A
CM
Confer
ence
on
Recommender
Systems
,
Prague
Czech
Republic:
A
CM,
Sep.
2025,
pp.
832–841,
doi:
10.1145/3705328.3748159.
[14]
B.
Chen,
Z.
Zhang,
N.
Langren
´
e,
and
S.
Zhu,
“Unleashing
the
potential
of
prompt
engineering
for
lar
ge
language
models,
”
P
atterns
,
v
ol.
6,
no.
6,
Jun.
2025,
doi:
10.1016/j.patter
.2025.101260.
[15]
L.
Shen
et
al
.,
“T
o
w
ards
natural
language
interf
aces
for
data
visualization:
a
surv
e
y
,
”
IEEE
T
r
ansactions
on
V
isualization
and
Computer
Gr
aphics
,
v
ol.
29,
no.
6,
pp.
3121–3144,
Jun.
2023,
doi:
10.1109/TVCG.2022.3148007.
[16]
W
.
Y
ang,
M.
Liu,
Z.
W
ang,
and
S.
Liu,
“F
oundation
models
meet
visualizations:
challenges
and
opportunities,
”
Computational
V
isual
Media
,
v
ol.
10,
no.
3,
pp.
399–424,
Jun.
2024,
doi:
10.1007/s41095-023-0393-x.
[17]
S.
Liu
and
M.
Motani,
“Impro
ving
mutual
information
based
feature
selection
by
boosting
unique
rele
v
ance,
”
J
ournal
of
Articial
Intellig
ence
Resear
c
h
,
v
ol.
82,
pp.
1267–1292,
Mar
.
2025,
doi:
10.1613/jair
.1.17219.
[18]
J.
T
ang,
Y
.
Luo,
M.
Ouzzani,
G.
Li,
and
H.
Chen,
“Se
vi:
speech-to-visualization
through
neural
machine
translation,
”
in
Pr
oceedings
of
the
2022
International
Confer
ence
on
Mana
g
ement
of
Data
,
2022,
pp.
2353–2356,
doi:
10.1145/3514221.3520150.
[19]
P
.
Maddig
an
and
T
.
Susnjak,
“Chat2VIS:
generating
data
visualizations
via
natural
language
using
ChatGPT
,
Code
x
and
GPT
-3
lar
ge
language
models,
”
IEEE
Access
,
v
ol.
11,
pp.
45181–45193,
2023,
doi:
10.1109/A
CCESS.2023.3274199.
[20]
P
.
Sarai
v
a,
“On
Shannon
entrop
y
and
its
applications,
”
K
uwait
J
ournal
of
Science
,
v
ol.
50,
no.
3,
pp.
194–199,
Jul.
2023,
doi:
10.1016/j.kjs.2023.05.004.
[21]
N.
M.
Gardazi,
A.
Daud,
M.
K.
Malik,
A.
Bukhari,
T
.
Alsah,
and
B.
Alshemaimri,
“BER
T
applications
in
natural
language
processing:
a
re
vie
w
,
”
Articial
Intellig
ence
Re
vie
w
,
v
ol.
58,
no.
6,
Mar
.
2025,
doi:
10.1007/s10462-025-11162-5.
[22]
H.
Man,
N.
T
.
Ngo,
V
.
D.
Lai,
R.
A.
Rossi,
F
.
Dernoncourt,
and
T
.
H.
Nguyen,
“LUSIFER:
language
uni
v
ersal
space
inte
gration
for
enhanced
representation
in
multilingual
te
xt
embedding
models,
”
in
Pr
oceedings
of
the
48th
International
A
CM
SIGIR
Confer
ence
on
Resear
c
h
and
De
velopment
in
Information
Retrie
val
,
P
adua,
Italy:
A
CM,
Jul.
2025,
pp.
1360–1370,
doi:
10.1145/3726302.3730029.
[23]
Y
.
Ning
et
al
.,
“
A
mutual
information
theory-based
approach
for
assessing
uncertainties
in
deterministic
multi-cate
gory
precipitation
forecasts,
”
W
ater
Resour
ces
Resear
c
h
,
v
ol.
58,
no.
11,
No
v
.
2022,
doi:
10.1029/2022WR032631.
[24]
A.
Moreo,
P
.
Gonz
´
alez,
and
J.
J.
D.
Coz,
“K
ernel
density
estimation
for
multiclass
quantication,
”
Mac
hine
Learning
,
v
ol.
114,
no.
4,
Apr
.
2025,
doi:
10.1007/s10994-024-06726-5.
[25]
M.
I.
Belghazi
et
al
.,
“Mutual
information
neural
estimation,
”
in
Pr
oceedings
of
the
35th
International
Confer
ence
on
Mac
hine
Learning
,
PMLR,
Jul.
2018,
pp.
531–540.
[26]
N.
Chen,
Y
.
Zhang,
J.
Xu,
K.
Ren,
and
Y
.
Y
ang,
“V
isEv
al:
a
benchmark
for
data
visualization
in
the
era
of
lar
ge
language
models,
”
IEEE
T
r
ansactions
on
V
isualization
and
Computer
Gr
aphics
,
v
ol.
31,
no.
1,
pp.
1301–1311,
Jan.
2025,
doi:
10.1109/TVCG.2024.3456320.
[27]
S.
Somv
anshi,
S.
A.
Ja
v
ed,
M.
M.
Islam,
D.
P
andit,
and
S.
Das,
“
A
surv
e
y
on
K
olmogoro
v-Arnold
netw
ork,
”
A
CM
Computing
Surve
ys
,
v
ol.
58,
no.
2,
pp.
1–35,
Jan.
2026,
doi:
10.1145/3743128.
[28]
H.
Peng,
F
.
Long,
and
C.
Ding,
“Feature
selection
based
on
mutual
information
criteria
of
max-dependenc
y
,
max-rele
v
ance,
and
min-redundanc
y
,
”
IEEE
T
r
ansactions
on
P
attern
Analysis
and
Mac
hine
Intellig
ence
,
v
ol.
27,
no.
8,
pp.
1226–1238,
Aug.
2005,
doi:
10.1109/TP
AMI.2005.159.
[29]
K.
Hu,
M.
A.
Bakk
er
,
S.
Li,
T
.
Kraska,
and
C.
Hidalgo,
“V
izml:
a
machine
learning
approach
to
visualization
recommendation,
”
in
Pr
oceedings
of
the
2019
CHI
Confer
ence
on
Human
F
actor
s
in
Computing
Systems
,
2019,
pp.
1–12,
doi:
10.1145/3290605.3300358.
Int
J
Artif
Intell,
V
ol.
15,
No.
1,
February
2026:
129–139
Evaluation Warning : The document was created with Spire.PDF for Python.