I nte rna t io na l J o urna l o f   E lect rica l a nd   Co m pu t er   E ng ineering   ( I J E CE )   Vo l.   15 ,   No .   5 Octo b er   20 25 ,   p p .   4 5 4 2 ~ 4 5 5 4   I SS N:  2088 - 8 7 0 8 ,   DOI : 1 0 . 1 1 5 9 1 /ijece. v 15 i 5 . pp 4 5 4 2 - 4 5 5 4           4542       J o ur na l ho m ep a g e h ttp : //ij ec e. ia esco r e. co m   Disco unt  factor - b a sed   da ta - driv en  reinforceme nt  l ea rning   ca sca de contro l st ructure  f o unma nned aeria l vehicl e sy stems       Ng o T rung   Da ng ,   Q uy nh   Ng a   Duo ng   F a c u l t y   o f   E l e c t r i c a l   E n g i n e e r i n g ,   T h a i   N g u y e n   U n i v e r si t y   o f   T e c h n o l o g y ,   Th a i   N g u y e n ,   V i e t n am       Art icle  I nfo     AB S T RAC T   A r ticle  his to r y:   R ec eiv ed   Oct  2 5 ,   2 0 2 4   R ev is ed   J u n   1 8 ,   2 0 2 5   Acc ep ted   J u l 1 2 ,   2 0 2 5       Th is  a rti c le  in v e stig a tes   th e   d isc o u n fa c to r - b a se d   d a ta - d ri v e n   re i n f o rc e m e n lea rn in g   c o n tro l   (DD RLC)  a l g o r it h m   f o c o m p lete ly   u n c e rtain   u n m a n n e d   a e rial  v e h icle   (UA V)  q u a d ro t o rs.   Th e   p ro p o se d   c a sc a d e   c o n tro st ru c tu re   o f   UA is  c a teg o rize d   wi th   two   c o n tr o lo o p o a tt it u d e   a n d   p o s it io n   su b - sy ste m s,  wh ich   a re   e sta b l ish e d   t h e   p r o p o se d   d isc o u n t   fa c to r - b a se d   DD RLC   a lg o rit h m .   Th r o u g h   th e   a n a l y sis  o th e   Be ll m a n   f u n c ti o n ' ti m e   d e riv a ti v e   fro m   two   p e rsp e c ti v e s,  a   re v ise d   Ha m il to n - Ja c o b i - Be ll m a n   (HJ B)   e q u a ti o n   in c lu d in g   a   d isc o u n fa c to is  d e v e lo p e d .   Th e n ,   i n   t h e   v iew   o o ff - p o li c y   c o n sid e ra ti o n ,   a n   e q u a ti o n   is  fo rm u late d   to   sim u lt a n e o u sly   so lv e   t h e   a p p ro x ima te  Be ll m a n   fu n c ti o n   a n d   a p p ro x ima te  o p ti m a c o n tro law   in   th e   p ro p o se d   DD RLC  a l g o rit h m   with   g u a ra n tee d   c o n v e rg e n c e .   Ac c o rd in g   to   t h e   m o d ifi e d   sta te v a riab les   v e c to r,   t h e   d e v e lo p m e n o t h e   d isc o u n fa c to r - b a se d   DD RLC  a lg o rit h m   in   e a c h   c o n tro lo o p   is  i n d irec tl y   imp le m e n ted   b y   tran sfo rm in g   th e   ti m e - v a r y in g   tr a c k in g   e rro r   m o d e i n to   th e   ti m e   in v a rian t   sy ste m .   F in a ll y ,   a   sim u lati o n   stu d y   o n   th e   p ro p o se d   d isc o u n t   fa c to r - b a se d   DD RLC  a lg o rit h m   is  p ro v i d e d   t o   v a li d a te  it e ffe c ti v e n e ss .   To   v a li d a te  th e   trac k in g   p e rfo rm a n c e   o th e   q u a d r o to r ,   fo u p e rfo rm a n c e   i n d ice a re   c o n sid e re d ,   in c lu d i n g    = 3 . 0527  = 0 . 1 1 7 5 = 1 . 8408 a n d   = 0 . 0 1 4 4 ,   w h e re   t h e   su b sc rip t     d e n o te p o siti o n   trac k in g   e rro r   a n d     d e n o tes   a tt it u d e   trac k in g   e rro r.   K ey w o r d s :   Ap p r o x im ate/a d a p tiv d y n am i p r o g r a m m in g     Data   r ein f o r ce m e n t le ar n in g     Mo d el - f r ee   b ased   co n t r o l   Qu ad r o to r   Un m an n ed   ae r ial  v eh icles   T h is i a n   o p e n   a c c e ss   a rticle   u n d e r th e   CC B Y - SA   li c e n se .     C o r r e s p o nd ing   A uth o r :   Ng o T r u n g   Dan g   Facu lty   o f   E lectr ical  E n g in ee r in g ,   T h ai  Ng u y en   Un iv er s ity   o f   T ec h n o lo g y     3 - 2   Stre et,   T ich   L u o n g   C o m m u n e,   T h ai  N g u y e n   C ity ,   Vietn am   E m ail: tr u n g csk td @ tn u t.e d u . v n       1.   I NT RO D UCT I O N   I n   r e ce n t d ec ad es,  u n m a n n ed   ae r ial  v eh icles  ( UAVs)   h av e   b ee n   in cr ea s in g ly   u s ed   to   p e r f o r m   v a r io u s   task s ,   s u ch   as  s u r v eillan ce ,   m ilit ar y ,   air   tr af f ic  co n tr o l,  ag r icu ltu r m an a g em en t   [ 1 ] [ 3 ] .   T o   p er f o r m   tas k   ef f ec tiv ely ,   it  is   o f ten   n ec ess ar y   to   d e v elo p   th e   tr ajec to r y   t r ac k in g   p r o b lem   a n d   o p tim al  c o n tr o p e r f o r m an ce .   I n   p r ac tical  a p p licatio n ,   th ese  two   co n tr o r eq u ir em en ts   ar e   n ec ess ar y   to   d ev elo p   to   th e   o b s tacle   o f   ex te r n al   d is tu r b an ce   an d   d y n am ic  u n ce r tain ties .   Du to   th co m p lex ity   o f   UAV  m o d el  with   h ig h   n u m b e r   o f   v ar iab les,  an   a p p r o ac h   o f   m o d el  s ep ar atio n   is   co n s id er e d   with   r o tatio n al  an d   tr an s la tio n al  s u b - s y s tem s   [ 1 ] [ 7 ] .   I n   s tu d y   [ 6 ] ,   th co n t r o d esig n s   f o r   p o s itio n   s u b - s y s tem   an d   attitu d s u b - s y s tem   wer im p lem en ted   s im ilar ly   b y   s lid in g   m o d co n tr o tech n iq u ( SMC )   an d   th ad d itio n   o f   s tate  o b s er v e r ,   n eu r al  n etwo r k s   ( NNs)  wer co n s id er ed   to   h an d le  t h e   o b s tacle   o f   ex ter n al  d is tu r b a n ce   an d   d y n am ic  u n ce r tain tie s .   So m ex ten s io n s   wer d ev elo p ed   f o r   m u lti - r o to r   UAV  m o d el  with   u n k n o wn   b o u n d ed   tim e - v ar y in g   d is tu r b an ce   b y   au g m e n ted   d is tu r b an ce   o b s er v er   ( DO)   b ased   co n tr o ller ,   wh ich   was  i m p lem en ted   u n d er   th e   ap p o i n ted - tim p r escr ib ed   p er f o r m an ce   ( AT PP )   tech n iq u [ 5 ] .   I n   [ 2 ] ,   a n   ad ap tiv tr ajec to r y   tr ac k in g   c o n tr o was  p r o p o s ed   f o r   UAV   ex p er im en s y s tem s   af ter   esti m atin g   th n ec ess ar y   v ar ia b les  b ased   o n   im ag e,   i n er tia  m ea s u r em en t.  Mo r eo v er ,   Evaluation Warning : The document was created with Spire.PDF for Python.
I n t J E lec  &   C o m p   E n g     I SS N:   2088 - 8 7 0 8         Dis co u n t fa cto r - b a s ed   d a ta - d r iven   r ein fo r ce men t le a r n in g   ca s ca d co n tr o l     ( N g o Tr u n g   Da n g )   4543   f o r   g en er al   r o b o tics   co n tr o l   d esig n   s tu d ied   in   [ 8 ] ,   o u tp u t   f e ed b ac k   law  with   s tate  o b s er v e r   was  p r esen te d   f o r   s u r f ac v ess els  ( SVs )   ac co r d in g   to   ev en t - tr ig g er e d   r u le.   I n   o r d er   to   f u r th e r   h an d le  B ac k lash - L ik h y s ter esis   an d   ex ter n al  d is tu r b an ce ,   an   a d ap tiv f u zz y   d y n am ic  m em o r y - ev e n t - tr ig g er e d   m ec h a n is m   was  s tu d ied   f o r   a   s ix - r o to r   UAV  b y   B ac k s tep p i n g   r ec u r s iv f r am ewo r k   with   t h f ir s t - o r d er   f ilter in g   tech n iq u [ 2 ] .   B u t   as  f ar   as  we  k n o wn ,   it c a n   b f o u n d   th a t th er is   litt le  r esear ch   atten tio n   o n   th o p tim al  co n tr o l U AV  s y s tem s .   W ith   th co m p lex ity   o f   UAV  m o d el  an d   th d iv er s i fi ca tio n   o f   p r ac tical   task s ,   it  is   d if f icu lt  to   o b tain   th co n tr o o b jectiv es  o f   co m p lex   p u r p o s es  o n l y   r ely in g   o n   s in g le  UAV  ag en t.  Hen ce ,   UAV  r esear ch es   p u f o r war d   th c o n ce p o f   m u lti - ag en s y s tem s   ( MA Ss ) ,   wh ich   in v o lv es  two   r esear ch   h o ts p o ts   o f   co n s en s u s   an d   f o r m atio n   c o n tr o p r o b lem s   [ 1 ] [ 3 ] [ 7 ] [ 9 ] [ 1 2 ] .   I n   [ 9 ] ,   c o n s en s u s   co n tr o law  was  d ev elo p ed   f o r   m u ltip le  UAV  s y s tem s   with   tim d el ay   an d   ca s ca d m o d el.   Ho wev er ,   th Kr o n ec k er   p r o d u ct  an d   L in ea r   Ma tr i x   I n eq u alities   ( L MI s )   we r im p lem en ted   i n   [ 9 ]   d u t o   th s im p lific atio n   o f   UAV  m o d el.   T h r esear c h   co n d u cte d   b y   [ 1 3 ]   is   co n ce r n ed   with   th co n s en s u s   co n tr o ller   with   th s ig n   f u n ctio n .   Hen ce ,   th s tab ilit y   co n s id er atio n   r eq u ir es  th Fil lip o v   th eo r y   em p lo y m e n t.  Ad d itio n ally ,   th b ea r in g   p e r s is ten ce   o f   ex citatio n   ( PE)   b ased   lead er - f o llo wer   f o r m atio n   co n tr o s tr ateg y   was  p r o p o s ed   f o r   m u ltip le  d o u b le - in teg r ato r s   in   th r ee   d im en s io n al  ( 3 D)   s p ac e   u s in g   th p r o jectio n   o f   v ec to r   o n   t h p lan e   o r t h o g o n al  to   2 - s p h e r [ 1 4 ] .   W h en   ea ch   ag en was  co n s id er ed   m o r co m p licated   with   E u ler - L ag r a n g s y s tem s ,   th s tate  r ep r esen tatio n   ca n   b u s ed   t o   o b tain   th e   ev en t - t r ig g er e d   b ased   co n s en s u s   co n t r o ller   w ith   Kr o n ec k er   p r o d u ct  [ 1 5 ] .   T h f au lt - to ler an t   co n s en s u s   co n tr o p r o b lem   f o r   n o n s tr ict - f ee d b ac k   n o n lin e ar   MA Ss   with   in ter m itten a ctu ato r   f a u lts   was  in v esti g ated   s tate  o b s er v er   a n d   b ac k s tep p in g   tech n iq u e   [ 1 6 ] .   M o r eo v er ,   th f o r m atio n   co n tr o l   o f   m u ltip l e   UAVs  wa s   also   co n s id er ed   b y   m o d el  p r ed ictiv co n tr o l   ( MPC )   with   th af f in tr ac k in g   er r o r   m o d el     [ 1 7 ] [ 1 9 ] .   Desp ite  th is ,   s tu d ies   [ 1 7 ] [ 1 9 ]   d id   n o e x am in e   th s tab ilit y   p r o p er ties   o f   th e   clo s ed - lo o p   s y s tem   wh en   o p e r atin g   u n d er   MPC   f r am ewo r k .   Fo r   th f o r m atio n   tr ac k in g   c o n tr o p r o b lem ,   ad d r ess in g   th tim e - v ar y in g   f o r m atio n   ( T VF)   is   also   ex tr em ely   cr u cial  f o r   m ee tin g   ap p licatio n   r eq u ir em en [ 1 ] [ 7 ] [ 1 0 ] [ 1 1 ] Acc o r d in g   to   lin ea r   UAV  m o d el,   t h T VF  tr ac k in g   c o n tr o was  i n v esti g ated   b y   Kr o n ec k er   p r o d u ct   co n s id er atio n   an d   L MI s   tec h n iq u [ 7 ] .   Alth o u g h   th e   co s f u n ctio n   was  m en tio n e d   i n   [ 7 ]   b u t   th o p tim al   co n tr o law  h as  n o b ee n   s tu d ied   in   th is   wo r k .   On   th o th er   h an d ,   ex ten d e d   o b s er v er   ( E SO)   b ased   b ac k s tep p in g   co n tr o ller   was  p r o p o s ed   in   th s ec o n d - o r d er   attitu d s u b - s y s tem   [ 1 ] .   Fu r th er m o r e,   th e   esti m atio n   o f   y aw   an g le  i n   v ir tu al  lead e r   was  ca r r ied   o u with   t h co n n ec tio n   to   th tim e - v ar y in g   co m m u n icatio n   to p o lo g y   as  well  as  th d is tr ib u ted   f o r m atio n   tr ac k in g   co n t r o was  ad d r ess ed   in   th p o s itio n   s u b - s y s tem   [ 1 ] .   B ased   o n   th lin ea r   m o d el  o f   f ix ed - s win g   U AVs,  th T VF  tr ac k in g   co n tr o was  d is cu s s ed   b y   em p lo y in g   th s o lu tio n   o f   R icca ti  eq u atio n   [ 1 0 ] .   No tab ly ,   [ 1 1 ]   tack le d   th T VF  tr ac k in g   co n tr o f o r   m u ltip le   lin ea r   s y s tem s   b y   ex ten d in g   E v en t - T r ig g er ed   m ec h an is m .   Alth o u g h   th er h as  b ee n   s o m r esear ch   o n   th d is tr ib u ted   co n tr o s ch em es  f o r   MA Ss   esp ec ially   th co n s en s u s   an d   f o r m atio n   s y s tem s ,   m o s o f   th r ec en t   r ef er en ce s   h av f o cu s ed   o n   s im p le  UAV  m o d el  an d   r ar ely   co n s id er ed   th ca s ca d UAV  s tr u ctu r as  well  as   o p tim izatio n - b ased   c o n tr o l   f o r m u latio n .   I m p l em en ti n g   t h o p ti m a l c o n t r o l la w   i n   r ea l - wo r l d   s y s te m s   r eq u i r es   t h e   u s e   o f   ite r a ti v e   al g o r it h m s   t o   co m p u te  s o l u t io n s   t o   th Ha m i lto n - J a co b i - B el lm an   ( HJB )   e q u at io n s   f o r   n o n li n e ar   s y s te m s   o r   R ic ca ti e q u ati o n s   f o r   li n ea r   s y s t em s ,   s i n c a n al y ti ca l   s o l u ti o n s   a r e   t y p ic all y   n o t   f ea s i b l e.   T o   ad v a n ce   t h e   im p le m e n t ati o n   o f   o p ti m al   co n t r o l   i n   r o b o t ic   s y s tem s ,   it   is   ess e n t ial   to   in c o r p o r ate   r e in f o r ce m e n t   lea r n i n g   c o n t r o l   ( R L C )   in   co n j u n ct io n   w it h   m et h o d s   f r o m   a p p r o x i m at e   a n d   a d a p t iv e   d y n am ic   p r o g r a m m i n g   ( ADP ) ,   as   h i g h li g h te d   i n   s tu d ies   [ 1 2 ] [ 2 0 ] [ 2 7 ] .   I n   [ 1 2 ] [ 2 0 ] [ 2 2 ] ,   th e   ac to r / cr iti s tr u ct u r e   w as  r e ali ze d   v ia   n e u r al   n etw o r k   ( N N)   ap p r o x i m a ti o n   m et h o d s ,   wit h   le ar n i n g   al g o r i th m s   f o r   we i g h ad a p tat i o n   p r o p o s ed   a lo n g s i d e   o p t im i za ti o n   s tr a te g ies ,   e n a b l in g   t h e   cl o s e d - l o o p   s y s te m   t o   s at is f y   b o th   t r ac k i n g   p e r f o r m a n c e   a n d   o p ti m al it y   r e q u i r e m e n ts .   Ho w ev er ,   i t   is   n ec ess a r y   t o   el im i n a te   e x t e r n al   d is t u r b a n ce   a n d   d y n a m ic   u n c er tai n t ies   i n   t h e   p r ac tic al   m o d el,   wh i ch   ar h an d l b y   t r a d it io n a l r o b u s t   c o n t r o l d esi g n   [ 1 2 ] [ 2 0 ] [ 2 2 ] .   d i f f e r e n t   a p p r o ac h   o f   h a n d li n g   d ir ec t ly   th e   e x t e r n al   d is t u r b a n ce   a n d   d y n a m ic   u n ce r ta in ties   i n   o p t im al  co n t r o l   la ca n   b k n o wn   i n   ze r o   an d   n o n - ze r o   s u m   g am m e th o d s   [ 2 8 ] [ 3 0 ] .   On   t h e   o th er   h a n d ,   it  is   d i f f er en t   f r o m   t h e   s i m u lta n e o u s   lea r n i n g   i n   ac t o r / cr i tic   f r am ew o r k   i n   [ 1 2 ] [ 2 0 ] [ 2 2 ] ,   a u t h o r s   i n   [ 3 1 ] [ 3 2 ]   d e v el o p e d   t h e   s eq u en tia l   le ar n i n g   v alu e   it er ati o n   ( VI )   alg o r it h m   t o   o b tai n   t h e   B ell m an   f u n cti o n   a n d   o p ti m al   co n t r o l   l aw .   So m e   r esea r ch er s   f o c u s e d   o n   u s in g   d a ta - d r iv e n   R L   t o   o b t ai n   th o p ti m al   c o n t r o s t r ate g i es  f o r   u n ce r tai n   s y s te m s   [ 6 ] [ 2 2 ] [ 2 8 ] [ 3 0 ] [ 3 3 ] [ 3 7 ] Acc o r d in g   t o   th e   d ata   co lle cti o n   i n   ti m e   i n t e r v al ,   th a p p r o x im a te   o p t im al   f u n c ti o n   c a n   b e   co m p u te d   f r o m   t h e   ap p r o x i m a te   o p t im al   c o n tr o l   i n p u t   wi th o u t   t h e   k n o w le d g e   o f   m o d e l.   H o we v er ,   to   h an d l t h co m p le te   u n ce r t ai n t y   i n   t h e   i n v e r s e   d i r e cti o n ,   t h e   a d d iti o n   o f   o f f - p o lic y   te c h n iq u e   o r   Q - l ea r n in g   is   n ec ess ar y   t o   co n s i d e r   [ 2 ] [ 3 6 ] [ 3 7 ] .   A   d a ta - d r i v e n   r ei n f o r ce m e n t   le ar n i n g   c o n tr o l   s t r at eg y   was   r ec e n tl y   i n tr o d u ce d   f o r   q u a d r o t o r s ,   d e m o n s tr ati n g   t h e   c a p a b il it y   t o   a ch ie v e   o p ti m al   c o n tr o l   w h ile   e n s u r in g   t r aje ct o r y   t r a c k i n g ,   wh ic h   is   c lo s el y   r el ate d   t o   t h e   f o c u s   o f   t h is   a r ti cle   [ 3 7 ] .   H o we v er ,   t h e   d ata - d r i v e n   R L   a p p r o a ch   i n   [ 3 7 ]   w as   a p p lie d   s o le ly   t o   t h e   attit u d e   s u b s y s t em   o f   UAV ,   a n d   t h e   ass o c iat ed   c o s f u n cti o n   d id   n o t   i n c o r p o r at d is co u n f ac t o r .   On   ac c o u n t   o f   t h e   a b o v e   r esu lts ,   we   wil l   f u r t h e r   e x p l o r er   th e   c a s ca d e   UA V   c o n t r o l   s tr u ct u r e ,   wh i ch   in v o l v es   tw o   d at a - d r i v e n   R L   wit h   a   d is c o u n f ac to r - b ase d   p er f o r m an ce   i n d ex ,   a n d   t h is   is   a n o th e r   i n t er est   o f   t h is   s t u d y .     Evaluation Warning : The document was created with Spire.PDF for Python.
                      I SS N :   2 0 8 8 - 8 7 0 8   I n t J E lec  &   C o m p   E n g ,   Vo l.   15 ,   No .   5 Octo b e r   20 25 :   4 5 4 2 - 4 5 5 4   4544   T h is   s tu d y   i n v esti g ates  ca s ca d co n tr o ar c h itectu r f o r   a   f u lly   u n ce r tain   q u ad r o to r   UAV  b y   em p lo y in g   two   d ata - d r iv en   R L   alg o r ith m s   b ased   o n   p er f o r m an ce   in d e x   with   d is co u n f ac to r .   T h r o u g h   co n s tr u ctin g   a   d ata   s et  tailo r e d   to   t h is   g en er al   class   o f   a f f i n co n ti n u o u s - tim e   s y s tem s   an d   in teg r atin g   R L   s tr ateg y   u s in g   an   o f f - p o licy   al g o r ith m ,   a   co n tr o l f r am ewo r k   is   f o r m u lated   f o r   UAVs w ith   u n k n o wn   d y n am ics.   T h s u m m ar y   co n tr i b u tio n s   o f   th is   s tu d y   ar g iv e n   in   th e   f o ll o win g :   a.   B ased   o n   th o p tim al  co n tr o s ch em with   d is co u n f ac to r - b ased   p er f o r m an ce   in d ex ,   we  f u r th er   in tr o d u ce   R L   alg o r ith m   f o r   an   af f in e   co n tin u o u s - tim s y s tem   to   g u ar an tee   th f in ite  v alu o f   t h in teg r al  co s t f u n ctio n   with   i n f in ity   ter m in al.   b.   W p r o p o s e   n o v el  d ata - d r iv en   R L   b ased   ca s ca d c o n tr o s tr u ctu r e   in   b o th   two   s u b - s y s tem s   f o r   co m p letely   u n ce r tain   UAVs  b y   o f f - p o licy   m eth o d .   C o m p ar ed   with   th e   cu r r en t   r e s u lts   [ 3 7 ] ,   o n l y   co n s id er in g   t h R L   alg o r ith m   f o r   th attitu d e   s u b - s y s tem   with o u d is co u n f ac to r ,   d ata - d r iv en   R L   b ased   ca s ca d co n tr o s tr u ctu r is   f ir s p r o p o s ed   f o r   co m p letely   u n ce r tain   UAVs  with   d is co u n f ac to r - b ase d   p er f o r m an ce   in d e x .   Fin ally ,   s im u latio n   r esu lts   ar p r esen ted   to   v alid ate  th ef f ec tiv en ess   o f   th p r o p o s ed   m o d el - f r ee ,   d ata - d r iv en   R L   al g o r ith m .         2.   CO NT RO L L E M E T H O D O L O DY  F O Q UADR O T O   As s h o wn   in   Fig u r 1 ,   th E ar t h - f ix ed   f r am an d   th b o d y - f i x ed   f r am ar estab lis h ed   to   d escr ib th d y n am ic  m o d el  o f   th q u a d r o t o r .   T h m o v em en ts   o f   th is   Qu ad r o to r   as  s h o wn   in   Fig u r 1   ca n   b estab lis h ed   b y   c h an g es   o n   f o u r   lift  f o r ce s ,   wh ich   ar e   g en er ated   b y   ad ju s tin g   th e   an g le  v el o cities  o f   f o u r   r o to r s .   I t   ca n   b e   s ee n   th at   a   v e r ti ca l   m o v em en ca n   b o b ta in e d   b y   t h e   v a r ia ti o n   o f   t h e   s u m   o f   f o u r   li f t   f o r c es   o n   t h e   f o u r   r o t o r s .   Du e   t o   t h e   d i f f e r e n c b e twe e n   t h e   c o u n te r - t o r q u es   a c h ie v e d   b y   t h g r o u p   o f   r o to r s   ( R o t o r   1   a n d   R o t o r   3 )   an d   th e   g r o u p   o f   r o t o r s   ( R o t o r   2   a n d   R o t o r   4 ) ,   th e   y aw   m o v e m e n is   estab lis h ed .   A d d itio n ally ,   th e   p itch   an d   r o ll  m o v em en ts   ca n   b e   g en e r ated   b y   ch a n g in g   th e   lift  f o r ce s   o f   ea ch   p air ,   wh ich   r esu lt  in   th e   l o n g itu d i n al  m o tio n   an d   th later al  m o tio n ,   as  s h o wn   in   Fig u r 1 .   T h p o s itio n   o f   th UAV  q u ad r o to r   a n d   th q u ad r o t o r   attitu d ar g iv en   as  = [ , , ] 3 an d   = [ , , ] 3 ,   r esp ec tiv ely .   I is   wo r th   n o tin g   th at  E u le r   an g les  R o ll - Pit ch - Yaw   ar s atis f ied   th b o u n d   co n d itio n   as  / 2 < < / 2 / 2 < < / 2   an d   < < Mo r eo v er ,   th UAV  q u a d r o to r   p ar am eter s   ar e x p r ess ed   in   T ab le  1 .           Fig u r 1 .   Qu a d r o to r   m o d el  in   No r th - E ast - Do wn   ( NE D)   c o o r d in ate       T ab le  1 .   UAV  p ar am eter s   an d   v ar iab les    U A V   p a r a m e t e r s   v a r i a b l e s     W e i g h t   o f   t h e   q u a d r o t o r     A c c e l e r a t i o n   o f   t h e   g r a v i t y   1 , 2 , 3 , 4   A n g l e   v e l o c i t y   o f   e a c h   r o t o r     Th e   a r l e n g t h   =  { , , } 3 × 3   Th e   i n e r t i a   ma t r i x   i s s y mm e t r i c   a n d   p o si t i v e   d e f i n i t e   , ,   P o si t i v e   p a r a me t e r s       T h r o tatio n   m atr ix    ( 3 )   r ep r esen tin g   th tr a n s f o r m atio n   f r o m   th E ar th - f ix ed   f r am t o   th e   body - f ix ed   co o r d in ate  s y s tem   is   g iv en   as  ( 1 ) :   Evaluation Warning : The document was created with Spire.PDF for Python.
I n t J E lec  &   C o m p   E n g     I SS N:   2088 - 8 7 0 8         Dis co u n t fa cto r - b a s ed   d a ta - d r iven   r ein fo r ce men t le a r n in g   ca s ca d co n tr o l     ( N g o Tr u n g   Da n g )   4545   =   [ + + ]   ( 1 )     wh er ( ) =  ( ) , ( ) =  ( ) .   I n   th v iew  o f   [ 1 ] ,   th e   co m p let q u ad r o to r   d y n a m ic  m o d el  ca n   b r e p r esen ted   as   ( 2 ) :     ̈ =                 ̈ = ( , ̇ ) ̇   +              ( 2 )     wh er th p a r am eter s   ar g iv en   in   T a b le  1   a n d   t h C o r io lis   m atr ix   ( , ̇ ) 3 × 3   is   d escr ib ed   in   [ 2 ] .   Ad d itio n ally ,   th f o r ce   3 × 1   is   r elativ to   th b o d y   f ix ed   f r am o f   th q u ad r o to r   ca n   b e   o b tain e d   as   ( 3 ) :     = [ 0 0 ] [ 0 0  ]     ( 3 )     wh er th liftin g   f o r ce     an d   t h to r q u = [ ] 3 ar g iv e n   as   ( 4 ) ,   ( 5 ) :     = ( 1 2 + 2 2 + 3 2 + 4 2 )      ( 4 )     = ( 2 2 4 2 ) , = ( 1 2 3 2 ) , = ( 1 2 2 2 + 3 2 4 2 )   ( 5 )     I n   wh er e,   t h co n tr o l sig n als o f   th q u a d r o t o r   ( 2 )   ar d ef in ed   as  ( 6 ) :     = 1 2 + 2 2 + 3 2 + 4 2 ,   = 2 2 4 2 , = 1 2 3 2   = 1 2 2 2 + 3 2 4 2 .   ( 6 )     T h co n tr o o b jectiv o f   th is   p ap er   is   to   d ev elo p   d ata - d r i v en   R L   alg o r ith m   b ased   o n   t h o p tim al   co n tr o s ch em to   ac h iev a n   o p tim ized   tr ac k in g   co n tr o l   law  f o r   q u ad r o to r ,   en ab li n g   th q u ad r o to r   to   ef f ec tiv ely   tr ac k   th e   d esire d   tr ajec to r y   with   h ig h   ac cu r ac y .   T h o p tim al  c o n tr o l   s ig n al  en s u r es  tr ajec to r y   tr ac k in g   wh ile  s im u ltan eo u s ly   ac h iev in g   ap p r o x im ate  o p t im ality   b y   m in im izin g   th o b jectiv f u n ctio n .   Ad d itio n ally ,   th d ata - d r iv e n   R L - b ased   o p tim al  co n tr o law  is   d ev elo p ed   f o r   n o t o n ly   th p o s itio n   s u b - s y s tem   b u t a ls o   th attitu d s u b - s y s tem   with o u t th k n o wled g o f   t h UAV  m o d el.       1 .   Un lik th co n v e n tio n al  tr ajec to r y   tr ac k in g   co n tr o p u r p o s in   UAV  co n tr o s y s tem s   [ 1 ] [ 3 ] [ 6 ] [ 7 ] [ 1 1 ] ,   th c o n tr o o b jecti v in   th is   p ap er   co n s id er s   b o th   th tr ajec to r y   tr ac k in g   p er f o r m a n ce   an d   th e   o p tim al  co n tr o p r o b lem .   I n   a d d itio n ,   b o th   s u b s y s tem s   as  s h o wn   in   Fig u r e   2   ac h iev e   u n if ied   f r am ewo r k   o f   o p tim al  co n tr o an d   s tab ilit y ,   wh ich   is   ty p ically   d if f icu lt  to   attain   d u to   th tim e - v ar y in g   d y n am ics  o f   th clo s ed - lo o p   s y s tem s .           Fig u r 2 .   T h q u ad r o t o r   co n tr o l sch em atic   Evaluation Warning : The document was created with Spire.PDF for Python.
                      I SS N :   2 0 8 8 - 8 7 0 8   I n t J E lec  &   C o m p   E n g ,   Vo l.   15 ,   No .   5 Octo b e r   20 25 :   4 5 4 2 - 4 5 5 4   4546   I n   th is   s ec tio n ,   d ata - d r i v en   r ein f o r ce m e n lear n in g   a p p r o a ch   is   in tr o d u ce d   t o   ad d r ess   th tr ad e - o f f   b etwe en   tr ac k i n g   p er f o r m an c an d   o p tim ality   with in   t h q u ad r o to r   c o n tr o l   s y s tem .   T h e   co n tr o l   ar ch itectu r e   illu s tr ated   in   Fig u r 2   in teg r ates  b o th   p o s itio n   an d   attitu d co n tr o s tr ateg ies  u n d er   th ap p licatio n   o f   a   d is co u n f ac to r .   T h ese  c o n tr o ller s   ar u p d ated   co n cu r r en tly   u s in g   th c o llected   d ata   to   h an d le  s y s tem   u n ce r tain ties   ef f ec tiv ely .     2 . 1 .     Dis co un t   f a ct o r - ba s ed   RL   co ntr o l desig n f o a ug m e nte d qua dro t o s y s t em     First o f   all,   we  co n s id er   n o n l in ea r   af f in s y s tem   as   ( 7 ) :      ( ) = ( ( ) ) + ( ( ) ) ( ) .      ( 7 )     an d   th ass o ciate d   co s t f u n ctio n   is   d ef in ed   b y   ( 8 ) :     ( ( ) , ( ) ) = [ ( ) ( ) + ( ) ( ) ]  .      ( 8 )     wh er × > 0 , × > 0   ar b o th   s y m m etr ic  p o s itiv d ef in ite  m atr ices.  T h tr ac k in g   er r o r   m o d el  o f   n o n lin ea r   af f i n s y s tem s   ( 7 )   with   th d esire d   tr ajec to r y   ( ) ,   wh ich   is   estab lis h ed   b y   co m m an d   g en er ato r   m o d el   ( ) = ( ( ) ) , ( 0 ) = 0 ,   ca n   b f o r m u lated   as   ( 9 ) :        ( ) = ( ( ) ) ( ( ) ) + ( ( ) ) ( ) .     ( 9 )     wh er ( ) = ( ) ( ) ,   ( ( ) ) is   th u n k n o wn   f u n cti o n .   Hen ce ,   ac c o r d in g   to   tr ac k in g   er r o r   m o d el  ( 9 )   an d   th c o m m an d   g en e r ato r   m o d el  ( ( ) ) ,   we  ac h iev th f o llo win g   au g m e n ted   s y s tem :        ( ) = ( ( ) ) + ( ( ) ) ( ) .   ( 1 0 )     wh er e       ( ) = [ ( ) ( ) ] , ( ( ) ) = [ ( ( ) + ( ) ) ( ( ) ) ( ( ) ) ] , ( ( ) ) = [ ( ( ) + ( ) ) 0 ]      ( 1 1 )     T h o p tim al  c o n tr o law  ( )   is   d esig n ed   to   m in im ize  th d is co u n ted   c o s f u n ctio n   ass o ciate d   with   th e   au g m en ted   s y s tem   ( 1 0 ) .       ( ( ) , ( ) ) = ( ) ( ( ) , ( ) )  ,      ( 1 2 )     wh er > 0 is   d is co u n f ac to r ,   ( ( ) , ( ) ) ( )  ( ) + ( ( ) )  ( ) = [ 0 0 0 ]   an d   = T h ad d itio n   o f   th d is co u n f ac to r     in   th co s f u n ctio n   ( 1 2 )   is   ab le  to   g u ar an tee  th at  it  w ill  b f in ite  v alu alth o u g h   th e   in teg r al   ter m in a is   in f in ity .   T h er e f o r e,   it  is   u n n ec ess ar y   to   ex p licitly   d ef in th e   ad m is s ib le   co n tr o s et,   as  d is cu s s ed   in   [ 2 ] .   T h s et   ϒ ( )   is   d ef in ed   as  th c o n s tr ain s et  o f   co n tr o in p u ( )   s u ch   th at   th co s f u n ctio n   ( 1 2 )   is   f in ite .   B ased   o n   th d y n am ic  p r o g r am m in g   p r i n cip le,   th tr ac k i n g   B ellm an   f u n ctio n   f o r   th a u g m e n ted   s y s tem   ( 1 0 )   ca n   b e x p r ess ed   as th f o llo win g   s tatic  f u n ctio n :     ( ( ) ) = ( ( ) ) ( ) ( ( ) , ( ( ) ) )     ( 1 3 )     B ased   o n   two   ap p r o ac h es  f o r   co m p u tin g   th tim e   d er iv ativ o f   th B ellm an   f u n ctio n   ( ( ) )   in   ( 1 3 ) ,   th e   ass o ciate d   Ham ilto n ian   f u n cti o n   u n d er   d is co u n f ac to r   > 0   is   f o r m u lated .   T h f ir s ap p r o a ch   in v o lv es  d ir ec t c o m p u tatio n ,   as d etailed      ( ( ) )   =    =  ( ( ( ) ) + ( ) ( ) ) .          ( 1 4 )     wh er ( )   d en o tes  th o p tim al  c o n tr o in p u t.  Acc o r d in g   to   th B ellm an   p r in cip le,   s ec o n d   ap p r o ac h   f o r   co m p u tin g   th e   tim d er i v ativ e   o f   t h B ellm an   f u n ctio n   ( ( ) ) is   f o r m u lated   b y   u tili zin g   th e   s tatic  B ellm an   f u n ctio n   i n   ( 1 3 ) :   Evaluation Warning : The document was created with Spire.PDF for Python.
I n t J E lec  &   C o m p   E n g     I SS N:   2088 - 8 7 0 8         Dis co u n t fa cto r - b a s ed   d a ta - d r iven   r ein fo r ce men t le a r n in g   ca s ca d co n tr o l     ( N g o Tr u n g   Da n g )   4547   ( ( ) ) = ( ) + ( ( ) , ( ) )  +  ( ( + ) ) + ( ( ) , ( ) )      = ( ) + ( ( ) , ( ) )  +  ( ( + ) )   ( 1 5 )     T h r ep r esen tatio n   ( 1 5 )   o b tain s   th at:       ( ( ) ) ( ( ( + ) ) = 1 ( ) + ( ( ) , ( ) )  + (  1 ) ( ( + ) ) .         ( 1 6 )     I n   th e   v iew  o f   ( 1 6 )   an d   ( 1 4 )   a s   0 ,   we  ac h ie v th at   th s tatic  B ellm an   f u n ctio n   ( ( ) )   ca n   b s o v ed   b y   th o p tim al  co n tr o l sig n al  ( )   u s in g   th f o llo win g   p ar tial d er iv a tiv eq u atio n   as   ( 1 7 )     ( ( ) , ( ) ) ( ( ( ) ) +  ( ( ( ) ) + ( ) ( ) ) = 0 .         ( 1 7 )     C o n v er s ely ,   to   d eter m in th e   o p tim al  co n tr o in p u ( )   u s in g   th s tatic  B ellm an   f u n ctio n   ( ( ( ) )   an d   b ased   o n   th e   B ellm an   p r in cip l e,   th co r r esp o n d i n g   o p tim izatio n   p r o b lem   ca n   b f o r m u late d   as   ( 1 8 ) :     ( ( ) ) = ( ) ( ) ( ( ( ) , ( ) )  + +  ( ( ( + ) ) )          ( 1 8 )     Sin ce   0 + ,   ( 1 8 )   lead s   to   th c o r r es p o n d in g   o p tim izatio n   p r o b lem   as   ( 1 9 )       ( ) ( ) ( ) [ ( ( ) , ( ) ) ( ( ) ) +  ( ( ( ) ) + ( ( ) ) ( ) ) ] = 0 .          ( 1 9 )     Def in in g   th m o d if ied   Ham ilto n ian   f u n ctio n   i n   th p r esen ce   o f   d is co u n t f ac to r   > 0   as   ( 1 9 ) ,     ( , ( ) , , ) = ( ( ) )  ( ) + ( ( ) )   ( )  ( ( ) ) + ( ( ) ) ( ( ( ) ) + ( ( ) ) ( ) )            ( 2 0 )     wh er ( )  ( )  ,   it f o llo ws th at  th o p ti m al  co n tr o l so lu tio n   is   th en   o b tain ed   b y   ( 1 9 )   as   ( 2 0 ) ,       ( ( ) ) = ar g m in ( ) [ ( , ( ) , ( ( ) ) ) ] = 1 2 1 ( ( ) ) ( ( ) )      ( 2 1 )     Ad d itio n ally ,   s u b s titu tin g   th o p tim al  co n tr o law  ( ( ) )   ( 2 1 )   in to   ( 1 9 ) ,   it  im p lies   th p ar tial   d er iv ativ e   eq u atio n   ( PDE)   is   ex p r ess ed   a s   ( 2 2 )     ( ( ) , ( ) , , ( ) ) = ( )  ( ) 1 4 ( ( ) ) ( ( ) ) 1 ( ( ) ) ( ( ) ) ( ( ) ) + ( ( ) ) ( ( ) ) = 0 .                ( 2 2 )         2 .   I n clu d i n g   a   p o s itiv d is co u n f ac to r   > 0   en s u r es  th at   th c o s f u n ctio n   in   ( 8 )   r em ai n s   f in it e,   ev en   wh en   th s tate  v ar iab le   ( )   d o es  n o co n v e r g to   ze r o   as  .   T h is   co n s id er atio n   lead s   to   th ap p ea r an ce   o f   th ter m   " ( )   in   ( 1 9 )   r esu ltin g   in   n ec ess ar y   ad j u s tm en ts   with in   th d is co u n f ac to r - b ased   R L   co n tr o l f r am ew o r k   d escr ib ed   in   s ec tio n s   2 . 2   an d   2 . 3 .     2 . 2 .     Da t a - driv en  pro po rt io n a l - inte g ra l   po s it io n c o ntr o lle r   I n   th is   s ec tio n ,   ca s ca d c o n tr o f r am ewo r k   f o r   q u a d r o to r   UAV  as  s h o wn   in   F ig u r e   2   is   f o r m u lated   f o llo win g   th m o d el  s ep ar atio n   in   ( 2 ) ,   w h er ea ch   s u b s y s tem   ap p lies   d is co u n f ac to r - b ase d   o p tim al  co n tr o a p p r o ac h .   H o wev er ,   d u to   th in h e r en u n ce r tain ties   an d   n o n lin ea r iti es  p r esen in   ( 2 2 ) ,   o b tain in g   d ir ec an aly tical  s o lu tio n   is   in f ea s ib le.   As  r esu lt,  d ata - d r iv e n   R L   alg o r ith m   is   em p lo y ed   to   esti m ate  th s tatic  B ellm an   f u n ctio n   ( )   co r r esp o n d in g   to   t h o p tim al  c o n tr o l   p o lic y   ( )   f o r   ea ch   s u b s y s tem .   T h d y n am ic  m o d el  o f   t h p o s itio n   s u b - s y s tem   ( 2 )   c an   b e   m o d if ied   as   ( 2 3 ) :     ̈ = 1 [ 0 0 1 ] [ 0 0 1 ] = 1       ( 2 3 )   Evaluation Warning : The document was created with Spire.PDF for Python.
                      I SS N :   2 0 8 8 - 8 7 0 8   I n t J E lec  &   C o m p   E n g ,   Vo l.   15 ,   No .   5 Octo b e r   20 25 :   4 5 4 2 - 4 5 5 4   4548   wh er = [ 0 0 1 ]  [ 0 0 1 ] .   Fo r   d ev el o p in g   th e   co n tr o d esig n   o f   t h p o s itio n   s u b - s y s tem   ( 2 3 ) ,   th tr ac k i n g   er r o r   m o d el  is   n ec ess ar y   to   m ad with   th tim in v ar ian m o d el  as  s h o w n   in   ( 7 ) .   T h er ef o r e,   th s tate  v ar iab les  v ec to r   = ( , ̇ , , ̇ , , ̇ ) 6 is   a p p lied   to   r ed u ce   th o r d er   o f   ( 2 3 ) .   He n ce ,   th e   m o d el  ( 2 3 )   ca n   b tr a n s f o r m e d   in to   th f ir s t o r d er   s y s tem   as   ( 2 4 ) :     ̇ = +       ( 2 4 )     wh er e     =   ( , , ) 6 × 6 , = [ 0 1 0 0 ]   an d   = [ 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 ]     Mo r eo v er ,   d u to   th tim e   v a r y in g   o f   th d esire d   tr ajec to r y   ( ) = [ ( ) , ( ) , ( ) ] 3 ,   to   tr an s f o r m   th tr ac k in g   er r o r   m o d el  o f   th e   p o s itio n   s u b - s y s tem   ( 2 4 )   in to   th tim e   in v a r ian m o d el   ( 7 ) ,   it  is   n ec ess ar y   to   u tili ze   th f o llo wi n g   ass u m p tio n s :      1 .   T h d esire d   tr ajec to r y   ( ) = [ ( ) , ( ) , ( ) ] 3   is   b o u n d e d   an d   its   tim d er iv ativ  ( )   is   th L ip s ch itz  f u n ctio n .        2 .   T h r ef er en ce   v ec to r   = [ , ̇ , , ̇ , , ̇ ] 6   ca n   b co m p letely   ex p r ess ed   as   ( 2 5 ) ,        ( ) =  ( )      ( 2 5 )     T h er ef o r e,   in   th e   v iew  o f   ( 2 4 )   an d   ( 2 5 ) ,   it  o b tain s   th tim in v ar ian t m o d el  ( 7 )   as:      = [ ̇ ̇ ] = [  0 6 , 6  ] + [ 0 6 , 3 ]       wh er     = = [ ]       ( 2 6 )     T h tr ac k in g   co s t f u n ctio n   is   m o d if ied   as   ( 2 7 ) :     ( ( ) ) = ( ) × [ ( ) ( ) + ( ) ( ) ]      ( 2 7 )     wh er = [  0 6 , 6 0 6 , 6 0 6 , 6 ]   an d    6 × 6 3 × 3   ar s y m m etr ic  m atr ices w ith   p o s itiv d ef in iten ess .   No t e   th at,   th ter m   ( )   is   ad d ed   to   ( 2 7 )   f o r   e n s u r in g   th f in ite  c o s f u n ctio n   wh ile  = [ ]   d o es  n o co n v er g to   ze r o   as  tim ap p r o ac h es  in f in ity .   Acc o r d in g   t o   ( 1 7 ) - ( 2 1 )   an d   th o f f - p o licy   tech n iq u [ 3 ] ,   th e   d ata - d r iv e n   alg o r ith m   is   p r o p o s ed   to   d ev elo p   th p o s itio n   co n tr o ller   as f o llo ws:     Alg o r ith m   1 .   Data - d r iv e n   alg o r ith m   f o r   p o s itio n   c o n tr o ller   1 .   I n itializatio n E m p lo y i n g   th s tab ilizin g   p o licy   0 ( )   an d   th ad d itio n al  n o is ( )   to   s ati s f y   PE   co n d itio n .   C o llectin g   th in p u t - o u tp u d ata  in   th q u ad r o to r   s y s tem   an d   estab lis h in g   th th r esh o ld       2 .   Po licy   ev alu atio n B ased   o n   th co n tr o in p u ( ) = ̂ ( ) +   an d   th co n tr o p o licy   ̂ ( ) ,   we   s o lv th ( 2 8 )   to   f in d   s im u ltan eo u s ly   + 1 ( )   an d   + 1 ( )     + 1 ( ( + ) )  + 1 ( ( ) ) = ( ) ( ( ) ( ) + ( ̂ ) ̂ + + 2 ̂ )  ; ̂ ( ) = ( ) + ( )      ( 2 8 )       Evaluation Warning : The document was created with Spire.PDF for Python.
I n t J E lec  &   C o m p   E n g     I SS N:   2088 - 8 7 0 8         Dis co u n t fa cto r - b a s ed   d a ta - d r iven   r ein fo r ce men t le a r n in g   ca s ca d co n tr o l     ( N g o Tr u n g   Da n g )   4549   3 .   Po licy   im p r o v em en t :   Ob ta in   th e   co n tr o p o licy   ( ) = + 1 ( ) , ( + 1 )   an d   g o   to   s tep   2   u n til   + 1 < .       I n   th Alg o r ith m   1 ,   th s o lu tio n   o f   B ellm an   eq u atio n   ( 2 4 )   is   im p r o v ed   b y   d ata  co llectio n   b y   th f o llo win g   m o d if icatio n :     + 1 ( ( + ) ) + 1 ( ( ) ) = ( ( ) ( ) + ( ) ( ( ) ) ( ( ) ) ) +  + + + 1 ( ( ) )  + 2 ( + 1 ( ( ) ) ) + ( ( ) )                 ( 2 9 )     Af ter   ac h iev i n g   t h p o s itio n   co n tr o s ig n al     in   th e   q u a d r o to r   co n tr o l   s tr u ctu r e   as  s h o wn   i n     Fig u r e   2 ,   we  p r o ce ed   t o   co m p u te  th r e f er en ce   o f   attitu d co n tr o s ch em [ ]   as  f o llo ws.   Acc o r d in g   t o   = [ 0 0 1 ]  [ 0 0 1 ] ,   it f o llo ws th at:     +  [ 0 0 1 ] = [ (  ) (  ) + (  ) (  ) (  ) (  ) (  ) (  ) (  ) (  ) (  ) (  ) ]        ( 3 0 )     B y   s ettin g   th y aw  an g le  r ef er en ce   ( )   as a   co n s tan t n u m b er   to   s y n ch r o n ize  i n   p r ac tical  ap p licatio n s ,   b ased   o n   ( 3 0 ) ,   we  ca n   ac h iev e   th d e s ir ed   , ,   as  ( 3 1 )     = (  +  ) (  ) (  )   =   (     ) ,   =   (   +    ) .        ( 3 1 )     2 . 3 .     Da t a - driv en  RL   ba s ed  a t t it ud co ntr o ller   I n   th is   p ar t,  d ata - d r iv en   R L - b ased   attitu d co n tr o law  is   s im ilar ly   d esig n ed   as  ab o v to   o b tain   th e   in p u s ig n als    f o r   s atis f y in g   o p tim al  tr ac k in g   p er f o r m an ce   with   th d esire d   tr ajec to r y   ( 3 1 ) .   T h attitu d e   d y n am ic  m o d el  ( 2 )   ca n   b r ew r itten   b y   ( 3 2 ) :     ̈ = 1 1 ( , ̇ ) ̇          ( 3 2 )     B y   co n s id er in g   th attitu d e   s tate  v ec to r   = [ , ̇ , , ̇ , , ̇ ] an d   r ef er r i n g   to   th e   attitu d co n t r o s tr u ctu r e   illu s tr ated   in   Fig u r e   2 ,   th e   d esig n   ap p r o ac h   m ir r o r s   th e   p o s itio n   co n tr o s tr ateg y   d escr ib ed   in   s u b s ec tio n   2 . 3 .   B ased   o n   ( 3 2 ) ,   th au g m en te d   attitu d d y n am ics ca n   b r e f o r m u lated   as  ( 3 3 ) :      = [ ̇ ̇  ] = [  0 6 , 6  ]  + [ 0 6 , 3 ]       ( 3 3 )     Acc o r d in g ly ,   th attitu d c o n tr o l stra teg y   is   s u m m ar ized   in   t h A lg o r ith m   2 :     Alg o r ith m   2 .   Data - d r iv e n   R L   b ased   attitu d co n tr o l sch em e   1 .   I n itializatio n E m p lo y in g   th s tab ilizin g   p o licy   0 ( )   an d   t h ad d itio n al  n o is  ( )   to   s atis f y   PE  co n d itio n .   C o llectin g   th in p u t - o u tp u d ata  o f   th q u ad r o t o r   s y s tem .   2 .   Po licy   ev al u atio n B ased   o n   th co n tr o s ig n al   ( ) = ̂ ( ) +   an d   th co n tr o p o licy   ( ) ,   we   s o lv th ( 3 4 )   to   f in d   s im u ltan eo u s ly   + 1 ( )   an d   + 1 ( )     + 1 ( ( + ) ) + 1 ( ( ) ) = ( ( ) ( ) + ( ) ( ( ) ) ( ( ) ) ) +  + + + 1 ( ( ) )  + 2 ( + 1 ( ( ) ) ) + ( ( ) )           ( 3 4 )     3 .   Po licy   im p r o v e m en t Ob tain   th co n tr o p o licy   ( ) = + 1 ( ) , ( + 1 )   an d   g o   to   s tep   2   u n til  + 1 <   Evaluation Warning : The document was created with Spire.PDF for Python.
                      I SS N :   2 0 8 8 - 8 7 0 8   I n t J E lec  &   C o m p   E n g ,   Vo l.   15 ,   No .   5 Octo b e r   20 25 :   4 5 4 2 - 4 5 5 4   4550       3 .   T wo   d ata - d r iv en   R L   alg o r ith m s   in co r p o r atin g   d is co u n f ac to r   ar p r o p o s ed   f o r   th q u ad r o to r ,   ad d r ess in g   b o th   t h attitu d e   a n d   p o s itio n   s u b s y s tem s .   T h is   wo r k   ex ten d s   th s tu d y   i n   [ 3 7 ] ,   wh ich   f o cu s e d   s o lely   o n   R L   co n tr o l f o r   th at titu d s u b s y s tem   with o u t c o n s id er in g   d is co u n t f ac to r .       3.   SI M UL A T I O R E S UL T S   I n   th is   s ec tio n ,   we  u s th ex am p le  o f   q u ad r o to r   to   illu s tr ate  th p r o p o s ed   d ata  R L   alg o r ith m   with   th f o llo win g   p ar am eter   as f o l lo ws:       = 2 . 0 (  ) , = 1 ( 2 ) = 1 ( 2 ) = 9 . 8 ( 2 ) = 0 . 2 ( ) ,   = 1 0 3   ( 5 . 1 , 5 . 1 , 5 . 2 ) (  . 2 )     T h d esire d   tr ajec to r y   o f   th p o s itio n   co n tr o ller   is   ch o s en   as:  ( ) = [ 0 . 5 , 0 . 5 , 1 . 5 + ] ,   it  ca n   b e   o b tain ed   th at  t h ( 2 5 )   is   g u a r a n teed   with   m atr ix        = [           0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ]               Mo r eo v er ,   th e   co s f u n ctio n   u tili ze s   th weig h m atr ice s   = 100 6 = 3  = 100 6 ,   = 3 , = 0 . 01 ,  = 0 . 01 ,   an d   d is co u n f ac t o r   o f   = 0 . 01 .   Du r in g   th in itial  d ata  co llectio n   p h ase   [ 1 7 ] ,   two   p r o p o r tio n al - d e r iv at iv ( PD )   co n tr o ller s   ar ap p lied   to   th e   p o s itio n   a n d   attitu d l o o p s   to   g ath er   d ata   f o r   t h lear n in g   p r o ce s s .   T o   en s u r th e   p er s is ten ce   o f   e x citatio n   ( PE)   co n d itio n s   r eq u ir e d   f o r   th e   p r o p o s ed   alg o r ith m s ,   n o is s ig n als  d ef in ed   as   = 0 100 = 1 . 01  ( )   an d    = 0 500 = 1 . 002 ( ) ,   wh er e   ea ch     is   r an d o m l y   s elec ted   with in   [ −1 0 0 ,   1 0 0 ] ,   a r in jecte d   in to   th p o s itio n   a n d   attitu d e   co n tr o in p u ts ,   r esp ec tiv ely .   Fo r   th c r itic  an d   ac to r   n e u r al  n etwo r k s ,   s e co n d - o r d er   an d   f ir s t - o r d er   p o ly n o m ial  ac tiv atio n   f u n ctio n s   ar em p l o y ed ,   r esp e ctiv ely .   I is   wo r t h   n o tin g   t h at  th tr ac k i n g   p er f o r m an ce   o f   t h p r o p o s ed   d ata - d r iv en   R L - b ased   p o s itio n   a n d   attitu d co n tr o ller s   is   ill u s tr ated   in   Fig u r es  3   to   7 ,   d em o n s tr atin g   f ast  co n v er g en ce   with   o n ly   f o u r   it er atio n s   r eq u ir ed   f o r   th alg o r ith m   weig h ts   to   s tab ilize.   Mo r eo v er ,   th p o s itio n   tr ac k in g   er r o r s   co n v er g t o   ze r o   with in   4   s ec o n d s ,   w h ile  th attitu d tr ac k in g   er r o r s   r ea ch   ze r o   in   ap p r o x im ately   0 . 5   s ec o n d s ,   as  illu s tr ated   in   F ig u r es  3   an d   5 ,   r esp ec tiv el y .   Fu r t h er m o r e ,   Fig u r 7   d em o n s tr ates  th q u ad r o to r s   t r ajec to r y   tr ac k in g   p er f o r m a n c r elativ to   p r ed ef i n ed   r ef er en ce   p ath ,   s h o win g   th at  th q u ad r o to r s   p o s itio n   clo s ely   f o llo ws  th r ef er en ce   tr ajec to r y   with   h ig h   ac cu r ac y .   Fu r th e r m o r e,   to   ev alu ate  th e f f ec tiv en ess   o f   th tr ac k in g   p e r f o r m an ce ,   n u m er o u s   p er f o r m a n ce   in d ices,  in clu d in g   t h in teg r al   o f   ab s o lu te  er r o r   ( I AE )   an d   t h in teg r al  o f   ab s o lu te  tim e - weig h ted   er r o r   ( I AT E ) ,   ar p r esen ted   as  s h o wn   in   T ab le  2 .           Fig u r 3 .   T h p o s itio n   t r ac k in g   er r o r   Evaluation Warning : The document was created with Spire.PDF for Python.
I n t J E lec  &   C o m p   E n g     I SS N:   2088 - 8 7 0 8         Dis co u n t fa cto r - b a s ed   d a ta - d r iven   r ein fo r ce men t le a r n in g   ca s ca d co n tr o l     ( N g o Tr u n g   Da n g )   4551       Fig u r 4 .   T h co n v er g e n ce   o f   tr ain in g   weig h ts   in   p o s itio n   co n tr o ller           Fig u r 5 .   T h tr ac k in g   o f   o r ie n tatio n   an g les           Fig u r 6 .   T h co n v er g e n ce   o f   tr ain in g   weig h ts   in   attitu d c o n tr o ller   Evaluation Warning : The document was created with Spire.PDF for Python.