Thesis/Dissertation

Advanced Cooperation Algorithms in MARL: From Discrete to Continuous

Ph.D. Dissertation Defense: Yasin Findik - 3/10/2025

Abstract: The rapid advancement of intelligent systems has sparked significant interest in reinforcement learning (RL) due to its potential in enabling autonomous agents to learn optimal behaviors through interactions with their environment. As the complexity of these environments and tasks increases, the need for agents to operate both independently and collaboratively has become more apparent. This has led to the emergence of multi-agent reinforcement learning (MARL), a field focused on developing frameworks that allow multiple agents to cooperate, compete, or coexist to achieve individual or collective objectives. MARL has become increasingly important in real-world applications such as robotics, autonomous vehicles, and finance, where agents must make decisions based on their own actions and those of others.

This thesis explores the challenges and opportunities presented by MARL in real-world applications, where agents must operate under conditions of partial observability, non-stationarity, and the need for coordinated decision-making. Traditional single-agent reinforcement learning (SARL) methods, although effective in isolated environments, fall short when applied to multi-agent settings due to the additional complexity introduced by inter-agent interactions. These complexities necessitate more sophisticated algorithms capable of managing decentralized decision-making, enhancing cooperation, and ensuring robustness. To address these issues, this thesis introduces novel methods aimed at improving the efficiency and effectiveness of MARL systems in both discrete and continuous action domains, leveraging recent advances in deep learning and RL. First, the thesis introduces a novel relational-awareness (RA) based cooperation strategy, which allows agents to work together more effectively by incorporating awareness of the relationships between agents. We evaluate the effectiveness of our proposed approach by conducting fifteen experiments in two different discrete environments. The results demonstrate that our proposed algorithm can influence and shape team behavior, guide cooperation strategies, and expedite agent learning. Therefore, our approach shows promise for use in multi-agent systems, especially when agents have diverse properties.

Another key contribution is the development of Mixed Q-Functionals (MQF), a value-based algorithm designed for continuous action domains, which significantly outperforms existing methods in terms of performance and fostering collaboration among agents. We evaluate the efficacy of our algorithm in six cooperative multi-agent scenarios within continuous environments. Our empirical findings reveal that MQF outperforms four variants of Deep Deterministic Policy Gradient through rapid action evaluation and increased sample efficiency. Furthermore, this thesis presents the Collaborative Adaptation (CA) Framework, which leverages relational networks to improve the resilience of multi-agent systems in scenarios involving unexpected failures. Empirical evaluations across both discrete and continuous environments demonstrate that, in scenarios involving unforeseen malfunction, although state-of-the-art algorithms often converge on sub-optimal solutions, the proposed CA framework not only mitigates and recovers more effectively but also offer valuable insights into the practical deployment of MARL algorithms in dynamic, real-world applications. In conclusion, this thesis presents significant advancements in the field of MARL by addressing critical challenges such as partial observability, non-stationarity, and the need for enhanced cooperation in complex environments. Through novel approaches like the RA-based cooperation strategy, MQF algorithm, and CA framework, the research demonstrates enhancements in both discrete and continuous action domains. Experiments across various multi-agent scenarios validate the effectiveness of these methods in improving collaboration, decision-making, and system resilience. Overall, the contributions of this thesis pave the way for more robust and efficient MARL systems, capable of adapting to real-world complexities and expanding the capabilities of intelligent autonomous agents.

[pdf]

Evaluation of Commercial Quadrupedal Robot Hardware and Controllers for Locomotion and Stable Manipulation on Dynamic Rigid Surfaces

Master's Thesis Defense: Stephen Misenti - 7/28/2024

Abstract: Designing and building legged robots has been a focus of research since the 1960s. Although several robots have been built over the past years, designing a system with a high level of stability particularly in dynamic environment is still challenging. Locomotion on dynamic rigid surfaces can be useful for robots operating on naval vessels, offshore platforms, trains, and aircraft. This thesis focuses on the evaluation of existing commercial legged robots and contributes methods for evaluating quadruped robots within non-inertial environments. More specifically, we evaluated two platforms, the Ghost Robotics Vision 60 and the Boston Dynamics Spot, to determine potential loco-manipulation effectiveness and evaluating system stability. The contributions of the thesis include: (a) development of controlled experiment testing and safety procedures, (b) design and construction of a robot safety harness and experiment apparatuses, (c) identification of shipboard motion equations from IMU data extracted during live experiments, (d) experimentation and evaluation of two commercial quadruped platforms within non-inertial environments to identify their capabilities, limitations, and performance gaps. In our experiments, the Vision 60 robot was coupled with the Kinova Gen2 manipulator and Spot uses the Spot Arm to facilitate loco-manipulation. Our experimental setup included a controlled environment with varying swaying and rocking motions. During the experiments, the robotic arms were tasked to track a marker attached to the base. We performed several experiments with with varying motions on the dynamic platform. For the base movement, we used both synthetic motions with manually designed equations, as well as, motions identified from the movements of an actual ship during field experiments. We also analyzed and compared the collected data between the two robot platforms. Our results show that Vision 60 and Spot are capable of staying upright on dynamic rigid surfaces, but need improvements. In field experiments, Spot was able to successfully move about the testing area during most experiments, but had difficulty during wave swells in rougher sea states. During controlled experiments, Vision 60 and Spot experienced significant disturbances, but were able to stay on the testing platform in most experiments. However, heavier motions worsened results. In some trials, Spot failed to stay safely upright. Overall, Vision 60 showed better results than Spot.

[pdf]

Methods for Measuring and Increasing Robot Autonomy in Dynamic Environments

Master's Thesis Defense: Ryan Donald - 7/26/2024

Abstract: Achieving success in real-world environments hinges significantly on enhancing the level of autonomy in robots performing their tasks. Improving robot autonomy, however, is not straightforward and comes with a number of challenges. Two of these challenges include (a) the development of metrics to quantitatively measure autonomy, and (b) the generation of behaviors that enable robots to handle uncertain situations more effectively. To measure the level of autonomy in robots, the development of accurate and reproducible metrics is essential. In this thesis, we investigate existing metrics for defining and measuring robot autonomy and develop new ones for both non-contextual and contextual autonomy measures. These measures are evaluated with seven small Unmanned Aerial System (sUAS) platforms. For the non-contextual evaluations, each platform's specifications and autonomous behaviors are gathered through datasheets and real-world verification. We then evaluate a number of methods for combining feature-dependent scores into a single score for each sUAS and proposed a Non-Contextual Autonomy Potential (NCAP) coordinate to produce a single metric for non-contextual autonomy. In the contextual evaluations, we utilize a set of preexisting tests and data for each sUAS. We design several cascaded Fuzzy Inference Systems (cFIS) which we then utilize to score the performance of each sUAS within these tests. We then calculate a predictive score based on the scores of each sUAS within each of the individual tests. The second challenge deals with the development of autonomous behaviors for robots. These behaviors must allow the robot to safely perform their tasks in environments which can rapidly change and may not be as structured as a laboratory environment. Dynamic environments are especially difficult, as they involve moving obstacles, goals, and forces in the inertial frame of the system. Towards this goal, we propose a framework for improving robot performance, and hence its autonomy, during an inspection task in dynamic environments. Our framework which allows the robot to predict the motion of the environment and make decisions accordingly. Our framework consists of (a) a Learning from Demonstration (LfD) module for encoding the robot's movements and skill reproduction, (b) an Unscented Kalman Filter (UKF) for state estimation, and (c) a Hidden Markov Model (HMM) for high-level decision making. We evaluate the performance of this framework in a number of simulation and real-world experiments and show that this framework improves the performance of the robot. Additionally, we determine that the weighted product method of combining scores into a single score is the best method as it allows for easy combinations of additional metrics to existing data. Our non-contextual autonomy evaluation shows the benefits and drawbacks of each combination method, and an evaluation based upon the NCAP coordinate, and determines a ranking of the systems based upon their NCAP coordinate. In the contextual autonomy evaluation, we show how our cFIS are able to combine data from multiple different experiments to produce an overall autonomy evaluation of a system, while also having the adaptability to include various different tests. Lastly, we showcase how our framework can improve the autonomy of robots within dynamic environments.

[pdf]

A Multi-Robot Task Assignment Framework for Search and Rescue with Heterogeneous Teams

Master's Thesis Defense: Hamid Osooli - 7/18/2024

Abstract: In post-disaster scenarios, search and rescue operations often require the coordinated efforts of multiple robots and humans to perform a variety of challenging tasks. Existing planners, while effective in certain aspects, frequently overlook critical elements such as information gathering, task assignment, and comprehensive planning. Furthermore, previous works that consider robot capabilities and victim requirements often suffer from significant time complexity due to repetitive and inefficient planning steps. This thesis addresses these limitations by proposing a comprehensive framework that encompasses scouting, task assignment, and path-planning steps. The proposed Multi-Stage Multi-Robot Task Assignment framework leverages detailed information about robot capabilities, victim requirements, and the historical performance of robots to optimize task assignments, thereby enhancing the overall success rate of rescue missions. An iterative process is integrated to ensure that primary objectives are achieved while considering problem constraints. The thesis employs off-the-shelf path-planning methods while introducing a hierarchical game theory-based framework for the scouting step. This innovative approach enables agents to make more informed decisions, optimizing their actions based on the relative information provided by their teammates. The incorporation of game theory significantly accelerates the accomplishment of the rescue mission, reducing completion time by 66% compared to scenarios where agents do not communicate their information, and by 58% compared to scenarios where agents only communicate information without suggesting actions to teammates. Additionally, the thesis develops and evaluates an environment for multi-agent search and rescue that leverages multi-agent reinforcement learning (MARL). The environment is designed to enhance the communication and coordination capabilities of the agents, allowing them to effectively locate and rescue victims. The agents are trained using Q-learning and Deep Q-Networks (DQN), with different aspects of this environment evaluated and detailed in the results section. The proposed framework is validated through extensive testing on four different maps, where it is compared to a state-of-the-art baseline. The results highlight the superior performance of the proposed task assignment algorithm, achieving a 97% improvement over the baseline in terms of planning time.

[pdf]

Enhancing Team Performance in Multi-Agent Multi-Armed Bandit through Optimization - Defense session

Master's Thesis Defense: Monish Reddy Kotturu - 7/1/2024

Abstract: The multi-armed bandit (MAB) problem involves sequential decision-making under uncertainty with the goal of maximizing an agent's cumulative rewards. In reinforcement learning, the MAB problem provides a foundation for developing algorithms that tackle the exploration-exploitation trade-off. MABs are used in various areas such as recommender systems, online advertising, dynamic pricing, and adaptive experimental design. In a multi-agent setting, the MAB problem can be extended to involve a team of agents cooperating to reach a consensus and maximize team reward. Effective decision-making in the Multi-Agent Multi-Armed Bandit (MAMAB) scenarios requires effective communication (i.e., sharing information) among agents. However, suboptimal communication can reduce team performance. The topic of this thesis is to study and develop methods to improve team performance in MAMABs through optimization. We define the team structure using a relational network, a graph that dictates the manner in which information is exchanged among agents and assigns weights (i.e., importance) to transmitted and received information. In teams governed by relational networks, one important step in achieving effective communication is finding optimal edge weights. The edge weight optimization problem can be formulated as a convex optimization problem to find the ideal relational weights that expedite consensus formation efficiently. We study the effects of various edge weight optimization algorithms in MAMABs. Our results show that in large, communication-constrained networks of agents, the timescale needed to reach a consensus can be improved through optimization. A major shortcoming of the above experimental setting is that it assumes perfect agents playing a given MAB, which is usually not the case in real-world scenarios. Agents often possess unique abilities that influence their performance on specific tasks. To account for this variability, we focus on the notion of competency. We define competency as an agent's ability to find the optimal arm of a MAB with high probability within a finite time. We simulate agent competency by adding noise to their observations. In our experiments, we formulate an intricate method that uses a vector of competencies for an agent that represents its performance in different scenarios. In other words, an agent can possess higher competency when playing a set of MABs with a certain level of difficulty while showing lower competency when playing another. When agents with varying levels of competencies work together, it is important to test the network's performance as a whole over a longer period of time over different problems or missions. We hypothesize that the optimization process can help improve team performance when playing bandits with various difficulty levels over a longer period of time. To validate this hypothesis, we propose a long-term online optimization process where in each bandit stage, the team goes through an iterative operation of playing a batch of bandits and optimizing their edge weights based on the resulting team performance. Our results show that the team performance can be substantially improved through this approach by limiting the spread of noisy information from lesser competent agents and feeding helpful information back to the lesser competent agents.

[pdf]

Page updated

Google Sites

Report abuse