The research team's paper has been accepted to ICML (International Conference on Machine Learning) 2026, the world's most prestigious AI conference, which will be held at COEX in Seoul this July. It was selected for an Oral presentation, an honor given to only the top 0.7% (168 papers) out of all submitted papers (23,918 papers), recognizing the excellence of the research. ICML is considered one of the most influential international conferences in the fields of AI and machine learning. Recently, AI technology is rapidly evolving beyond generative AI that writes text and draws pictures into the era of 'Physical AI,' which moves actual machines and acts in the real world. Representative examples include robots that perform dangerous tasks in factories instead of humans, autonomous vehicles that judge road situations on their own, and medical robots that perform delicate surgeries. However, there was a barrier that had to be overcome for the practical application of physical AI. It is the problem of learning human-level evaluation criteria to judge whether the actions performed by a machine match human intentions and which actions are more desirable. For example, when a surgical robot performs suturing or an autonomous vehicle passes through a complex intersection, the AI must choose the most appropriate action among numerous options. To achieve this, a 'Reward Function' that reflects human preferences and judgment criteria is required. However, until now, humans had to directly evaluate thousands to tens of thousands of action data points to build this, which required an enormous amount of time and cost. The research team focused on the way humans learn new tasks after seeing just a few demonstrations. VOTP, developed by the research team, helps AI understand human-preferred action patterns on its own with just a few videos of good and bad examples. Even without humans evaluating a vast amount of data one by one as before, AI can understand human judgment criteria and expand its learning to various situations. The core idea of this research is that intelligent machines such as robots or autonomous vehicles can quickly grasp human intents with only a small number of videos containing human preferences. The algorithm developed for this purpose proved its effectiveness and generalization performance through extensive experiments across various environments and tasks. This method can significantly reduce human feedback and data construction costs required for physical AI development. Since robots, autonomous vehicles, and industrial machinery can learn actions that meet human expectations with only a small number of examples, it is expected to drastically shorten development time and costs. The technology can be widely applied not only to robot arm control, humanoid robots, autonomous vehicles, smart factories, drones, and surgical robots, but also to AI agents that directly operate computers. In particular, it is expected to be utilized as a core foundational technology for all physical AI systems that need to learn human intention and satisfaction.
Professor Chang D. Yoo said, "The core of physical AI is making machines understand human intentions and choose the correct actions," and added, "Since VOTP can learn human judgment criteria with only a small number of videos, it is a core technology that will accelerate the era of robots making human-like judgments." This research, in which PhD student Tung M. Luu from the School of Electrical Engineering participated as the first author, was selected as an Oral presentation paper at ICML (International Conference on Machine Learning) 2026, the world's most prestigious AI conference. ※ Paper Title: Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning, Paper File: https://sanctusfactory.com/data/file/publications/202606091714078906.pdf This research was conducted with support from the Institute for Information & Communication Technology Planning & Evaluation (IITP) and the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT."
Meta-analysis
Not applicable
Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning
20-Sep-2025