Grand Challenge

IEEE AICAS 2026 Grand Challenge

With the rapid advancement of artificial intelligence, multimodal large models have become a core driving force in the evolution of intelligent systems. Vision-Language Models (VLMs), which integrate visual perception with language understanding capabilities, have demonstrated revolutionary potential in fields such as image recognition, visual question answering, and intelligent interaction. However, practical applications face dual challenges: on one hand, there is a need to efficiently deploy computationally intensive advanced models onto resource-constrained edge devices and dedicated hardware platforms; on the other hand, it is crucial to optimize the inference performance of large models on specific hardware architectures to fully leverage their computational potential.

To address these challenges, promote the practical deployment of multimodal large models, explore cutting-edge technologies for hardware-software co-design and optimization, and cultivate interdisciplinary talent with system-level design and optimization capabilities, the IEEE AICAS 2026 Grand Challenge features two core competition tracks.Participants can access the competition homepage via the links provided for each track. Registration and team formation are facilitated through the Tianchi Platform.

Track 1: VLM Efficient Inference and Optimization for AI Chips: https://tianchi.aliyun.com/competition/entrance/532450
Track 2: FPGA Hardware-Software System Design for On-Device VLM Inference: https://tianchi.aliyun.com/competition/entrance/532451

This competition is primarily aimed at students and researchers from universities and research institutions, and is also open to industry professionals. Each team may consist of up to three participants and two mentors to ensure reasonable team structure and professionalism. Please note that employees of T-Head Semiconductor and Alibaba Tongyi Lab are not eligible for awards to ensure fairness in the competition. The schedule is as follows:

Preliminary Round Starts: 2026.02.03
FPGA and Remote Platform Training: 2026.02.04
Preliminary Round Ends: 2026.04.18
Competition Workshop: 2026.04.26 (Shanghai)
Final Round Starts: 2026.04.28
Final Round Ends: 2026.05.28
Code and Technical Report Submission: 2026.06.08
Results Announcement: 2026.06.18
AICAS Conference: 2026.09.16

Outstanding teams in each track will receive cash prizes. Additionally, teams will have special opportunities for presentation and networking. The top 10 teams from the preliminary round will be invited to participate in the technical workshop in Shanghai. Excellent design solutions may be invited for publication at the IEEE AICAS conference or in partner journals, providing participants with a platform for academic exchange and showcasing their work.

First Place: USD 1300 or CNY 9100 (1 team per track)
Second Place: USD 900 or CNY 6300 (1 team per track)
Third Place: USD 650 or CNY 4550 (1 team per track)
Excellence Award: USD 100 or CNY 700 (7 teams per track)

Track 1: VLM Efficient Inference and Optimization for AI Chips

This track aims to research how to efficiently deploy advanced multimodal large models on computing platforms. Participants will work with the Tongyi Qianwen Qwen3-VL-2B-Instruct model, conducting model inference optimization and deployment practices on the designated Alibaba Cloud APG server, systematically exploring methods to enhance its performance on specific hardware.

Participants are required to propose various optimization methods to systematically improve the inference performance of the model on the target hardware, including but not limited to operator optimization and fusion, computing scheduling strategies, memory management optimization, etc. The final evaluation will be based on the competition committee’s specified testing scheme to score the optimization methods.

In terms of technical evaluation, the preliminary round focuses on two key dimensions:

Model Inference Accuracy (Ability) Evaluation
- Accuracy testing is based on a specialized evaluation dataset provided by the organizers, containing (image, question, reference answer) triplets. The predicted answers generated by the optimized model will be intelligently matched with the reference answers (ignoring case, punctuation, and supporting partial and keyword matching).
Model Inference Performance (Efficiency) Evaluation
- Time to First Token (TTFT) improvement rate: Measures the improvement in average time required from receiving input to generating the first token.
- Throughput improvement rate: Measures the improvement in average token generation speed per unit time.

The competition process consists of preliminary and final rounds. In the preliminary round, teams can choose their own hardware platforms for model optimization method validation and performance evaluation. After the preliminary round, the review committee will select 16 teams with valid submissions and passing code review to advance to the final round. In the final round, teams will uniformly use the Alibaba Cloud APG server provided by the organizers for final hardware-software co-optimization, proposing targeted algorithm inference and deployment strategy optimization solutions. The final round requires deployment optimization on the APG server, using the same model for performance evaluation and comparison. The final competition score will be weighted based on preliminary and final round scores.

Teams must submit detailed technical solution reports introducing their VLM application scenarios, hardware-software co-design solutions, optimization method designs, and comparison of evaluation metrics and performance summaries before and after optimization. Complete source code implementing the optimization methods must also be submitted. All code will be reproduced and evaluated using the standardized hardware and software environment provided by the competition organizers to ensure fairness and comparability of results.

Track 2: FPGA Hardware-Software System Design for On-Device VLM Inference

This track requires deploying and running large models under the hardware conditions of the KV260 platform, utilizing on-chip CPU and FPGA resources to conduct chip architecture design optimization for multimodal vision-language model on-device inference.

Participants, based on the SmolVLM2 large language model, need to design corresponding hardware acceleration architectures on FPGA and utilize on-chip processors and FPGA resources for VLM model deployment optimization. The mandatory core task is to use FPGA resources including ARM cores, LUTs, DSPs, etc., for accelerator design. Additionally, innovative approaches can be proposed from perspectives such as optimizing computing scheduling, improving data reuse, pipeline design, and hardware-software co-design. The primary goal of the competition is to comprehensively optimize and enhance the model’s inference performance on the target hardware.

In terms of technical evaluation criteria, the preliminary round mainly examines two dimensions:

Model Inference Accuracy (Ability) Evaluation
- Accuracy testing uses the OCRBench test set with 100 test samples. Each test sample provides an image, a question, and an answer. The model needs to output the correct answer based on the input image and question. The correctness is determined by the substring inclusion relationship.
Model Inference Performance (Efficiency) Evaluation
- Prefill stage throughput improvement rate
- Decoding stage throughput improvement rate

Participants will remotely access the Kria KV260 board through the interface provided by the competition organizers to verify, test, and evaluate the accelerator system, comparing capability and performance before and after optimization. During the preliminary round, each team has limited daily access to the cloud platform. After the preliminary round, the organizers will select 16 teams with valid submissions and passing code review to advance to the final round. Teams entering the final round will receive a KV260 development board for further development and optimization. The evaluation criteria for the final round will be appropriately adjusted and optimized based on the preliminary round results.

Teams must submit detailed test reports describing the optimized test results for SmolVLM2-500M-Video-Instruct; submit technical documentation detailing their implementation of optimization methods for large models; and submit complete source code implementing the optimization methods to ensure reproducibility and verifiability of the technology.