2025

CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation
CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation

Chihan Huang, Hao Tang

Under Review 2025

Although autoregressive models have dominated language modeling in recent years, there has been a growing interest in exploring alternative paradigms to the conventional next-token prediction framework. Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability. However, these models are often constrained by fixed-length generation. A promising direction is to combine the strengths of both paradigms, segmenting sequences into blocks, modeling autoregressive dependencies across blocks while leveraging discrete diffusion to estimate the conditional distribution within each block given the preceding context. Nevertheless, their practical application is often hindered by two key limitations: rigid fixed-length outputs and a lack of flexible control mechanisms. In this work, we address the critical limitations of fixed granularity and weak controllability in current large diffusion language models. We propose CtrlDiff, a dynamic and controllable semi-autoregressive framework that adaptively determines the size of each generation block based on local semantics using reinforcement learning. Furthermore, we introduce a classifier-guided control mechanism tailored to discrete diffusion, which significantly reduces computational overhead while facilitating efficient post-hoc conditioning without retraining. Extensive experiments demonstrate that CtrlDiff sets a new standard among hybrid diffusion models, narrows the performance gap to state-of-the-art autoregressive approaches, and enables effective conditional text generation across diverse tasks.

CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation

Chihan Huang, Hao Tang

Under Review 2025

Although autoregressive models have dominated language modeling in recent years, there has been a growing interest in exploring alternative paradigms to the conventional next-token prediction framework. Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability. However, these models are often constrained by fixed-length generation. A promising direction is to combine the strengths of both paradigms, segmenting sequences into blocks, modeling autoregressive dependencies across blocks while leveraging discrete diffusion to estimate the conditional distribution within each block given the preceding context. Nevertheless, their practical application is often hindered by two key limitations: rigid fixed-length outputs and a lack of flexible control mechanisms. In this work, we address the critical limitations of fixed granularity and weak controllability in current large diffusion language models. We propose CtrlDiff, a dynamic and controllable semi-autoregressive framework that adaptively determines the size of each generation block based on local semantics using reinforcement learning. Furthermore, we introduce a classifier-guided control mechanism tailored to discrete diffusion, which significantly reduces computational overhead while facilitating efficient post-hoc conditioning without retraining. Extensive experiments demonstrate that CtrlDiff sets a new standard among hybrid diffusion models, narrows the performance gap to state-of-the-art autoregressive approaches, and enables effective conditional text generation across diverse tasks.

ScoreAdv: Score-based Targeted Generation of Natural \ Adverarial Examples via Diffusion Models
ScoreAdv: Score-based Targeted Generation of Natural \ Adverarial Examples via Diffusion Models

Chihan Huang, Hao Tang

Under Review 2025

Despite the remarkable success of deep learning across various domains, these models remain vulnerable to adversarial attacks. Although many existing adversarial attack methods achieve high success rates, they typically rely on $\ell_{p}$-norm perturbation constraints, which do not align with human perceptual capabilities. Consequently, researchers have shifted their focus toward generating natural, unrestricted adversarial examples (UAEs). Traditional approaches using GANs suffer from inherent limitations, such as poor image quality due to the instability and mode collapse of GANs. Meanwhile, diffusion models have been employed for UAE generation, but they still predominantly rely on iterative PGD perturbation injection, without fully leveraging the denoising capabilities that are central to the diffusion model. In this paper, we introduce a novel approach for generating UAEs based on diffusion models, named ScoreAdv. This method incorporates an interpretable adversarial guidance mechanism to gradually shift the sampling distribution towards the adversarial distribution, while using an interpretable saliency map technique to inject the visual information of a reference image into the generated samples. Notably, our method is capable of generating an unlimited number of natural adversarial examples and can attack not only image classification models but also image recognition and retrieval models. We conduct extensive experiments on the ImageNet and CelebA datasets, validating the performance of ScoreAdv across ten target models in both black-box and white-box settings. Our results demonstrate that ScoreAdv achieves state-of-the-art attack success rates and image quality. Furthermore, due to the dynamic interplay between denoising and adding adversarial perturbation in the diffusion model, ScoreAdv maintains high performance even when confronted with defense mechanisms, showcasing its robustness.

ScoreAdv: Score-based Targeted Generation of Natural \ Adverarial Examples via Diffusion Models

Chihan Huang, Hao Tang

Under Review 2025

Despite the remarkable success of deep learning across various domains, these models remain vulnerable to adversarial attacks. Although many existing adversarial attack methods achieve high success rates, they typically rely on $\ell_{p}$-norm perturbation constraints, which do not align with human perceptual capabilities. Consequently, researchers have shifted their focus toward generating natural, unrestricted adversarial examples (UAEs). Traditional approaches using GANs suffer from inherent limitations, such as poor image quality due to the instability and mode collapse of GANs. Meanwhile, diffusion models have been employed for UAE generation, but they still predominantly rely on iterative PGD perturbation injection, without fully leveraging the denoising capabilities that are central to the diffusion model. In this paper, we introduce a novel approach for generating UAEs based on diffusion models, named ScoreAdv. This method incorporates an interpretable adversarial guidance mechanism to gradually shift the sampling distribution towards the adversarial distribution, while using an interpretable saliency map technique to inject the visual information of a reference image into the generated samples. Notably, our method is capable of generating an unlimited number of natural adversarial examples and can attack not only image classification models but also image recognition and retrieval models. We conduct extensive experiments on the ImageNet and CelebA datasets, validating the performance of ScoreAdv across ten target models in both black-box and white-box settings. Our results demonstrate that ScoreAdv achieves state-of-the-art attack success rates and image quality. Furthermore, due to the dynamic interplay between denoising and adding adversarial perturbation in the diffusion model, ScoreAdv maintains high performance even when confronted with defense mechanisms, showcasing its robustness.

HUANG: A Robust Diffusion Model-based Targeted Adversarial Attack Against Deep Hashing Retrieval
HUANG: A Robust Diffusion Model-based Targeted Adversarial Attack Against Deep Hashing Retrieval

Chihan Huang, Xiaobo Shen

AAAI Conference on Artificial Intelligence (AAAI) 2025 Poster

Deep hashing models have achieved great success in retrieval tasks due to their powerful representation and strong information compression capabilities. However, they inherit the vulnerability of deep neural networks to adversarial perturbations. Attackers can severely impact the retrieval capability of hashing models by adding subtle, carefully crafted adversarial perturbations to benign images, transforming them into adversarial images. Most existing adversarial attacks target image classification models, with few focusing on retrieval models. We propose HUANG, the first targeted adversarial attack algorithm to leverage a diffusion model for hashing retrieval in black-box scenarios. In our approach, adversarial denoising uses adversarial perturbations and residual image to guide the shift from benign to adversarial distribution. Extensive experiments demonstrate the superiority of HUANG across different datasets, achieving state-of-the-art performance in black-box targeted attacks. Additionally, the dynamic interplay between denoising and adding adversarial perturbations in adversarial denoising endows HUANG with exceptional robustness and transferability.

HUANG: A Robust Diffusion Model-based Targeted Adversarial Attack Against Deep Hashing Retrieval

Chihan Huang, Xiaobo Shen

AAAI Conference on Artificial Intelligence (AAAI) 2025 Poster

Deep hashing models have achieved great success in retrieval tasks due to their powerful representation and strong information compression capabilities. However, they inherit the vulnerability of deep neural networks to adversarial perturbations. Attackers can severely impact the retrieval capability of hashing models by adding subtle, carefully crafted adversarial perturbations to benign images, transforming them into adversarial images. Most existing adversarial attacks target image classification models, with few focusing on retrieval models. We propose HUANG, the first targeted adversarial attack algorithm to leverage a diffusion model for hashing retrieval in black-box scenarios. In our approach, adversarial denoising uses adversarial perturbations and residual image to guide the shift from benign to adversarial distribution. Extensive experiments demonstrate the superiority of HUANG across different datasets, achieving state-of-the-art performance in black-box targeted attacks. Additionally, the dynamic interplay between denoising and adding adversarial perturbations in adversarial denoising endows HUANG with exceptional robustness and transferability.

Efficient Multi-branch Black-box Semantic-aware Targeted Attack Against Deep Hashing Retrieval
Efficient Multi-branch Black-box Semantic-aware Targeted Attack Against Deep Hashing Retrieval

Chihan Huang, Xiaobo Shen

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025 Poster

Deep hashing have achieved exceptional performance in retrieval tasks due to their robust representational capabilities. However, they inherit the vulnerability of deep neural networks to adversarial attacks. These models are susceptible to finely crafted adversarial perturbations that can lead them to return incorrect retrieval results. Although numerous adversarial attack methods have been proposed, there has been a scarcity of research focusing on targeted black-box attacks against deep hashing models. We introduce the Efficient Multi-branch Black-box Semantic-aware Targeted Attack against Deep Hashing Retrieval (EmbSTar), capable of executing targeted black-box attacks on hashing models. Initially, we distill the target model to create a knockoff model. Subsequently, we devised novel Target Fusion and Target Adaptation modules to integrate and enhance the semantic information of the target label and image. Knockoff model is then utilized to align the adversarial image more closely with the target image semantically. With the knockoff model, we can obtain powerful targeted attacks with few queries. Extensive experiments demonstrate that EmbSTar significantly surpasses previous models in its targeted attack capabilities, achieving SOTA performance for targeted black-box attacks.

Efficient Multi-branch Black-box Semantic-aware Targeted Attack Against Deep Hashing Retrieval

Chihan Huang, Xiaobo Shen

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025 Poster

Deep hashing have achieved exceptional performance in retrieval tasks due to their robust representational capabilities. However, they inherit the vulnerability of deep neural networks to adversarial attacks. These models are susceptible to finely crafted adversarial perturbations that can lead them to return incorrect retrieval results. Although numerous adversarial attack methods have been proposed, there has been a scarcity of research focusing on targeted black-box attacks against deep hashing models. We introduce the Efficient Multi-branch Black-box Semantic-aware Targeted Attack against Deep Hashing Retrieval (EmbSTar), capable of executing targeted black-box attacks on hashing models. Initially, we distill the target model to create a knockoff model. Subsequently, we devised novel Target Fusion and Target Adaptation modules to integrate and enhance the semantic information of the target label and image. Knockoff model is then utilized to align the adversarial image more closely with the target image semantically. With the knockoff model, we can obtain powerful targeted attacks with few queries. Extensive experiments demonstrate that EmbSTar significantly surpasses previous models in its targeted attack capabilities, achieving SOTA performance for targeted black-box attacks.

PoemBERT: A Dynamic Masking Content and Ratio Based Semantic Language Model For Chinese Poem Generation
PoemBERT: A Dynamic Masking Content and Ratio Based Semantic Language Model For Chinese Poem Generation

Chihan Huang, Xiaobo Shen

International Conference on Computational Linguistics (COLING) 2025 Poster

Ancient Chinese poetry stands as a crucial treasure in Chinese culture. To address the absence of pre-trained models for ancient poetry, we introduced PoemBERT, a BERT-based model utilizing a corpus of classical Chinese poetry. Recognizing the unique emotional depth and linguistic precision of poetry, we incorporated sentiment and pinyin embeddings into the model, enhancing its sensitivity to emotional information and addressing challenges posed by the phenomenon of multiple pronunciations for the same Chinese character. Additionally, we proposed Character Importance-based masking and dynamic masking strategies, significantly augmenting the model's capability to extract imagery-related features and handle poetry-specific information. Fine-tuning our PoemBERT model on various downstream tasks, including poem generation and sentiment classification, resulted in state-of-the-art performance in both automatic and manual evaluations. We provided explanations for the selection of the dynamic masking rate strategy and proposed a solution to the issue of a small dataset size.

PoemBERT: A Dynamic Masking Content and Ratio Based Semantic Language Model For Chinese Poem Generation

Chihan Huang, Xiaobo Shen

International Conference on Computational Linguistics (COLING) 2025 Poster

Ancient Chinese poetry stands as a crucial treasure in Chinese culture. To address the absence of pre-trained models for ancient poetry, we introduced PoemBERT, a BERT-based model utilizing a corpus of classical Chinese poetry. Recognizing the unique emotional depth and linguistic precision of poetry, we incorporated sentiment and pinyin embeddings into the model, enhancing its sensitivity to emotional information and addressing challenges posed by the phenomenon of multiple pronunciations for the same Chinese character. Additionally, we proposed Character Importance-based masking and dynamic masking strategies, significantly augmenting the model's capability to extract imagery-related features and handle poetry-specific information. Fine-tuning our PoemBERT model on various downstream tasks, including poem generation and sentiment classification, resulted in state-of-the-art performance in both automatic and manual evaluations. We provided explanations for the selection of the dynamic masking rate strategy and proposed a solution to the issue of a small dataset size.

2024

Progressive Artistic Aesthetic Enhancement For Chinese Ink Painting Style Transfer
Progressive Artistic Aesthetic Enhancement For Chinese Ink Painting Style Transfer

Chihan Huang

European Conference on Artificial Intelligence (ECAI) 2024 Poster

The translation of artistic style is a challenging yet crucial task for both computer vision and the arts, and the unique attributes of Chinese ink painting—such as its use of negative space, brushwork, ink diffusion, and more—present significant challenges to the application of existing style transfer algorithms. In response to these distinctive characteristics, we propose a progressive artistic aethetic ink painting style transfer method. The progressive multi-scale aesthetic style attention module in the network leverages the complementary benefits of shallow and deep style information to progressively fuse style features across multiple scales. The covariance transform fusion module addresses issues of stylistic disharmony and enhances the aesthetic quality of the style transfer while preserving the content structure effectively. Additionally, we have developed adaptive spatial interpolation module for detailed information finetuning. Finally, we conducted comparative experiments with previous studies as well as ablation studies, and invited 30 experts in art and design to perform manual evaluations. The results demonstrate that our method can achieve more aesthetically pleasing Chinese ink painting style transfers, confirming its effectiveness and artistic integrity.

Progressive Artistic Aesthetic Enhancement For Chinese Ink Painting Style Transfer

Chihan Huang

European Conference on Artificial Intelligence (ECAI) 2024 Poster

The translation of artistic style is a challenging yet crucial task for both computer vision and the arts, and the unique attributes of Chinese ink painting—such as its use of negative space, brushwork, ink diffusion, and more—present significant challenges to the application of existing style transfer algorithms. In response to these distinctive characteristics, we propose a progressive artistic aethetic ink painting style transfer method. The progressive multi-scale aesthetic style attention module in the network leverages the complementary benefits of shallow and deep style information to progressively fuse style features across multiple scales. The covariance transform fusion module addresses issues of stylistic disharmony and enhances the aesthetic quality of the style transfer while preserving the content structure effectively. Additionally, we have developed adaptive spatial interpolation module for detailed information finetuning. Finally, we conducted comparative experiments with previous studies as well as ablation studies, and invited 30 experts in art and design to perform manual evaluations. The results demonstrate that our method can achieve more aesthetically pleasing Chinese ink painting style transfers, confirming its effectiveness and artistic integrity.

Data-Driven Lightweight Design of Bumper Based on Multi-methods
Data-Driven Lightweight Design of Bumper Based on Multi-methods

Chihan Huang

International Journal of Crashworthiness 2024

The lightweight of the bumper is conducive to reducing fuel consumption and pollution emission. How to better design and choose a better method in lightweight design was a problem that researchers needed to solve urgently. This paper established a finite element model of an automobile bumper and used the LS-DYNA system for a collision simulation. First, the shape of the bumper is optimised. Then, this paper selects bumper beam thickness, energy absorber thickness, thickened area width and thickened area thickness as design variables to construct the response surface model and the R2 of each model is larger than 0.9. Afterwards, the NSGA-II with Bayesian optimisation is adopted to realise the lightweight design. Finally, the results are simulated and compared with the original bumper. The results convey that the optimised bumper gains a weight reduction of 53.96% while improving crashworthiness and energy absorption.

Data-Driven Lightweight Design of Bumper Based on Multi-methods

Chihan Huang

International Journal of Crashworthiness 2024

The lightweight of the bumper is conducive to reducing fuel consumption and pollution emission. How to better design and choose a better method in lightweight design was a problem that researchers needed to solve urgently. This paper established a finite element model of an automobile bumper and used the LS-DYNA system for a collision simulation. First, the shape of the bumper is optimised. Then, this paper selects bumper beam thickness, energy absorber thickness, thickened area width and thickened area thickness as design variables to construct the response surface model and the R2 of each model is larger than 0.9. Afterwards, the NSGA-II with Bayesian optimisation is adopted to realise the lightweight design. Finally, the results are simulated and compared with the original bumper. The results convey that the optimised bumper gains a weight reduction of 53.96% while improving crashworthiness and energy absorption.

基于融合注意力和特征增强的跨模态行人重识别

黄驰涵, 沈肖波

南京信息工程大学学报 2024

跨模态行人重识别是一项具有挑战性的任务,目的是在可见光和红外模式之间匹配行人图像,以便在犯罪调查和智能视频监控应用中发挥重要作用.为了解决跨模态行人重识别任务中对细粒度特征提取能力不强的问题,本文提出一种基于融合注意力和特征增强的行人重识别模型.首先,利用自动数据增强技术缓解不同摄像机的视角、尺度差异,并基于交叉注意力多尺度Vision Transformer,通过处理多尺度特征生成具有更强区分性的特征表示;接着,提出通道注意力和空间注意力机制,在融合可见光和红外图像特征时学习对区分特征重要的信息;最后,设计损失函数,采用基于自适应权重的难三元组损失,增强了每个样本之间的相关性,提高了可见光和红外图像对不同行人的识别能力.在SYSU-MM01和RegDB数据集上进行大量实验,结果表明,本文提出方法的mAP分别达到了68.05%和85.19%,相较之前的工作性能有所提升,且通过消融实验和对比分析验证了本文模型的先进性和有效性.

基于融合注意力和特征增强的跨模态行人重识别

黄驰涵, 沈肖波

南京信息工程大学学报 2024

跨模态行人重识别是一项具有挑战性的任务,目的是在可见光和红外模式之间匹配行人图像,以便在犯罪调查和智能视频监控应用中发挥重要作用.为了解决跨模态行人重识别任务中对细粒度特征提取能力不强的问题,本文提出一种基于融合注意力和特征增强的行人重识别模型.首先,利用自动数据增强技术缓解不同摄像机的视角、尺度差异,并基于交叉注意力多尺度Vision Transformer,通过处理多尺度特征生成具有更强区分性的特征表示;接着,提出通道注意力和空间注意力机制,在融合可见光和红外图像特征时学习对区分特征重要的信息;最后,设计损失函数,采用基于自适应权重的难三元组损失,增强了每个样本之间的相关性,提高了可见光和红外图像对不同行人的识别能力.在SYSU-MM01和RegDB数据集上进行大量实验,结果表明,本文提出方法的mAP分别达到了68.05%和85.19%,相较之前的工作性能有所提升,且通过消融实验和对比分析验证了本文模型的先进性和有效性.

Optimisation Design of Child Seat Based on Chaos Mutation MOPSO

Chihan Huang

International Journal of Crashworthiness 2024

To improve the safety of the children in child seats when a vehicle collision occurs, a finite element model of a child safety seat is established, and LS-DYNA is used for crash simulation. Select backrest inclination, headrest inclination, and seat belt centre height as design variables to establish response surface models. The application of chaotic mutation theory in particle swarm optimisation (PSO) is proposed, and the optimal solution of child seat parameters is obtained by using the chaotic mutation based grouped multi-objective particle swarm optimisation (MOPSO). The optimal solution was used to carry out the simulation experiment again, and compared with the original performance. It turned out that the Head Performance Criterion (HPC) is 47.08% better than the original child seat. Besides, the rib deformation is reduced by 21.30%, the neck shear stress is reduced by 55.74%, and the thigh stress is reduced by 48.55%, which indicate significant improvements. The results suggest that a certain parameter combination of the child seat leads to the optimal overall performance in child injury prevention, which indicates significant improvement in the damage of all parts of the child.

Optimisation Design of Child Seat Based on Chaos Mutation MOPSO

Chihan Huang

International Journal of Crashworthiness 2024

To improve the safety of the children in child seats when a vehicle collision occurs, a finite element model of a child safety seat is established, and LS-DYNA is used for crash simulation. Select backrest inclination, headrest inclination, and seat belt centre height as design variables to establish response surface models. The application of chaotic mutation theory in particle swarm optimisation (PSO) is proposed, and the optimal solution of child seat parameters is obtained by using the chaotic mutation based grouped multi-objective particle swarm optimisation (MOPSO). The optimal solution was used to carry out the simulation experiment again, and compared with the original performance. It turned out that the Head Performance Criterion (HPC) is 47.08% better than the original child seat. Besides, the rib deformation is reduced by 21.30%, the neck shear stress is reduced by 55.74%, and the thigh stress is reduced by 48.55%, which indicate significant improvements. The results suggest that a certain parameter combination of the child seat leads to the optimal overall performance in child injury prevention, which indicates significant improvement in the damage of all parts of the child.