A Comparative Study of Text-to-Image Synthesis Techniques Using Generative Adversarial Networks
Faculty of Engineering - Informatics - Al-Wataniya Private University
Abstract:
Text-to-image synthesis using Generative Adversarial Networks (GANs) has become a pivotal area of research, offering significant potential in automated content generation and multimodal understanding. This study provides a comparative evaluation of six prominent GAN-based models—namely, the foundational work by Reed et al., StackGAN, AttnGAN, MirrorGAN, MimicGAN, and In-domain GAN Inversion—applied to a standardized dataset under consistent conditions. The analysis focused on four key performance dimensions: visual quality, semantic alignment between text and image, training stability, and robustness to noise in textual input. The results reveal a clear progression in model capability over time. While early models laid essential groundwork, they were limited in resolution and semantic coherence. Subsequent models introduced architectural innovations such as multi-stage generation, attention mechanisms, and semantic feedback loops, which significantly enhanced image fidelity and alignment with textual descriptions. Notably, AttnGAN and MirrorGAN achieved strong alignment performance due to their integration of attention and redescription modules, respectively. MimicGAN demonstrated superior robustness to noisy or ambiguous inputs, addressing a critical gap in earlier approaches. In contrast, In-domain GAN Inversion, though not a traditional text-to-image method, offered high image quality and valuable insights for latent-space manipulation. Overall, the comparative findings emphasize the trade-offs between model complexity and performance gains. Advances in attention, robustness, and semantic feedback have led to more reliable and realistic image synthesis. This study contributes a structured overview of current approaches and identifies pathways for future research aimed at balancing accuracy, interpretability, and generalizability in text-to-image systems.
It was established in 2007 and includes six colleges: