Current 3D style transfer methods usually depend on a reference image to incorporate style into 3D content. However, in many practical applications, users also look forward to transferring styles through textual imagery, instead of being limited to a fixed reference image. To address this limitation, we introduce a novel real-time styling technique M2StyleGS. It utilizes the multimodal knowledge refined by CLIP to generate a sequence of precisely color-mapped novel views to achieve instant 3D style transfer with 3D Gaussian Splatting as a backbone. M2StyleGS resolves the abnormal transformation by employing a precise feature alignment process termed "subdivisive flow", which accurately projects the style domain of the mapped CLIP feature to the style domain of the VGG feature. Additionally, we introduce auxiliary observation loss and suppression loss to enhance visual effects. By integrating these technologies, M2StyleGS can use text or images as style references to generate a series of style-enhanced novel views. Our experimental results indicate that M2StyleGS surpasses our baseline ConRF by up to 25% in terms of consistency and visual quality, measured by RMSE and LPIPS.