Large Language Models (LLMs) have demonstrated significant potential across various domains, including digital security. Recent studies have explored their ability to assist in mitigating code vulnerabilities; however, their effectiveness in mobile application security analysis remains an open question. This work aims to systematically evaluate the capabilities of LLMs in two key areas: mobile code vulnerability detection and malware family classification. First, we assess the performance of multiple LLMs in identifying vulnerabilities in Android code using an open dataset of over 100 vulnerable code samples from the Open Worldwide Application Security Project (OWASP). The evaluation focuses on the models’ ability to detect security flaws and verify whether the permissions declared in an application’s manifest file align with its actual behavior. This analysis will provide insights into the accuracy, strengths, and limitations of different LLMs in static code security assessment. Second, we investigate the potential of LLMs for malware family classification. Using a compiled dataset of malware samples labeled by family, we evaluate whether LLMs can accurately categorize malicious applications into their respective families. The study examines each model’s classification accuracy, consistency, and effectiveness in identifying distinguishing characteristics of malware families. The findings of this research will provide a comprehensive understanding of LLMs’ effectiveness performance in mobile security analysis, highlighting their strengths, limitations, and areas for improvement. Also, the findings will contribute to the growing body of knowledge on AI-driven security solutions and help to advance and automate mobile threat detection.