Phishing Website URL’s Detection Using NLP and Machine Learning Techniques

EasyChair Preprint 14123

13 pages•Date: July 25, 2024

Abstract

Phishing attacks continue to pose a significant threat to internet users, with cybercriminals constantly devising new methods to trick individuals into disclosing sensitive information or installing malware. Traditional approaches to phishing detection, such as blacklists and heuristic-based methods, have proven to be limited in their effectiveness, as they struggle to keep up with the evolving tactics of phishers. This research paper proposes a novel approach to detecting phishing websites using natural language processing (NLP) and machine learning techniques.

The proposed method involves a comprehensive analysis of URL components, leveraging NLP techniques to extract lexical, semantic, and sentiment-based features from the URLs. These features are then used to train various supervised and unsupervised machine learning models, including logistic regression, support vector machines (SVMs), random forests, and ensemble methods. The performance of the models is evaluated on a large dataset of legitimate and phishing URLs, using metrics such as accuracy, precision, recall, and F1-score.

The results demonstrate that the combination of NLP and machine learning outperforms traditional phishing detection methods, achieving an accuracy of over 95% in identifying phishing websites. The analysis of the most informative features reveals that both lexical and semantic aspects of the URL are crucial in distinguishing legitimate and phishing websites. The proposed approach also shows promising results in detecting novel, previously unseen phishing attempts, highlighting its potential to be a valuable tool in the ongoing battle against cybercrime.

Keyphrases: Cybercriminals, NLP techniques, Traditional Approaches, machine learning

Links:

https://easychair.org/publications/preprint/wqxJ

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:14123,
  author    = {John Owen},
  title     = {Phishing Website URL’s Detection Using NLP and Machine Learning Techniques},
  howpublished = {EasyChair Preprint 14123},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser