CoNLL'2000 - Abstract

Combining text and heuristics
for cost-sensitive spam filtering

José María Gómez Hidalgo*, Manuel J. Maña López**, and Enrique Puertas Sanz***

* Departamento de Inteligencia Artificial, Universidad Europea de Madrid – CEES (Spain)

email: jmgomez@dinar.esi.uem.es

** Departamento de Lenguajes y Sistemas Informáticos, Universidad de Vigo (Spain)

email: mjlopez@uvigo.es

*** Escuela Superior de Informática, Universidad Europea de Madrid – CEES (Spain)

email: epsilonmail@retemail.es

ABSTRACT

Spam filtering is a text categorization task that shows special features that make it interesting and difficult. First, the task has been performed traditionally using heuristics from the domain. Second, a cost model is required to avoid misclassification of legitimate messages. We present a comparative evaluation of several machine learning algorithms applied to spam filtering, considering the text of the messages and a set of heuristics for the task. Cost-oriented biasing and evaluation is performed.