### Statistical Learning with Similarity and Dissimilarity Functions

* MPI Series in Biological Cybernetics, Bd. 10*

### Ulrike von Luxburg

#####
ISBN 978-3-8325-0767-1

166 pages, year of publication: 2004

price: 40.50 €

This work explores statistical properties of machine learning algorithms from different perspectives. Questions arising both in the fields of supervised and unsupervised learning, dealing with diverse issues such as the convergence of algorithms, the speed of convergence, generalization bounds, and how statistical properties can be used in practical machine learning applications are investigated. All topics covered have the common feature that the properties of the similarity or dissimilarity function on the data play an important role.

Learning is the process of inferring general rules from given examples. The examples are instances of some input space (pattern space), and the rules can consist of some general observation about the structure of the input space, or have the form of a functional dependancy between the input and some output space. Two types of learning problems are considered: classification and clustering. In both problems, the goal is to divide the input space into several regions such that objects within the same region "belong together" and "are different" from the objects in the other regions. The difference between the two problems is that classification is a supervised learning technique while clustering is unsupervised.

Machine learning algorithms are usually designed to deal with either similarities or dissimilarities. In general it is recommended to close an algorithm which can deal with the type of data given, but sometimes it may become necessary to convert similarities into dissimilarities or vice versa. In some situations this can be done without loosing information, especially if the similarities and distances are defined by a scalar product in an Euclidean space. If this is not the case, several heuristics can be invoked. The general idea is to transform a similarity into a dissimilarity function or vice versa by applying a monotonically decreasing function. This is according to the general intuition that a distance is small if the similarity is large, and vice versa. The connection between information theory and learning can be exploited in every-day machine learning applications.