Research


My research centered on Artificial Intelligence specifically:
  • Statistical Machine Learning in particular its application to real world challenges. I have explored statistical machine learning solutions to natural language classification tasks, including spam filtering and document classification.
  • Active Learning This subset of Artificial Intelligence examines the specific problem of applying supervised learning algorithms in domains where labelled data is scarce. While supervised learning techniques reduce the overall burden of producing models for classification, there are many domains where the act of collecting sufficiently large training data is prohibitive (for example in a domain where the correct label can only be found through expensive tests). Active Learning addresses this problem by allowing the model to select its training data from a pool (or stream) of unlabelled data. Only those selected examples require labelling, giving orders of magnitude reductions in labelling effort. The main benefits is the reducing the labelling effort, but active learning can also be used to track concept drift.
  • Sentiment Analysis is an interesting classification task, in which the labels we wish to predict are the opinion (positive or negative) expressed in natural language text. This is an emerging research area which has many new and interesting applications. I have worked at examining and predicting the polarity of financial blog data. Being able to automatically predict the polarity of text has direct applicability to marketing and advertisement campaigns.
  • Dimensionality Reduction is a machine learning technique whereby high-dimensional data is represented by a much smaller set of features yet still retains the salient information. High-dimensional data is a particular problem in text classification tasks where the number of features used to represent documents can range from tens of thousands to hundreds of thousands. Dimensionality reduction is a useful tool in reducing the size of the data, giving significant saving in both computational and memory requirements of a classification system.
  • Reinforcement Learning is an alternative way of training classification systems, whereby instead of supplying labelled training data, the learner is given rewards or punishments based on its actions. I investigated reinforcement learning techniques for use in active learning whereby a number of alternative strategies were available to the learner in each iteration of active learning. The learner utilised reinforcement learning techniques to adaptively choose between the alternative strategies based on their relative performance. The goal was to track, as close as possible, the optimal strategy in each iteration of active learning.