Theoretical:
Generalization:

Deep learning models have seen enormous success in mining data in a supervised learning paradigm. However, due to the large amount of data required to train these models, their success is limited to applications for which such datasets are available. The unavailability (or limited availability) of large labeled datasets mainly arise from a combination of i) the enormous cost, time and labor effort required to collect the data in generous (e.g. acoustic signals to study autism), ii) the lack of domain expertise to label the collected data (e.g. medical domain), or iii) having insufficient time to label the data for real-world application (e.g. early detection of sepsis, fraud detection, cybersecurity).

Multiple computational solutions have been proposed to work around the bottleneck of obtaining labeled training data. These include active learning, semi-supervised learning, and transfer learning. However, all these learning paradigms require some amount of labeled training data. Two other possible solutions are:

i)Weak Supervision. This is a technique, where instead of requesting domain experts to manually label data, we ask for descriptions of the relevant patterns/classes. Crowdsourcing or multiple heuristics provide imprecise/inaccurate labels for the data, which are then combined by some probabilistic machine learning, or other computational models to produce a single set of clean labels for the originally unlabelled data. The prime purpose here is to develop heuristics for providing labels and computational models for combining them.

ii) Knowledge Incorporation. Is it possible to encode knowledge into machine learning/AI models through literature search, or information in books so that it can learn, without depending on collected, labeled training data? The prime purpose here is to develop knowledge encodings that a machine can recognize.

Interpretability:

In computer science, optimizing run-time and use of memory space for an algorithm is just as important as obtaining robust and accurate results. And this is often achieved at the expense of explainable computation. However, in the current age of inter-disciplinary research, explainable computation is becoming necessary to promote the trustworthiness and reliability of computer-aided systems. In the intersection of computer science and other fields, we define comprehensive interpretability as the ability of a computational model to describe its decisions not only to a computational expert or programmer but also to any client, who rely on computer-aided systems. Thus, the overall encompassing aim of this lab is to:
i) increase the interpretability of current computational algorithm (especially those used in machine learning and AI)
ii) develop new explainable algorithms, and
iii) develop knowledgebase libraries to improve comprehensive interpretability.

 

Applications:

Some area of applications that the lab currently focuses on:

i) Clinical Informatics: Using clinical data (heart rate, glucose level, EEG, EMG, etc.) for explainable diagnosis, prediction, prevention, and synthesis

ii) Neuro-Informatics: Using images from PET scans, MRI, fMRI to study the structure and functions of the brain as it relates to various diseases, disorders, etc.

iii) Social Informatics: Predicting and explaining socio-economic factors as they relate to health

iv) Bioinformatics: Protein structure and folding; Genomics