*Mathematical Foundations of Data Science* – My research is on the mathematical foundations of data science, particularly on the design and analysis of algorithms that are provably effective for exploiting information from available data and the results constraints. In the big data regime, surprisingly, many of the problems we are interested in actually are ill-posed in the classical sense because many of the data we collected are actually very messy, contains lots of missing entries.

The number of parameters we want to recover sometimes can be much higher than the number of observations we have. In order to solve such ill-posed problems, we have to be able to exploit prior information from the data. Fortunately, many high-dimensional data actually has interesting low-dimensional structures we can exploit.

My research, primarily, is on the design of efficient data representations that can employ such structures in the algorithm. And then we try to design provable performance guarantees on these algorithms. Our research actually has led to new algorithms for several interesting imaging modalities that are very important in science and engineering applications.

One example is an imaging technology called super resolution fluorescence microscopy. With the collaborators we have, the domain experts, we actually developed new algorithms that can be able to achieve much higher resolution, both in space and in time, using the same available data, but also using fewer computation resources.

The other example is phase retrieval, which is another interesting imaging modality that has a lot of applications. In, for example, crystallography and astronomy, we have designed new algorithms for the phase retrieval problem, which are much more scalable, much more robust than existing solutions. So many of the big data algorithms people use in practice are heuristic algorithms.

It has been very successful, but lacks theoretical understanding. So while our goal is trying to understand why such simple heuristics work so well in practice, can we theoretically analyze why they are so successful? And hopefully, our analysis can potentially shed light on how to design new algorithms that can improve their practical success. So I imagine data science to be a truly interdisciplinary field.

But to do this, we actually have to work very closely with domain experts in different engineering domains, science domains, where they actually have the data, and then work with theoretical science people from many different fields in statistics, mathematics, electrical engineering, and computer science.

Useful links:

reference – *Mathematical Foundations of Data Science*

Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!