# Research profile

I have a strong interest in applications of algebraic topology, and in particular the applications that come through the techniques of persistent homology. Even so, I have a pure mathematics background, with more pure algebra than most of the active researchers in the computational and applied algebraic topology field. As such, I am uniquely poised to work with, in a sense, knowledge transfer.

On the one hand, there is much to be gained applying algebraic insights and techniques on computational topology and even in computational geometry. Several of my currently ongoing projects deal with this, most notably the parallelization work with Primoz Skraba and David Lipsky, and a spin-off from that focused on the concrete use of computational algebra to handle persistent homology of chain complexes with torsion.

On the other hand, a rich source of research that I intend to investigate is the reflow of insights from the applications and computations field back into pure mathematics. A preprint by Simon King and Graham Ellis http://arXiv.org/abs/1006.2237 investigates and evaluates persistent homology of the central subgroup series of finite groups as a group invariant. Similarly, the use of barcodes as a tool in homological algebra seems a fruitful area to mine. For another reflow approach, I am involved in a recently started project jointly with Jon Hauenstein (Texas A&M), Chris Peterson (Colorado State), David Eklund and Martina Scolamiero (KTH), and Primoz Skraba dedicated to using the techniques of persistent homology to compute topological invariants of algebraic varieties.

**Concrete approaches:** A number of immediate approach plans are already evident to me. For the flow from pure to applied, I am deeply interested in approaching the foundational aspects of persistent homology. There are ways to interpret persistent homology as being homology computed with coefficients in the ring *k[t]*, for a field *k*. This interpretation can guide both algorithms and methods in far deeper ways than it currently does. Furthermore, there is an interpretation of persistent homology as being the internal simplicial homology theory of a topos of sheaves over a particular site — and this approach would be deeply interesting to investigate for further results.

In the other direction, some immediate approaches are to pick up on the work by King and Ellis, and investigate filtered objects in homological algebra and the usefulness of the persistence algorithms and methods on these; as well as taking areas of algebraic geometry and topological combinatorics where the objects can reasonably be *sampled*, and using persistent homology to estimate topological features for further use within the respective fields.

**Grant plans:** To fund my research and that of students and
guests, I plan to apply for a number of funding sources. I have
funding already through the project Toposys, funded under the
7th Framework Programme, jointly with Primoz Skraba (Slovenia),
Herbert Edelsbrunner (Austria), Danica Kragic (KTH), Robert Adler
(Israel), and Marion Mrozek (Poland). Furthermore, I expect to put
together and submit a proposal to ERC for a starting researcher grant
— I submitted one proposal in 2010. I have been submitting project grant
proposals yearly since 2010.

**Dissemination plans:** The research intended to apply pure insights on applied themes is most appropritely published in the fields where they will most easily reach their intended audiences. This means, concretely, that the chief publication venues will be ACM Symposium on Computational Geometry, Discrete and Computational Geometry, as well as machine learning venues and even the high profile science journals such as Science, or Proceedings of the National Academy of Science.

As for the application of applied insights on pure themes, these will be most appropriately oriented after the area the insights are applied in. I expect Journal of Pure and Applied Algebra, Geometry and Topology and similar journals to be good starting points.

In addition to the pure publication aspects, there will be a number of highly interesting conferences to disseminate through. The biennial Algebraic Topology: Methods, Computation and Science meetings are a mainstay of the field, as are the theoretical computer science conferences ACM Symposium on Computational Geometry, IEEE Visualization, ACM Symposium on Discrete Algorithms and IEEE Symposium on Foundations of Computer Science.

## Work packages and timelines

Over the next few years, there are several already articulated projects I will work on. With external funding in place, I would build a group of myself, a postdoctoral researcher and 1—2 PhD students, and likely accelerate some of the time estimates below as a result.

**Statistics in TDA** In a project with a CUNY undergraduate student, Alisa Leshchenko, we have
explored statistical methods to study the quality of Mapper models.

From these studies I have also started a research project, joint with Prof. Sayan Mukherjee into the statistics of multiple hypothesis testing in TDA.

**Topological process diagnostics** Jointly with Pär Jonsson and our joint students Mattia de Colle (MSc 2016) and Leo Carlsson (MSc 2015, PhD ongoing), we are studying applications of Mapper to process diagnostics: by including measurement error in the parameters of a Mapper analysis, we produce geometric models for failure modes of industrial predictive algorithmics.

**Point processes and topological methods** My collaboration with Jonathan Peters calls for spatial point process statistics, with data sizes and data structures that fall outside the main track of contemporary spatial statistics. We plan to study potential extensions of classical meth- ods and explore potential applications of persistent homology, alpha-shapes and Mapper to the construction of new statistical approaches and invariants. This builds on recent results by Biscio and Møller.

**Mathematical Software** I have been working extensively at creating and maintaining mathemat- ical software. My software projects include:

- An implementation of Dotsenko and Khoroshkhin’s definition of Operadic Gröbner Bases in Haskell,
- the jPlex and JavaPlex Java-based libraries for persistent homology and topological data anal- ysis,
- modules for the Computer Algebra System Magma that implement A∞ algebra computations and basic persistent homology algorithms,
- modules for the Computer Algebra System GAP that implement persistent homology algorithms.

JavaPlex is overdue for an update - I am currently pursuing funding for an overhaul of the code base.

**Statistics** In a more classical statistics consulting role, I work with Prof. Fumio Someki on statis- tical validation of translated rating scales and diagnostic instruments for ADHD. We have one paper published from our collaboration so far, another paper under review and two more currently being written.

**Commutative algebra in persistent homology** The interpretation
of persistent homology as homology with coordinates in *k[t]* induces new algorithms and new constructions of known results
in the field. was accepted for publication in 2013,
and we see a number of followup directions of research.

**Topos-theory in persistent homology** Persistent homology as internal to a particular topos of sheaves is an idea that requires some work before its implications can be clearly seen. At my current rate of approach, this project will be publishable by 2015.

**Parallelization of persistent homology** The recovery of a parallelization scheme from spectral sequences turned out to be more difficult than expected. Some first results were on the arXiv already in 2011, but a report solving the problem completely jointly with David Lipsky, Dmitriy Morozov, and Primoz Skraba will likely not appear earlier than 2015. Tuning and investigating the algorithms and reaping all follow-up results from this will likely generate work for years to come.

**Persistent homology of algebraic varieties** This project, while already started, generates new questions and the need for deeper research — both on the algebraic geometry side, where a dense sampling paradigm for algebraic varieties has turned out to be a crucial aspect, and on the topology side, where the need for faster and stronger algorithms has been made clear by our initial experiments. The project might yield output during 2015—2016.

**Circle-valued coordinates in signals** Jointly with Primoz Skraba
and Vin de Silva, we are investigating applications of circle-valued
coordinates and persistent cohomology to periodic signals. First
results were presented at a NIPS workshop in 2012, and we are
currently finalizing a more complete report.

This direction has also spawned work jointly with Florian Pokorny, Primoz Skraba, and Danica Kragic on analyzing bipedal locomotion with circular coordinates. A journal paper is currently undergoing peer review.

**Opportunistic signals** Jointly with Robert Ghrist, Michael
Robinson and Justin Curry (all at the University of Pennsylvania),
we investigate topological coordinatization methods using
opportunistic signals — geo-location with wireless network signal
intensities has in initial experiments given very promising
results. We hope to have a detailed technical paper during 2015.

**Topological Machine Learning** I have been increasingly
interested in applications of techniques from topological data
analysis to machine learning. I have been working out potential
applications of Mapper to automatically propose model
spaces in semantics together with Jussi Karlgren and Hedvig
Kjellström at KTH. Furthermore, jointly with Jesse Berwald (IMA,
University of Minnesota) and Marian Gidea (Yeshiva University), I
have been investigating the uses of betti barcodes as features for
analyzing spatial time series and dynamical systems with machine
learning techniques.

Below, I will expand on some of these projects.

## A-infinity algebras

In my doctoral dissertation I studied the A-infinity-algebra structure
induced on the group cohomology ring H(G,k) from the presentation
of H(G) = Ext_{kG}(k,k) = H(hom_{kG}(pk,pk)) for a
projective resolution pk of the trivial G-module k over the
group ring kG for a field k. This presentation demonstrates that
H(G,k) is the homology of a dg-algebra, and thus by a theorem
proven by Kadeishvili (also proven and extended
by several other authors) has an induced A-infinity-algebra
structure. From the proof by Kadeishvili an algorithm for the
computation of an A-infinity-algebra structure extends.

I proved (*Blackbox computation of
A-infinity algebras*) that the resulting
algorithm can be used to compute A-infinity-structures of a certain
class of dg-algebras in finite time, using extra structure on the
dg-algebra used, and that detection of failure for this approach can
be detected during computation. Furthermore, I provide a condition on
whether the entire A-infinity-algebra structure has already been computed from an
initial fragment of the structure computed.

As an example of the technique, I provide an independent proof that
the structure for a canonical A-infinity-structure on H(C_{n},k) as
described by Dag Madsen in his doctoral thesis is the correct
A-infinity-structure.

I proved that the A-infinity-structure induced
by using the Saneblidze—Umble diagonal construction on the known A-infinity-algebra structures
for H(C_{n},k) and H(C_{m},k) to provide an A-infinity-algebra
structure on H(C_{n} x C_{m},k) has non-trivial operations in two
arities further than previously documented. (*A partial A-infinity structure on the cohomology of CnxCm*)

## Persistent cohomology and circular coordinates

In collaboration with Vin de Silva and Dmitriy Morozov, I studied an
extension of the persistence algorithm to compute persistent
cohomology. In *Persistent Cohomology and Circular coordinates*, we use
this, together with the identification of S^{1} as the Eilenberg-Mac
Lane-space K(ℤ; 1) to convert representative cocycles for
degree 1 persistent cohomology of point clouds into circle-valued
continuous functions on the point clouds. Key in this process is to
smooth the resulting circle-valued function after the conversion in
order to get a (persistently) homotopic coordinate function. We
demonstrate that a smoothing may be done by projection from the space
of real cocycles to the space of Hodge cocycles, and that this
projection corresponds to an L_{2}-optimization problem.

As an extension of the work done on computing persistent cohomology,
we discovered a collection of duality functors connecting persistent
absolute and relative homology and cohomology, as well as an algorithm
for computing persistent cohomology faster than existing methods.
*Dualities in Persistent (Co)Homology*

Together with Vin de Silva and Primoz Skraba, I currently study applications of the circular coordinate approaches to the study of periodic signals and systems. A delay embedding of periodic data will, using the right parameters, provide an embedding of the data stream as an image of a circle into the embedding space. We are able to, using circular coordinates of sample data, analyse periodic data with techniques different and potentially more versatile than existing methods for these analyses. We have presented preliminary results at a NIPS Workshop in 2012, and are currently writing up the details.

Computing cohomology for a dataset from a curve, relative to
everything but a small neighbourhood, gives us a linear coordinatization of
small patches of the curve at a time, allowing for an approach to
curve reconstruction with cohomological methods. Together with Bei
Wang, we're trying to find applications of these ideas in questions
appearing in Wang's research in visualization, with datasets from gait
analysis and other application fields. This research was presented at
the selective computational workshop Vis 2011 (*Branching and circular features in high
dimensional data*).

## Parallelization in persistence

Persistent homology has proven a versatile and powerful tool — in topological data analysis as well as in applications in the sciences. The limitations of the use of persistent homology lies in memory and processing power limitation — and none of the available software packages to compute persistent homology are able to easily use more than one processor at a time. I would like to approach the adaptation of persistent homology to parallel, distributed and co-processor based techniques — to use several cores, several computational nodes and the GPU to compute persistence.

In particular, in work jointly with Primož Škraba (Jožef Stefan institute, Slovenia) and David Lipsky (University of Pennsylvania, USA), we are able to use a classical construction of a spectral sequence generalizing the Mayer-Vietoris long exact sequence to provide a parallelization scheme for persistent homology. We have a preprint joint with Dmitriy Morozov http://arxiv.org/abs/1112.1245.

## Hom-complex coordinatization

At the core of data analysis lies the production of coordinate functions for data sets. As an extension of classical linear techniques, and of the circle-valued techniques discussed above, we would want to produce coordinate functions with arbitrary coordinate spaces.

One approach, pioneered by Yi Ding, uses the topological fact

H_{i}(hom(X,Y)) = ⊕_{p}hom(H_{i}(X),
H_{i+p}(Y))

to associate H_{0}(hom(X,Y)) with
⊕_{p}hom(H_{i}(X),H_{i}(Y)). Using this, we can take any
explicit identification between the representative classes of a
persistence barcode for a point cloud and the classes of a suggested
model space and associate to it an equivalence class of actual maps
between the corresponding chain complexes.

To actually generate good coordinatizations out of this leaves a difficult optimization problem in its wake: while we can pick out a representative cycle in hom(X,Y), it remains to find a homologous cycle that is well suited to interpretation as a coordinate function. On this problem, our research is still ongoing.

## Topological data analysis in politics

Data analysis has plentiful applications in political science. One of
the approaches used is to analyze the shape of the collection of
*vote vectors* given as one vector per member of a particular
parliament and session, with entries numeric encodings of that
member's votes in the rollcalls of the parliamentary sessions.

In a project joint with Gunnar Carlsson, Anders Sandberg, Emil Sköldberg and Primoz Skraba, we try to approach this type of data set, as well as other data sets characterizing parliaments, with topological tools. Primarily, we try to use the Mapper algorithm to provide simplicial models of the data that can be analyzed with high dimension reduction.

The work on using Mapper on politics datasets has been
presented at a NIPS Workshop in 2012
and was published in Scientific Reports (*Extracting insights from the shape of complex data using topology*).

## Parallelized Gröbner bases

In a project joint with Emil Sköldberg and Jason Dusek, we use combinatorial structures in homogenous ideals in multigraded polynomial rings to parallelize the Buchberger algorithm. The combinatorial structure we use allows us tighter control over dependencies between Gröbner basis elements, and allows us to pick out subsets of potential basis elements that can be examined independently of each other.

Once extended to Gröbner bases for homogenous multigraded modules, the approach we are taking will automatically work to compute the rank invariant and other invariants in multi-dimensional persistent homology, opening up topological data analysis to a wider spread of problems.

## Operadic Gröbner bases

Building on the work by Dotsenko and Khoroshkhin, I collaborated
with Vladimir Dotsenko to produce a reference implementation of the
Buchberger algorithm for shuffle operads. The work already done is
published in ICMS and in Séminaire et Congrès (*Implementing Gröbner bases for operads*, *Operadic Gröbner bases: an implementation*). From this
point, further work and some research is needed to go from a
first implementation to an efficient and user-friendly
implementation.

## Persistence in homological algebra

Ellis and King approach group homology with techniques developed in persistent homology. Specifically, they use persistence on the filtration given by the lower central series of a group to seek more homological information for finite groups. Their work indicates that the techniques in persistent homology and topological data analysis may well be transported back into pure mathematics. I would be interested in seeking further applications of the persistence paradigm in algebraic topology, homological algebra and algebraic geometry.

## Persistence topoi

Carlsson and Zomorodian identify persistent homology with coefficients
in a field k as the homology of differential
k[t]-modules. A more fine-grained formalism could be constructed by
picking an appropriate Heyting algebra L of *life times* and
re-deriving the relevant parts of linear algebra and algebraic
topology over the topos of sheaves over L.

Through this approach, the time-dependency or filtration dependency of the simplices and homology classes in persistent homology would be anchored all the way down to set theory, and the hope is that from such a more subtle formalism may provide more interesting approaches both to the computation of persistent homology, and to other, derived invariants.

I outlined this approach in a seminar talk at the IMA in 2013.

## Topological Machine Learning

Jointly with Jussi Karlgren and Hedvig Kjellström at KTH I am planning to investigate using persistent homology on Mapper output spaces to automatically pick out features in the output spaces. A persistence barcode can serve as a catalogue of a collection of feature identifications together with a measure of their saliency for further learning algorithm handling. The space and its homology can be used either as a feature for further algorithms or as a model of its own. There are many details to be investigated in this direction, and the idea is one of my newest and least explored directions of research.

With Jesse Berwald (IMA, University of Minnesota) and Marian Gidea (Yeshiva University) I have started looking at tagging topologically distinct behaviour in time series from spatial dynamical systems. We have a preprint on the subject.