The most concise description of my research program is: to interpret and exploit cross-frequency oscillation data, spatio-temporal wave processes at mesoscopic scales, and inter-regional macroscopic dynamics in functional, computing systems terms. My graduate work pioneered the use of non-stationary coupled maps as abstract models of neural field processing. In contrast to key figures in dynamical neuroscience, I explored the ways that coarse graining and symbolic dynamics could build bridges between dynamical systems and traditional concepts in pattern recognition and machine learning. Generally, the advocates of dynamical systems had an anti-representationalist stance.
Coupled map lattices (CML) were introduced in the statistical mechanics literature. In neural network terms, they are essentially 2D arrays of recurrent networks with a diffusive coupling between local neighborhoods. This coupling is implemented as a convolution kernel; asymmetric kernels are essentially like a single channel convNet followed by a highly nonlinear layer. The function at each lattice site is a logistic map, which is taken to be a model of a balanced network. The convolution step is followed by the application of the logistic map. The settings of only a few parameters give rise to very different phase regimes of pattern formation. The operation of coupling and nonlinearity parameters to produce flows between subspaces differs substantially from the weight training process in more well known artificial neural networks.
If the network parameters are changed over time, the CML dynamics and flow of states between subspaces resembles a recurrent network ‘unrolled in time’; in contrast to the layers with fixed weights in the recent review of deep learning, the weights would vary between layers, and the partitions (histogram bins) of states after a short evolution serves as a layer feeding something like the Siamese network distance computation.
However, in the computer vision work based on this formalism, the ensemble of networks are not identical. A small set of parameters is discovered by genetic algorithms, with a cross entropy term forcing each network to find solutions spread out over the underlying manifold.
After 10 years, a good deal of supporting evidence has accumulated for non-stationarity and functional roles for interacting frequencies (theta-gamma), but there is still not a workable interpretive framework explaining how spike or local field phase interactions correlated with task performance actually work within and across regions. One task would be to demonstrate that detailed spike level interactions between columns may be considered as operators in a more abstract lattice dynamics formalism and to devise experiments to verify this. For example, we might seek to demonstrate that phase synchronization from an external region projecting across a spatial array of columns is equivalent to increasing the lateral coupling of those columns, or switching from a local to small world dynamic within a Brodmann area to integrate broader spatial scales of the input over time.
Review of past systems neuroscience work
Based in an engineering department, my work was focused on applications rather than numerical characterization of non-stationary spatially extended systems. Over time I shifted my research and community outreach from the neural network community, INNS conference, to the computational neuroscience community, and spent more time justifying the modeling formalism as a model of cortical processing. In my Ph.D. work, iterations were considered as ‘slow gamma’ cycles coupling adjacent columns, and the number of iterations permitted to recognize objects was constrained by psychological experiments.
Prior to the vision work described below, I did one computational experiment used non-stationary coupled map lattices (CML) with multiple layers to model Necker cube perceptual dynamics, attempting to match a variety of spatial psychophysical data while simultaneously give a theory of saccade target formation (known to be correlated with perceptual transitions).
My applied vision work explored unsupervised shape similarity and self-supervised learning to solve 3D invariance (i.e. recognizing objects rotated in depth). For the shape similarity work, I used “quenched” systems where the dynamics switch from critical or edge of chaos to periodic regimes, with constrained high contrast initial conditions (i.e. Marr’s primal sketch). After several iterations, the distribution of states created by wavefront interactions during their approach to periodic oscillations is measured. The distribution (over a simple histogram or coarse graining) is treated as a vector similar to a feature vector, and distance between the shapes on this manifold is used to assess similarity or ‘confusability’. The process was tested on a set of shapes used in psychophysics experiments. This form of unsupervised learning as having some similarities to the similarity preserving hash algorithms that appeared some years later, where the initial states are permuted and the subspaces approached on the manifold acting as bins.
In my Ph.D. work, genetic algorithms were used to find non-stationary coupling and bifurcation parameters for which all training views of an object would rapidly evolve to the same instantaneously sampled distribution, after a few iterations of an expanding wavefront from binary images of several view of the object. Each object had an associated “dynamical recognizer” consisting of parameters and the averaged distribution based on training over several views. Recognizing untrained views consisted of running all the recognizers and matching via L2 norm, with the best match winning. When projecting the distributions to 2D, I noticed that topologically similar items (i.e. paperclip objects with the same number of bends) would cluster, so it appears that the learning constraints (high entropy distributions and KL entropies between distributions) created an overall manifold shared by the individual learners.
In retrospect it appears that the learning loss function is very similar to the semi-supervised approach described in https://ai.googleblog.com/2019/07/advancing-semi-supervised-learning-with.html, optimizing KL cross-entropy loss from the previously learned distributions (representations) and ‘consistency loss’ across representations known to be different rotations of the same object. In contrast to this work, because learning is a network per class, there is never the possibility of catastrophic forgetting, only that a contrastive embedding principle may cause similar objects to be placed nearby on the manifold built object by object.
This work could also be considered as ‘few shot learning’, and results were reported for training views ranging from 2-7 per object.
The physicist M. Marder on my committee suggested that the CML field transients are forgetting the differences between views. This is in some sense true and I suspect there is a kind of rate matching between the approach to unstable periodic oscillators and information loss as the shape projections evolve from greater to lesser information content. Other vision algorithms, i.e. geometric heat equation and nonlinear diffusion, exploit information loss over time. However, the effective dynamics discoverd by GA were not always a quenching cycle, and there is also a scattering or diffraction like process which is creating information at and behind the wavefront. Perhaps a better explanation might be sought in terms of subspace permutation groups isomorphic to rotation groups, or in analogy to well known locality sensitive hashing algorithms which rely on permutations. When the distributions were projected to 2D via multi-dimensional scaling, the “paper clip” objects clustered according to the number of bends, even though this was not a criteria during learning. This shows that natural metric spaces and clustering may be emergent phenomena arising from solving invariance problems.
The non-stationarities were motivated by asking what role low frequencies play in interaction with gamma oscillations. I interpreted that the results might apply to interactions between hippocampal pulses to IT cortex, which is still considered the site of object identification in localist paradigms. Another result in my work is to show that one can find cells in the lattice where the dynamics are pinned to high values (i.e. representing high ensemble average frequencies or LFP), thus appearing as “grandmother cells” or cells as a member of some assembly with optimal tuning to that stimulus when in fact the statistics over the entire lattice are the decision criterion. So if the mechanisms of object recognition follow this spatiotemporal evolution, it will be easy to find misleading correlations. Other recent work is confirming that better prediction is obtained by considering cells with both elevated and supressed firing rates.
This work makes some contact with the theory of liquid state machines which emerged around the same time; my work would predict the existence of readout units with the capacity to sense a particular distribution of LFP frequencies over a Brodmann area scale computational unit.
I might mention briefly emerging theories bridging oscillations, coarse graining and and extended view of quantum mechanics; I am particularly interested and in contact with H. Atmanspacher and P. bem Graben in this area. I have recently become aware of efforts to establish quantum phenomena as emerging from classical systems; some of this work is based on the coupled map formalisms.
I have an extensive background in semiconductor design and nano-scale fabrication methods and techniques, and feel well positioned to participate in and to propose novel architectures for biologically inspired computing based on the theories described above. I have extensive contacts with many semiconductor companies, especially IBM. Much of my work in the last ten year involved the application of data mining approaches to semiconductor data in support of software engineering and computational lithography. More recently I have been engaged in work on sparse dictionary learning and have some familiarity with this topic from a neuroscience and signal processing perspective.
Dynamical Systems Approaches to Music Generation
I also have a keen interest in music and cognition, and in complex systems approaches to generative music. I am in the development stage of an opera now where I intend to generate portions of the music using statistics of non-stationary coupled map lattices. My goal will be to make music whose pitches derive from high dimensional oscillationg networks, and whose dynamics (volume) and timing are adjusted in a way that skilled musicians, would recognize as consistent with the pitch material. The theories of Manfred Clynes, operationalized in the Superconductor score rendering software, are a key motivation for this work. I hypothesize that the intrinsic timings and flows of densities between coarse grained states may be responsible for the feeling of correctness associated with well performed music, i.e. that which maximizes the emotional content and which listeners given a parametrized expression system will choose. My CV has an arts section reflecting a long engagement in music performance, composition, and visual arts; the soundcloud link has an early example of music generated from high dimensional dynamics.