Winner of the 2013-2014 Neukom Institute/IQBS CompX Faculty Grants Program for Dartmouth faculty has been announced with awards of up to $20,000 for a one-year project.
Co-sponsored by the Neukom Institute and the Institute for Quantitative Biomedical Sciences(IQBS), the program is focused on funding computational biomedical research in bioengineering, bioinformatics, biostatistics, biophysics or other related areas across the campus and professional schools.
Dartmouth College faculty including the undergraduate, graduate, and professional schools were eligible to apply for this competitive grant. This years winners are:
Hospitals have rapidly adopted the use of electronic medical records (EMR) for routine management and reporting of patient health care utilization. A comprehensive EMR has the possibility of conducting routine surveillance of quality measures for patient safety and measuring hospital performance. However, the use of EMRs for routine patient safety surveillance has been underutilized. To address the gap between unstructured EMR data and current NLP output, we propose to develop a new NLP framework to extract terms from multiple types of documents and aggregate them into patient-level information that can allow real-time automated surveillance of patient safety and quality reporting. We will develop a novel NLP toolkit to automate surveillance for 30-day hospital readmission and death. We propose to develop and train our NLP toolkit to abstract semi-structured and unstructured EMR data into prediction models for 30-day readmission and mortality. As a clinical context, we will develop our NLP toolkit for patients hospitalized for an acute myocardial infarction (AMI).
We hypothesize relevant risk factors and terms can be extracted from complex unstructured and semi-structured data fields in the EMR to inform routine surveillance of 30-day readmission and mortality. We will extend two existing NLP tools, called Topaz and ConText, to conduct automated information extraction from unstructured EMR text fields (admission history and physical, procedural reports, discharge summaries) and semi-structured data from the cardiac catheterization database (Cardiomac). We call our toolkit ReX (Readmission Extraction).
We will develop and validate methods to extract and aggregate information from physicians' notes so that we can classify the risk for 30-day readmission or mortality for each patient hospitalized for an AMI. We will apply Topaz and ConText for our research aims. Our ultimate objective is to aggregate the terms and the values of these contextual properties across multiple patient documents to determine the outcomes (readmission, mortality) for that patient. To develop and validate our ReX approach, we will undertake three phases of document analyses.
Our first step is to select an appropriate training set of physicians' notes that account for three factors. First, because we are focused on readmission and mortality after AMI hospitalization, we will focus on patients hospitalized for AMI. Second, since physicians will have different narrative styles, we will sample notes from all cardiologists. Third, to ensure that we find patients who are hospitalized for AMI, rather than false AMIs, we will select subjects from the validated AMI registry. We will randomly select 50 patients from the AMI registry who were hospitalized between 2011 and 2013 and meet these criteria. In the second step, we create a knowledge base (KB) of lexical features and regular expressions needed for the ReX toolkit. The third step in the NLP tool development cycle is to run Topaz to identify concepts from the KB in the text and to evaluate. In this cyclic development and review of the training set of 50 documents, the team will identify the relevant terms for each patient's readmission or mortality and will make changes to the KB for improved accuracy.
In the second phase of our method development, we will repeat the cycle on a randomly selected set of another 50 patients from the AMI registry who meet the same criteria as specified in Phase 1. We will apply the modified Topaz to analyze the set of documents for these patients. Blinded to these results, the evaluation team will use eHOST to examine each sentence containing a concept to independently determine the value of the contextual property. We will then determine the recall and precision of the outputs of the tool when compared against the team's results. The team will also classify readmission or mortality outcomes for each patient to develop the overall ReX approach. Additionally, the team will develop methods to aggregate and classify the results of the sentence-by-sentence analysis for each patient into one of fourteen risk factors for 30-day readmission or mortality. Our toolkit will also have to handle the complexities of procedural treatment versus pre-procedural treatment. We address this challenge in developing our ReX approach. By aggregating the results of sentence-level analyses per document and across all documents temporally, we argue that a more comprehensive and accurate prediction of 30-day readmission and mortality can be achieved.
In the third phase, validation of the ReX toolkit will be performed on 200 new sets of unstructured text fields and catheterization fields from AMI patients in Epic and Cardiomac. Sensitivity and specificity analyses will be performed using the AMI prospective registry as the gold standard. Each error will be reviewed by the team to determine if the error occurred through misclassification from the ReX tool or miss reporting for the AMI registry. We will compute the F measure (concept), harmonic mean of precision and recall. Methods above will be repeated for secondary endpoints.
TCT invited talks - Focus on Chronic Kidney Disease and Contrast-Induced Nephropathy
Proteins are a remarkable class of molecules that perform a wide array of functions in the cell – from sensing and logic, to force generation and movement, to chemical catalysis. Remarkably, this functional diversity is encoded in a seemingly simple way, as proteins are merely chains of amino-acid links, with twenty amino acids commonly occurring in nature. The specific amino-acid sequence of each protein encodes its 3-dimensional (3D) molecular structure, which in turn determines the protein's function. Despite some progress, the fundamental question of how sequence dictates structure has remained an enigma of computational biology.
The central difficulty with tackling the sequence/structure relationship lies in the complexity and the high-dimensionality of protein structure space. The numbers of conformations available to even small proteins are astronomical. On the other hand, folded proteins do not occupy this structure space uniformly, but rather tend to be composed of recurrent local structural motifs. Here we propose that such modularity lends itself to a reductionist approach to understanding the structure/sequence link. Namely, the structure of a protein can be thought of as composed of local 3D structural motifs, such that the structure/sequence relationship of the entire protein is given by the combined preferences of all the constituent motifs. The key idea we put forth here is that sequence preferences of each motif can be deduced by identifying all structurally similar motifs in the rapidly growing database of known protein structures. We aim to take this approach in two main directions:
I. Structure Prediction Our approach will enable a strong selection filter for the correct structural model on the basis of sequence/structure agreement on the level of individual motifs.
II. Protein Structural Universe We will elucidate the minimal set of recurrent structural building blocks necessary to describe the protein structural universe as it is now known, shedding light on the hidden architecture of this space and providing immediate impact for the fields of protein design and structure prediction.
The reductionist procedure is akin to deducing individual words from writing in an unknown language and ascribing meaning to each. Such an approach requires 1) a sufficiently large structural database such that for recurrent motifs ("words") sequence preferences can be established ("meaning") and 2) efficient structure search algorithms (finding similar "words").
In the past year, we have demonstrated that it is possible to establish quantitative links between sequence and structure on the level of tertiary structural motifs. The resulting metric we term design score characterizes the global agreement between the motif and the sequence, in stark contrast with conformational energy-like metrics, which consider only local structure/sequence compatibility. Further, we have explored the use of design score in structure prediction applications.
We propose to use our design score metric to guide the search for the correct model in de novo structure prediction. We are exploring two approaches to this. In the first approach, we aim to combine the design score with standard scoring functions that evaluate the energy of a given conformation to produce a hybrid measure, which takes into account both very local structural interactions (from physics) and global agreement between sequence and structure (from the design score). This hybrid score would then be used to drive model search in a Monte Carlo simulation. The second approach we propose involves pre-filtering structural motifs to include just those that are strongly compatible with different parts of the input sequence. Prediction of the correct structure then reduces to the problem of most productively combining these motifs, under the guidance of a physic-based scoring function.
The apparent modularity of protein structures suggests that the entire structural universe, as complex as it appears at first, may be well described with a finite number of local patterns. Knowledge of all such recurrent motifs would constitute a succinct description of this very complex space and would have immediate impact on such fields as protein design and structure prediction. We propose to elucidate all of the elementary structural building blocks by casting the problem as a classical set cover problem. Specifically, we define the set of all contacting residue pairs within all structures of the Protein Data Bank (PDB) and attempt to cover these with the fewest number of local structural motifs. By choosing contacts as the objects to cover, we will unearth tertiary motifs – that is, motifs that explain which residues come together to form the 3D structure. Although analysis of the full PDB will require much larger computational resources, the results from 100 proteins already show that structure space is indeed quite redundant and will be well described by a limited set of motifs.
Winners of the 2013-2014 Neukom Institute CompX Faculty Grants Program for Dartmouth faculty have been announced with awards of up to $20,000 for one-year projects.
The program seeks to fund both the development of novel computational techniques as well as the application of computational methods to research across the campus and professional schools.
Dartmouth College faculty including the undergraduate, graduate, and professional schools were eligible to apply for these competitive grants. This years winners are:
Research Assistant Professor
Thayer School of Engineering
Micro- and nano-scale optical devices play an essential and growing role in technologies from telecommunications to graphic displays to biological sensors. Modeling the flow of light in such devices requires solving the full Maxwell partial differential equation (PDE) in three dimensions (3D) with boundary and matching conditions on complicated geometries. We propose to create and test a promising new numerical algorithm for a common class of such devices: ones with cylindrical symmetric structures lying in a doubly-periodic array. We propose to combine a boundary-based numerical method representing fields as a sum of point sources with Fourier methods for the cylindrical symmetry, and with a new scheme for periodizing a single structure into a infinite array. For devices several wavelengths in size, this will be much more efficient than the finite elements or finite differencing which are commonly used by engineers. This combines expertise from the Barnett and Shubitidze groups in a new collaboration.
There are two modeling application areas we will focus on:
Microsphere magnetic field detectors. The Shubitidze group proposes to design a device to sense, via evanescent wave coupling, the movement of a microsphere resonator attached to a silica substrate by a DNA strand. This would enable magnetic fields to be imaged at the micron scale.
Solar cells. Both our groups interact with Jifeng Liu (Thayer School), whose lab creates and prototypes periodic layered structures for improving the efficiency of thin-film solar photovoltaics, and photonic detectors.
The method of fundamental solutions (MFS, also known as the method of auxiliary sources, MAS) is the core of our algorithm. The basic idea is to represent the unknown field in each homogeneous (dielectric) region as a sum of N point charges (fundamental solutions) with unknown strengths.
Because it leverages analytic solutions, the MFS has a tremendous advantage in accuracy, efficiency, and simplicity, over finite element or other volume discretizations.
The main tasks for the graduate students involved will be to understand and combine the MFS, bodies of revolution (BOR), and periodizing algorithms, then test them in application geometries. We plan to start with spheres and planes, then introduce cylinders and channels using a slight rounding of edges (in that case the MFS is known to work when source and collocation points are concentrated near the edge).
Our goal is to demonstrate that the periodizing scheme can be successfully combined with MFS in bodies or revolution, to create a solver that any optical engineer could use to solve a variety of full Maxwell scattering problems. The result will be new software to solve a large-scale real-world problem with lower computational cost, without sacrificing accuracy. The more rapid modeling time will enable design and optimization of potential devices.
External access to the internal mental processes of dreams, hallucinations, and flashes of creative inspiration has been the subject of science fiction for decades. Recent research in multivariate neural pattern decoding and audio-visual reconstruction shows that the prospect of externalizing such inner thoughts into sounds and moving images has become a tantalizing reality.
Mind2Art (mind-to-art) will explore neuro-imaging as a mediator for creative expression via computational decoding of neural activation patterns, elicited by creative intentions in the auditory and visual pathways, into externalized sounds and images that others may see and hear. The results of neural decoding experiments will become speculative works of art, cast in a framework of established computational neuroscience practice. Subjects in the Mind2Art experiments will be artists and composers; neuro-artists, who will be invited to imagine novel scenes and sounds for subsequent decoding and audio-visual playback to an audience. The technologies behind Mind2Art are being developed at the Bregman Music and Auditory Research Studio at Dartmouth College in collaboration with the Department of Computer Science and the Neuroscience program in the Department of Psychological and Brain Sciences. Mind2Art will be an investigation of the creative possibilities of the technologies of neuroimaging, machine learning, and neural decoding.
Our goals will be to: 1) decode audio and visual neuroimage patterns as audio and visual media; 2) test generated media by reconstruction of imagined sounds and images from memories of such; 3) disseminate new results via scholarly music, computation, and neuroscience publications; and 4) disseminate neuro-creative decoding on-line (e.g. Vimeo and YouTube) and at major international art-science venues, such as Ars Electronica.
Our proposed outcomes include: 1) public performances and video demonstrations of Mind2Art works; 2) publications in leading conferences and journals in the fields of music (Music Perception), computer science (Neural Information Processing Systems, Machine Learning), and neuroscience (Neuron, Neuroimage); 3) new machine learning methods and publicly available software, extending our public Bregman and Brainspotter software toolkits, for decoding neural images into sound and images; 4) published imaging and decoding data sets to help foster a new community of artists and scientists exploring the intersection of neuroscience, computation, and art; and 5) participation in workshops and conferences to disseminate the findings and creative outcomes of Mind2Art, contributions to multiple art and science disciplines.
While recent work in the area of neural decoding has shown that it is possible to decode perceived music, and images, from the listening/seeing brain, we seek to address auditory and visual imagination. There is evidence that perception and imagination of music share common processes in the brain. It is known that imagery produced in the auditory imagination has many of the structural and temporal properties of auditory stimuli and that it elicits similar brain responses to auditory perception. Brain signals recorded while listening to a music piece could serve as reference data for a retrieval system in order to detect salient elements in the signal that could be expected during imagination as well. It is on this basis that Mind2Art will investigate synthesis of sounds and images from neural patterns representing imagined sound and images (the creative intent). Our methodologies, developed in the context of reconstructing perceived sounds and images, will have the ability to represent specific features of audio and visual phenomena, so we expect our system to surpass previous results when applied to the imagery paradigm.
Sherman Fairchild Distinguished Professor,
Tiltfactor Research Laboratory
Myers Family Professor,
Institute of Arctic Studies, Dickey Center for International Understanding
This joint proposal represents a noteworthy art and science collaboration at Dartmouth to use creative games to challenge beliefs, opinions, and possibly behaviors about climate change and the environment. The goal is to use the CompX award to fund pilot research in order to crystallize the creation of a prototype digital app and kickstart a new research partnership. Our aim is to combine our research interests to address a global agenda for climate change through digital games.
Climate change is one of the most urgent issues facing global citizens. Scientists are recording changes in surface temperature, the frequency of extreme weather events including precipitation and wind, declining water supplies from changes in snow and ice, and changes in ecosystems and their biodiversity with disruptions to the essential services that the natural world provides to society.
Responses to the urgency of global climate change have not met this pressing demand: a need to mitigate and adapt to rising tides and unpredictable weather events. Why is it so difficult to inform and adjust human behavior in response to climate change? Social and psychological factors have been cited as the central causes for global citizens' reticence towards engaging in mitigating behaviors that could help alleviate climate change. Indeed, social and psychological factors appear to create a set of motivational barriers that explain – and perpetuate – this imbalanced level of participation in ameliorative action towards climate change.
Psychologists have identified a number of psychological barriers that impede behavioral choices when faced with climate change. Could a game facilitate changes in perception, attitudes, and possibly behaviors related to climate change?
Our research team will create a digital app to change beliefs about climate change. We will:
1) Iterate designs based on this set of key areas for change;
2) Prototype and informally test the game designs for playability;
3) Further develop the most promising game as an App and for Android;
4) Produce the game to a more finished look and translate to Greenlandic;
5) Assess the game's impact in the US and in Greenland; and
6) Publish the results of our findings. Our team anticipates producing publishable evidence from our studies, and a prototype digital game on which further research funding will be pursued.
In Year One, we will research, prototype, and playtest game solutions to specific issues around climate change, with the aim to produce one finalized game prototype for research. Flanagan and her Tiltfactor team have developed a number of card- and screen-based simulation games (e.g., Pox, Awkward Moment, Metadata Games) that contain features and approaches that will inform the development of climate change games. In Year Two, we will study the effects of the game on local citizens both in the United States and in Greenland. Identifying cultural differences in climate change perception and action is an important objective of our work, and our aim is that this project be a part of an ongoing cross-cultural conversation.
|Robert C. Johnson
The cross-border fragmentation of production is a defining feature of the modern international economy. This fragmentation entails 'slicing up' production stages or tasks required to produce output and distributing them across countries to minimize production costs. The result is a 'global production chain' through which inputs are traded, in which a country's imports are used to produce its exports.
The rise of global supply chains has attracted the interest of policymakers and academics alike. An active theoretical literature explores the positive and normative consequences of fragmentation. Production fragmentation may help explain the large expansion of trade following adoptions of global or regional trade agreements. Recent research has also linked production fragmentation (a.k.a. offshoring) to changes in income distribution (i.e., stagnant growth of blue collar wages in the United States).
Quantitative analysis of the rise of global production chains has generally lagged behind theory for two main reasons. First, global production chains are hard to measure, since statistical agencies collect only information on the gross value of goods as they cross border, not the locations at which value added is added in the production process. Second, models with production fragmentation tend to be computationally difficult to take to data.
Our project aims to combine the recent advances in data collection with advanced computational techniques to provide a rigorous quantitative evaluation of existing theories. Our main goal is to estimate a structural model that explains how trade costs and technology differences across countries shape the structure of international production chains. Having developed a workhorse quantitative model, we can then use the model to perform counterfactual analyses. For example, we plan to study the impact of changes in tariffs, transport costs, or productivity on the organization of production chains.
The major computational challenge is that models with production chains typically do not have closed-form solutions. This is a particular problem when one wants to estimate and simulate models with many countries and/or industries. Intuitively, each firm solves a high dimensional discrete choice problem to determine the allocation of production stages across countries. To solve the model, we have to compute the total production cost of all potential configurations of the production chain for each firm. To find a general equilibrium solution, we then have to aggregate up the decisions of these individual firms and impose aggregate market clearing conditions.
In order to estimate the structural parameters of this type of model (e.g., the parameters that govern technology, trade costs, or consumer tastes), we have to solve the model many times for alternative parameter values to minimize the distance between model statistics and data. With both global optimization and gradient-based methods we have to solve the model for every guess of parameter values, implying that we typically have to re-solve the model thousands of times. Further, to construct bootstrapped standard errors for our estimated parameters, we have to estimate the model hundreds of times more. Running counterfactuals through the model then requires even more computing time. In sum, our project requires significant computational resources. Our initial work has focused on one industrial sector and 15 countries, but we would like to add more countries, greater geographic detail, and finer sector detail into the model.
In addition to applying more computational resources to solve our problem, we are also working on applying computational techniques that reduce the dimensionality of the problem without compromising the precision of the estimates. This methodological work is likely to attract attention in the literature in its own right, as the type of discrete choice problem we face is present in a class of important related models. We therefore expect to contribute to the growing literature on computational methods in the field of international economics. Additionally, we intend to present our work at several high-profile conferences in the U.S. and Europe.
New Digital Modeling tools now allow the creation of realistic interactive digital models of unbuilt architectural designs, as well as digitally fabricated built models using laser cutters for topography, and 3D printers for building models. These models and visualizations are powerful design tools for architects and planners, and useful in educating the public and building consensus, as well as applying for funding for public projects. This project will apply this computational method to a design for one of the great public spaces in America and will facilitate its development in a way that benefits the many agencies involved, as well as the taxpayers.
21st century issues including global warming, flooding, seismic, security, visitor orientation, public transit access and parking have transformed the needs for one of America's most beloved spaces, the National Mall in Washington, DC, not foreseen by its designers. Many public agencies are involved in the planning for the Mall, all with specific concerns and challenges. The agencies involved currently in various ways with Mall planning include the National Park Service, the Trust for the National Mall, the District of Columbia Planning Office, and the Federal Government.
There has been no long term planning for the National Mall for 40 years that brings together the diverse concerns of these agencies and the Public. Many initiatives by stakeholders overlap and there is no mechanism to assure they work in concert to create innovative designs that incorporate the highest level of sustainability while addressing current concerns and budgetary challenges.
The National Coalition to Save Our Mall is a group of citizens, architects, planners, and historians working to create a "3rd Century Mall" Plan. This project will forward their initiative by creating one cohesive long-term plan that embodies their work and develops it. It will complete the unfinished planning for the area around the Washington Monument and extend the vision to create a "3rd Century Mall" Comprehensive Plan which weaves together diverse plans from the various stakeholders while preserving and extending L'Enfant's iconic vision.
We will develop a virtual model and create digitally fabricated models from the digital model. A large interactive built section model (9' long by 3' wide) with replaceable bays and blocks and visibility of underground elements will allow public interaction and critical dialogue about the designs. A series of larger scale model sections of the elements (such as a Mall Museum below the Washington Monument and the Underground Parking Garages and Transit Hub) will allow them to be presented in more detail. Walkthroughs and Flyhroughs will be created from the Digital Models. All of these models will be displayed at the District Architecture Center and venues provided by the Coalition, and photographed for publication on-line on the Coalition's website.
Three major goals for the 3rd Century Mall Plan are to: maintain and enhance the historic, grand space and experience of the National Mall while addressing practical current needs; weave the current plans and needs of the National Park Service, District of Columbia and Federal Government into one symbiotic coherent plan; and follow the highest standards of sustainability in the process. But without visualization both as a design and presentation tool these designs will go unpromoted and unbuilt.
Some of the designs begin to address parking, flooding and Mall maintenance concerns by combining underground bus parking, which would produce a badly needed revenue stream, irrigation for the lawn for the National Park Service, and flood reservoirs planned by the DC Planning Office, all currently planned for the same site, into one integrated design. The National Park Service has shown interest and support for this design, which would pay for itself in 10 years with parking fees, and save the city millions of dollars by combining the flood reservoirs and irrigation with the parking.
In beginning to investigate existing conditions, we have rediscovered the forgotten "Washington Canal"-L'Enfant's canal created from Goose Creek-later turned into a sewer and buried in the late 19th century-which may account for some of the current flooding in the Federal Triangle and which we can show in the model.
We will also develop the visitor experience and orientation on the Mall by creating plans for the Mall Visitor's Center and Washington Monument Museum at the center of the Mall and base of the Washington Monument, with access to bus parking, public transportation and bike shares.
The opportunity to create 3rd Century Mall Plan digital virtual and fabricated models will encourage new dialogue and alliances among the stakeholders. Visualizations of these designs will provoke discussion by the various stakeholders and help make the plans a reality.
Urban Turf Article:
From a social, hydrologic, and geomorphic perspective Hurricane Irene in August 2011 was a truly catastrophic event generating widespread erosion, power outages, and overbank flooding. It generated more than 750 km of road closures with more than 200 bridges damaged or eroded, including 10 historical covered bridges of profound cultural symbolism. The rapid time to peak discharge and unprecedented flood flows spawned major geomorphic changes, including significant changes in channel position and size and the entrainment of very large (> 1m) channel bed material, transporting this sediment across valley bottoms as much as 4 m above the channel. These geomorphic impacts generated immediate and sustained response from watershed and town planners; including channel riprapping, channel straightening, and levee construction. This response intensified when the 1984 ban on channel gravel mining and channel straightening was suspended. From a geomorphic perspective, as well as from a social perspective, Hurricane Irene can be considered a natural experiment of a known perturbation generally fixed in time and space. In this proposed research we plan to take advantage of the "natural" effects of Hurricane Irene as well as the managed and engineered solutions employed by town planners and other local stakeholders to document the recovery times and processes in stream reaches affected (a) singularly by Irene and (b) those reaches affected by both Irene and subsequent management.
There is a rich tradition in geomorphology evaluating the role of large floods on landscape change, and there is an equally robust geomorphic literature dealing with the timeframes and processes associated with stream channel recovery to a catastrophic flood (Costa, 1974). For humid alluvial streams, typical of the eastern US, channel recovery to pre-flood dimensions occurs within ~10 years. Because of Irene's unprecedented flood flows, the geomorphic signature may be longer than ten years, and there is potential for significant "legacy effects" of Irene's impacts.
With our intensive field campaign, we have been documenting the immediate impacts of Irene across numerous watersheds in VT. These baseline data are fundamental for characterizing the impacts of Irene, but they further establish the initial conditions for channel recovery. The limitation of field data however is that these efforts are time consuming and spatially limited. To overcome these limitations, we are proposing to use a combination of remotely based photogrammetric approaches to generate topographically rich models of topography, known as digital terrain models DTMs. We propose to acquire 2 m resolution DTMs from stereoscopic analysis of optical imagery from the WorldView-2 satellite, processed to remove tree crowns, buildings, and other features above the terrain surface. Along with visual interpretation of the raw multispectral and panchromatic imagery, these DTMs will be used to map areas of channel changes and to serve as a baseline for assessment of the geomorphic recovery from Irene and/or the impacts of future flooding episodes. We further propose to augment these "coarse" 2 m resolution data with a computationally rich approach for generating high-resolution field imagery, "Structure from Motion" photogrammetry (SfM). This process reconstructs real world scenes from SfM algorithms based on the derived positions of the photographs in three-dimensional space. Our goal for each method is to establish baseline riverine conditions which will be followed by repeat coverage via a combination of field measurements and one of the two remotely sensed approaches.
With its focus on utilizing high resolution data sets and the incorporation of newly developed SfM algorithms, this proposed research corresponds with the Neukom mission to support novel computational techniques. Similarly, results from this inter-disciplinary research team will have important policy implications and help advance the Neukom Institute's goal of fostering the use of computational methods to solve problems in the Physical and Social Sciences.
|Dale F. Mierke
We aim to develop a novel computational method for the identification of the structural features of guest/host complexes. The proposed method calls for casting the guest in four-spatial dimensions (x,y,z,.) which allows placement in the center of the host (cast in 3D) with no steric clashes or energetic repulsion. During optimization of the guest-host interaction, the guest can pseudo-tunnel through the host, allowing for rapid, efficient sampling of the potential energy landscape to locate the global minimum. This computational method will be particularly useful with the incorporation of experimental data from nuclear magnetic resonance (NMR) spectroscopy. An NMR-based screening center has recently been established at Dartmouth allowing for high-throughput collection and analysis of NMR experiments. We envision that the proposed computational method will become an integral component of the screening center, allowing Dartmouth (and other) researchers in biology, chemistry, medicine, and material sciences to obtain high-resolution structural insight into guest/host or ligand/receptor complexes.
NMR is an extremely powerful tool for the constitutional and structural characterization of molecules and molecule complexes, with wide ranging applications in the biological and physical sciences. A number of recent improvements, including cryogenic probes, higher magnetic fields, and robotic sample preparation/handling, have made screening by NMR very attractive: it can play a major role in identification, validation, and optimization of the hits. One of the most powerful techniques to identify hits (molecules that bind to its target) is saturation transfer difference (STD) spectroscopy.
Although extremely powerful for the identification of binders, and the topological orientation of the ligand while bound, no information on the location of the binding is provided. Knowledge of the structural features of the binding site within the host is extremely valuable, allowing for the rational design of improved binders. Here, we propose a novel computational approach that employs higher-dimensionality of the guest to allow for an unbiased and efficient search for the optimal mode of binding.
To identify the structural features of the guest/host complex, we propose to incorporate a 4-spatial dimension (4D) representation of the ligand into molecular dynamics (MD) simulations. Casting the 3D object into 4D allows for quasi-tunneling through potential energy barriers during refinement of the complexes; the 4D guest can occupy the same 3D space of the host, without penalty from the non-bonded terms of the force field, and therefore the potential energy barriers present in 3D space can be evaded.
The guest will be converted into 4D using a home-written metric matrix distance geometry program. From the 3D Cartesian coordinates, a holonomic matrix of upper and lower allowed distances is created, a configuration randomly selected, and the resulting distances smoothed using tri- and tetra-angle inequality laws. The resulting real and symmetric matrix is diagonalized, and the four Eigenvectors with the largest Eigenvalues used for calculation of the 4D coordinates (x,y,z,.). The guest will be placed in the center of mass of the host using a simple molecular graphics program (e.g., Chimera, VMD). The molecular dynamics simulations will be carried out using the CHARMM force field, for which the laboratory has a license and the source code.
There are a number of metrics and parameters that will have to be optimized for the proposed simulation protocol. To examine these we will first investigate a number of model systems, for which the high-resolution structure of the guest/host complex is known. To test the simulation protocol, we will choose systems which vary in the nature of the binding interaction (deep binding pocket with multiple guest/host interactions to a shallow binding site with minimal inter-molecule interactions). With these test systems, we will establish the optimal procedures and protocols for the simulation with real experimental data.
Upon successful optimization of the calculation protocol, we will develop a web-based user-interface to facilitate the utilization of this novel method. The user will provide the SDF description of the guest and the Cartesian coordinates of the host. This will be incorporated into the NMR-based Screening Center.
|Deborah L. Nichols
Ceramics comprise one of the most common classes of material found on archaeological sites. Archaeologists typically recover pottery sherds where they were last used or discarded. Sourcing pottery to the locus of production can therefore elucidate the movement of goods and/or people across the landscape (Glascock 1992; Neff 2000). The chemical and mineral composition of pottery provides a fingerprint representing both the raw materials exploited by potters and the unique way that producers mix the ceramic paste (Arnold et al. 1991; Bishop et al. 1982). Sourcing pottery is essentially a geospatial reconstruction that connects people, places, and geological sources across over the landscape. Ironically, the institutions that construct massive ceramic chemical databases typically have not kept pace with the recent advances made in geographic information systems (GIS). The University of Missouri Research Reactor (MURR), houses one of the largest ceramic chemical databases in the world. Our research proposes to establish a major research collaboration between the Neukom Institute and MURR to integrate these ceramic compositional data into a geodatabase to open up new dimensions of analysis.
The Archaeology Laboratory at MURR has been very influential in advancing pottery (and obsidian) sourcing studies over the past 25 years (Glascock 1992). MURR has analyzed more than 100,000 ceramic specimens through neutron activation analysis (NAA). Over 21,000 of the NAA assays have been run on ceramic artifacts from Mesoamerica, representing regions throughout Mexico, Guatemala, Belize, El Salvador, Honduras, and other parts of Central America. Mesoamerica is a key world region to understand some of the most important transformations in human history including the origins of agriculture, the independent development of cities and states, and the encounter between the empires of the Aztec and Spanish. Mesoamerican research has also been on the forefront of advancing ceramic sourcing studies. MURR possesses one of the largest ceramic chemical databases in the world, and has been responsible for some of the most groundbreaking research on ancient exchange networks. Combined with regional archaeological settlement pattern data, this offers an unusually good place to bring together compositional and spatial analyses.
The proposed research will create a relational geodatabase for all of the MURR ceramic (including raw clay samples) chemical data for Mesoamerica. Over 90 percent of the samples in the MURR database currently lack spatial reference coordinates. The first step of this project will be to assign missing spatial reference data through review of publications and collaboration with researchers who have submitted samples. The database must be cleaned and standardized. Cleaning involves a search for errors and misspellings. Standardization will require much more time and labor. The descriptive data originally derive from 85 different researchers using variable taxonomies. To be useful for searching, displaying, and interpreting data, all specimens within each descriptive field must conform to the same nomenclature. The structure of the Mesoamerican ceramic geodatabase will later be applied to other regions and other classes of material (e.g., obsidian tools). Since maintaining and updating the geodatabase will require a trained skill set, Wesley D. Stoner (MURR) has agreed to manage the dataset indefinitely. The dataset will be housed at MURR, an institution that the PI has collaborated with for almost 20 years.
The MACGeo project will be presented in a symposium where researchers will synthesize compositional research for each region of Mesoamerica. The project PI and MURR consultant, Stoner, will work with collaborators to generate standardized maps and spatial analyses for each culture region to establish coherency and facilitate comparison. Collaborators will then interpret the spatial analyses according to their own regional archaeological expertise. The symposium will create the foundation for an edited volume featuring a synthesis of three decades of Mesoamerican research at MURR. The MACGeo project has the potential to transform how archaeologists study exchange, interaction, and landscapes.
Thayer School of Engineering
Ultrasonography is a medical imaging method used to visualize soft tissue and internal organs in order to detect cancer, lesions, and other abnormalities. It is also used routinely for obstetrics. An ultrasonic "probe" typically contains multiple piezoelectric transducers that produce pulses of sound at frequencies between 1 and 18 MHz. Each type of tissue has a characteristic density that provide differing acoustic impedances, which cause a portion of the sound wave emitted by the probe to be reflected back to the probe as an echo. The echo is sensed by the probe, and the time of travel between source emission and echo detection is used to calculate the depth of the interface between the tissue and surrounding fluid. The depth signals as the tissue is scanned are then used along with echo strength to produce a visual image. The larger the frequency of the emitted wave, the better the resolution of features, i.e., more detail can be seen for high frequency sound pulses. However, there is a tradeoff between frequency and signal attenuation, with higher frequency signals being weaker owing to attenuation.
Besides use in medical imaging, ultrasonic transducers are used for non-contact distance measurement and non-destructive evaluation (NDE) of structures, i.e., finding incipient fatigue cracks that can lead to structural failure. For distance measurement, a high frequency (inaudible) sound wave is emitted, and its return echo detects distance through time-of-arrival calculations. NDE uses the same principal of emission of a pulse and detection of an echo as medical imaging; an ultrasonic probe detects a flaw in the structure through differing acoustic impedance of the structure and the small void within a fatigue crack.
Ultrasonography benefits from beamforming, a form of array signal processing where multiple transducers arranged in an array of known geometric shape each emit a signal, which is time-delayed relative to signals emitted from other transducers in the array such that the signal arrives in phase at a desired location, providing amplification of the emitted signal to overcome deficiencies due to natural attenuation. The superposition of signals from each transducer forms a "beam" of ultrasonic energy. Likewise, the return echoes are sensed by an array of ultrasonic transducers and are beamformed to improve signal-to-noise ratio of the return echo.
The objective of this investigation is to determine the feasibility of a novel distributed computational approach to beamforming to improve ultrasonic imaging.
While beamforming is a common array signal processing technique, it is computationally intensive and thus the number of ultrasonic elements using in the head of an ultrasonic probe is limited by computational throughput. Scatter of the beam also results in echoes that originate from multiple locations and not just the location on which the energy of the emitted beam is focused. Previous research in the investigator's lab (sponsored by AFOSR and the National Science Foundation) has developed a scalable, high throughput distributed computational approach to acoustic beamforming that permits real-time operation and provides streaming audio signals at 100 kHz. The proposed Neukom investigation will extend the computational approach from audible sound to ultrasonography in order to enable medical ultrasound arrays with more probes and more tightly spaced probes. Increasing the number of probes and reducing the spacing between probes will improve the ability to focus the emitted beam, the resolution of features imaged, and increase signal-to-noise ratio in the echoed signal, thereby reducing image blur and improving image resolution. This, in turn, is expected to improve the diagnostic capability of ultrasonography.
The proposed work will scale our existing distributed computational system to work in real-time for ultrasonic probes operating above 1 MHz, i.e., more than an order of magnitude increase in throughput. The system will be comprised of a Virtex FPGA, data converter and signal conditioning electronics, and real-time code for distributed, parallel signal processing. Additionally, we will develop new components of the core signal processing computations that provide cancellation of echoes arriving from directions other than where the probe is pointed. Finally, we will investigate the use of the computational architecture to provide additional diagnostic information beyond the ultrasonic image. We will develop a distributed computational approach to adaptive estimation that enables estimation of the acoustic impedance of the ultrasonic signal path. Estimating the acoustic impedance of the signal path may provide additional medical diagnostic information, e.g., a material property, which, when combined with improved image resolution, could enable the detection of smaller tumors or lesions than can be detected with current ultrasonic imaging technology, as the legion would represent a local disruption in the acoustic impedance relative to surrounding (healthy) tissue.
Psychological and Brain Sciences
We have been developing a new way of figuring out what neuronal circuits underlie what kind of mental processing. The brain is in some ways like a muscle. There are feedback processes that change axonal myelination depending on how many action potentials are sent along a given axon. We can measure an indirect indicator of myelination and axonal organization using diffusion tensor imaging (DTI). The complexity of the 32-directional diffusion tensor within each brain voxel can be simplified into a single scalar value for each voxel called the 'fractional anisotropy' (FA) value. FA increases with (1) increased myelination, (2) increased parallelness of axons, and (3) increased packing of axons. Because of this we can train people in whatever mental task we like, ideally for many months, collecting one DTI scan a month, allowing us to observe changes in FA across the brain. We can infer that changes in FA indicate changes in axonal organization, and we can infer from this that the changed axons play some role in the mental process that was trained.
We now ask essentially, "What neural circuit underlies mental process x? What is the neural basis of human creativity?" To collect data, we have had a 6-month creativity training course. We have had 25 middle-schoolers from the Upper Valley attend weekly creativity training sessions led by the Creative Education Foundation. In addition, there are 15 other students, all randomly assigned, who will not take part in the training, but who will get scanned. We have collected DTI scans before, during, and after the training for both the test and control groups.
We will have vast amounts of DTI and behavioral data for the 40 participants. The doable thing would be to just look at FA, and plot voxels that have undergone changes with creativity training, as we did in the attached paper on the language students. But FA takes a 32-dimensional tensor and reduces it to a scalar. We are throwing away vast amounts of information mainly because we lack the computational power and insight needed to know how to exploit all the information in the tensor. The tensors can be explored in interesting ways to try to extract circuits. People in the field are already using probabilistic fiber tracking algorithms to try to extract axon bundles from a single DTI. But we have longitudinal data, and can probably extract much more. The existing algorithms assume that a fiber cannot bend too suddenly, and cannot fan out too suddenly. Fibers that cross or interdigitate become intractable. We would like to improve our ability to see that fibers connect two cortical areas of interest. Our hope is not only to be able to do fiber tracking in more sophisticated ways, but also to transcend our present FA-based analysis and use tensors to study how fibers develop over months of use. Tensor evolution will almost surely show some interesting properties that FA evolution throws out.
Last Updated: 7/7/14