Winner of the 2014-2015 Neukom Institute/IQBS CompX Faculty Grants Program for Dartmouth faculty has been announced with awards of up to $20,000 for a one-year project.
Co-sponsored by the Neukom Institute and the Institute for Quantitative Biomedical Sciences(IQBS), the program is focused on funding computational biomedical research in bioengineering, bioinformatics, biostatistics, biophysics or other related areas across the campus and professional schools.
Dartmouth College faculty including the undergraduate, graduate, and professional schools were eligible to apply for this competitive grant. This years winners are:
The goal of this project is to develop a model that investigates under what conditions GTA-mediated gene exchange could increase population fitness, and to characterize the gene pool of GTAs in wild bacterial populations.
1. Model possible fitness benefits of DNA acquisition via GTA particles. Regardless of whether GTAs are SGEs or not, there could be beneficial effects of GTAs for the population that in turn could reinforce GTA gene cluster maintenance, now due to the populationlevel selection. Such exaptations, or domestications, have been observed multiple times in various biological systems. Additionally, examination of genomes of closely related bacteria and genotyping of natural bacterial communities reveals enormous variation in gene content among closely related cells. Therefore, some cells might carry genes that are beneficial (or even vital) under specific environmental conditions that other cells lack. How quickly can a beneficial trait B propagate through the population via GTAs? To answer this question, I will construct a model that examines the change in frequency of the trait B in a population if it is transferred by GTAs at a rate t. Rates of a cell's division and death will depend on the fitness increase associated with the presence of the trait B in the cell's genome. Both random and biased packaging of host DNA into GTA particles will be considered. The model will predict if a GTA-producing population is able to adapt to changing environmental conditions more rapidly by exchanging useful genes, and therefore if GTAs are indeed an innovation that could drive bacterial evolution.
2. Collect experimental evidence to refine and validate the model. The proposed model will predict the fraction of GTA particles that need carry a trait B to provide sufficient benefit to the bacterial population. Therefore, it may matter what genes in wild bacterial populations are packaged inside GTA particles, and at what frequency. To address this question, I plan to team up with Dr. Andrew Lang (Memorial University of Newfoundland, Canada), who experimentally studies GTAs in Rhodobacter capsulatus. We propose to examine DNA content of GTA particles collected from the coastal waters of Newfoundland. The sampling location is known to contain a diverse population of Rhodobacter relatives (Rhodobacterales), which will ensure presence of GTA particles in the sampled ocean water.
Specifically, viral concentrates (VCs) will be obtained from marine samples collected at Logy Bay, Newfoundland. Using standard protocols, seawater will be sequentially filtered through 1.6 and 0.22-μm filters to remove (most of) the cellular material, and viral particles purified and probed for the presence of Rhodobacterales GTAs. DNA from the VCs identified to contain GTAs will be extracted, processed and sequenced. Obtained sequence data that pass quality control will be taxonomically and functionally classified using both composition and similarity to publicly available marine cellular and viral genomes and metagenomes. Statistical analysis will be performed to assess any overrepresentation of specific genes and gene families in GTA particles in comparison to closely related completely sequenced GTA-hosting genomes and to marine cellular and viral metagenomes. Since the sampling location is known to contain diverse Rhodobacterales population (Lang, prelim. data), cellular DNA from the material collected on the 0.22-μm filters will also be sequenced in order to capture Rhodobacterales diversity that would complement available reference genomes.
Using this metagenome and reference Rhodobacterales genomes, sequences from GTA particles that match well-annotated genes in the reference genomes will be examined for elevated mutation rates. Such sequences may represent novel (and potentially adaptive) alleles of conserved genes. Some bacterial populations are known to contain mutator cells, which can be the source of novel alleles.
The proposed analysis would be an initial step in investigating if GTA-producing cells may similarly have such mutator phenotypes.
The collected data will provide us with a frequency distribution of specific (and potentially beneficial) genes in GTA particles. Finding of elevated mutation rates in this gene pool would support our hypothesis that GTAs may propagate novel alleles through a population.
The proposed work also has a potential to indirectly identify GTA production by marine microorganisms not yet known to do so.
Outlook. The proposed project will produce new insights into the evolution of a fascinating case of a possible virus "domestication" by a population of microorganisms. Cooperative population-level behavior can ultimately lead to the advanced multicellular life forms capable of complex intercellular communications. Therefore, the GTA system could become an exemplar of how a social trait emerges and evolves.
Winners of the 2014-2015 Neukom Institute CompX Faculty Grants Program for Dartmouth faculty have been announced with awards of up to $20,000 for one-year projects.
The program seeks to fund both the development of novel computational techniques as well as the application of computational methods to research across the campus and professional schools.
Dartmouth College faculty including the undergraduate, graduate, and professional schools were eligible to apply for these competitive grants. This years winners are:
The object of our investigation is the cartographic masterpiece and pioneering study of ancient Roman topography, The Forma Urbis Romae. Published in 1901 by Italian archeologist Rodolfo Lanciani, this highly complex map sums up the pluri-millennial history of urbanism in the Eternal City. Over a century after its original publication, the Forma Urbis remains the standard archeological reference to Roman topography even though it does not incorporate the host of archeological discoveries that have come to light since its creation. Our project will digitally remaster the Forma Urbis Romae, a process that will unleash myriad potentials that users of the analog original could only imagine: from customizable maps that isolate particular structures according to building type, era of construction, patron or architect, to hyperlinked references to the primary sources used by Lanciani himself to assemble this vast and encyclopedic map. All this will eventually be made freely available to students and scholars of Rome via a website and mobile application.
This task of bringing Lanciani's map into the twenty-first century was initially funded by an ACLS Digital Innovation Fellowship and began as a collaborative effort between scholars at Stanford University and University of Oregon. Since that moment, the scope of the project has greatly expanded, in part thanks to the introduction of resources and opportunities made available through collaborations with Dartmouth College. The team now wishes to capitalize on this potential collaboration and is seeking a CompX Faculty Grant to assist in the completion of the map. The expected outcomes of the project include the creation of a versatile resource—the digitally remastered map—that will comprise a foundational part of a future website and mobile application. These digital tools together will, in turn, revolutionize the visualization and analysis of Rome's built history. Needless to say, these ambitious goals will be accomplished only with additional major grants. A CompX Faculty Grant would provide time and money to work toward these other sources of funding.
To digitally remaster Lanciani's 1901 map of Rome requires a few essential steps:
1) Rasterizing the existing map. In essence, this task is akin to digitally gluing together the fortysix original plates to create a complex composite. This task has been accomplished already in full thanks to previous grants.
2) Vectorizing objects on the existing map. In this case, the basic goal is to redraw the thousands of features on the map with Adobe Illustrator so that each object—theater, stadium, or street, for example—can be separated from other objects around it in order to visualize it in isolation or along with other selected objects. Vectorizing the map will also permit users to zoom in to the level of a step or column without sacrificing crispness and precision. Of the forty-six plates, we have already vectorized thirty of the densest, which is to say about 80% of the vectorization process is done. Completing the remaining 20% will be the first task of the principal investigator, chief collaborator, and pair of student workers funded by a CompX Grant.
3) Geo-rectifying the map: in layman's terms, stretching the map so its components correspond to real space coordinates. Distortions that exist in Lanciani's original will be undone (or rectified) using a reference map produced by the SARA Nistri surveying company. This map, already in our possession, is the gold standard for accuracy. Thus far, only a dozen plates of the Lanciani map have been stretched into alignment with SARA Nistri, but these serve as a proof of concept for the task ahead. The actual geo-rectification will be accomplished with a program called MAPublisher, which merges our vectorized map (in Adobe Illustrator) with the SARA Nistri map (in ArcGIS, which is a software made especially for maps).
4) Mining the analog map for metadata to be hyperlinked to the new digital map. Throughout the original map, Lanciani referenced published excavation reports, manuscripts, and drawings to further inform the users of his map who were curious about a particular site. Tracking these references is cumbersome and requires having a rather rich library at one's disposal. Today, all of the references have entered the public domain and many have become digitally available online. We wish to make it possible for users to hyperlink from our remastered Lanciani map to the primary sources themselves, whether they are stored in digital libraries or on museum websites. This task will require students, skilled in Italian, to methodically go through the plates and link analog metadata with its virtual counterparts. 80% of the map still remains to be mined in this fashion.
Background and significance:
A hallmark of Latin literature is the extent to which authors self-consciously reference earlier works, situating themselves in a rich literary tradition extending from Homer through the poets of the Roman empire. These references create a vast network of "intertextuality" that often has profound consequences for literary meaning. As such, identification and interpretation of intertextual parallels has been a ubiquitous aspect of classical literary criticism for at least the past century. Virtually all critical commentaries on Latin texts contain long lists of putative intertextual references, and many seminal works of criticism (such as Knauer's Die Aeneis und Homer and Nelis' Vergil's Aeneid and the Argonautica of Apollonius Rhodius) are systematic studies of intertextuality between two major authors. Traditional approaches to intertextuality could be described as heroic but crude, involving manual crossreferencing of numerous texts and simple word searches guided by literary understanding. Although no computational method can replace the intuition required to make sense of the thematic implications of specific intertextual relationships, computation holds tremendous promise for enabling the discovery and organization of possible parallels on a grand scale.
The diversity of intertextual references is a major challenge for any method of detection, whether traditional or computational. Intertexts range from the obvious (direct quotation) to the extremely subtle (ironic appropriation of an earlier text or a reference that makes sense only in light of the whole literary tradition). Furthermore, the similarity may be verbal (repeated words), thematic (repeated meaning expressed with different words), or (for poetry) metrical. There have been successful attempts to theorize this diversity, leading to various "typologies" of classical intertextuality (Thomas 1986, Hinds 1998). An ideal computational method would be able to capture references that fall anywhere on this spectrum.
We propose to apply a suite of techniques from computational biology and machine learning to address this problem, including sequence alignment methods for the identification of verbal similarities, pattern matching for metrical features, and clustering for large-scale, unbiased characterizations of intertextuality. The central goal of the project is to develop methods that reveal information of real use to the literary critic, not just linguistic curiosities.
Aim 1—Robust, versatile identification of intertextual parallels in Greek and Latin literature:
The simplest instances of intertextuality involve local repetitions between two texts. The Tesserae project is a collaborative effort headquartered at SUNY Buffalo to develop methods for computational identification of certain types of intertexts (Coffee et al. 2012, Coffee et al. 2013). Their core accomplishment is the implementation of a web-based search tool (http://tesserae.caset.buffalo.edu/) for two-word phrases repeated in multiple Greek or Latin texts. The Tesserae methodology has been used to identify previously unrecognized allusions in Lucan, Statius, and other Latin epic poets (Coffee et al. 2012, Gervais 2014, Bernstein and Berti 2014). Repeated phrases, however, comprise only one (especially overt) class of intertextuality, and there remains a considerable need for search tools that can identify instances from other classes.
We plan to develop three search tools that will enable high-throughput identification of a much broader range of intertextual parallels. Our first tool, based on sequence alignment of n-grams (short combinations of characters or words of arbitrary length), will allow for identification of any short verbal parallel between multiple Latin or multiple Greek texts, not just repeated two-word phrases. Searching for homologous gene or protein sequences using the Basic Local Alignment Search Tool (BLAST) or related algorithms is essential to modern biology research (Altschul et al. 1990). We envision a "BLAST for classicists" that allows users to generate lists of n-grams similar to a phrase of interest, ranked according to a distance metric that measures character-by-character similarity. Our second tool will enable searches for n-grams with similar meaning but no repeated words. We will use the excellent electronic Greek-English and Latin-English dictionaries available through the Perseus Digital Library (http://www.perseus.tufts.edu/hopper/) in combination with nineteenth-century Greek-Latin dictionaries to create a "digital thesaurus" for meaning searches. It will be possible to undertake cross-linguistic searches using this approach, an important advantage given the profound influence of Greek authors on Latin literature. Finally, we will develop a search method for metrical intertextuality. Most Greek and Latin verse is highly regular, and scansion requires application of a small set of rules governing syllable length. We will use simple pattern matching to perform computational scansion of a large corpus of poems and plays, which users will then be able to search for all occurrences of a metrical irregularity or other feature of interest. Used together, these tools should recreate the combinatorial approach that most critics intuitively take to literary intertextuality. They will enable both investigation of specific hypotheses and broad comparisons of whole texts.
Aim 2—Unbiased profiling of intertextuality using machine learning:
Interpreting the output of numerous local searches is laborious and often dependent on prior assumptions. We propose the use of several machine learning algorithms for unbiased identification and analysis of many intertextual parallels. Clustering is a form of unsupervised learning that divides objects into groups so that the members of a group are more similar (by some metric) to each other than to objects in any other group. As such, clustering by a metric of linguistic or literary relevance has the potential to be a powerful method for large-scale identification of intertextuality. We will apply non-negative matrix factorization (NNMF), a robust, scalable clustering method developed by Lee and Seung that is designed for parts-based representations (decompositions of the input data into clusters with intrinsic meaning, such as clustering pixels from the image of human face into groups corresponding to the eyes, nose, and mouth). NNMF requires an input matrix, a simple example of which is a word-frequency matrix in which the columns are all texts or sub-texts of interest, the rows are all words appearing in those texts, and an entry a_i,j is the frequency of the ith word in the jth text. This matrix is then decomposed into two matrices of low rank that contain no negative entries, a word-topic matrix in which each column is a "meta-word" consisting of a topic and associated words, and a topic-text matrix in which each column reflects the distribution of meta-words within that text. The words comprising each meta-word are likely to be related intertextually; similarly, charting topics across texts should provide useful information about broad thematic similarities. To this end, we will implement other algorithms optimized for this "topic modeling," including a version of NNMF called probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA), to enable extensive profiling of thematic intertextuality (Yan et al. 2013, Blei et al. 2003). These thematic profiles will complement the local information that can be assembled from the search tools and from the word-topic matrices. Additionally, we plan to analyze matrices with other frequency data, including functional n-grams (syllable-length n-grams, which often capture aural effects in poetry), metrical features, and word co-occurrences determined using local searches. We selected NNMF because there is evidence that it is more robust and versatile than standard clustering methods such as k-means and hierarchical (Brunet et al. 2004). The proposed research will provide an opportunity to evaluate the efficacy of NNMF relative to other methods for an unusual application, which is of fundamental interest to machine learning research. Several of the primary collaborators have extensive experience with quantitative and computational modeling (including machine learning) for diverse applications (Field et al. 2013, Field et al. 2014, Dexter and Gunawardena 2013, Dexter et al. 2014).
Aim 3—Applications to Silver Latin epic and reception studies:
Although the methods discussed in Aims 1 and 2 should be applicable to a wide range of problems, initially we plan to explore two applications that are central to the current research interests of the investigator and collaborators and which build upon our past work on intertextuality (Chaudhuri 2013, Dexter 2013). PC has a longstanding interest in the epic poetry of the mid- to late-first century AD ("Silver epic") and has authored a forthcoming book on human-divine interactions in Silver epic (Chaudhuri 2014a). It is well-known among classicists that Silver epics, such as Lucan's Bellum Civile and Statius' Thebaid, draw heavily on the language, imagery, and themes of Vergil's Aeneid and Ovid's Metamorphoses, two epics composed at the end of the first century BC, and Seneca's tragedies, composed around the middle of the first century AD. Recent literary scholarship has tended to promote the influence of individual works (Hardie's The Epic Successors of Virgil is a standard example), and it appears that this approach is likely to continue in future scholarship (Raymond Marks' current project, for instance, is entitled The Epic Successors of Ovid). Traditional approaches cannot provide an unbiased and comprehensive view of intertextual connections across multiple texts, because the field of potential connections far exceeds the expertise of a single critic and the existing scholarship. In contrast, computational methods have the capacity to identify a much greater range of lexical and semantic connections, enabling the literary critic to see hitherto unnoticed trends against multiple contexts and to evaluate the potential literary significance of those results. This expansiveness should free the critic from simply rotating attention from one successor to another, as seems standard practice with Silver epic. As part of the proposed research, we plan to investigate the relative influence of the Aeneid, the Metamorphoses, and the ten tragedies attributed to or associated with Seneca on Silver Latin epic. All proposed computational methods will be deployed on this problem. We expect the data to support a move away from viewing Silver epic primarily through a Vergilian lens, and to reveal patterns of intertextuality that center on passages and scenes of particular importance across the earlier tradition.
As a natural extension of this goal, we plan to use our methods to better understand the representation of violence in Silver epic. Classicists typically regard these epics as making use of graphically violent scenes to a greater extent than their predecessors, a characteristic attributed to wider social changes in the first century AD that saw the significant rise of gladiatorial games and public punishments in the arena and to the heightened metaphorical significance of violence (Most 1992, Gervais 2013). By examining the semantics of violence and the relative concentration of such imagery across Greek and Latin epic, we seek to establish whether the traditional view of Silver epic is indeed correct and, if so, to identify the salient patterns that constitute the new aesthetic. Topic modeling should be especially appropriate for charting such thematic evolution.
Classical reception studies involves identifying and theorizing the influence of the classics on postclassical literature and culture. PC and JPD both have strong interests in reception studies (Brockliss et al. 2012, Chaudhuri 2014b, Chaudhuri 2014c, Dexter 2011a, Dexter 2011b), and JPD is currently working on a book project on Roman comedy and its later reception. On some level, reception can be thought of as an extreme form of intertextuality, in which a text is systematically adapted and re-imagined in a new cultural context. Plautus' Amphitryon is notable for having one of the most complex and extensive reception histories of any ancient comedy; the Archive of Performances of Greek and Roman Drama lists nearly 70 adaptations just from 1450 to the present. Some of these adaptations hew closely to the Plautine original, while others diverge radically. Several of the most different ones were influenced more by other post-classical versions (such as those written by Molière and Dryden) than by Plautus. Although several long surveys of the reception of the Amphitryon have been written (Sherro 1956, Lindberger 1956, Margotton and Huby-Gilson 2010), the structure of interactions between the various versions remains poorly understood. We plan to assemble a "phylogenetic profile" (Pagliarini et al. 2008) of adaptations of the Amphitryon using a mix of computational searches (for versions in Latin) and coarser metrics such as the presence or absence of entire scenes (for versions in modern languages). This profile will provide unusually precise information about the evolution of a classical text from antiquity to the present day.
Dissemination of results:
A simple web-based implementation of the core tools will be developed with assistance from the Neukom Institute. The interface will allow users to search for passages paralleling specific phrases of arbitrary length according to linguistic, thematic, or metrical similarity, and to deploy the clustering and topic modeling algorithms on texts of interest. Through collaboration with Neil Coffee at SUNY Buffalo, we hope to integrate these tools with the Tesserae project to provide a comprehensive resource for computational studies of intertextuality. A paper on the core methodology and sample applications will be submitted to a high-profile, interdisciplinary science journal by the end of the grant period. Additionally, we plan to give talks at one or more conferences to introduce classicists to the tools.
Integration with undergraduate education:
Dartmouth students taking Greek 30/Latin 30 (taught by PC) this summer will be asked to perform a small computational research project using one or more of the techniques developed by the investigators. Greek 30/Latin 30 is an advanced undergraduate seminar that focuses on two classical texts with numerous similarities (such as the Greek Argonautica of Apollonius and the Latin Argonautica of Valerius Flaccus). Accordingly, the course gives extensive attention to issues of literary intertextuality and is an ideal setting to introduce undergraduates to appropriate computational methods. Additionally, we plan to use the grant money to hire multiple Dartmouth undergraduates as research assistants for the proposed project (see budget statement for additional information).
Professor of Biological Sciences
Professor in the Ecology and Evolutionary
Biology Graduate Program
Biological Sciences at Virginia Tech
Freshwater lakes provide many services fundamental for human well-being, including drinking water, fisheries and irrigation. However, in many regions globally, lake ecosystem services are threatened by increasing cyanobacterial blooms. Scientists have noted an increase in the duration, frequency, and geographical extent of such blooms in many lakes; these blooms negatively impact food web structure, light penetration, provisioning of water for consumption and recreation, and property values. Moreover, because some cyanobacterial taxa can produce toxins, they can even pose a health risk to humans, pets, and livestock. Cyanobacteria are widely expected to dominate lake phytoplankton communities in deep temperate lakes under the warmer, more strongly stratified conditions predicted to occur due to global climate change. The dominant mechanism of climate change hypothesized to favor cyanobacteria is increased stratification, due to their ability to regulate their buoyancy in the water column using gas vesicles. Buoyancy regulation gives cyanobacteria a competitive advantage over most eukaryotic phytoplankton by allowing them to both access light near the water surface and nutrients at depth. When a stratified water column mixes following storm events, cyanobacteria are expected to decrease because mixing can propel cyanobacteria to depths at which the pressure causes gas vesicles to irreversibly collapse. After gas vesicle collapse, colonies are no longer able to control their buoyancy and can sink out of the water column. In addition, colonies experience severe light limitation and colder temperatures at depth, reducing population growth rate.
However, the paradigm that global climate change – in particular, increased stratification – will translate into larger populations of cyanobacteria in the water column overlooks the fact that in many temperate lakes, cyanobacteria are not present in the water column year-round: there is a period of the life cycle spent on the sediments, typically in dormant stages. Contrary to current expectations, we have recently reported that seasonal recruitment of cyanobacteria from the sediments into the water column tends to be higher in years with greater water column mixing, resulting in higher abundances in the water column. This new finding suggests that mixing events may be coupled to the life cycles and population ecology of cyanobacteria in ways that are far more complex than currently appreciated – and, moreover, that current predictions of increases in cyanobacteria with climate change must take a more complex series of causal pathways into account, especially lake physics.
We are about to submit an NSF pre-proposal that seeks to extend our understanding of how mixing affects cyanobacterial populations in deep stratified lakes by considering the effects of mixing throughout their life history. As part of that project, Bates mathematician Meredith Greer is building an agent-based model of cyanobacterial life history that will be coupled to realistic scenarios of physical mixing within our several of our focal study lakes. Here, we propose to begin the background studies needed to generate a working model of physical mixing within our two focal study lakes, which we can then couple to Greer's agent-based model. Specifically, we request a Neukom Institute CompX Faculty Grant in order to parameterize and then calibrate the lake hydrodynamics model IPH-ECO for our two focal study lakes, Lake Sunapee, NH, and Lake Auburn, ME, two lakes that provide substantial ecosystem services, including drinking water. With this model up and running for realistic climate scenarios, we should have a greater chance of securing NSF funding should we be invited to submit a full proposal in August 2014. A calibrated model will also jump-start several other projects in cooperation with our partners at the Lake Sunapee Protective Association (LSPA) and Lewiston-Auburn Water District.
Lake Ecosystem Model
IPH-ECO is an open-source, coupled hydrodynamic-water quality model that will allow us to manipulate the representation of physics and chemistry within a lake ecosystem, as driven by weather variables like insolation, air temperature, and wind. We chose this model because it is increasingly used within the limnological community and because its spatial complexity can be explicitly manipulated by the modeler as part of the modeling process. For example, we can calibrate for "0-D" (a point model that assumes the lake is completely homogeneous), "1-D" (a model for one site, with many depths at that site), and "3-D" (a spatially explicit model for the entire lake, including many depths at each site) applications. This flexibility provides the ability to make within-lake spatial comparisons – which will be particularly important in Lake Sunapee, which has very complex bathymetry and many isolated coves. The goal of this project is to have a working "1-D" version of an existing hydrodynamic model (IPH-ECO) for both lakes by late summer and ambitiously to be running realistic simulations, ideally in "3-D" by the end of the fall 2014.
Importantly, IPH-ECO is open source with a graphical user interface, facilitating use by both programmers and non-programmers. During this Neukom CompX project, we would work with an undergraduate intern (either an off-term enrolled student or a recent A.B. graduate) to create lake-specific versions of IPH-ECO for Lake Sunapee and Lake Auburn. Our goal is to identify a candidate with a programming background but strong interests in ecology or environmental science; experience with fluid dynamics and/or engineering would be a real plus. Victoria Tersigni '14, an engineering-modified-with-biology major, is interested in the position if it were to be funded; her expertise is a good match to what we will need.
We will calibrate IPH-ECO, first at 1-D and then 3-D, for within-year dynamics at both Sunapee and Auburn, using standard methods for configuring and then calibrating for each lake. First, we will configure the model for Lake Sunapee using data from the Lake Sunapee Protective Association buoy, which measures weather variables and lake temperature at ~1 m intervals through the water column at 10-minute intervals; we will use data from the summers of 2007-2011 for calibration and 2012-2014 for validation. Lake Auburn has a similar buoy, but it was just installed this past summer. As such, we will calibrate the model for Lake Auburn using summer 2013 data for calibration and summer 2014 data for validation. During validation, simulated and observed data will be compared using standard goodness-of-fit metrics and system identification techniques. Co-PI Cayelan Carey '06 spent her postdoc working with IPH-ECO for Lake Mendota, WI, and her expertise will be essential as we get this project up and running. As such, we are requesting funds to bring Carey back to campus in early summer 2014 to help the intern and me begin work.
Once the lake models are successfully calibrated, we will simulate a range of realistic future nutrient and climate scenarios for northern New England lakes. These scenarios will be created from recent predictions for increased air temperatures and altered precipitation patterns; our goal is to simulate how lake stratification and mixing might change under different climate change scenarios. Numerical simulation is a valuable tool often used to predict the effects of climate, land use, and nutrient loading on lakes.
Ultimately, we will couple these simulations of future lake physics to Greer's agent-based model, but as that model is just beginning development, it will likely be a few years before we get there. In the long run, being able to couple realistic lake hydrodynamics with the agent-based model will enable us to provide state-of-the-art predictions about how climate change will affect cyanobacterial blooms in two lakes that provide critical services for their communities.
This project seeks to advance investigations with techniques for making animated percussion by creating a computer program that emulates the optical soundtrack's response to images. This program will serve as a tool for both experimental composers who wish to research the sonic capabilities of the optical medium by shooting images onto the optical soundtrack portion of the 16mm filmstrip. The application also has potential to serve as a valuable tool in the world of film archiving through the possibility of scanning and synthesizing the sound of shrunken or damaged film that can no longer run through the 16mm projection mechanism. Building upon a longtime tradition of finding an analogue between image and sound, this project fuses the seemingly separate worlds of analog and digital into a new hybrid mode of conducting artistic research.
The final result of this project entails creating a computer application that simulates the response of the 16mm optical track to changes in contrast. The computer program will allow the user to compose and edit music in time as well as export both sound and video files. After collecting examples of sound films exhibiting a wide range of possibilities and creating a replica of an optical sound head, the team will create a digital library, categorizing by simple waves and patterns alongside complex, overlaid (multi-tracking sequences). The prototyping stage will entail creating a visual-sound synthesis environment in Max/MSP/Jitter to determine how to import images and synthesize sound. Paying specific attention to the transients and bandwidth of the possibilities the team will create/test plans for a useful interface that can edit synthesize the image/sound relationship, edit in time, and export audio and video files.
Students in SP15 Handmade Strategies for 16mm: Experiments in Sound and Image will perform helpful research towards this project. Exercises with capturing more data through students' own drawings will grow the sound database. And, basic composition assignments in animated 16mm optical sound percussion will test the functionality of the computer program that synthesizes images into sounds. Students will create a collection of potential sounds and then use this bank of sounds to compose a series of animated rhythm exercises. The rhythmic exercises will take the form of 256-frame loops to play and phase rhythms on multiple projectors as an installation: projector drum circles. Students will perform these results at the Green Fish (graduate student music series taking place in One Wheelock). In addition to conducting experiments stemming from indexical animated percussion seeking out information on the systems that others have devised in optical sound technology is necessary: 35mm soundcards, handmade sounds, etc. Several lectures, performances, and screenings will take place throughout the 2014/15 academic year under the umbrella of the EYEWASH series that outline many of the historical examples of experiments in sound and image. The Department of Film Studies and the Hopkins Center for the Arts will fund these events.
Hopkins Center for the Arts:
HOP Film - Let Your Light Shine
Hood Museum of Art:
Flim Screening - Posthaste Perennial Patterns: Fabriflicks
Motivation: Protein-protein interactions (PPIs) govern a host of essential cellular activities. They permit the formation of macromolecular assemblies that coordinate complex processes, such as neuronal signaling (at the post-synaptic density) or regulated protein synthesis (at the ribosome) and turnover (at the proteasome). PPIs also help to guide newly synthesized or mobilized proteins to their appropriate locations within the cell. One of the most abundant PPI families exploits modular peptide-binding domains named for three representative members: PSD-95, Dlg1, and ZO-1. PDZ domains bind to peptides, usually at the C-terminus of their partners, and then serve as adaptors, guiding the partner proteins to the appropriate destination or helping them to assemble into a macromolecular scaffold once they arrive. In many cases, these PDZ-mediated interactions are therapeutic targets: a deleterious outcome could be ameliorated if an individual PDZ domain could be selectively inhibited. However, because the target interactions are weak and distributed across a relatively shallow interface, screening for small-molecule inhibitors has typically yielded only very weak inhibitors, unsuitable for further therapeutic development. Building on an extensive suite of structural and biochemical data, we propose to exploit computational approaches to engineer a new class of PPI inhibitors with promise in the treatment of cystic fibrosis (CF).
Opportunity: Over the past several years, we have validated a particular PDZ interaction as a therapeutic target for the treatment of patients with CF, the most common fatal recessive genetic disorder in populations of European origin. In most CF patients, a mutation called ΔF508 inactivates a cellular ion channel (CFTR), perturbing the flow of chloride and bicarbonate and thus dehydrating mucus membranes. In the lung, inhaled bacteria are typically caught in the mucus and swept up towards the throat for clearance. In CF patients, this mechanism becomes clogged, leading to chronic, drug-resistant airway infections. Ultimately, the associated tissue inflammation leads to airway damage and respiratory failure – the proximal cause of death for 80-90% of CF patients. However, the ΔF508 mutation offers a glimmer of hope. It does not completely eliminate CFTR, but renders it unable to fold, and prone to rapid degradation once folded. Drugs to solve the folding problem are in phase 3 clinical trials (VX-809; Vertex Pharmaceuticals), but they provide limited benefit. We have shown that a class of PPI inhibitors, designed specifically to address CFTR degradation, can act in concert with VX-809 to provide an enhanced benefit. When both drugs are coupled with a third, FDA-approved compound, we expect that patients should achieve >40% of wild-type CFTR activity, close to the level seen in heterozygous CF carriers (one copy of mutant CFTR), who are relatively asymptomatic.
Challenge: Our PPI inhibitors block the PDZ domain of the CFTR-Associated Ligand (CAL). Upon binding, CAL leads CFTR into a degradation pathway. We have demonstrated that by blocking this degradation, inhibitors of CAL substantially increase the amount of functional CFTR in the cell. As noted above, inhibitors of the CAL:CFTR interaction act in combination with folding correctors such as VX-809. However, our current inhibitors are peptide-based. Although a few peptides have been developed as therapeutics, they are generally considered to be poor drugs, unable to enter the cell efficiently and subject to proteolytic degradation when they do. Thus, we wish to identify drug-like small-molecule CAL inhibitors. Unfortunately, it has proven difficult to identify small-molecule inhibitors of PPI interfaces, which are generally shallow and have low density of functional groups. We nevertheless established a screening protocol and evaluated >800,000 small compounds for the ability to disrupt the CAL:CFTR PPI. Several hits were obtained, one of which exhibits remarkable properties. PRC1163 displaces CAL more efficiently than a peptide, even though it binds weakly (Kd ~ 50 μM) and at a site distinct from the peptide-binding groove. It also has the ability to covalently modify CAL, attaching itself irreversibly to a single cysteine residue adjacent to the CAL PDZ binding cleft. Interestingly, three of the top ten drugs act covalently, supporting such an approach for CAL. Yet traditional medicinal chemistry offers limited prospects when presented with a weak PPI scaffold.
Approach: Thus, our goal is to utilize state-of-the-art fragmentbased computational screening approaches to identify moieties that will modify our scaffold and to test them for simultaneous binding by NMR spectroscopy. This work will build on an established NMR collaboration with Prof. Dale Mierke (see letter). In addition to functional validation through multiple assays, we have obtained both NMR footprinting and crystallographic structural data of PRC1163 in complex with CAL. The proposed studies also build directly on a preliminary computational exploration of the CAL binding site in my laboratory using AutoDock. We validated our computational strategy by modeling inhibitors for which we have NMR footprinting data and observed excellent concordance. These initial experiments highlighted novel binding pockets in the vicinity of the CAL peptide-binding cleft, and we propose here to explore and score them with a stereochemically diverse library of probes, available for screening.
In addition, we propose a de novo collaboration with colleagues at Medical Research Council (MRC) Technology, a UK non-profit that offers collaborative access for target development. With MRC support we will screen a library of commercially available fragments optimized for three-dimensional character as part of the 3D Fragment Consortium. This approach significantly expands upon the libraries available at Dartmouth. At each site, we will employ an iterative workflow. Established, state-of-the-art in silico scoring algorithms (AutoDock; GROW; 3DFIT) will be used to predict the interaction propensity of fragments with the high-resolution crystal structure of the covalently modified complex that has already been determined by X-ray crystallography. As candidates are prioritized by computational scoring, they will be purchased, and their interaction with CAL assessed by NMR HSQC footprinting experiments in the presence and absence of PRC1163. Compounds that show HSQC offsets in the vicinity of the PRC1163 scaffold will be cocrystallized and evaluated for synthetic attachment to the scaffold. Synthetic elaboration, although outside the scope of this application, will be performed with Dr. Alex Pletnev in the Chemical Synthesis Core Facility. Comparison of results, both across platforms and with the results of the experimental screening, will be used to optimize the parameters of subsequent searches, enhancing the probability of success in an iterative paradigm. The project will also cross-fertilize the MRC and Dartmouth computational screening communities and strengthen our understanding of their distinct approaches.
Impact: Ultimately, our goal is to obtain small-molecule inhibitors with an affinity for CAL tighter than 1 μM. The problem is challenging, but our extensive structural data, diverse chemical libraries, and stateof- the-art modeling and scoring approaches provide an unusually robust foundation for the proposed computational analysis. If successful, our studies will demonstrate that computationally based chemical elaboration can overcome a key hurdle that currently limits the affinity of smallmolecule PPI inhibitors. Given our existing data validating the CAL:CFTR interaction as a CF target, a high-affinity small molecule inhibitor would represent a major milestone towards realizing a novel treatment paradigm for this devastating disease. Furthermore, it would offer dramatic proof of the principle that PPIs are indeed 'druggable', providing access to a large and untapped source of therapeutic targets that have been implicated in cancer, pain, Alzheimer's disease, and other major public health threats.
Adjusting our behavior in response to reward feedback from the environment is crucial for survival. This feedback is signaled throughout the brain via the neurotransmitter dopamine (DA). Recently, there have been numerous studies on the influence of reward on choice behavior, neural representation of reward, and dopaminergic modulations of neural processes; however, it is still unclear how dopaminergic modulations alter various aspects of our behavior, from how we make decisions to how we perceive the world. Exploring this question is a challenging task for several reasons. Firstly, DA modulates activity in many brain areas and elucidating its effects on behavior requires understanding interactions between those areas.
Secondly, changes in neural activity per se do not inform us about the level of the reward-related modulation that results in behavioral changes, and so we cannot use neural data (and some arbitrary criteria) alone to pinpoint the timing and flow of relevant modulation across brain areas. Finally, whereas there is a specific timescale at which reward influences a given behavior, DA affects neural processing at various timescales (i.e. fast changes in neural activity but slow synaptic changes), leading to great difficulty in linking behavioral effects of reward to dopaminergic modulations at synaptic and cellular levels.
At this point in time, only a concerted system-level computational and experimental effort can move us forward in understanding the connection between dopaminergic modulations of neural processes and modulation of behavior by reward. We now have the computational tools to construct detailed, multi-area network models that can reveal the interaction between multiple brain areas and link dopaminergic modulations of activity in those areas to changes in behavior. Such models can be used to guide and design new experiments to measure timescales at which reward modulates neural activity and behavior. We have used similar tools to investigate how attentional signals are formed across multiple brain areas [Soltani & Koch 2010] and to link dopaminergic modulations in a single brain area to changes in choice behavior [Soltani & Wang 2006, Soltani et al, 2006, Soltani & Wang 2010; Soltani et al 2013].
For the Neukom CompX faculty grants, we are proposing the following specific aims:
Aim 1) To link behavioral effects of reward to dopaminergic modulations at synaptic and cellular levels, we will construct a spiking network model to simulate a task (see aim 2) that involves interactions between visual, cortical, and subcortical areas. The model will comprise a network of areas which includes visual (V1, V2, and V4), prefrontal (frontal eye field, FEF), and parietal (lateral intraparietal, LIP) cortex, as well as superior colliculus (SC). We will use this model to:
a) explore the role of interactions between simulated brain areas in saccadic choice and visual perception
b) find proper methods and criteria to determine the onset and propagation of relevant rewardrelated modulation (i.e. modulation that results in behavioral changes).
c) elucidate the contribution of various dopaminergic modulations in simulated brain areas to value-guided saccadic choice and reward influence on perception
We will specifically test two alternative hypotheses for the emergence of observed reward-modulated neural activity in primary visual areas (V1,V2, and V4) [Serences 2008] and their possible role in value-guided choice and perception: 1) reward modulation in primary visual areas is merely due to feedback from FEF or other higher cortical areas (e.g. LIP) that carries information about reward to guide the saccadic choice; therefore, such modulation of primary visual areas only mediates the effects of reward on perception, and 2) reward modulation originates in primary visual areas and is amplified in higher cortical areas; therefore, such modulation of primary visual areas is crucial both for value-guided choice and for reward influence on perception.
The reason for focusing on saccadic choice (i.e. choosing between choice options by moving the gaze) is that the saccadic system is one of the most well characterized systems in the brain, both in terms of the involved brain areas and connections between them and in terms of the anatomy and distribution of dopamine receptors. This is why the saccadic system has become the standard system for studying decision making in non-human primates [Glimcher 2003]. Consequently, there is a wealth of neural data measured during various saccadic choice tasks (from all brain areas we plan to simulate), the data which we use to construct and constrain our computational model.
Aim 2) To compare the timescales at which reward feedback affects choice behavior and perception, we will perform a human psychophysics experiment to simultaneously measure the effects of reward on choice and perception, using the same task which we plan to simulate (see below). We hypothesize that the timescale at which reward feedback guides saccadic choice reflects a mixture of reward modulation in all involved brain areas, whereas the timescale at which reward modulates perception is mostly related to reward modulation in sensory areas.
The psychophysics experiment will require the subject to select between color and oriented visual targets (color, sinusoidal gratings arranged in an array) by making saccadic eye movements. Selection of a target will be rewarded with a certain probability, but on a given block of trials only one of the two target features (color or orientation) carries information about the reward probability (unknown to the subject). The subject has to fixate on the selected target for a short amount of time before moving to the next target, and has a fixed amount of time for exploring all targets in an array (making them likely to fixate on highly valuable targets first). This design provides a naturalistic foraging environment for the subject and enables the subject to learn about the reward-relevant feature over time. By fitting the choice behavior of the subject (using reinforcement learning models) we can estimate the timescale of reward influence on choice. After foraging through an array of visual targets, the subject will perform a color or orientation discrimination task in which she/he has to judge whether the color or orientation (the two features that could predict reward) of a given visual target is closer to one of the two alternative color or orientated patches, respectively. A differential change in the subject's performance on color or orientation discrimination over time, enables us to estimate the gradual effects of reward on perception.
We expect that in one year we will develop the proposed model (aim 1) and collect the experimental data (aim 2) which will be used to refine the model. The research proposed here will serve as the first stage for developing large scale computational models that can simulate neural activity and behavior in various value-guided tasks. Such models would be invaluable for understanding how cognitive functions are influenced by reward. Moreover, the experimental results can be used for the design of fMRI studies to measure the emergence of reward modulation across brain areas.
We believe that ultimately this concerted computational and experimental effort can link the influence of reward on behavior to dopaminergic modulations at cellular and synaptic levels, and provide a new way for studying the brain as an integrated system. The outcome could also be used for better understanding of many psychological disorders that involve dopamine and the reward system such as attention-hyperactivity-disorder (ADHD).
Existing methods for object-class recognition in pictures can be broadly grouped into two categories:
1) whole-image classifiers, which operate on a holistic representation of the input photo, and 2) detectors, which instead decompose the picture into a large number of regions or subwindows individually tested for the presence of the target object. The detection approach provides several obvious benefits over holistic classification, including the ability to localize objects in the photo, as well as robustness to irrelevant visual elements, such as uninformative background, clutter or the presence of other objects. However, while whole- image classifier can be trained with image examples labeled merely with class information (e.g,, "chair" or "pedestrian"), detectors require richer labels consisting of manual selections specifying the region or the bounding box containing the target object in each individual image example. Unfortunately, such detailed annotations are expensive and time-consuming to acquire. This effectively limits the applicability of detectors to scenarios involving only few categories. Furthermore, these manual selections are often rather subjective and do not provide optimal ground truth regions for training detectors.
In this proposal we introduce self-taught object localization, a new machine learning framework to localize objects with minimal human supervision by leveraging the output of whole-image classifiers, which are trained without object location information. The key-idea is to analyze how the recognition score of these classifiers varies as we artificially "gray-out" or obscure different regions in the image. Intuitively, we expect that when the region containing the object is obscured the classification score will drop significantly. This strategy allows us to compute for each input photo a "heat map" encoding the probability of the object presence at each individual pixel. The heat map can then be used to calculate the region or the bounding box containing the object of interest. Thus, this approach can yield object location information to be used as training data to learn a new object detector, which will focus on recognizing the regions most likely to contain the object, rather than the full picture. By doing so we effectively replace the traditional manually-selected bounding boxes with regions automatically estimated from training images annotated only with object-class labels. We stress again that, unlike manual region annotations, category labels are easy and inexpensive to obtain even for a large number of training images. If successful, this framework would enable scalable training of object detectors at a much reduced human cost, since no manual labeling of regions is needed. Potentially, this may even produce better detection accuracy since the localizer is trained on discriminative regions unspoiled by subjective human labeling biases. Below we describe in further detail the design of our technical approach.
Image classification model. A critical choice of our approach is the whole-image classifier generating the spatial heat maps. For this task we plan to employ the deep convolutional neural network of Krizhevsky. This model is well suited to our goal since, although it performs holistic image recognition, it has been shown to be impressively robust to clutter and multiple objects. It outperformed by a large margin all the other entries in the 2012 ImageNet recognition competition. Our proposed approach will allow us to determine for each input photo, the bounding box that is most likely to contain the object classified by the network. The localizations obtained via region obscuration will then be used to retrain the deep convolutional network as an object detector. This can be viewed as a process of self-taught localization, where the output of the current deep network is fed back to the training procedure in order to learn a new network that is tuned to recognize the subwindows that are most likely to contain the objects of interest.
Localization via obscuration. The building block of our localization approach is a procedure that iteratively obscures different regions of the image and analyzes the corresponding change in the classification score of the whole-image classifier. Obscuration will be achieved by replacing the RGB values of the pixels in the obscuration region with their average color values computed from the entire training set of images. This effectively eliminates image-specific information in the region. In order to reduce the number of neural network evaluations for a given input image, we plan to obscure a small set of candidate regions, the ones deemed most likely to contain objects. We have performed preliminary experiments using a bottom-up hierarchical image segmentation method to produce the candidate regions. This algorithm produces multiple (coarse-to-fine) segmentations of the image, which partition the photo into segments, such that each segment groups together contiguous pixels having similar color appearance. These segments can be used as obscuration regions. For each obscuration region, we compute the difference between the classification score of the original image and the one with obscuration, normalized by the area of the region. These normalized classification scores are stored in the heat map. If a pixel is included in more than one obscuration region, we assign to it the average of the normalized classification scores. Note that in order to produce the heat map, we will consider only the classification score of the network output unit corresponding to the ground-truth object-class label (the neural network of involves 1000 output units, one for each of the 1000 object classes of ImageNet). For example, if the training image is labeled as "car", we would average the scores of only the "car" output unit. The use of class labels to localize objects in training images is critically necessary, especially for photos containing multiple objects, each involving a different region to be localized. Figure 1 shows a sample image, and the heat map produced via our obscuration technique, given the class labels ("fish"). Note that once the deep network localizer has been learned, we can apply it to simultaneously localize and recognize objects in new unlabeled images either via an exhaustive sliding window approach or, more efficiently, by evaluating the model on a reduced set of candidate object regions. Then, a region producing high score for one of the 1000 output units will be declared as containing that corresponding class.
In order to retrain the neural network as a detector, we need to compute from each heat map a binary mask or bounding box containing the object of interest. In principle this can be done by thresholding the values in the heat map. However, this may produce several small disconnected fragments and spurious pixels. To overcome this problem we plan to use the pixels with large heat map value as initialization for the GrabCut segmentation algorithm. This would refine this initial labeling using the original RGB pixel information by encouraging the pixels of the foreground segment to have similar color values. The last column of Figure 1 shows the final object segment computed from the heat map in this fashion.
Conclusions. The successful execution of this proposal will enable object localization with the most accurate image recognition system known today, which currently can only predict the class of the object in the image but not its location. Furthermore, we conjecture that our framework will also yield an improvement in the recognition accuracy of the system, as it will allow the algorithm to factor out irrelevant background and clutter. Most importantly, this proposal describes a strategy to achieve these objectives without the need to acquire additional time-consuming manual annotations. Instead, it exploits the ability of the system to learn from its own output. While a preliminary version of localization via obscuration (given the class label) has been implemented and has shown promising qualitative results, this project requires using the data generated in this fashion to train a neural network of 60 million parameters as a detector on large-scale image datasets, such as ImageNet. As further discussed in the budget description, such task cannot be performed using the computing infrastructure currently available to the PI and it requires the acquisition of a new high-performance GPU-based system, for which funds are requested.
• IEEE Winter 2016 presentation/publication This article will be presented and published at the IEEE Winter Conference on Applications of Computer Vision.
Last Updated: 9/26/16