Sponsors
 Amazon Robotics
Entrants

Automatically Generated Digital Memories from Mobile Geolocation Data  Linda Park
The research examines the effectiveness of Rewind compared to photos through three virtues of memory: memorability (what aspects of the Rewind affect the ability of a person to remember a place or trip?), believability (how accurately do the Rewind reflect a person’s memory in the eyes of the participants?), and desirability (what factors are inherent to participants’ decisions in preserving artifacts created by the system?). The goal is to investigate users' attitudes and behaviors surrounding digital memories of places visited and trips taken, and to explore what makes the digitallymediated memories generated by the Rewind system representative and desirable. To understand what makes a digital memory meaningful to a person, a withinsubject comparison of sequence of images produced by the Rewind and static streetlevel images was carried out. By comparing to photos – today's prevalent form of digital memory media – we can identify the features that make the Rewind, and digitallymediated memories in general, meaningful and representative as a memory of a trip.

(PEOPLE'S CHOICE) Converting Video into 3D Mode  Chris Chen
The goal of my research project is to find a method of converting 2D video into a 3D model. Just as stereo visionhaving two cameras in different positionsallows us to perceive depth, so does video because the movement of the camera and objects in the video allows us to see different viewpoints of an object. This allows us to have a sense of what the 3D shape of the object is like.

Extracting useful information from WebGazer’s User study  Justin Zhang
WebGazer is a browserbased eyetracking library that uses a computer’s built in webcam to infer the location of a user’s gaze. Currently, user studies are being run to collect data on the accuracy of WebGazer. This includes information such as the location of cursor clicks, videos of each user’s face as they complete tasks, and the time stamp of each action. My research deals with extracting the most pertinent information out of these data files, performing statistical analysis, and discovering the most useful trends or anomalies in the data.

Algorithms for Progressive Histograms  Steffani Gomez
Over the years, hardware has advanced to expand memory capacity and technology is now able to collect terabytes or even petabytes of data, with datasets growing alongside in size and dimensionality. As these datasets become more prevalent, there is a dire need for visualizations and the algorithms behind them to be incremental. Algorithms and their visualizations can no longer accommodate many complete datasets because of time and space constraints that the characteristics of the dataset imposes; on a dataset on the scale of gigabytes, visualizations can take tens of minutes, or many hours to compute. Incremental visualizations help alleviate this problem by providing an initial visualization and then updating that visualization as more and more data is known or streamed in. We wanted to focus on histograms, a common visualization that attempts to capture the numerical distribution of a dataset as best as possible, with applications in nearly all domains.

(FIRST PLACE) De novo structure prediction with density guided optimization and particle belief propagation  Roshan Rao, Jason Pacheco, Erik Sudderth
3D protein structure prediction is a seminal problem in computational biology. Successful prediction of protein structure could lead to significant improvements in drug design, such as the production of enzymes targeted to perform some specific function. Recent advancements in a technology known as CryoElectron Microscopy (CryoEM) have allowed for the creation of nearatomic resolution electron density maps. These maps have several major advantages over the common alternative of xray crystallography. Xray crystallography requires a crystallized protein; creating this is generally a slow and expensive process, and for some proteins can be impossible. However, CryoEM maps have poorer resolution (often greater than 4 Å) which prevents current tools for automatic reconstruction on Xray crystallographic data from converging to the correct model on CryoEM data. The CryoEM protein structure prediction problem takes as input a sequence of amino acids and a 3D electron density map and seek to output highly accurate locations for every atom in the protein. Current methods for 3D structure reconstruction specifically from CryoEM data rely on either handconstructed initializations or extremely time consuming preprocessing. This thesis proposes a fast method for determining 3D protein structure from CryoEM data based on the Diverse Particle Max Product (DPMP) algorithm developed by Pacheco et al. This work extends the Rosetta protein folding library, a third party software produced by UW. We formulate the structure prediction problem as MAP inference on a complete pairwise Markov Random Field (MRF). As DPMP is an iterative algorithm, we note that at each iteration many of the edges in the MRF are noninformative and then propose a method for dynamically estimating a sparse subgraph on which to perform inference. Finally, we define three proposal functions for sampling new particles that help to refine the initial estimate. When testing our algorithm on protein 4NIB, it recovers the correct structure with RMSD < 0.5 Å when provided with a 1.3 Å electron density input and a strong initialization (random walk with large variance around the correct structure). We are in the process of testing using lower resolution inputs and are exploring different methods of initialization.

Object Detection Algorithm for Robots  Hong Jun Choi
Object delivery is an important problem for service robots. However to deliver an object, a robot must map between a word, such as ``wooden spoon'' and an object model capable of detecting,localizing, and ultimately picking up the object. Existing approaches include constructing 3D mesh of an object using robot's sensor to find an optimal grasping point\citep{3d_mesh_grasping} or \citet{oberlin15}'s instance based object modeling that enables robots to autonomously collect data until optimal model with grasping point is found. However, while theses methods successfully create robust methods for object grasping, the task of mapping words to object along with localization needs to be solved.
In this thesis, we propose object detection algorithm can be applied to robots  more specifically Baxter. We focus on creating a object detection algorithm that is feasible in a setting with limited labeled data. We combine category agnostic object search algorithm along with convolution neural network to create simple but performant object detection software. Furthermore, in order to develop a robust object detector that does not require thousands of training images, data augmentation pipeline that creates multitudes of synthetic data has been developed. Finally, we propose an automated software pipeline for learning new object categories for detection. We demonstrate that this approach enables a robot to create a model for an object category, and use that model to detect, localize, and manipulate the object.

(THIRD PLACE) Exploring Interaction with Dynamic Interiors  Arielle Chapin
Interactive architecture and interiors present exciting new opportunities for customizable and adaptable spaces, but few explorations into this topic attempt to understand how these spaces should be designed for interactions. This work presents Imprint, a prototype for an Interactive Wall that would exist inside of a dynamic living space, that was designed and built with the intention of testing out different models of interaction. Participants gave feedback on two iterations of Imprint. Based on the observations of an interaction trial and discussions, each interaction mode was found to fit better with certain contexts of an interactive surface. The connections between the interaction modes and contexts help build a greater understanding of interacting with various dynamic spaces, which can be used in designing new kinds of interactive interiors that users can easily adapt to.

Pong  Tyler Devlin
pong is a freely available software package, released by Behr et al. (2016, Bioinformatics), for postprocessing output from clustering inference using population genetic data. It combines a a networkgraphical approach for analyzing and visualizing membership in latent clusters with an interactive D3.jsbased visualization. pong outpaces current solutions by more than an order of magnitude in runtime while providing a userfriendly, interactive visualization of population structure that is more accurate than those produced by current tools. Thus, pong enables unprecedented levels of scale and accuracy in the analysis of population structure from multilocus genotype data.

Inferring User Interest from Interactions to Recruit Crowd Editors for Structured Data Upkeep  Abraham Peterkin
Traditional crowdsourcing platforms, such as Amazon Mechanical Turk, support an ecosystem where requesters can post microtasks for a number of use cases, such as data collection and verification. While this is practical to generate an initial dataset, the long term upkeep of the dataset can prove exponentially difficult. Long term repeatable tasks for data upkeep through Amazon Mechanical Turk are impractical due to increased cost, time and accuracy risks. The tasks and costs to employ crowd workers increase as the dataset grows larger. Given this, how can we build a system that scales to maintain large sets of structured data in a cost and time effective manner?

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear grammars are where Parsing Becomes Hard!  Rajesh Jayaram
In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(nω) for parsing where ω ≤ 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n3) algorithm is unlikely to exist for the general parsing problem.The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(nω) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this nω hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of nonlinear productions, k∗, or the number of nested nonlinear production, k, used in the optimal derivation. Explicitly, we give an additive O(k∗γ) approximation in time O(G(n2 +n3γ3 )) and an additive O(kγ) approximation in time O(G(n2 +n3γ2 )), where G is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k∗ are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well. These are the first algorithms that break the fast matrix multiplication barrier for designing any nontrivial approximation algorithms for the language edit distance problem. Moreover, the algorithms are based on a generic recipe of approximating dynamic programming which we believe will find many applications.

(SECOND PLACE) Accurately and Efficiently Interpreting HumanRobot Instructions of Varying Granularities  Siddharth Karamcheti
Humans can ground natural language commands to tasks at both abstract and finegrained levels of specificity. For instance, a human forklift operator can be instructed to perform a highlevel action, like "grab a pallet" or a lowlevel action like "tilt back a little bit." While robots are also capable of grounding language commands to tasks, previous methods implicitly assume that all commands and tasks reside at a single, fixed level of abstraction. Additionally, those approaches that do not use abstraction experience inefficient planning and execution times due to the large, intractable stateaction spaces, which closely resemble real world complexity. In this work, by grounding commands to all the tasks or subtasks available in a hierarchical planning framework, we arrive at a model capable of interpreting language at multiple levels of specificity ranging from coarse to more granular. We show that the accuracy of the grounding procedure is improved when simultaneously inferring the degree of abstraction in language used to communicate the task. Leveraging hierarchy also improves efficiency: our proposed approach enables a robot to respond to a command within one second on 90% of our tasks, while baselines take over twenty seconds on half the tasks. Finally, we demonstrate that a real, physical robot can ground commands at multiple levels of abstraction allowing it to efficiently plan different subtasks within the same planning hierarchy.

Using Biomarker and Conversational Data to Predict Mood  Sachin Pendse

Investigating Abstraction Transfer with Generative Adversarial Networks  Aaron Gokaslan
While recent advances in generative adversarial network (GAN) networks have allowed for realistic texture generation and style transfer, geometry based reconstruction remains an under researched area. We demonstrate an effective approach to stylize domain specific images. By learning cross domain relations between realistic and stylized images, our GAN learns various traits that drastically changes the shape of the corresponding image. Existing techniques fail to transfer style when geometry changes significantly. Our approach builds off prior work by utilizing a multiscale structural dissimilarity index as a reconstruction loss function to allow for geometric variations. The loss function takes advantage of the area statistics provided to facilitate shape change. We showcase the results of our network when evaluated on a dataset of human and collected anime style faces. Our deep learning architecture called GANime converges when other techniques fail and reduces the noise in the reconstructed version of the image.

Jay Khurana

Shivam Gandhi