REVIEWER #1 Rating: Good Review Summary In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to intellectual merit. This proposal seeks funding to carry out a series of controlled experiments and informal observations to answer the question: to what extent and how do various aspects of display fidelity impact the utility of a virtual reality system for making scientific discoveries? The project will leverage a unique resource available to the PI: an immersive display system called the YURT that is capable of presenting visual stimuli at a resolution that nearly matches the visual acuity limits of human vision. The core concept in the proposed work is that all of the studies will use the same underlying hardware (the YURT) but in some of the tested conditions, various aspects of the display fidelity will be artificially reduced, such as: the display resolution (from each pixel subtending one arc-minute 1' to each pixel subtending 4'), the image contrast (from 500:1 to 100:1), the display area (from 360˚ to just a few degrees, both in a constant sense relative to the room and relative to the orientation of the head), and the use of stereo vs mono display. In addition, a range of common artifacts to large projection displays will be simulated, such as: the presence of bezels, geometric discontinuities, and image discontinuities. The goal of the planned studies is to help inform the design process for other institutions that seek to install large immersive display systems. Presently, design decisions such as whether to wallpaper a room with flat panel displays, which provide high brightness and contrast but monoscopic images, which are separated by bezels but don't require tedious and often problematic calibration to maintain alignment and avoid visual seams, are resolved based on the project designer's intuition in combination with assorted practical considerations, such as: is there sufficient space to permit rear projection, rather than on solid science derived from controlled experiments, or even on informal experience, since so few people get a chance to personally experience the use of different types of display systems in the process of working on a particular scientific visualization application, where the practical impacts of various artifacts can be best appreciated. Additional strengths of the proposal are the PI's strong track record in publishing high quality research results related to the theme of the proposed work, and the PI's uncommonly strong record of collaboration with the domain scientists who are the ultimate user group for large display systems. The main concerns I have about the proposal are twofold. First, I am not convinced of the importance of testing the impacts of many of the mentioned display characteristics on the usability of such displays for scientific visualization applications. This is for several reasons. One, the impact of any particular factor, such as low contrast, or low resolution, will depend greatly on the particular use case: is the scientist trying to follow thin fiber tracts in the brain, or are they trying to appreciate the layout of an archaeological site? Two, some of these factors interact. It is well-known for example that human contrast sensitivity varies with spatial frequency. This means that these factors really can't be considered in isolation. Three, people will in general seek to acquire the best system they can for the money they have available. It's hard to argue that low resolution or poor contrast doesn't matter in general - of course higher resolution and higher contrast are desirable, up to the limits of human perception. I can't imagine that anyone would seriously argue otherwise. Studying the impact of artifacts like seams also falls into this category: why would we ever think that seams don't matter - of course we should try to avoid them. Sometimes we have to live with them because we can't figure out how to get rid of them, but it makes no sense to try to justify that somehow. One can always think of cases where decreased contrast or decreased resolution would have a negative impact on visibility. One can of course also think of cases where decreased contrast or decreased resolution or the presence of seams could be tolerated, but again it's so use-case-specific, it's hard to see how a formal or informal study on factors like these could have a significant impact. Likewise stereo: for some types of applications/tasks it may not make much difference if the images are shown in stereo but for other applications/tasks it will make a huge difference. Also, I think we already understand this question pretty well based on experiments that have been done by vision scientists. The question of the impact of corners, and of bezels, seems a little more interesting, but in this case what I find lacking is a clear hypothesis to guide the study design. The PI expresses the intuition that bezels interfere, and I agree. But the proposal contains no further elaboration of the thinking behind this intuition, which I think is a problem. I could try to "channel" the PI and come up with some good examples of when and why bezels will cause problems, which could guide the design of experiments that would reveal those limitations, but that's not appropriate. The other thing is that once one appreciates these problems, and expresses them explicitly, they seem in retrospect to be rather obvious and not something one would need to do an experiment to test. My second major concern with the proposed research agenda is that I have doubts about the reliability of the planned comparisons between the YURT and lower fidelity displays simulated using the YURT. I appreciate that in some respects it's easier to focus on isolated display characteristics such as resolution when all other variables are kept constant, but I have a hard time with the idea that one can robustly replicate the full perceptual and cognitive experience of viewing a virtual environment through an HMD using a display like the YURT, for example. Also, it seems like a huge waste of time and effort to try to do so. Why not just purchase an HMD? They are only $600. It would seem to make much more sense to do a head-to-head comparison of usability between the YURT and HMD actually using the YURT and HMD. Also, according to the Facilities document, Brown has a Cave as well as a YURT. Why go to huge effort to simulate the creases in a Cave on a YURT when you can just use the Cave you have? If the Cave is lower resolution and lower brightness, then just do the crease comparison in a matched low-res and low-brightness condition. Also, there are other considerations that probably can't just be ignored. At a distance of 8' to a surface, there may be accommodation cues that are different for a flat display than for a curved display. I would want to see a strong validation of the simulation before trusting the outcome of any experiment based on an assumption of its validity. If you expose people to a Cave, so that they know what one looks like, and then bring them blindfolded into the YURT-simulated Cave, when you take off the blindfold and ask them if they are in a Cave or in a display with curved walls, will everyone immediately say the former and not the latter? If bring people blindfolded into the YURT-simulated HMD, and ask people if they feel like they are wearing an HMD or looking at a large display screen, will the illusion be robust? I have a hard time imagining how it could be. I fear that this proposed research effort is ill-conceived, then, because a lot of time and effort will be devoted to trying to implement these simulations which may not even be valid, and then more time and effort will be spent on making comparisons between what sounds like an amazing display, the YURT, and some bastardized version that is explicitly designed to be inferior, through deliberate efforts to add artifacts and reduce quality, and no one will be surprised when the results show that people prefer the original YURT to the inferior versions. An additional concern that I would be remiss to leave out is that the proposal does not clearly explain the details of the experiments that will be conducted. However, the PI has a proven track record of designing similar good experiments so I have little doubt of a successful implementation in that regard. In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to broader impacts. Large displays have the potential to facilitate scientific inquiry and understanding. The proposal aims to improve our understanding of how various characteristics of large displays can affect their usability in scientific research. In that respect, the proposed research could have broader impact by clarifying the design characteristics of large display systems that make them useful for scientific visualization use. The proposal also discusses ways in which the planned research will be integrated with education. The proposal does not discuss any issues related to broader societal impact in terms of increasing the participation in CS research of students from underrepresented groups, etc. Please evaluate the strengths and weaknesses of the proposal with respect to any additional solicitation-specific review criteria, if applicable The facilities description should include a detailed description of the YURT, with photos and technical specifications. The current facilities document is not appropriate and looks like a CS Department template, spending most of the time talking about electronic classrooms, desktop computing resources, and data storage capacity, none of which are relevant to this proposal. There is only a 5-line paragraph at the end that vaguely refers to "a high end visualization facility with ... a fully immersive Cave system, a multi-projector stereo display wall, and a full-surround, stereo, .. virtual display with 100 million pixels driven by 69 HD projectors. This is both a missed opportunity and a serious technical omission for a proposal that depends on the reviewer appreciating the capabilities and limitations of the YURT as a system. For example, one thing that is not clear is what the brightness and contrast levels are that are currently achieved in the YURT. The Data Management plan is equally perfunctory, but sufficient. It is a strength of this PI and his research group that they have a history of making their products freely available to other researchers. The budget request is extremely modest - only 0.7mo of salary for one PI, travel funds sufficient for one person-attendance at about 4 conferences over the 3 years, and minimal supplies, with the bulk of the request seeking support for 1.5 grad students/year. Summary Statement I think there are probably better things to study with respect to how a display system like the YURT can be effectively used to advance scientific understanding than what is proposed here. I appreciate that the YURT has rare characteristics such as supporting a 360 degree field of regard without the sharp corners characteristic of traditional CAVEs that the PI would like to demonstrate are advantageous, but I really don't think that the proposed effort of simulating worse displays in the YURT in order to make that comparison is worth the investment. REVIEWER #2 Rating: Fair Review Summary In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to intellectual merit. VR is a hot topic at present, and more effort to determine the extent of "how much virtual reality is enough?" is timely. The investigator also appears to have a newly designed system that could be used for this research, and significant experience in this field. That said, the four IM as listed are vague and open-ended. They are not hypothesis driven where any one factor (or the pair-wise combinations) is (are) investigated in sufficient depth, or where one factor would be studied in depth in a particular context. For example, resolution will be varied from 1 to 4 arc-mins per pixel in a single factor experiment on the investigator's display, but it is not clear what is the task in which the user needs to be engaged or in exactly what are the measureable performance metrics. How would we really know whether a particular mins per pixel is better than another or if that is generalizable across tasks and across displays. It's not clear that there is a basic science question here, nor if there is a generalizable method that would be created that could be used to test all VR systems. The authors also note that their current system operates at the limits of human perception. It is not clear what exactly that means or how it has been measured and quantified. I would hazard to guess that this VR display is not nearly that of human vision "in the wild" in terms of movement latency, or resolution, for example. It would take much greater scientific evidence to persuade this reviewer of that. In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to broader impacts. The investigator proposes to work with the display with those from a design school, middle school students, undergraduate students in courses, and researchers from multiple domains. That said, most of the VR work is now being done by the large companies, including Google, Facebook, Microsoft, and others who are pouring massive amounts of funding into their systems. The second task proposed here relates to a certain extent, but it is not clear how this research will impact the major steps being made in society by these companies with their products, which are personal interactions (headsets) in nature. Also, and somewhat related to the IM above, it is not clear what would be produced that could be disseminated broadly beyond the context of the simulation environment at the investigator's institution. Please evaluate the strengths and weaknesses of the proposal with respect to any additional solicitation-specific review criteria, if applicable Poor results from the recently completed NSF award from 2009-2015, where there were no peer-reviewed papers produced (to this point in time). Support from another grant from 2010-2014 resulted in four publications. Also, the data management plan is limited. There is some discussion of software dissemination, and publication, but the actual data in how it would be archived and shared more broadly is not given. Summary Statement The proposed work seeks to use a newly completed virtual reality display room to study the effect of fidelity upon perception along a number of dimensions, including resolution, horizontal and vertical field of view, display bezels, screen corners, etc. These will be studied in the context of that system and compared to some other commercial systems and compared in the context of scientists in domains such as planetary science, mathematics, cell biology, etc. Outreach is proposed to other schools, including a design school, and generalization to researchers and users in several domains as well as via courses taught. However, the intellectual merit as listed are vague and open-ended and it is not clear if the single and pair-wise factors to be studied will generalize to other VR systems, especially when other systems are moving a quite a fast pace from many companies and are all based on head-sets while the present system is based upon a cave/room. There is some mention of comparison to mimic a Cave2, 4-wall cave and an HMD but the investigators note that this will only be done if time allows and are not specific on the HMDs. The results from the prior NSF grant resulted in no publications. REVIEWER #3 Rating: Fair Review Summary In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to intellectual merit. The proposal aims to quantify elements of display fidelity on task performance, determine trade offs between cave's and head-mounted displays, and to expand the understanding of the virtual reality design space through interviews with scientists using the software. Although the proposal seems to address important challenges, the research plan does not lay out specifics for moving towards answering specific questions. For example, the single-factor experiment description (e.3) walks through the different variations of factors that can be modified, but there is not an explanation of how these factors might impact task performance, or what kind of tasks could possibly be performed in the study. This lack of detail makes it hard to determine if varying a single factor (and 'some' dual factors) will support an understanding of display fidelity. Also not included is how many participants will be recruited, from where, etc. Section e.6 says that the same tasks and visual stimuli will be utilized, but without the prior details, again it's hard to know what is being done in the study. Also in e6, the proposal says a cave2 and HMD will be mimicked. Why are these chosen? What will be the comparison? What is possible to learn from this comparison? And what others might be added to the list and why? Also, how will all this data be analyzed? Finally, there is heavy reliance in the proposal for scientific tasks, but no tasks are described. An example of what a particular scientist might do, or build, or want to see build would be helpful. In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to broader impacts. The broader impacts suggested are related to creating software that will be disseminated to studying similar systems, however with the expense to create such a large system, it's unclear how widely used the software will be used. The development of courses lying at the intersection of virtual reality and science (e.g., Interdisciplinary Scientific Visualization and Virtual Reality Design for Science) is also suggested. These courses might be able to reach 12 students per year (~36 over the course of the project). The PI also suggests that people visiting the YURT will be able to engage with the ideas explored in the research, although previous work on the actual space has reached people, but not necessarily been disseminated fully into the research community. It is not clear how the PI intends to make sure that recruitment of visitors as well as scientists will occur, other than previous anecdotal evidence. Please evaluate the strengths and weaknesses of the proposal with respect to any additional solicitation-specific review criteria, if applicable Summary Statement The PI proposes to use the YURT, a 360 degree projection room used for immersive virtual reality, to study the value of different elements of VR fidelity for scientific analysis applications. The PI has a long history of virtual environment development and study including a large development grant for the YURT room that, I believe, is to be used in this research. There are adequate resources to conduct the research. A data management plan is provided.