Professors Claudio Silva and Juliana Freire in their data analysis lab
Every day the Internet grows to include at least 150,000 new URLs or websites. While generic search engines such as Google and Bing cover a substantial portion of the Web, they have important limitations when it comes to delivering specific information, notably due to their restrictive search interface. "It's not really economical or feasible to crawl 30 billion pages," says Juliana Freire, professor in the department of computer science and engineering at the Polytechnic Institute of New York University (NYU-Poly). To tackle this inefficiency, the computer scientist developed new focused crawling techniques which are the backbone of DeepPeep.org, a search engine specialized in deep-Web sites.
But searching for information on the Web is only one of Freire’s research interests. Together with husband Claudio Silva (also a professor in the computer science and engineering department at NYU-Poly), they have created VisTrails, an open-source, workflow-based data exploration and visualization tool. A distinguishing feature of VisTrails is a provenance management system that systematically and transparently creates a detailed history of the steps followed and data derived in the course of an exploratory task. VisTrails has been downloaded thousands of times and it is being used throughout the world in many different disciplines, including environmental sciences, high-energy physics, molecular modeling, quantum physics, tracking of invasive species, and climate data analysis.
After joining NYU-Poly last July, they are now in the process of establishing a new center of excellence around large-scale data analysis and visualization. One of the components of infrastructure they are assembling is a 98 million pixel display wall consisting of 24 large, flat-screen video monitors to produce a single, high-resolution image, such as a city street grid or the brain's vascular system. They spent their first semester at the Institute setting up their offices and computer labs, which officially opened last month at 2 MetroTech, the department's new hub.
With the start of classes last month, they taught their first set of students at NYU-Poly, with Silva leading a course on data visualization and Freire providing instruction on advanced databases.
In addition to teaching, Silva and Freire are continuing their research projects. They continue to develop VisTrails, now in its 2.0 release. They've been developing the software since 2005, though, and spinoffs of the current version have already proven successful. There's the Ultrascale Visualization Climate Data Analysis Tool (UV-CDAT) tool, for instance, which scientists at the U.S. Department of Energy and NASA use to analyze climate data. Silva hopes the visualization and provenance functionality available in UV-CDAT "will lead to research people can believe," alluding to the often contentious debate among scientists and policymakers about global warming, a claim skeptics attack by challenging so-called supportive data.
Another important area Silva and Freire work on is the development of infrastructure to support reproducibility for computational experiments. "As people use more computational methods to do experiments, they aren't as disciplined about publishing their results," explains Freire. "Unless people publish the whole research compendium, it's not possible for others to reproduce [experiments] because it's very complex analysis with large data."
She and Silva believe VisTrails can help because it records each step a user takes. "At any point in time, you know exactly what you've done to get there," Freire explains.
Beyond the verification of scientific findings, however, VisTrails offers insight into problem-solving. By recording how experts resolve issues, Freire and Silva think such methodologies might be applied or modified to different problems. The software is also useful to educators, allowing instructors to study which groups of students solved problems a certain way, where in the problem-solving process there's confusion, how many steps students travel to arrive at their conclusions and how long each step takes, as well as a host of other variables.
While their students are sure to be instrumental to their research, Freire and Silva are certain the wider community of New York will yield resources they might not have encountered in other settings. "This is a place with a strong emphasis on technology transfer, says Silva, "and we're looking forward to the opportunities this is going to provide and the contribution we can make, "