“The Rensselaer IDEA: Announcing The Rensselaer Institute for Data Exploration and Applications”
Shirley Ann Jackson, Ph.D.
President, Rensselaer Polytechnic Institute
Rensselaer Institute for Data Exploration and Applications Launch
EMPAC Studio Two
Rensselaer Polytechnic Institute, Troy, NY
Thursday, June 13, 2013
Good morning and welcome.
There is a significant transformation underway globally in the way we make discoveries, make decisions, make products, make connections, and ultimately make progress.
The transformation is being driven by the extraordinarily rapid expansion in the availability of data from multiple sources, and ever more powerful analytical and computational capacity that is generating new information.
Let me begin with a vignette to illustrate.
Last fall, as “Superstorm Sandy” was beginning to gather steam in the Caribbean, five days before it slammed into New Jersey and New York, US forecasters were predicting a monster storm, but were uncertain of its path. By most indications, the unusually powerful and complex storm would graze the coast, but move back out into the north Atlantic. However, there were steady reports of the “European Model” predicting a sharp left turn into the coast of New Jersey and New York, with potentially devastating consequences.
The U.S. and European models eventually converged. But, the Europeans got it right first, giving more time for those in Sandy’s path to prepare... no doubt saving lives. The difference in the early predictions lay with the strength of the analytical models and the computational power.
As intimated by what happened with the Sandy forecast, newly available data, and how it is accessed and used, will become an ever more vital force that shapes and changes our world.
“Big Data”-driven innovation will be a driver for changes in science and society during the next 50 years in much the same way as quantum science was to technological and economic development in the 20th century.
Moreover, there is a rapidly growing network of networksthe so-called “Internet of Things”on which our daily lives depend, including power, water, retail, financial, manufacturing, and social networks all driven by the interplay of data, physical systems, high performance computing, and analytical models.
Big data and network science are mergingmarrying the Internet of data with the Internet of Things in new ways, and this will be world changing.
In the 1800’s, there was a shift from electricity as a curiosity to a commodity, made possible by the emergence of electrical engineering. We now are in the midst of a shift in data as a commodity, and importantly, data as a resource, in ways not previously imagined.
If we take full advantage of emerging technologies, new opportunities will be created by the abilitywith smart analyticsto anticipate and predict events, thereby making those events easier to manage.
The intersections and interactions are complex. The outcomes can be powerful, and risky.
Powerfulin that we will be able to see connections that we would not have seen otherwise, and we will have better predictive abilities.
Riskyin that the interconnectivity can lead to sometimes abhorrent, and certainly unintended consequences, like “flash crashes.”
As we saw in such devastating ways with energy, communication, and transportation systems in the aftermath of “Superstorm Sandy,” interconnectivity presents enormous intersecting opportunities and intersecting vulnerabilities with cascading consequences.
The level of interdependence of these interconnected networks will grow rapidly, and will present even greater risks, but even greater opportunities, if we are able to take advantage of the ubiquity of data, the interconnectivity of data and things, and powerful new analytical and computational capabilities and immersive technologies to stay ahead of the curve.
New economic and business models will emerge around data-driven information, both data at rest and data in motion, and there will be new opportunities and new tensions around the monetization of dataparticularly with respect to ownership, privacy, and securityas these new models take shape. This means that the intersection of science, technology, and public policy is important.
We live in a data-driven, web-enabled, supercomputer-powered, globally interconnected worldinterconnected both through the Internet and other technologiesand through global, natural phenomena, such as climate-driven weather events. This is a world whose technology Rensselaer has helped to create. It is a world in which Rensselaer is positioned powerfully to help humanity use the remarkable technological tools at its disposal to answer the grand challenges surrounding energy, water, food, and national security; human health; climate change; and the allocation of scarce natural resources.
Therefore, today, I am very pleased to announce that we are creating The Rensselaer Institute for Data Exploration and Applicationsor The Rensselaer IDEA. This institute-wide center will receive $60 million in new investments by Rensselaer Polytechnic Institute itself, and over $100 million in total. It will be led by Professor James Hendler, currently head of the Department of Computer Science, and Senior Constellation Professor in the Tetherless World Constellation.
The Rensselaer IDEA will bring together Rensselaer talents and strengths in web science, high-performance computing, cognitive computing, data science and predictive analytics, and immersive technologiesand link them to applications at the interface of engineering and the physical, life, and social sciences. We intend to answer questions in nearly every area of research that never could be answered in the past.
Working across disciplines and sectors, our researchers will apply powerful new tools and technologies to access, aggregate, organize, and analyze data from multiple sources and in multiple formats, in order to address challenges and opportunities across the spectrum, including in basic research, environment and energy, water resources, health care and biomedicine, business and finance, public policy, and national security. Educated in this context, with new approaches and analytical capabilities, our students the next generation of discoverers, innovators, and entrepreneurswill be better equipped to truly change the world.
Let me briefly describe our talents and tools, and then say a few words about the ways The Rensselaer IDEA will allow us to use them.
In our Computational Center for Nanotechnology Innovations (CCNI), we already have one of the most powerful university-based supercomputers in the world, whose capabilities are being significantly upgraded this summer and will be astonishing by this fall.
Already, our talents are shining through. This year, scientists at the CCNI, joining forces with the Lawrence Livermore National Laboratory, set a new simulation speed record on the Sequoia Supercomputer, decisively smashing the old record of 12 billion events per second with 504 billion events per second.
Equally astonishing are recent advances in cognitive computingtechnologies that are engineered for a human-like ability to use natural language, answer complex questions, and make decisions. In 2011, the IBM computer Watsona cognitive computing machineusing memory-based artificial intelligence, was able to beat the best human champions at Jeopardy!
This year, Rensselaer became the first university in the nation to receive the Watson technology. IBM chose Rensselaer as the place to send Watson to master a more difficult form of artificial intelligence: knowledge-based artificial intelligence. We will give Watson the power, not merely to answer questions based on the subjects it has been taught, but also to access the tetherless world of information on the Web and to make inferences from what it learns in real- or near real-time.
On the input side of the equation, the growth has been equally explosive, thanks to sources of data that now range from scientific platforms such as the Hubble Space telescope, to teenagers posting vacation photos on Instagram.
The volume, velocity, and variety of the data collected by social networking sites should not be underestimated. In November of 2012, Facebook said that it was warehousing half a petabyte of new data every single day.
The only reasonable expectation is that data genesis will continue its explosive growth, given that the existing Internet is merging with the Internet of Things as the systems of our physical universeour electric meters, our cars, our farming equipment, our housesincreasingly are equipped with sensors and networked digitally.
As a consequence, data has been growing at a volume much greater than the tools available to process it. But new tools are being engineered that enable us to take massive amounts of structured and unstructured data, manipulate them, and create useful information. We are developing advances in infrastructure that will allow us to access and reduce the crushing volume of unstructured data from videos, tweets, photos, posts, articles, papers, trades, purchases, and other data from sources of all kinds (including distributed sensor networks, satellites, etc.) by creating knowledge frameworks of data about data or metadatathat offer useful information about the world around us.
Professor Jim Hendler is one of the originators of the Semantic Web, which uses common data formats for unrelated web-data sets, allowing us a new ability to integrate different sources of information, search across them, and to visualize and analyze them.
During the past three and one-half years, the number of US government data sets available on data sharing sites has grown from 57 to more than 400,000, and to more than 1 million data sets from governments worldwide. Dr. Hendler’s team, collaborating with the White House Data.gov staff, has developed smart interfaces that allow government data from an enormous variety of international sources to be combined in beneficial and unexpected ways. The infrastructure and technologies they have created have made it possible for others to “mash up” data sets to develop more than 1,200 applications that are driving health policy, transportation policy, and much more.
If scientists similarly could share their research data openly with colleagues across the globe, we could speed up innovation radically. Rensselaer Professor Francine Berman is the United States representative to the Research Data Alliance, a new international organization formed to accelerate the ability of scientists everywhere to access, combine, and use each other’s data.
Clearly, in the future, data discovery and exploration will be the foundation of many a career. We feel a strong obligation to prepare our students, in every discipline, to collaborate in the use of high-performance computing, data analytics, the Semantic Web, and other interactive tools, in order to maximize their opportunities to drive discovery and innovation. For example, starting in the fall of 2013, we will be offering a new Master of Science degree in Business Analytics through the Lally School of Management and Technology.
Under The Rensselaer Plan 2024, we are positioning Rensselaer itself for an even greater degree of leadership in this data-driven, web-enabled, supercomputer-powered, globally interconnected world. With The Rensselaer IDEA, we will no longer have a group of strong, but separate, disciplines. We have a full data-driven ecosystem that is going to produce a range of discoveries and applications that will allow us to change the world.
If we make intelligent use of the tools at our disposal, we can better model planetary systems, or global systems relating to, for example, the flow of water or power, allowing for better allocation of scarce resources. We can use predictive analytics to make better decisions in every field, assessing, for example, how a new nanomaterial is likely to behave under the kinetic conditions of manufacturingor how to design new cities to make them as energy-efficient as possible.
The implications of such data-driven research are particularly important in health care. Our new partnership with the Icahn School of Medicine at Mount Sinai is going to bring together supercomputers at each institute that will allow us to produce sophisticated computer algorithms to analyze genomic data from many, many patients. We will be able to use that data to develop predictive models of disease, to develop safer and more effective drugs, and to make better use of shared healthcare assets and financial resources.
As we grapple with other multi-faceted questions, such as climate change, its implications and solutions, Rensselaer work in Web science and data visualization will help researchers turn ecological, social, and policy questions into Big Data questionsand allow them to discover and visualize information that is both local and truly global in scope.
We also must be aware of the security, social, and public policy challenges which these technologies present. The Rensselaer IDEA will include our Data Science Research Center, led by Dr. Bulent Yener, which is exploring cybersecurity issues ranging from better encryption algorithms to identifying malware via software signature data.
Ultimately, the goal of The Rensselaer IDEA is to find ways to access and aggregate a global storehouse of social, cultural, financial, scientific and engineering informationand then to make it available in a form in which any person, anywhere on earth, can ask important questions and contribute to emergent hypotheses.
At Rensselaer, we change the world for the better. Part of that is shaping the tools and making the breakthroughs that enable others.
Here at Rensselaer, we will address the hard problems, which we are uniquely qualified to address because of our strengths in engineering, science, design, management and entrepreneurship, and the humanities, arts and social sciences. We will continue to leverage our interdisciplinary approaches to problem solving and educating students, using the new tools and technologies of this data-driven, Web-enabled, supercomputer-powered, globally interconnected world.
We will do it because we are Rensselaer Polytechnic Instituteand this is our big IDEA.
Source citations are available from the division of Strategic Communications and External Relations, Rensselaer Polytechnic Institute. Statistical data contained herein were factually accurate at the time it was delivered. Rensselaer Polytechnic Institute assumes no duty to change it to reflect new developments.