Throughout the week, the NetApp Community will focus on a series of questions that highlights how Thomson Reuters followed a path of steady IT evolution that ultimately allowed them to avoid $65 million in costs, reduce power use by 25%, and improve availability, all while allowing them to search 50X more data in half the time.
What is it about the storage that helps to enable the Novus architecture ?
Hi all - Mike Arndt here, I am a NetApp Systems Engineer and have been working with Thomson Reuters in a variety of roles over the past 6+ years.
The Novus system is a distributed search architecture that uses thousands of SUSE Linux servers, each running proprietary Thomson Reuters software. Each search server is responsible for part of the overall content index, which fits in server memory so it can be accessed extremely quickly. When a search is executed, it hits thousands of machines at once. The results are sent back to a controller, which sorts them, aggregates them, ranks them, and sends that back to the requesting application. By doing it this way, they can get subsecond search performance.
In order to operate at this level of scale, a high performance shared filesystem that can be accessed by any search node at any given time was required. NetApp storage accessed via the NFS protocol provides this capability. Other components of the Novus architecture use Oracle RAC databases to manage relationships between pieces of content, and NetApp storage accessed via NFS is used again in this area to provide a high performance shared filesystem with very fast and efficient backup and recovery capabilities.