The National Diet Library in Japan, wished to speed up the accessibility of its documents, and introduce new search features. NTT DATA developed a system to support various search functions: full-text search, associative search and aggregated search. Furthermore, the system provides links to online bookstores that sell items (books or magazines) related to the search query. NTT DATA used Hadoop for full-text search indexing and bibliographic identification and grouping.
- It took a long time for the National Diet Library to process the huge volume of data using traditional processing methods. It was complicated to aggregate multiple search results into a semantically equivalent group because a piece of work could have multiple bibliographic forms and publications.
- NTT DATA created a system, using Hadoop to speed up full-text search indexing and bibliographic identification and grouping.
- The system used more than 30 Hadoop nodes and processed data volume at around 5TB (tens of millions of items).
As a result, the National Diet Library was able to significantly reduce the time for creating a search index from all its documents, so allowing its readers to access information more efficiently.
Client : National Diet Library
Location : Tokyo, Japan
Items collected : books, journals, newspapers, electronic archives, manuscripts, official publications, doctoral dissertations, maps, sheet music
Size : 34.7 million items