MetaLib, WebFeat, and Google: The Strengths and Weaknesses of Federated Search Engines Compared with Google

Xiaotian Chen. Online Information Review. Volume 30, Issue 4. 2006.

Introduction

Library federated search (or metasearch) engines have become increasingly popular for libraries to improve services. The obvious advantage of the federated search is that users can search multiple library online resources simultaneously from one uniform interface, so that they do not have to try databases on various interfaces one by one. This article aims to describe the features of library federated search engines MetaLib and WebFeat by comparing them with each other and by highlighting their strengths and weaknesses against Google and Google Scholar. MetaLib and WebFeat are used in this article to represent library federated search engines or metasearch services, since they are among the most popular ones on the market for libraries. Google is used to represent the internet search engines.

One of the major reasons the federated search was developed was to imitate Google’s simple one-stop-shopping feature. But can the federated search imitate Google or can it “out-Google Google” (Fryer, 2004)? The author of this article believes that the answer is a mixed yes and no (leaning towards no) if all that libraries want the federated search to do is to imitate Google.

A Comparison of Metalib and Webfeat

Before we discuss the strengths and weaknesses of the federated search against Google, let us make some comparisons between MetaLib and WebFeat, arguably the two most popular federated search engines.

Although both MetaLib and WebFeat can search multiple library databases simultaneously and can be called library federated search engines, these two products are different in many ways.

First and foremost, the way WebFeat searches is that it transfers search terms in the WebFeat search box into the search box of databases’ native search interfaces, and counts on native interfaces to do the search. MetaLib uses Z39.50 or XML gateway protocols, thus bypassing the native search interface and using its own search to fetch data from native databases. This key difference leads to many different consequences. The obvious one is that the search results on MetaLib have the MetaLib look, while the search results on WebFeat eventually take users to the native interface. MetaLib also returns more consistent results across the board because all databases are searched with the same default operator (AND, OR, ADJ) decided by MetaLib. WebFeat, on the other hand, may return hits with huge differences among various databases because databases have different default operators and handle phrase searching differently. Another consequence is database ranking on the search result page. After a search is performed on MetaLib and WebFeat, the initial search result users see a list of databases with various numbers of hits. MetaLib post-search database ranking is the same as the pre-search database ranking determined by the library. The database ranking on WebFeat is somewhat dynamic because the ranking has to do with the speed of the return of search results of native databases. Therefore, a certain database’s ranking on WebFeat can go higher or lower, depending on how fast the return is. For example, “applied science and technology abstracts,” which begins with the letter A, can be ranked after some databases that begin with the letter B and the letter C on WebFeat search results.

MetaLib has a suggested limit of ten as the maximum number of databases that can be searched at the same time, while WebFeat does not have a limit. As a result, WebFeat offers an option of searching all databases on WebFeat with the checkbox of “all resources,” while there is no such option to search all resources on MetaLib. Therefore, MetaLib does not really offer one-stop shopping, a key idea for developing the federated search. It only offers “one-stop window-shopping” – users can go to different subjects or categories to view different databases listed there but cannot search all databases with one search. WebFeat’s one-stop shopping truly allows users to search all databases’ libraries put on WebFeat with one search, by checking the “all resources” box. Of course, the limit on the number of databases may not always be a negative feature. With a limited number of databases to be searched, the system searches faster. After all, users will probably not bother to view all results if there are hundreds or more of them. Large research libraries with hundreds of databases would probably not like the option for all their resources to be searched at the same time by one search, while small-to-medium academic libraries and public libraries with a smaller number of databases may want to leave the one-search-for-all option open for users.

On WebFeat, users can easily choose to group databases on any subject of their choice by checking boxes of the subjects and searching their customized grouping of databases with one search. On MetaLib, users do not have this flexibility of customized subject grouping because libraries predetermine subject groupings, and users can only search one subject at a time. For example, WebFeat users can choose to group humanities databases and science/engineering databases together and search all databases under these two categories with one search, while MetaLib users cannot do that. In real life, it is not unusual for the seemingly unrelated databases to have common subjects. Researchers of artificial intelligence and machine translation may find language databases and engineering databases equally important, and may want to search them together. WebFeat offers this flexibility of customized subject grouping while MetaLib does not.

WebFeat is vendor-hosted, while MetaLib can be both vendor-hosted and locally hosted. The advantage of the vendor-host model is that libraries do not need to add an extra server for the new system and the vendor can do most, if not all, system administration jobs for libraries, for those that may not have the staff or expertise to administer the system. On the flip side, a vendor-hosted system may not give libraries access to all system administration privileges. For example, WebFeat libraries have to count on WebFeat staff to add databases or customized images and looks. Another issue is that the vendor-hosted system requires that the federated-search-vendor IP addresses be added to all databases, so that the federated search servers can reach the databases as “authorized users.” Database vendors, most notably LexisNexis, may be reluctant to accept the federated-search-vendor IP addresses as authorized IP addresses. It is certainly a plus for libraries to have a choice of vendor host or local host. MetaLib has the edge here in hosting flexibility.

The Interfaces of the Federated Search

Federated search interfaces normally list library databases by subject or category or in alphabetical order. A standard MetaLib package includes a QuickSearch mode and a MultiSearch (or MetaSearch, or a name decided by libraries) mode, among other things. QuickSearch normally has broader subjects or categories such as “science” and “social sciences,” and may look simpler. MultiSearch (or under a different name decided by libraries) groups databases in more specific subjects such as “agriculture” and “zoology.” From both QuickSearch and MultiSearch, users have to select or de-select a subject before they perform a search, or the default selection (“general” or “agriculture” on the top of the list) will be searched. A standard WebFeat page would look like a combination of MetaLib’s QuickSearch page and MultiSearch page. It is usually longer than one screen and users have to scroll down to view everything listed on the page. It starts with search boxes, then a subject or category list, then an alphabetical list of databases. As with MetaLib, users on a standard WebFeat page have to select or de-select databases either by name or by subject before searching. It is possible for libraries to redesign the WebFeat standard front page to add tabs, sub-categories to separate pages, or move the entire subject section to separated subject pages, in order to accommodate more subjects. In addition, both MetaLib and WebFeat offer an option of a simple search box that can be added to library Web pages or instructors’ class websites. Like regular MetaLib search modes, the MetaLib simple search box can only search one category at a time. Although technically the WebFeat simple search box can be configured to search all resources, the common practice recommended by the WebFeat vendor is to put a few popular general databases there.

Compared with the classical look of Google’s search box, the standard interfaces of MetaLib and WebFeat look more complicated, especially the WebFeat standard interface because it typically requires scroll-down to view all the resources listed.

This is a WebFeat example. MetaLib has similar option too.

The Search Results on the Federated Search

If the typical interfaces of the federated search do not look as simple as Google’s classical look, the search results of a federated search look even more complicated than Google’s. After the initial search is performed, both MetaLib and WebFeat present search results by databases first. The order of databases is based on the library’s choice (as in the case of MetaLib) or the combination of the library’s choice and the speed of the return of search results (as in the case of WebFeat). Therefore, instead of seeing Google-like search results, users of the federated search will first see a list of databases with the number of hits for each database.

Although the initial search results from MetaLib and WebFeat are both lists of databases with the number of hits next to the databases, users will go different paths to view individual records of search results on MetaLib and WebFeat. MetaLib QuickSearch users will have to click on “view retrieved” to see records from various databases. After the users get to the “view retrieved” page of MetaLib QuickSearch, the sorting of individual records from various databases can be of several kinds. It can be sorted by database, by year, by relevance (indicated by a green bar for the strength of relevance), or by a couple of other choices. If the sorting is by database, it is hard for users to pick a database of their choice to view the records from because it is not possible to skip the database on MetaLib QuickSearch “view retrieved” mode. On MetaLib MultiSearch mode, after a search is performed, instead of having a single “view retrieved” link, there is a “view” link for each individual database that users click to view records for the database of choice. In summary, MetaLib has a single “view retrieved” link on the QuickSearch mode and various “view” links on the MultiSearch mode (one “view” link for each database). “View retrieved” takes users to a combined results page, where it is hard for users to pick and choose results by database, while “view” takes users to the results of a specific database, and users can pick the database of their choice by picking the “view” link for that specific database.

On the other hand, WebFeat sorts results by database first; then within each database, records are ranked in the way decided by individual databases, usually by year. The number of hits on the first page of each database is also decided by native databases, normally between 10 and 25 hits. Users can either click on a certain database or scroll down to view the first 10 to 25 hits from a database. To view more hits from a certain database, users click on the “next set … “ button. WebFeat does not offer the option of combined search results sorted by year, relevance, etc.

Compared with viewing Google results, viewing of federated search results is more complicated. One challenge is to determine which database users should pick after initial searching, if they have no knowledge of library resources and research techniques. Should they pick the database that appears at the top of the list or pick the database with the most hits? If users start from the top, they could miss some lower-ranked but more relevant databases. If they start from databases with the most hits, they could miss more relevant databases with fewer and more accurate hits.

One reason why the federated search was originally developed was that the names of databases are as “Greek” to library patrons as the names of the food on the menu of a Greek restaurant (Jacsó, 2004). With a federated search, users are still faced with the menu of names like ABI, CINAHL, ERIC, JSTOR, WorldCat, Xrefer, etc. If nursing majors have no idea what CINAHL is, how could we expect them to choose CINAHL against other less relevant databases that could be ranked higher or have a higher number of, but less relevant hits on, the result list? Similarly, if users do not know the difference between WorldCat and the local OPAC, and find that WorldCat has many more hits than the local OPAC, do users benefit if they choose WorldCat rather than the local OPAC simply because WorldCat has more hits?

It is clear that federated search results are different from Google’s, because federated search initial results are a list of databases with a number of hits for each database. This is especially true for WebFeat, as WebFeat does not have an option for users to view combined records; so the only way for users to view results is by database. Therefore, in a certain sense, the federated search shifts the process (or the burden) of selecting a database from “before” performing searches to “after” performing searches. But in either case, users have to have some knowledge of databases in order to make an informed selection for the best results. Or, they simply pick up some “good enough” or “quick and dirty” results from the top-ranked hits, and end the research.

Because people may get the impression that they can use a federated search as they would use Google, and that they can avoid learning the “Greek menu” of library resources, some librarians who value information literacy education think the federated search “is a step backward, a way of avoiding the learning process” (Frost, 2004). The author of this article believes that the federated search cannot be as simple as Google. It is the nature of the federated search not to be simple. Besides, users of subscription-based resources normally do more advanced or serious research than Google users who normally are happy with “good-enough” results. On the other hand, the federated search should not be considered a step backward, because it should not replace information literacy education. As Bell points out, education is the only thing that differentiates libraries from Google (Bell, 2005). The initial search result pages of both MetaLib and WebFeat can support Bell’s view that education, as a part of library services, is still important.

The Speed of the Federated Search

Google normally takes a split second to complete a search. MetaLib, which recommends a maximum of ten databases be searched at the same time, normally takes about ten seconds or longer to complete a search; and if some databases are too slow and cannot produce results after a certain period of time, MetaLib spells out “suspended” for those databases, and users will not see any results from those “suspended” databases. WebFeat does not have a suggested limit for the maximum number of databases that can be searched, and users can choose to search all databases listed on WebFeat. The more databases to be searched, the longer it takes. For a WebFeat library with 60 databases listed, it can take about one minute or so to complete a search of all 60 databases. On both MetaLib and WebFeat, users can watch the search process and results showing up, database by database. On MetaLib, while users are waiting for a search to complete, the screen has a moving “S-e-a-r-c-h-i-n-g … “ until the search is completed. Similarly, users on WebFeat are watching databases turning out results one after another. Google users never get a chance to see search results showing up one by one because Google is so fast. In other words, a federated search could be dozens of times slower than Google. If users are off-campus, a federated search will take even longer, because they have to go through the process of authentication either before or after they perform searches. There is probably no way that the federated search can compete with Google in speed, because the speed of a federated search is dependent not only on the speed of its own server, but also on the speed of servers of library databases with various response speeds.

Other Issues of the Federated Search

The federated search has some other issues as well. First, it cannot cover all online library resources. The goal of one-stop shopping cannot be achieved completely by any federated search. There are various reasons for this:

  • Some databases do not work with any federated search at all, such as SciFinder Scholar. SciFinder Scholar does not use a web browser but rather requires its own internet client. Neither MetaLib nor WebFeat can cover SciFinder Scholar.
  • If databases require a login, they will not work with the federated search.
  • Some databases work with one federated search product but do not work with the other. MetaLib cannot search LexisNexis databases because LexisNexis does not allow Z39.50 or XML gateway access. WebFeat cannot search databases that do not have a search box on their front page because WebFeat counts on the search box on the native interface to search.
  • Many libraries have databases on a pay-per-search basis, and libraries normally do not want them to be searched by a federated search for budgetary reasons.
  • Some databases have a limited number of concurrent users, and if these databases are included in a federated search, the limited seat(s) is/are taken immediately whenever someone logs into the federated search, and no other users can use these databases. Libraries normally do not want to include databases with a very limited number of concurrent users in the federated search.
  • It may not make sense to add to a federated search menu the very specialized databases that most general users would not be interested in, or the databases that require special software. One example is Inter-university Consortium for Political and Social Research (ICPSR) that requires statistics software such as SPSS to view data.

The next issue of the federated search is that not all search results can stay alive in the results list after a search is performed, because some databases time-out when idling. For example, by default, library OPACs using Endeavor’s WebVoyage have five minutes before they time-out. If users do not view the records from databases within the amount of time allowed after the initial search, users will be timed-out. While it may be easy to change time-out for some databases, it may not be easy for others such as WebVoyage, especially if users are in a consortium environment and the change has to be done at consortium level.

Ease of use, access, or convenience can be an issue too. A universal disadvantage of library’s online resources, including the federated search, is that users have to go through the website of their home library with several clicks before they can start a search. If users are off-campus, they have to know how to be authenticated. Besides, it may not be easy for them to find the link to the federated search from a library website, even if users are aware of it and are willing to try it. After users successfully find a link to the federated search, they may encounter some browsing or navigational inconvenience. A common navigational inconvenience is that they may have a problem using the “back” button on Internet Explorer (IE) when they are on a federated search page. When the IE “back” button is clicked, the page displayed on IE simply flashes and stays put. The author finds this to be the case for WebFeat at the author’s home library, as well as for MetaLib at a library in St Louis, Missouri. Convenience is certainly another area in which the federated search cannot compete with Google.

The Advantages of the Federated Search against Google

So far, when the federated search and Google are compared, what has been discussed seems to be the weaknesses of the federated search as compared with Google. But the federated search certainly has advantages compared with Google, mainly because of the quality and timeliness of library database content and the objectivity of library databases’ search results.

There are some obvious reasons for the superiority of search result content on the federated search, due to the fact that Google has commercials and cannot cover many quality publications. Examples of quality publications that are not covered by Google are key newspapers such as the Wall Street Journal and scholarly journals on “hidden web” or “invisible web” that Google cannot reach. The author of this article would like to point out some not-so-obvious reasons:

– Google results can be manipulated, while there has been no report that search results from library databases and the federated search have been manipulated for commercial, political, ideological or other purposes. A classical example of search result manipulation on Google is a search of “miserable failure,” which results in the “Biography of President George W. Bush” page at the White House as the number one result. This search result has been in existence since 2003 (Hansell, 2003), and the Bush page is still the number one hit as of February 1, 2006, with “Biography of Jimmy Carter” being the number two hit. Similarly, a Google search of “French military victories” once took users to a Google page asking “Did you mean: French military defeats” (Kopytoff, 2004). If Google results can be manipulated for political fights, it is not hard to imagine that they could be manipulated for commercial and other purposes. In contrast, a federated search, like other library tools, returns objective results and displays them objectively.

– New publications get better ranking from library databases and federated searches than from Google. Library databases and the federated search typically rank results in chronological order, starting from most recent publications. That makes sense for most scientific researchers. On the other hand, it normally takes a while before Google can index new web pages or before web page updates can be reflected on Google results, unless web site owners pay Google for faster indexing and retrieval. As for the ranking of search results, if Google results are not manipulated, popularity becomes the main factor for ranking. However, in scholarly research, the older publications normally get cited more frequently and therefore are more “popular,” if we were to apply Google’s popularity standard. So Google tends to give older publications an edge in the ranking of results while newer publications could either be buried far behind or have to wait for a while before Google indexes them at all. Tennant points out that Google’s lag time is “unacceptable” for scientific researchers (Tennant, 2005), though some library databases are also slow in indexing new publications.

– On a Google search results list, what users see may not be what they will get, because web site owners can use some tricks to make their pages look good on a Google results list. Tricks include creating middle pages to re-direct users from seemingly good results to pages that web page owners want users to view.

Federated Search Engines and Google Scholar

While libraries are pressured to imitate Google’s simplicity and one-stop shopping, an interesting development in Google is happening. Google is expanding and starting to look more like library resources: on the classical, simple search page Google has added many categories such as Google Scholar, Google Book Search, Google Maps, Google Directory, Google News, Google Video, etc. In other words, Google is no longer as simple as it used to be, and users can choose to search one of the categories, just like users can choose one of many library databases to search.

Among the new Google categories, Google Scholar is probably closest to the federated search, because it searches scholarly publishers’ archives and databases that the regular Google search may not reach. It has both similarities and differences as compared with the federated search. It can serve as some kind of one-stop shopping for various kinds of databases owned by different publishers. Also similar to the federated search, Google Scholar cannot search all scholarly databases yet. Tennant reports in July 2005 that Google Scholar was not able to get an agreement to crawl the full-text contents of Elsevier, American Chemical Society, and American Psychological Association (Tennant, 2005).

One major difference between the federated search and Google Scholar is the secretive nature of Google. The public has little idea about Google Scholar sources. Users of the federated search can find out what sources (databases and their contents) they are searching because library databases not only offer sources (titles of publications) they cover but also years or volumes and issues of coverage, ISSNs, if the sources have full text or not, etc. EBSCOhost, Gale Group, OCLC, and ProQuest call this type of information “title lists,” and post them on their company websites. LexisNexis calls these “sources” and posts the list on both the database Web site and the technical support website. None of the library databases cover everything, but they all tell people what they cover and how many years and volumes they cover, so it is known what is not covered. However, Google Scholar gives no such specifics as sources, let alone coverage of years and volumes. Without knowing sources and coverage of years and volumes, researchers have no idea how complete their searches will be.

Another Google Scholar issue is its inability to retrieve scholarly publications it is allowed access to. Jacsó finds that Google Scholar fails to retrieve the majority of articles even though it is allowed access to the digital archives of most of the largest academic publishers, preprint servers and repositories. Jacsó tests Google Scholar by searching various scholarly publications such as Nature and Science , and finds that Google Scholar can retrieve only 10-30 percent of the records from online archives of these publications. “stunning gaps” and “shallowness of coverage” are terms Jacsó uses to characterize Google Scholar. Jacsó does not elaborate on the reasons why Google Scholar cannot retrieve most of the scholarly records it has access to, but does mention Google Scholar’s one weakness: it limits the indexing of the collected files to the first 100-120 KB of the text, while the size of the majority of scholarly articles is close to or exceeds 1 MB (Jacsó, 2005).

Of course, Google Scholar may improve and resolve some of its current issues. Some people think that as Google Scholar improves it will replace the need for the library federated search, but others believe that it is much more of an open question whether the generic Google Scholar can serve researchers better than services specially tailored for various research needs (Tennant, 2005).

Conclusion

Probably the two most popular federated search engines, MetaLib and WebFeat, have some fundamental differences between them, while both are serving the same purpose of the federated search. By allowing users to search multiple databases simultaneously, federated search engines may save some steps in getting results from various library resources, and may also attain search results from databases users otherwise would not try without the federated search. Even librarians at reference desks sometimes notice a good number of search results from databases they might not expect would have many good hits. But in no way can the federated search compete with Google in Google’s strengths: speed, simplicity, ease of use, and convenience. Nor can the federated search truly serve as one-stop shopping for all library databases as people hoped, because some databases cannot be searched by the federated search for various reasons. Furthermore, the author of this article would like to characterize MetaLib as “one-stop window-shopping” rather than “one-stop shopping,” because MetaLib users can normally search only a maximum of ten databases at a time, and they do not have the search-all-resources option. The strengths of the library-federated search, however, lie in the content it searches as well as the objective way it retrieves and displays results, rather than how well it can mimic Google.

The federated search probably cannot replace information literacy education or the learning process either, partly because it cannot make searching as easy as a Google search, as serious research may require selecting various information sources beyond Google results. In a certain sense, the federated search shifts the process of selecting a database from “before” performing searches to “after” performing searches. Users still need to learn the functions of the library OPAC and periodicals indexes with names like ABI, CINAHL, ERIC, MLA, etc. It is also very helpful for users to learn other information literacy basics, such as the ability to interpret bibliography or to tell the difference between books, book chapters, and periodical articles, in order to make better use of the federated search engine offered by their libraries.