11.3 Google Scholar and AI Citation Searching

The list of databases under Sources of Evidence are called abstract and indexing databases; they pull or are provided with content from journals and journal publishers; specifically, they are provided with a list of abstracts when a new issue is published. So, when you search, you’re not generally searching an article’s full text, just it’s summary, which is often beneficial. They employ the equivalent to editorial boards to review the quality of journals before including them in their index. Their content is relatively static; run a search today and run the same search 6 months from now and the only difference will be content that has been added to the index since you last ran your search. The results returned to you are not dependent on who you are; while you can re-order a result set by relevance, citation count etc, the pool of results returned to you and your friend(s) will be identical. While they may index millions of articles, they rarely return millions of hits; they are built to cater to specific disciplines and as a result, the result sets are usually manageable. You also get access to the full set of results.

Google Scholar is pretty amazing, but it works quite differently from the databases in Sources of Evidence. The search platform does work with publishers, but it also crawls sites known to host academic content. It frequently searches the full text of articles, not just the abstracts, so it risks bringing in less relevant content. It ingests more or less whatever it can find; there is no editorial review on the content. What it returns in your results list is dependent on who you are and when you’re running your search; don’t anticipate that running a search at different times, from different machines, or as two different people will ever return the same result set. The database is huge; result sets are consequently also generally overwhelmingly large, but you’re also only provided access to the first 1000 hits. All in, there’s a fair bit happening behind the scenes that you don’t control.

AI informed citation searching services are also pretty amazing. These discovery tools generally rely on network analyses, using things like citations, to draw connections between papers and other algorithms to topically cluster papers. The indexes that they have to draw on don’t generally come from publishers directly or by crawling the web like Google Scholar does, but rather by leveraging the meta data available through the organizations that, for example, issue DOIs for journal articles.

11.3.1 When to Use Which

These three kinds of services suggest three different approaches to the discovery of published evidence. Which you use will be determined by what your’re trying to acheive with a review of the published evidence.

The first – those listed under Sources of Evidence – are curated, stable, and reproducible. When being systematic in your approach, or when you need a confined set of literature that is generally accepted within academia, these should be your primary source of evidence. Any bias introduced here is through publication bias and the curatorial work done on selecting journals to index.

The second – using a service like Google Scholar – are great when you are already familiar with a subject area, and need a quick, topical citation. Don’t expect to get a full evidence summary here, but do expect to find complimentary evidence to what you’ve already found, and to find it quickly. The page ranking algorithms used introduce bias. This is somewhat offset by the fact that these systems are rather indescriminate in what they’ll include.

The third – clustering and networking services – are great compliments to the above two sources, especially for serindipedous discovery. No search will ever return all relevant results; citation tracking and thematic clustering of abstracts can be hugely beneficial to trying to understand the scope of published evidence available. For the purposes of evidence synthesis, these tools on their own are insufficient. For the purposed of large evidence synthesis and exploratory efforts, these tools are invaluable. Bias in these systems will largely result from the clustering algorithms used.