Entries in Searching (17)

Friday
Sep172010

How Does Difficulty Affect a Search?

According to this Google study (pdf), in 5 ways:

  1. more question queries,
  2. more frequent use of advanced operators,
  3. more time spent on the results page,
  4. longer queries appearing in the middle and not at the end of search sessions, and
  5. a larger proportion of time spent on the results page.

None of these surprise me, but query length isn't something I've given much thought to.

(via RWW on Twitter)

Friday
Dec112009

Understanding PubMed User Behavior

From The Bioinformationista:

 

A recently published article in the journal, Database: the Journal of Biological Databases and Curation, investigates the needs and behavior of PubMed users  through the analysis of log data.  The authors analyzed 23 million user sessions with more than 58 million user queries. [The article: Understanding PubMed user search behavior through log analysis]


For the most part, the results - and the authors have many - don't surprise me.  Users tend to use few words* in their searches.  Result sets vary considerably in size.  Searches begin and end on the first page for a vast majority of searchers.  Etc.  

But there are a few surprises (and of course it's nice to have statistical evidence to support 'intuitions'):

1.  The article's supplemental site includes a table comparing user search data from March '08 and February '09.  According to the data, there are over 50 million searches each month and well over 1 million each day.  Note: the authors discarded several million sessions that seemed to represent atypical PubMed usages, so the chart doesn't represent the full picture.  In reality - it's much larger.

From the article's supplemental site. Click the image to pop over to the site. Source: Dogan RI, Murray GC, Névéol A, and Lu Z. 'Understanding PubMed user search behavior through log analysis.' Database: The Journal of Biological Databases and Curation.

2.  Author searches are by far the most frequent PubMed query.  I knew author searching was common, but I didn't realize the extent to which it predominated. 

Figure 4 in the article. Source: Dogan RI, Murray GC, Névéol A, and Lu Z. 'Understanding PubMed user search behavior through log analysis.' Database: The Journal of Biological Databases and Curation.

3. The interface design affects which abstracts are viewed in a set of results:

PubMed users are more likely to click the first and last returned citation of each result page. This suggests that rather than simply following the retrieval order of PubMed, users are influenced by the results page format when selecting returned citations.

As you can see from the chart (again, from the paper), the first result is clicked on the most frequently.  After that there is a steady decline until the third to last article, at which point the number of clicks and abstracts viewed steadily increases.

Source: Dogan RI, Murray GC, Névéol A, and Lu Z. 'Understanding PubMed user search behavior through log analysis.' Database: The Journal of Biological Databases and Curation.

4. 5% of PubMed searches result in no subsequent action (i.e., clicks) on the part of the searcher.  The authors interpret this as a 5% abandonment rate, which would definitely be a statistical floor, considering there'd be many abandoned searches that include user 'actions'.  You search, check a few abstracts, consider them irrelevant, and move on, abandoning your search. This type of (common) scenario isn't discussed in the study and perhaps can't be captured in the log data.  

5. Not necessarily a surprise, but a 'thing' of interest.  The larger the result set, the more likely the user is to run another query - which is something that corroborates my experience working with students and researchers in all areas of health.  Searchers tend to be put off by large numbers of results in PubMed.  Unlike Google, where result numbers are ignored.  

There are several other interesting observations in the article, so take a look.

---------

* The authors use the word 'tokens' to represent individual search terms.  'Words' would, of course, be misleading since things that aren't words (like proteins, genes, acronyms, etc.) make up a large portion of searches. 

The images all come from Dogan RI, Murray GC, Neveol A, Lu Z.  Understanding PubMed user search behavior through log analysis.  Database: The Journal of Biological Databases and Curation (2009)

Page 1 ... 1 2 3 4