Design, query, and evaluate information retrieval systems.
Introduction
Databases are a key aspect of information science, specifically information retrieval. A database is an organized collection of records. Databases may be classified as “reference” or “source.” “Reference databases lead the users to the source of the information,” while “source databases provide the answer with no need for the user to refer elsewhere” (Chowdhury, 2010, p. 17). Modern-day databases are typically web-based, where “users can use common browsers to search and retrieve records … [and] add or edit records” (p. 20).
Design
Bates (1999) says that the big design question of information science is, “How can access to recorded information be made most rapid and effective?” (p. 1048). She identifies scalability as a fundamental information science problem, since “each time the average collection grew to a new level, a new access method had to be devised … from the development of subject headings to the development of hyperlinks” (Bates, 1999, p. 1048). One model for design is design thinking, where “three phases are essentially objectives to be met—to understand, explore, and materialize” (Tucker, 2024, p. 9-6). There are six stages in design thinking: empathize with users, define users’ problems, ideate and brainstorm creative ideas, prototype representations of the ideas, test with users, and implement tested ideas (Tucker, 2024, p. 9-7).
One significant concept in database design is the controlled vocabulary, “a limited set of terms that must be used to represent the subject matter of documents” (Chowdhury, 2010, p. 155). Controlled vocabularies limit the terms that can be used, which can be a major disadvantage. The major advantage of a controlled vocabulary is “the standardization and predictability provided” (Tucker, 2024, p. 4-6).
Subject indexing is another major concept in database design, where indexing systems are “based on the analysis of the contents of the documents,” which can be analyzed manually or automatically (Chowdhury, 2010, p. 98). Subject indexing systems can use pre- or post-coordination. The Library of Congress describes pre-coordination as “the combining of elements into one heading in anticipation of a search on that heading,” while post-coordination is “the assignment of elements to separate headings, in anticipation of a user combining them at the time [they] look for materials in a catalog, usually through keyword searching” (Library of Congress, 2010).
Querying
One of the most common ways to locate information is through searching, but searching can be difficult, especially if a user isn’t sure what to search for. Research has shown that “users find it easier to recognize what they want than to describe what they want” (Tucker, 2024, p. 5-3). The classic model of information retrieval depicted a search in a static, limiting way, where “a single query … yield[s] a single output set” (Bates, 1989, p. 409). Bates proposed a more realistic model called “berrypicking,” which depicts search queries as evolving rather than static, where searchers “gather information in bits and pieces” using a “wide variety of search techniques” and a “wide variety of sources” (p. 421).
An essential search technique to understand is the use of Boolean operators, also known as logical operators: OR, AND, and NOT (Tucker, 2024, p. 6-4). OR allows a searcher to search for multiple terms, and documents will be returned if they contain any of the terms; this is useful for searching for synonymous or similar concepts, like “archives OR repositories.” AND will only return documents where all the terms are found, making it useful for narrowing results. NOT will exclude specific terms, so it needs to be used more carefully than the other operators, as a searcher could accidentally remove useful documents without realizing. It is important for a searcher to know how the search engine they are using interprets these operators, if it does at all. Google, the main Internet search engine used today, no longer supports Boolean operators (Shamaeva, 2022).
Evaluation
Two key factors in evaluating an information retrieval system are precision, “a measurement of discrimination,” and recall, “a measurement of aggregation” (Tucker, 2024, p. 8-9). Precision refers to the relevance of documents returned by a search, while recall refers to whether the system retrieved all of the relevant documents. It is important to conduct user research to ensure an information retrieval system is actually meeting their needs. Relevance is not an objective concept; it is “as perceived by the user of the information” and “inherently subjective” (Tucker, 2024, p. 8-9).
There are many ways to conduct user research. Interviewing users, distributing surveys, and conducting focus groups are common research methods. Card sorting, where users sort topics and categories using index cards, is a “quick, inexpensive, and reliable method” for information design that “generates an overall structure … as well as suggestions for navigation, menus, and possible taxonomies” (Spencer & Warfel, 2004). This makes card sorting a good tool to use when designing a website’s structure.
Evidence
Artifact 1
Assignment: Redesign Proposal for the Plano Public Library Website
Course: INFO 202, Information Retrieval System Design
Description: This group assignment required us to redesign an organization’s website with a focus on structure, organization, and labeling. The objective was to improve the website so users would have a better experience with its navigation. We chose the Plano Public Library’s website, created a site map of its current state, sketched out a new site map, and wrote a report about our redesign.
This redesign project demonstrates my understanding of design principles that improve users’ experiences. We took an existing website with a messy, uneven structure and created one that would be more balanced and easy to understand for library patrons. The labeling of menu categories did not necessarily match the items within them; for example, “Interlibrary Loan” and “Computers & Printing” were listed in the “About” menu, which would not be an intuitive place to find those options from a user perspective. By focusing on clear labeling, we created a site map that library patrons would find a lot more user-friendly.
Artifact 2
Assignment: Beta Prototype Design Document
Course: INFO 202, Information Retrieval System Design
Description: A major group project in INFO 202 is designing, building, and evaluating a database of objects with at least eight fields. One part of this project is creating a beta prototype design document containing a statement of purpose, a database with working search and submission forms, and rules for each field. Some fields needed to be “hard,” meaning its value could not be entered correctly without a thorough reading of the rule. Each rule needed to include the field’s unit of analysis, whether it is required or not, if it can have multiple values or only one, if it uses a controlled vocabulary, and any definitions or explanations that an indexer would need.
This assignment demonstrates my ability to a database from scratch, considering user experience throughout the process. Our group made a database for cataloging varieties of cheese in a way that would be useful to both sophisticated and everyday cheese lovers. We chose to include basic fields like cheese type, brand name, country of origin, and price range, as well options for milk type, age, flavor, rind, and firmness for more intricate searches. By considering the possible use cases for our database, we were able to create a comprehensive cheese database.
Artifact 3
Assignment: Comparative Analysis of Major Subscription Databases: JSTOR and Project MUSE
Course: INFO 210, Reference and Information Services
Description: This group assignment had us compare major databases that many libraries subscribe to from the perspective of a small library with limited financial resources. We needed to consider scope, quality of content, accuracy, currency, authority, ease of use, arrangement, and appropriateness of the databases.
We compared JSTOR and Project MUSE. I wrote the ease of use comparison, as well as contributed to some other writing and editing. This assignment demonstrates my knowledge of usability principles and database searching. I found that Project MUSE and JSTOR use Boolean logic in very different ways, which impacts users’ search behavior; JSTOR allows Booleans in text searches, while Project MUSE only allows Booleans in advanced search using drop-down options. JSTOR loaded searches much faster than Project MUSE. Based on those findings, and the findings of others in my group, we decided that JSTOR was the better database to spend precious funding on.
Conclusion
I was very familiar using databases before starting the MLIS program, but in the years preceding the program, I was mostly using law databases. The MLIS program made me become reacquainted with the kinds of databases I used a lot in my undergraduate program. I am much more comfortable with learning how specific databases operate now. As Google and other Internet search engines become less useful, it is important to have the skills to find information in other places.
References
Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407-424.
Bates, M. J. (1999). The invisible substrate of information science. Journal of the American Society for Information Science, 50(12), 1043-1050.
Chowdhury, G. G. (2010). Introduction to modern information retrieval (3rd ed.). Neal-Schuman.
Chu, S. K.-W., & Law, N. (2005). Development of information search expertise: Research students’ knowledge of databases. Online Information Review, 29(6), 621-642. https://doi.org/10.1108/14684520510638070
Cooey, N., & Phillips, A. (2023). Library of Congress Subject Headings: A post-coordinated future. Cataloging & Classification Quarterly, 61(5-6), 491-505. https://doi.org/10.1080/01639374.2023.2193584
Library of Congress. (2010, May 2). The Policy and Standards Division’s progress on the recommendations made in “Library of Congress Subject Headings pre- vs. post-coordination and related issues”. https://www.loc.gov/catdir/cpso/pre_vs_postupdate.pdf
Shamaeva, I. (2022, November 4). Boolean search is dead. Boolean Strings. https://booleanstrings.com/2022/11/04/boolean-search-is-dead/
Spencer, D., & Warfel, T. (2004, April 7). Card sorting: A definitive guide. Boxes and Arrows. https://boxesandarrows.com/card-sorting-a-definitive-guide/
Tucker, V. M. (2024). Design concepts in information retrieval: Creating user-centered systems, search engines, and sites.