What I’ve done with My Summer Vacation

It has been a busy month at the Institute with all of us traveling and moving in different directions before the start of the semester. I thought it would be useful to recap some of the recent projects I’ve been working on and where they are headed. Some of this information is just an update to previous posts on my blog, some is new. Sorry if it is repetitive, but it helps me put it all in one place.

My projects have revolved around two threads…credibility and digital reference. There are three major projects here:

• Reference Extract: A digital reference search engine
• Story Starters: building a blogging community of answers and questions, and
• IKE: the Inductive Knowledge Engine

Let me break them down:

Reference Extract

Description: Web search engine that searches on digital reference materials from library and AskA sites. Expanded search allows you to go beyond digital reference archives to cited resources.
Digital Reference Perspective: Demonstrates two things: First, that useful referenced authored products can be created from digital reference knowledge-bases that both preserve patron confidentiality and provide very useful service to patrons with minimal time and effort. Second introduces the concept of “reference weighting” where results are ranked by actual usage in digital reference services.
Credibility Perspective: Builds a testbed to explore how librarian selected materials (both librarian created in the form of digital reference knowledge bases and librarian selected resources in the expanded search) can lead to highly credible results versus wide open web searching. Explores “levels of credibility” where as the user expands their search, they “step down” in terms of expected credibility of results.
Current Status: Reference Extract is built on the open source web crawler Nutch. Lisa has become pretty expert on working with Nutch. We have optimized the creation, and combination of different indexes used in both the base and expanded searches. The system is open to try at http://www.digref.org We are having problems in the expanded search with a lot of noise getting included in the expanded search through hijacked URLs and page redirects.
Next Steps: As will become evident when playing with the Story Starters website, we see Reference Extract as being a sort of chainable “build your own search engine.” Right now the system shows controlled digital reference search, followed by a broader cited reference expanded search. We see these two types of indexes as only two of a series of possible. Once could imagine, searching blogs as the base search, then the expanded index. Or search a local library site, then a community site, then the base reference extract then the expanded search, then Google. The idea is to make a sort of erector set of indexes (local and hosted at the Institute) that a person could build their own search system. We are also working on a “Digital Reference Deathmatch” where you could see the top 25 results from any of these indexes run side-by-side. Users could then vote on which index did the better job. Stay tuned.

Story Starters

Description: A blog infrastructure that allows people to create ideas, questions and items that bloggers can write about. It then collects the responses from the distributed blogs and lets you browse and search them.
Digital Reference Perspective: Story Starters is my current vision of the “expanded digital reference environment” that includes not only a system to ask and answer questions, but to browse the answers, blog about things without being prompted by a questions, and to rapidly build linkages to other tools like search and collection development. In Story Starters questions can have multiple answers, answers can come from anyone (librarian, user, expert), and all answers are seen in the larger context of someone’s blog. The person providing the answer is just as much transparent as the user who asks the question. Digital reference is seen as a community activity rather than a patron to librarian interaction. It also shows that open source tools can be used to build digital reference systems and that the feature sets of digital reference are from from set.
Credibility Perspective: Two aspects of credibility are explored: reliability and context. Reliability is the repetition of results over time and sources. Do you get the same answer if you ask 5 people? Context asks who are those five people? It also looks at answers as coming from a context. In library science we tend to do a lot of looking at the context of the people asking question (the reference interview), but don’t do too much talking about the people answering questions. Here, you see an answer as one entry in a persons blog. The other entries make up context.

Context also covers the idea that different answers may be credible at different times. For example, there was a story starter posted that asked why teens abuse alcohol. There were two answers. One was a short narrative answer about family problems and the pressure to look cool. The other was a long list of citations. When you looked at the blog contexts, one was from a teen, and the other from a reference librarian. If you want a teen perspective, the first answer is more credible. If you want a more research broader perspective, the latter was more credible. How do we build tools that support “credible” information, they must accompany this situation.
Current Status: The Story Starters prototype is available at http://storystarters.iis.syr.edu/StoryStarters/ You can play with it and see the ideas in practice, while the real system is being built. We have also created a WordPress plugin so you can see how StoryStarters can be tightly integrated into blog software. If you don’t use WordPress, you can use bookmarklets or cut and paste so long as your blogging software supports trackback URLs.
Next Steps: Now that we have the prototype we are building the final open system that is intended to be used by the whole blogging community. We then will take this code base and modify it specifically for libraries and AskA services as OpenQA, the next version of our QABuilder software.

IKE

Description: The Inductive Knowledge Engine, or IKE, uses concepts from complexity theory to analyze digital reference data (currently the responses in Story Starters). The idea is to use inductive methods to find related information, weed old information, and find unique combinations of responses. In essence a dynamic data mining application for digital reference. This one is VERY young, so I’ll be writing more about it soon as it matures.
Digital Reference Perspective: There has been a lot of comments on the perceived usefulness (or more often lack of usefulness)of digital reference knowledge bases. It is true that deductively derived knowledge bases have a limited life span and utility. But there has never been a true test of these uses. Complexity theory gives a variety of tools and approaches to organize this material and see how it evolves over time. It is also very useful on very large number of objects so we might be able to manage thousands of digital reference transcripts.
Credibility Perspective: Story Starters allows multiple views on a given question or topic. Can we automatically detect when these views agree? Can we look for relation among topics that are non-obvious (people who said X on social security also tended to say X about black holes)?
Current Status: I’ve built a very simple system to cluster Story Starter responses. This system will be enhanced to provide more dynamic simulations.
Next Steps: Add dynamic variables in clustering. Build a better interface to the tools. Try the tools on other data sets. Much more on this to come.

One Reply to “What I’ve done with My Summer Vacation”

Comments are closed.