Skip to content

ClassificationAndRecommendation

John Nowak edited this page May 1, 2014 · 26 revisions

We have three complementary jobs when it comes to presenting books:

  • To classify the books into feeds, such that each book in a feed is related in some way.
  • To show each patron the feeds most likely to interest them.
  • Within each feed, to put more the best and most interesting books at the front of the feed.

The first is a job for a classification scheme. The second and third are jobs for a recommendation engine.

A good classification scheme reduces the need for a good recommendation engine, and vice versa. The more precisely you know what kind of book someone likes, the less picky you need to be about the relative ranking of books in that category. But if you could always present someone with the perfect book they want right now, then classification would be irrelevant.

Classification schemes

A lot of schemes have been devised to classify books.

  • BISAC Sample: "POLITICAL SCIENCE / Public Policy / City Planning & Urban Development"

  • BIC Sample: "FKC" (Classic horror and ghost stories), child of "FK" (Horror and ghost stories), child of "F" (Fiction). A bidirectional BIC↔BISAC mapping can be obtained from BISG.

  • Dewey Decimal classification Sample: "188" (Stoic philosophy), child of "18" (Ancient, medieval & eastern philosophy), child of "1" (Philosophy & psychology)

  • Library of Congress classification Sample: "QE521-545" (Volcanoes and earthquakes), child of "QE" (Geology), child of "Q" (Science)

  • Library of Congress subject headings Sample: "Fundraising cookbooks"

  • Bookstores like Amazon have their own proprietary classifications, e.g. "Books > Arts & Photography > Architecture > Urban & Land Use Planning". These also show up as GoodReads "genres".

  • Amazon books also come with classifications from the publishers, although these aren't displayed very prominently. Major publishers tend to use LOC subject headings or BISAC classifications. Self-published books effectively use tags.

  • A book's author and the series it belongs to groups it with other books in a very basic way.

  • Folksonomic classifications are emergent classifications derived from use. For instance, GoodReads "shelves", Twitter's hashtags, or the tags in any system that supports tagging. A tag like "hugo-winner" or "female-protagonist" divides the universe of books into two sets: a set of books that have a certain feature and a set of books that lack that feature. Tags may also express private sentiment ("4-stars").

  • Similarly, any list of books, no matter what other theme it might have, divides the universe of books into two sets: the books on the list and the books not on the list.

Recommendation sources

We only acquire a paid ebook if a librarian thinks it's worth stocking. This sets a minimum level of quality on our paid selection. This means automated techniques for detecting quality aren't as useful. (They may still be useful on our public domain material.)

Recommendations should focus on identifying aspects of a book most likely to appeal to particular audiences, so that we can order the same feeds differently for different audiences.

Goodreads

Let users who are member of Goodreads connect to their Goodreads accounts, and access the books in their shelves, their ratings, their reviews, and their friends – the social reading graph.

LibraryThing

LibraryThing API

Bibliocommons lists

Reviews (sentiment + subject matter)

Recommendation engines

http://blog.mortardata.com/post/82195614895/giving-away-our-recommendation-engine-for-free

Clone this wiki locally