Omahaku

Log in

Omahaku Details

Value is subjective. Therefore the value of information is also subjective. This means that, given enough diversity in the preferences of people, no objective scoring of information can be robust in providing value.

Instead, subjective scoring is needed. Omahaku aims to provide the means to rate information, information sources and other information raters to fully control your information intake.

Our current reputation and matching algorithms use different forms of memory-based collaborative filtering (CF) techniques. Memory-based CF seems like a good choice when the user needs to have precise control over their recommendations. There are some challenges to this approach though.

The main design problem

Giving the user control of a recommendation system has its drawbacks. Using such a system requires more effort, and the algorithms must be simple enough to be controllable by the user. If the system is too difficult to use, it simply won't work. The UI and the algorithms are strongly coupled to each other, which is not the case with recommendation systems where the user has less control. The main design problem is creating a convenient UI for controlling a useful recommendation algorithm.

Data sparsity, gray sheep, cold start

The user doesn't get recommendations only based on the ratings of like-minded users, which may not always exist due to data sparsity. They also get them based on 1) topics they've rated 2) information sources they've rated, and most importantly 3) from other users they've rated positively. With these systems one rating can yield thousands of good recommendations even if rating data is sparse. A single positive rating of a new item is enough to make it reputable in the trusted network (given that there's no other ratings interfering). Gray sheep still get recommendations from their network and the cold start lasts for only a few ratings.

Shilling attacks

Filtering out manufactured recommendations is solved by limiting the CF neighborhood to the user's trusted network. Only if an authentic user takes a shilling account into their trusted network, can their attack start to take hold. If such infiltration occurs and one gets a bad recommendation, they can see the recommender and rate them as not trustworthy, removing the shill account from their trusted network and also from the networks of people who trust the ratings of this account. The cost of creating an effective shilling attack becomes high and the cost of shutting it down becomes low.

Scalability

The amount of personalization data (reputations, categorizations) grows exponentially when the number of users and items grows. This means scaling the system to millions of users and items while providing everyone unrestricted personalizations over the full dataset is not feasible. One option to combat the exponential growth of expenses is to put a cap on the sizes of the personalized datasets. Since the information preferences of the user are known, we can keep the (probably) most valuable items in each user's dataset.

Privacy

The user can choose the sharing settings for each item (its rating and categorization). Currently the service has four settings for rating visibility: private, friends, network and public.

The internet is too large

There's too much data on the internet to be indexed by any search engine. Omahaku too must at some point limit how many items it contains. However, for the system to work, anyone should still be able to add a link they think is valuable to the system. One solution is to give each natural person a link budget. This and the number of people in the world would cap how many pages Omahaku indexes. To increase one's allocation, one would buy link space from others in an auction. This way 1) any person could add an important page to the index via their personal link budget, 2) companies could buy large amounts of link space to get their catalogs etc. indexed, and 3) the index size would remain capped, solving the problem.

Roadmap

Near-future development goals:

Company

The company is owned by me, Tuukka Pensala. I'm also the developer. See my profile