Omahaku

Value is subjective. Therefore the value of information is also subjective. This means that, given enough diversity in the preferences of people, no objective scoring of information can be robust in providing value.

Instead, subjective scoring is needed. Omahaku aims to provide the means to rate information, information sources and other information raters to fully control your information intake.

Features

Rate links, topics and other users
Share your ratings publicly, to a select group, or to no-one
Create topics and collect links under them for yourself and others
Present your favorite collections and links in your profile
Use the browser extension to conveniently rate the websites you visit
Find like-minded users or explore differing viewpoints
Get a trusted network of other users based on your own ratings
Get personalized reputations for links, topics and other users based on the ratings of your network
Get personalized search results based on the personalized reputations
Get automatic recommendations based on the personalized reputations
Get content automatically to your feed by following other users, topics, links and searches

Details

Our current reputation and matching algorithms use different forms of memory-based collaborative filtering (CF) techniques. Memory-based CF seems like a good choice when the user needs to have precise control over their recommendations. There are some challenges to this approach though.

The main design problem

Giving the user control of a recommendation system has its drawbacks. Using such a system requires more effort, and the algorithms must be simple enough to be controllable by the user. If the system is too difficult to use, it simply won't work. The UI and the algorithms are strongly coupled to each other, which is not the case with recommendation systems where the user has less control. The main design problem is creating a convenient UI for controlling a useful recommendation algorithm.

Data sparsity, gray sheep, cold start

The user doesn't get recommendations only based on the ratings of like-minded users, which may not always exist due to data sparsity. They also get them based on 1) topics they've rated 2) information sources they've rated, and most importantly 3) from other users they've rated positively. With these systems one rating can yield thousands of good recommendations even if rating data is sparse. A single positive rating of a new item is enough to make it reputable in the trusted network (given that there's no other ratings interfering). Gray sheep still get recommendations from their network and the cold start lasts for only a few ratings.

Shilling attacks

Filtering out manufactured recommendations is partially solved by limiting the CF neighborhood to the user's trusted network. The trusted network produces one component of the reputation score. If a herd of shilling accounts appear and start their attack, their ratings don't have an effect on the network scores. Only if an authentic user takes a shilling account into their trusted network, can their attack start to take hold. If such infiltration occurs and one gets a bad recommendation, they can mark the recommender as not trustworthy, removing them from their trusted network. The cost of creating an effective shilling attack becomes high and the cost of shutting it down becomes low.

Scalability

Scaling the system to millions of users and items while keeping to a memory-based CF might be impossible. It might be that memory-based CF must be limited and augmented with a better scaling recommendation system (in a way that doesn't degrade recommendation quality too much). That said, if the service manages to provide very valuable information, it being more expensive to operate than e.g. traditional search engines might be justified.

Privacy

The user can choose the sharing settings for each item (its rating and categorization). Currently the service has four settings for rating visibility: private, friends, network and public.

The internet is too large

There's too much data on the internet to be indexed by any search engine. Omahaku too must at some point limit how many items it contains. However, for the system to work, anyone should still be able to add a link they think is valuable to the system. One solution is to give each natural person a link budget. This and the number of people in the world would cap how many pages Omahaku indexes. To increase one's allocation, one would buy link space from others in an auction. This way 1) any person could add an important page to the index via their personal link budget, 2) companies could buy large amounts of link space to get their catalogs etc. indexed, and 3) the index size would remain capped, solving the problem.

Roadmap

Near-future development goals:

All browser extensions: Currently only Firefox and Chrome-based browsers are supported.
Better browser extension usability: Show the reputation of any link on the current page, and allow rating or collecting it.
More granular account rating: Make it possible to essentially say: "This account is a reliable source of information in this particular topic."
Better profiles: Make it possible to search from public/visible ratings.
Better account following: Change account follow to mean that visible ratings of the followed account will be in the feed (instead of mere profile additions).
Better text search: Make the text search resistant to typos.
Better UI: Some features are still too hard to use.
...

Company

The company is owned by me, Tuukka Pensala. I'm also the developer. See my profile