All you stat lovers out there, check out our new site, The Full Wiki. It allows you to monitor global trends by seeing what's hot on Wikipedia. We've combined Wikipedia traffic data with their categories, to create 13,000 what's hot lists. Updated daily.
Some interesting examples from today's trending topics:
Our new site, The Full Wiki has brought together Google Maps and Wikipedia. Now you can view any article on Wikipedia with all the locations mentioned on a Google Map.
The map and the article are linked throughout. Click on a map marker and it jumps to that part of the article. Click on an article marker and it will show you that location on the map.
See for example our tourism fact map. If we click on Bangkok (Q), for example, we can see the relevant part of the article and learn that it's the third most visited city in the world. Or zooming into Europe and clicking on Nice, Wikipedia tells us it was one of the first and best established resorts in the French Riveria.
Everyone hates Microsoft Word's grammar checker, yet no one tries to replace it.
In the 1970s and 1980s there were several grammar checkers. Then in 1992, Microsoft added grammar checking to Word. The rest is history. Microsoft dominated the market and innovation stopped.
AbiWord, a competitor to MSWord, now uses an open source package, Links Grammar, to do grammar checking. But as I understand it, Links Grammar is a rule based checker. In the text processing field these days though, higher quality results come from throwing a large amount of data at the problem and using statistically-based algorithms to mine it.
This is what Google does. Their Google Sets service extracts lists from the web and does association rules on them - similar to Amazon's "people who bought this book also bought these books". Google has a patent on this. Google translate and other commercial machine translators learn by comparing large volumes of human-translated texts. These are more effective than attempts to codify all the rules of a language.
So why not use the same approach for grammar checking? There's a lot more data out there to use than back in 1992. But what data source to use? I propose: the Wikipedia revision history. There's about 3 terabytes of text publically available. A large proportion of it is simply minor grammatical corrections. I believe this would be the largest publically available source of grammar corrections in the world. There is something in the order of 300 million revisions on the history. And if only a third are corrections of grammatical mistakes, that's 100 million corrections to learn from.
How would it work? I'm not sure exactly but I have some ideas. Look for minor changes in between revisions. Some are even tagged as grammatical changes:
Consider subsequent edits to revisions to be higher quality than the previous. Editors are less likely to change correct grammar into incorrect grammar.
Consider revisions that last a long time to be higher quality than ones that disappear quickly. Vandalism is generally wiped quickly.
Use a part of speech tagger to help disambiguate word usage.
Generate transformations using both the actual words and their morphology. It wouldn't take long to discover that "PLURALWORD is" is replaced with "PLURALWORD are".
Part of speech tagging and parse trees could also be used.
I attempted this back in 2006 but I've decided it would be best as an open source project. I should dig into my code archives and see what I can find.
What this project needs:
Big time server resources, beyond what my project TheFullWiki can provide. I understand many universities have already created servers to store Wikipedia revisions and analyse them. Access to this would be very helpful.
Computational linguists
A more recent dump of Wikipedia revision history would be nice.
From the makers of NationMaster comes a new project aimed at harnessing the wealth of content of wikis.
It's called the The Full Wiki. Its goal is to become a platform for an enriched user experience for wikis using open licenses.
We have seen many fantastic projects come and go under the weight of traffic spikes, large datasets and the need to stay fresh. We want to provide serious hosting resources to make these projects feasible.
If you have a wiki oriented project you'd like us to host or provide other help, check out The Full Wiki.
We are honoured to be listed among Australia's best web 2.0 applications this week. Ross Dawson produced a list for the prestigious Business Review Weekly showing our most internationally successful, innovative sites.
Geoff Evason put together a list ranking according to Quantcast traffic data alone. The list seems to have disappeared though, so I've reproduced it myself:
THe middle column is rank according to Feb08 Quantcast data. This puts us in 3rd and 18th places. No matter how you cut it, Rapid Intelligence is one of Australia's top independent publishers.
The Wall Street Journal has an article on Outsourcing your life about how ordinary consumers are starting to use offshore labour for personal tasks to save time. Because this service work is so much cheaper, and our lives are getting more competitive and complex, this trend is sure to grow.
1. Education: Have personal teachers give your kids them quality one-on-one time. There'll be no need for the tutors to factor in travel time. And larger companies will be have much more specialised staff, not only by subject but by learning style.
2. Babysitting: Ok, maybe a Russian mother on a screen won't help much with a toddler, but at least you could watch the watcher. They could be an extra set of eyes to ensure that the kids are ok and call you when there may be a problem.
3. Adult education: In the information age, we miss out primarily because we don't understand. Having a relevant person on-hand at any time to explain anything would be pretty useful.
4. Shopping: Consumer purchases are quite complex these days. What you see in your local store is a tiny proportion of what's available on the web. Technology purchases makes the complexity of the decision limited only by your capacity to understand the product.
5. Counselling: It's a big step to leave the house to seek professional help. Many people are ashamed and combine it with laziness, you have hesitation. But in your own comfortable environment, many people may seek a sympathetic or helpful ear for $2 an hour. You probably won't even need to book. You just get someone straight away when you're at your most vulnerable and open at 3am Saturday night. The fact that your counseller is not western can be a plus. The pitch: Get timeless spiritual wisdom from a land untouched by modern life (except for offshoring and the internet of course).
6. Video editing: The amount of video and photography we generate these days has not been matched by increased patience in your friends to consume them. To retain people's attention, you'll need some human touches.
7. Writing: We all have to do it but few of us do it well, and the best of us still need a second opinion. Right now I'm thinking how nice it would be if I could paste this to an Indian on Yahoo Messenger for a thorough proofread.
8. Events: What is a event planning but a series of phone calls, invitations, emails and delegation of tasks? There's a place for a meetup.com with a more human touch.
Now of course there are limits to how well a consumer can define their problem and how culturally aware the individual service-provider is, and how motivated the service-provider is at home (particularly in Indian homes, where distraction is a way of life). For now, it's mostly one-on-one but whole new industries will be forming. Each of these could be commodified into specialist industries and managed like upmarket call centres.
But what about trust? New companies will build their reputations upon it.
What about privacy? The incentive for developing countries to develop and enforce law that meshes with western legal systems will be too big to ignore. It will happen.
It just occurred to me as I was doing a vanity search that lead to Google Books: wouldn't books be the ultimate authoritative source of links for a search engine?
Books sometimes have full URLs in their footnotes like the example above. They're not links you can click on, but a search engine (like Google) that indexes books can certainly read them.
There are any number of ways of spamming websites that have arbitrary been deemed authoritative, like with form spam. But nobody, to my knowledge, has thought of negotiating links on actual tree flesh in order to get into Google? It would be a very pure source of quality links.
The Australian Internet Industry Association has published a risk analysis for different kinds of entities for the federal government's new copyright legislation. Scary stuff. Here it is for small businesses.
I don't remember the last time I spent 4 hours on one site. But Pandora's got me. I'm gonna blog about it again.
I really love how Pandora is telling me why I like the songs I do. Every time I click "Yes, I like it" I'm adding to my list of songs I like and refining the criteria for future songs. With each rating, it gets more articulate about my tastes, far surpassing what I could have come up with myself. I'm learning about my own preference for modal harmonies, slow moving bass lines and highly synthetic sonorities.
Pandora still keeps most of its smarts to itself, which is understandable if they want to be your radio station. But take communication a little further and you a fantastic way to teach people about art and culture: telling the story through your own tastes. Imagine a service that highlights elements that your favourite paintings have in common. It would expose you to new works and when your tastes develop and change, contrast what you liked before to what you like now, explaining why and how. It would congratulate you on your growing sophistication and challenge you with art it knows you'd like if it you'd just give it a chance. It would also connect you also with similar people going on a very similar artistic journey.
I was referred to Pandora today. it plays you a personal radio station where you can specify seed songs and artists and plays similar ones based on hundreds of attributes discovered in the Music Genome Project.
I just set up a channel based on The Orb. It's currently playing Moog Apella by 16B, all new to me but sounds great. I sent the channel to a fellow Orb fan. He said he really liked the track playing now, but it was different.
Was thinking it would be great if you could share channels on an ongoing basis. With such a wide library of tracks available, you'd think it could resolve disputes on what music to play in office environments. One person would play the music through their speakers (others could do if they wanted the sound closer to them). But everyone could rate the tracks that were coming through and put in a set number of request tracks for others to rate. The anonymity of the rating system would make it fair too; a secret ballot.
They should have a feature "combine channels" where different users select channels they want to hear and submit them. Then the system merges those to create something new, common and live. Would be great for being in the same space as remote workers.
A nice way of sharing for now is just to get visitors to enter their favourite tracks now into the same channel. I went round to this friend's place yesterday and we each took turns to name favourite songs. And yeah, a lot of the time the new songs played were to everyone's liking.
Next time I have a nice lady over, I'm going to ask her what her 3 favourite romantic songs are. I'll enter them into a new Pandora channel then only demute once a different song to those three come on.
As researchers and information professionals are called upon to provide not just information but intelligence, NationMaster.com is a great resource for gaining new insights from the available information.
I look forward to seeing what the traffic looks like. I remember a few years back we got a flood of email asking quite a range of detailed questions about our sources, then checked our stats to see we'd been on the front page of conspiracy site, Rense.com.
While I'm blogging, I'd like to thank everyone involved with STIRR. Creative party games made for a good vibe in the room and it was a good focused crowd. Our glorious team, team 3 won with our ShoeWave.com business; a peer to peer sock sharing service where you send in an odd sock with a dollar, and receive 2 matching socks back in the mail. Genius .
Also great fun was Clickaholics. It was a younger crowd this time, not coming on the tail end of a big expensive conference. Free alcohol didn't last long but who cares? People did!
Met fun, interesting and smart people, all of whom will be great to see again.
Just tried out ChaCha, a new stab at the old humans-search-for-you concept. The conversation was very slow. Each response took 1-3 minutes and overall it was about half an hour. I guess if you wanted to use them effectively you could open up 10 windows and ask 10 questions. I hope they have good protection against bots running thousands of queries at once.
Status: Looking for a guide ...
Status: Connected to guide: ErinL
ErinL: Welcome to ChaCha!
ErinL: hello!
You: Hi!
ErinL: Welcome to ChaCha! Please wait a moment while I search for your results.
You: are you there?
ErinL: yes I'm still searching. i'm just getting the news show.
You: it's an australian show
You: a comedy
You: different the us one
ErinL: I've found it. bare with me.
ErinL: this is the correct show, right?
You: Yep
You: I'll give you a clue
You: http://www.google.com/search?q=%22rob+sitch+played%22+frontline&hl=en&sourceid=gd&rls=GGLD,GGLD:2006-41,GGLD:en
ErinL: here is the wikipedia site. he played Mike Moore.
You: Thanks..Does that url come up now?
ErinL: Are these results sufficient? Is there anything else I can find for you?
You: Yep what's the most popular site published by Rapid Intelligence?
ErinL: ok bare with me as I find that answer as well.
ErinL: Is this what you are looking for?
You: ..
ErinL: hmmm?
You: is somehting supposed tocome up in the guide results now?
You: I don't see a url
ErinL: yes. you should have 3.
ErinL: www.rapint.com
You: nothing showed up..I'm using firefox
You: one more question.. Who is the opposition leader in Australia?
ErinL: oooh. I don't think it's compatible with fire fox. This is the 2nd time I've had this happen.
ErinL: if I give you the www site can you find it from there? I'm not familiar with firefox.
You: yep no prob
ErinL: www.en.wikipedia.org/wiki/Frontline_%Australian_TV_series%29
ErinL: www.rapint.com for the rapid intelligence question
ErinL: I hope this helps you. If not let me know. Is there anything else I can do?
You: Yep Who is the opposition leader in Australia?
ErinL: http://en.wikipedia.org/wiki/List_of_Australian_Opposition_Leaders
ErinL: It's Kevin Rudd.
You: Thank you
You: One more quesiton: where are you based?
[many minutes pass]
ErinL: Thanks for searching ChaCha! Have a nice day. thank you.
ErinL: Please RATE ME. Thanks for using ChaCha.
Status: Session ended.
Status: Looking for a guide ...
Status: Connected to guide: AmandaG
AmandaG: Welcome to ChaCha!
AmandaG: Chacha is based in Indiana.
You: Where are you personally right now? (which city)
[10 minutes pass. Things start getting really slow here]
You: are you there?
You: Are you checking to see if you're allowed to answer this or?
You: Hello?
You: Thanks, I'm done.
Status: Session ended.
Status: Looking for a guide ...
Status: Connected to guide: Steven C
Steven C: Welcome to ChaCha!
Steven C: hi
You: Hi Steven
Steven C: hi
You: where are you?
Steven C: in USA
You: Ok thanks
Impressive they could actually field queries despite a recent spike in traffic. But I can't say I think it's a great business. Perhaps if you publish the chats, you've got yourself an easy way to generate content for AdSense. Perhaps ones that both chatters agree is worth publishing. Or it could work doing verticals. I'm sure there's room on the net for a few Indian mesothelioma experts giving you advice then sending you to affiliated sites.
Luke Metcalfe is CEO, manager, developer and founder of Rapid Intelligence, a content
aggregating web publishing company based in Sydney, Australia. Rapid Intelligence does large research-oriented websites with
titles including NationMaster, Factbites
and Qwika.