Need a reason for semantic HTML?

Try this on for size: an automatic bullshit detector. The Nieman Journalism Lab writes:

You’re reading a wrap-up of the Sept. 22 Republican presidential debate when you land on this claim from Rep. Michele Bachmann: “President Obama has the lowest public approval ratings of any president in modern times.”

Really? You start googling for evidence. Maybe you scour the blogs or the fact-checking sites. It takes work, all that critical thinking.

That’s why Dan Schultz, a graduate student at the MIT Media Lab (and newly named Knight-Mozilla fellow for 2012), is devoting his thesis to automatic bullshit detection. Schultz is building what he calls truth goggles — not actual magical eyewear, alas, but software that flags suspicious claims in news articles and helps readers determine their truthiness. It’s possible because of a novel arrangement: Schultz struck a deal with fact-checker PolitiFact for access to its private APIs.

Yes, yes, I know it’s not using semantic HTML for its fact-finding, but it’s easy to imagine a world where an intelligent webcrawler can use the semantic meaning embedded in every (well crafted) piece of writing on the internet to find and analyze related content.

Let’s invent an example. Say you’re writing a blogpost or an essay in a wordprocessor. You write a loaded sentence like: ”evolution is just a theory”. Behind the scenes, the bullshit detector you’ve installed runs a search for the keywords in this sentence. It gets lots of results, written by everyone from respected scientists to religious crackpots. It then runs a background check on the most relevant results and ranks them by trust-worthiness (the algorithm could take following items into consideration: citations, health-status of links, popularity, the presence of attributions and footnotes, etc… maybe even using a crowd-sourced repository of facts, curated by top-scientists and respected writers) and uses human language analysis to compare your statement (“evolution is just a theory”) with the facts-backed consensus (“evolution is a scientific fact”) and then points out your (potential) lies.

The technology could work for readers too. Say you’re reading an article or opinion-piece on a news-site or politician’s blog, and it underlines all potential lies or falsehoods.

Right now, most false information on the internet (and beyond, as traditional media aren’t much better) isn’t born out of malice or ill-intent, but rather out of sheer intellectual laziness. In the future, the use of semantic HTML combined with Siri-like human language processing could lead to technology that makes doing the right thing, intellectually, as easy as running a spell-check.

The technology to do this is here, albeit in a primitive form (at the moment something like Apple’s Siri requires a massive server-farm). The content is there too, but it still needs a lot of work in terms of marking up the semantic connections between articles and their sources.

And that’s something I think webdevelopers like myself and countless others can – and should – do our part for. And it could start as easy as just using <blockquote> and <cite> correctly.

