Hey there! Thanks for dropping by my blog! Take a look around and grab the RSS feed to stay updated. See you around!

Posts tagged with "scraping"

Internet Honesty

This entry was prompted by <a href=http://joesaward.wordpress.com/2009/...om-the-thieves>Joe Saward's blog</a> writing an entry on some rather blatant plagiarism between <a href=http://formula-1.updatesport.com/new...site/view.html>Update-F1</a> and <a href=http://www.f1-daily.com/news/article...site/view.html>F1-Daily</a>. The fact that F1-Daily has a story bearing the headline:

 

F1-Daily is rogue website
Not related to F1-Daily

 

perhaps suggests which party is the guilty one in this instance. F1-Daily also went down during the typing of this entry...

 

<a href=http://en.wikipedia.org/wiki/Web_scraping>Web scraping technology</a>, which has been partially prohibited in Australia since 2003 under <a href=http://en.wikipedia.org/wiki/Spam_Act_2003>The Spam Act</a> but has ambiguous legality elsewhere, appears to be the cause. Surely such behaviour is against copyright if nothing else, considering that graphics and verbatim news items were copied and uploaded to the internet with only the briefest (and most unintentionally amusing) of edits.

 

Theft is also implied in the act. Not only is that the root of the anti-web scraping laws, but UpdateF1 had paid for material from GMM, which was scraped and published without permission. Since the information was GMM's and it was licencing it out to UpdateF1, F1-Daily was committing information property theft when it scraped that part of UpdateF1's site...

 

...or was it? You see, GMM, for all that it purports to produce "between 10 and 20 original, highly researched and professionally compiled Formula 1 news articles for publication every day", doesn't own much content of its own. Rather, it looks through a quantity of journalistic output relating to F1, makes edits at most and then dumps it into an information stream. It doesn't apply the "two sources" rule that, for example, the BBC generally does. It's not clear how GMM acquires permission to re-publish such stories this way, but even if it did so by the expensive-but-legal method of agreeing article distribution rights, the theft would not be against GMM but its source publications (except, of course, for the aforementioned edits). Sometimes the edits might be enough for it to be considered distinctive content and therefore GMM's own material, but that simply raises it to the level of blogger.

 

As far as I can see, the main problem with GMM isn't the sourcing methodology, though I might question its legality (depending on how GMM came by that information in the first place). It is that it is not entirely honest about the nature of its output (this may be an understatement). If it was honest, fewer people would purchase its output. Those who did would not only be completely aware of what they were getting and make that clear to readers, but they could better hold GMM to account. For one thing, I'd like to see anyone acting as a professional information filter (i.e. taking other people's money for the privilege) to have at least some basic information literacy so that they could do their job properly. Simply dumping stories onto a feed and relying on feed recipients to do the hard work of filtering is not only amateurish, but fairly simple to replicate for free with Web 2.0 technologies such as Yahoo! Pipes.

It shouldn't be complicated. Everyone knows (or should know) that the journalists on the scene are necessary to understanding what's going on in F1. Logic suggests that they are the ones most likely to know the truth (or something close to the truth, where stories are at the guesstimate stage) and therefore the most authoritative sources. Sometimes other sources can come up with creative takes on a situation that shed more light on it - but they shouldn't be taken as gospel. For that matter, stories that sound completely ridiculous generally warrant further investigation before being believed.

Different circumstances affect the story. If you're in a paddock, you will see different things compared to being at the race but watching from the stands. In turn, someone watching from the stands will have a different perspective from someone watching at home. Indeed, the country "home" is in and (in some cases) the availability of broadband access or quality paper journalism can significantly affect what someone understands about a situation, for each country has a different combination of people analysing the typical race.

 

Furthermore, each of us has a particular talent for looking at different parts of the sport and for seeing it in different ways. When we write accordingly, our work improves and we help spread understanding and strength between one another. When we feign an expertise that belongs to another, we confuse ourselves and reduce the quality of everyone else's experience.

 

So let's acknowledge who and what we are. Let us try to fulfil the role(s) we claim to have to the best of our abilities, let others fill the roles we cannot and act with due respect to one another for helping build the F1 community. Some of us fill several roles - in fact most of us when we note that reading, commenting and posting replies can also be roles. In no particular order:

 

Journalists are journalists.

Bloggers are bloggers.

Podcasters are podcasters.

Forumites are forumites.

Commenters are commenters.

Media filters are media filters.

Thieves are thieves.

 

It's when we pretend to be what we're not that the troubles begin...

Read More & Comment