Log in

No account? Create an account
A little more about Wikipedia. - Chronicles of a Hereditary Geek [entries|archive|friends|userinfo]
Darth Paradox

[ website | Pyrlogos - a fantasy webcomic ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

A little more about Wikipedia. [Dec. 21st, 2005|09:50 pm]
Darth Paradox
[mood |contemplativecontemplative]
[music |The Daily Show]

So I've engaged in discussions with proponents and detractors of Wikipedia. I've arrived at some conclusions (by which I mean "opinions").

* The vast majority of the time, Wikipedia is a wonderful repository for an unimaginably huge quantity of human knowledge. Whatever else it does wrong, it's still better than anything that's come before - or practically anything else around today - probably by a few orders of magnitude (on whatever scale you care to measure its overall usefulness). It has several strengths - mainly, the use of collective knowledge over the singular knowledge (and commensurate opinions) of any one domain expert provides, in general, better coverage of the range of topics, better coverage of a given topic, and a dampening effect on opinions or unintentional bias, at least in topics not generally considered "controversial". Also, the fact that changes don't require approval by an editor greatly increases the volume of topics and edits that Wikipedia can handle.

* That said, there are places where it falters. There is always a small probability that the information you get will be incorrect, either because someone with insufficient expertise wrote it (or someone with sufficient expertise made a mistake), or because it's been intentionally (but not obviously) defaced. While such errors will generally be caught relatively quickly for prominent topics, for "second-tier" topics there is always the possibility of inaccuracy. This is shown by both the fake bio story (in which intentionally incorrect, and in fact rather libelous, information was left on the Wikipedia biography of a journalist for four months), and Eric Burns' Fort Kent experiment (search for "Fort Kent" - he inserted erroneous information into the entry of a somewhat small but not un-notable town in Maine, and it was not corrected after two weeks). How big of a problem this is tends to vary with the type of information you're seeking, but as it stands Wikipedia is a long, long way from being able to be considered a reliable, credible source of information in its own right. That said, I still use it for my day-to-day informational needs, but it is entirely insufficient should I ever need information for an "official" purpose. Perhaps the Wikipedia community is okay with this; I see it as a shortcoming.

* There is a problem of inclusion. Well, perhaps "problem" is too strong, but there are very definite subdomains of knowledge that people seem to think is not of broad enough importance to warrant inclusion in Wikipedia. (Yes, I'm talking about webcomics, but it applies to a lot of things - do we really need an entire page in Wikipedia for each of the nearly 400 types of Pokemon? The individual Pokemon pages themselves account for a little less than one in 2200 pages in Wikipedia, and the search results for Pokemon contain over 2100 pages - i.e. a little less than 1 in 400 pages in all of Wikipedia were deemed to have some relevance to the concept of "Pokemon". No, I'm not bitter about webcomics. Really.) And supposing we do end up with a set of subdomain-specific wikis, there's still the issue of coordinating between them so that they can represent one comprehensive body of knowledge. (Webcomics already has its subdomain wiki: Comixpedia. Incidentally, Scatterplot doesn't have an entry there, and I refuse to write it myself.) Anyway, if there are to be inclusion rules, they need to be consistently applied across all domains, and coordination with other domain-specific wikis needs to be made more effective, or else Wikipedia is going to appear biased towards focus on certain domains and ignorance of others.

Anyway. At some point, if I have the headspace (a term that I may or may not have coined, referring to capacity to think about or concentrate on something in between all the other crap that a busy person needs to deal with (maybe I did coin that use of the term, anyway - Wikipedia recognizes it only as a firearms term)), I intend to write out a model for editing processes for a collectively editable body of knowledge that I hope will address some of Wikipedia's weaknesses while preserving its strengths. I look forward to some good, quality discussion.

But now, I've already spent an hour on this, when I should have been working on Scatterplot.

[User Picture]From: darthparadox
2005-12-22 07:55 am (UTC)
Yep. For prominent, non-controversial topics, Wikipedia is doing pretty well indeed.
(Reply) (Parent) (Thread)
[User Picture]From: eard_stapa
2005-12-22 08:00 am (UTC)
I just thought it was interesting because someone had posted that in a community I'm in about 4 days ago...

Wales said the accuracy of his project varies by topic, with strong suits including pop culture and contemporary technology. That's because Wikipedia's stable of dedicated volunteers tend to have more collective expertise in such areas, he said.

The site tends to lag when it comes to topics touching on the humanities, such as the winner of the Nobel Prize for literature for a particular year, Wales said.

So basically Wikipedia users are geeks? News to me. ;) Although there *is* an Old English version of the site...
(Reply) (Parent) (Thread)
[User Picture]From: zephulos
2005-12-22 09:25 am (UTC)
I think I used the term "headspace", or its cousin "thinkspace" some time during middle school to explain why I had a hard time concentrating on something when people were hovering around me. It essentially refers to the no-fly zone around my head into which I felt like I was projecting my thoughts.
I like your definition better.
(Reply) (Thread)
From: ex_miang438
2005-12-22 09:46 am (UTC)
Two things.

First, on Wikipedia and its derivatives: We've gone around on this, and given how much we do agree on I think I can finally distill my platform to one specific request. You said: "Wikipedia] has several strengths - mainly, the use of collective knowledge over the singular knowledge (and commensurate opinions) of any one domain expert provides, in general, better coverage...[etc.]" I've seen this argument pretty much everywhere, so I'm certainly not trying to take *you* to task for it, but just once I'd like to see it defended. So please: this supposed chief strength, that collective knowledge > individual knowledge, someone show me that a) that actually happens all or almost all of the time and b) that it *works* all or almost all of the time. It's always trotted out as an axiom, an assumption, and from what I know about shared information bias, I don't buy it.

(If you don't want to do the work on that, for which I certainly couldn't blame you, feel free to link me elsewhere. The news item comparing it to Britannica doesn't count, though; for one, that was also a group of people, and for two, a reputable source being bad doesn't make a slightly worse source good.)

Second, on "headspace": I believe you're the only person I've seen to use the term like that, but it has certainly been coined with alternate usages before (my personal favorite: a Suikoden comic called "Luc in Headspace," which I would link to but the site is unexpectedly down.) Generally I've heard used by fen referring to the process of being lost in their own thoughts, only they get interrupted by characters. Or muses. Or something. I get the "lost in one's own thoughts" part, anyway.
(Reply) (Thread)
[User Picture]From: mcmartin
2005-12-22 10:31 am (UTC)
A simple experiment would be:

- Finding an article or set of articles on a topic you know.
- Determining its general accuracy, following up on stuff you don't know to check its accuracy. Ideally, this would be you (as an expert) learning something new (that is both true and thus not what you would put in).
- Checking its revision history to see if these facts were written in by multiple people.

In any case, it shouldn't be an axiom. I assume the intuitive argument would go:

- Only experts make substantive changes to topics requiring experties.
- A bunch of experts will have fewer gaps in their knowledge than just one.

with perhaps a side of:

- Shared information bias will afflict any encyclopedic effort, so bringing it up as a criticism is unfair.

Regarding headspace, I've heard it only as "getting in a character's headspace", which is a precondition to being able to reliably write that character's dialogue or determine his/her actions in a given scene.
(Reply) (Parent) (Thread)
[User Picture]From: partiallyclips
2005-12-22 08:02 pm (UTC)

{Long, flamey post self-deleted. One-sentence summary:}

Before you form an opinion on something, and especially before you air the opinion in public, you should directly examine the available sources of information on the subject for yourself.
(Reply) (Parent) (Thread)
[User Picture]From: darthparadox
2005-12-22 08:36 pm (UTC)
miang specifically asked for the collective information argument to be defended. And she was right - it had been for the most part just kind of assumed, and with her background in psychology I don't doubt her assertion that this sort of ideal information-sharing is far from the truth of the matter.

And this is LJ, not academia. Our discussions shouldn't require sources and citations, here - though ideally, they'd be provided or found when requested, as she did in that case. I doubt most of us have the time to really research the claims we're making to the satisfaction of others...
(Reply) (Parent) (Thread)
From: ex_miang438
2005-12-22 11:35 pm (UTC)
I appreciate the way you put this, and had I the time (or frankly, the continued interest), I might consider doing just that. For the few entries I consider myself expert in and have taken the time to look up thus far, I have been disappointed roughly equally between information that is missing (a proof of the Singular Value Decomposition would have been nice about four weeks ago) and information that is incorrect (see "competitiveness" under "the study of competition" on the competition page). I have not gone so far to check the revision history to see who's responsible for each segment, but that would certainly be one way of getting at the question.

My comment really was just curiosity if anyone knew, as opposed to assumed, whether the collective knowledge principle actually worked. I agree that it shouldn't be axiomatic, though, even if I'm not wedded to the criticism enough to spend time trying to disprove the idea.

The only statement with which I might take issue is this one: "Shared information bias will afflict any encyclopedic effort, so bringing it up as a criticism is unfair." I disagree mainly *because* the shared information aspect is brought out as one of the chief advantages of a collective information source like Wikipedia over traditional expert-driven references. It would be incorrect to say Wikipedia is the only source that would suffer from biased information sampling, and I apologize if I ever implied that -- but by the same token, I believe it is a fair criticism when this very "shared" component is heralded as an asset that other encyclopedic references don't possess.
(Reply) (Parent) (Thread)
[User Picture]From: darthparadox
2005-12-22 08:29 pm (UTC)
My argument for collective knowledge as a strength of Wikipedia is pretty similar to McMartin's. Assume for the moment that we've got topic T, and a set of T-experts E1, E2, and so forth. Assume further that (for the sake of the argument, obviously these numbers have no real-world meaning or source) that, of the information that "ought to" appear in an article on topic T, your average expert knows about 80% of that information correctly, doesn't know about 15% of it at all, and knows 5% of it erroneously - that is, the substance of an article on T by a single expert is likely to differ from the substance of the "ideal article" by 20%, and of that 20%, three quarters of it is omission and the last quarter is error.

So E1 writes the first draft of an article on topic T, and it contains or lacks or errs information as indicated above. Now E2 edits the article. Assume (yes, a lot of assumptions, I know - this is why it's an intuitive argument, not an actual proof) that the distribution of E2's knowledge about T is independent from E1's - i.e. for each of the portions of E1's knowledge about T - accurate, omitted, and erroneous - the distribution of E2's actual knowledge is also 80/15/5. A large portion of the missing information is filled in, in that same proportion - of that 15%, we now have 80%*15% = 12% filled in correctly, and 5%*15% = .75% filled in erroneously. Leaving 2.25% missing. Of the previously erroneous information, comprising 5% of the total information, 80% of it is corrected by E2, since he after all knows better. The remainder is left erroneous - either because E2 doesn't know it, or because he also knows it erroneously. So, of the 5% erroneous information before, that leaves us with 4% correct and 1% wrong.

Then, of the 80% information that used to be correct... it's naive to assume that it would all remain correct (I know a lot of this is probably naive, but still). So suppose that of the 5% of that 80% that E1 has right and E2 has wrong, E2 is convinced enough of his own correctness to edit the article in half the cases, and recognizes his error in the other half. That leaves us with 2% wrong and 78% right in that segment.

Add up the percentages, and after two expert editors, topic T is now represented with 78+4+12 = 94% correct information, 2.25% missing information, and 3.75% erroneous information - a substantial increase over the previous article. Now perhaps E1 comes back and reverts some of the changes. I highly doubt E1 would remove any of the information that was formerly incorrect, but E1 might fix some of the errors introduced by E2 into the previously correct portion written by E1, and might revert some of the corrected portions by E2 back to the erroneous version written by E1. In the most extreme reversion case, reverting all changes to his previously existing (correct or incorrect) version, we end up with 92% correct, 5.75% incorrect, and 2.25% missing. In the most erroneous case (E1 reverts back to all of his errors but none of his correct data), we end up with 90% correct, 7.75% incorrect, and 2.25% missing - still, in my opinion, better off than E1's article alone, and the errors are likely to be reverted back by E2 or another expert at some point.

Now. I have nigh-completely ignored psychology and group dynamics. Your turn - where does this theory fall down in light of the fact that people don't always act rationally?
(Reply) (Parent) (Thread)
From: ex_miang438
2005-12-22 11:20 pm (UTC)
*grins* That was a neat thought experiment, and I thank you for it. My only real counter, I suppose, is that it only works when you've got a number of true experts in T, as opposed to self-described experts in T, editing the entry. Given the above-average effect, the number of people who are going to consider themselves expert enough to edit the entry is almost certainly greater than the number of people who are qualified (by your standard - 80/15/5 which I think is fine) to be editing it. This is complicated further by the fact that the true experts won't necessarily know about Wikipedia, and if they do, they probably won't have the time to edit everything (or most things, or possibly anything?) in which they are expert -- I myself am guilty of that one, and I refer to Tycho's point about not having the time to babysit the Internet. (I get few enough publications out in a timely manner as it is, and those are for people who really *need* to be reading my work!)

Oh, what I wouldn't give to have my nice summary prelims cards back. Per your request, though: Tversky and Kahneman each built fantastic careers on the ways in which rational choice is the exception rather than the rule in humans, so I won't go over all their stuff. Larson et al (1994) try to impose a probabilistic sampling model on the likelihood of any given piece of information coming to light based on how many people are involved (in this case, it would be potential editors of the entry) and how many of them know the information; the short story is that probability is very low even as the number of people increases. Postmes et al (2001) advanced on that idea with a biased sampling model, giving two reasons for privately held information being unlikely to come to light -- one is probabilistic (low probability if only a few people know the info) and the other is motivational (how hard do you want to work to change an entire body of contextualized information to include your one new little nugget and get people to accept it?) In both cases, experimentation suggested that the probability of good information coming to light was not high if the majority did not enter the discussion already knowing the information.

...not that I'd even consider *myself* expert in group dynamics, mind you. Just...clearly more expert than anyone who's bothered to create or edit the relevant Wikipedia pages. ^_-
(Reply) (Parent) (Thread)
[User Picture]From: darthparadox
2005-12-23 12:36 am (UTC)
Maybe you should go edit the relevant Wikipedia pages, then! :D

Yeah, yeah, I know. No time.

Anyway, I was explicitly considering the set of experts who were aware of, and willing to edit, Wikipedia. A smaller set to be sure, but the basic idea still applies.

I understand your point about non-experts who believe they are experts. My thought experiment was mainly intended to prove the basic idea of "two (expert) heads are better than one", as a description of Wikipedia's strength of collective knowledge. The flip side, of course, the injection of non-experts into such an endeavour, will in the best case have a negligible effect on the article, because the difference between actual knowledge and the non-expert's knowledge is likely to be comprised primarily of omissions, as well as superficial and obvious errors. In the worst case, of course, you get someone who's very wrong, but convinced beyond all argument against that he's right about everything, continually editing the pages. That's when you start really needing to babysit the page, instead of just making the relevant edits, if you care about it being correct. This falls into the "lack of expertise" weakness of Wikipedia, and is the inverse of the "collaborative knowledge" strength.
(Reply) (Parent) (Thread)