Before beginning any serious efforts as to the questions of Who was Socrates?, and, further, What is Philosophy?, I think it may be necessary to share a few thoughts I’ve recently had about how we share information online. I think it prudent to consider the aims of writing for online consumption, and also the advantages and disadvantages involved in doing so. Hopefully through a careful analysis of these subjects we may be informed as to what is the best manner in which to share writing online.
history of the term ‘blog’
I must first make a short digression into my own investigations about the term ‘blog’. ‘Blog’ is a shortened form of the more descriptive term ‘Weblog’- first coined by Jorn Barger [1] (editor of the once famous weblog Robot Wisdom ) [2] in order to describe the process of “logging the web”, and in the fashion of Robot Wisdom it is a useful term. Robot Wisdom had as its mission to scour the internet for interesting things and provide links to them. This is a very valuable thing to us; I am currently a very enthusiastic member of one community blog Metafilter [3], which serves the purpose of logging the best of the internet very well.
Over time, however, it seems that we now regard many or most online collections of writing as “blogs”, or “logs of the web”, and I think that in many instances this word choice is not supported by the original definition of “weblog”. A log does not have as its goal to present coherent, trusted information; the goal of a log is simply to record without commentary. I would argue that the common usage of the word “blog” implies something different than its original usage. I could cite many blogs that actually log very little, or even nothing- and even more blogs have as their mission to simply spread commentary and opinion without regard to actually logging anything.
problems with the term blog
If you would tolerate my personal laundry in this argument, I must admit that I cannot stand the aural properties of the word ‘blog’. It sounds like something that happens in the private confines of ones own bathroom, and should probably be kept there– but that is only my aesthetic. I would, however, propose the idea that perhaps many self proclaimed ‘blogs’ that are online are exactly that: some personal stuff that may be kept best private- a blog where one discusses a recent break-up, blog about how someone is currently feeling, a blog about the parties one has recently gone to, etc. While these subjects I think are interesting in themselves, and perhaps deserve attention to more private communities, I think it would be safe to say that many people use blogs as sort of an online diary- a place to record one’s thoughts and opinions. It is rare that a blog purports as its mission to deliver factual information, and, if a blog does purport to doing so I would examine their aims and methods clearly before believing as fact anything written there.
And so we have this popular understanding of the word ‘blog’, and this understanding connotes a lot of things. It connotes a sense of being a diary, or perhaps a list. It connotes an emphasis on opinion rather than fact, an emphasis that tends toward lazy speculation rather than rigorous investigation.
What is unfortunate is the fact that almost all writing online, with a few exceptions,(Wikipedia among them) is regarded as a mere blog- a diary, a nice contribution to culture and etc but not very academically important. There are, however, ‘blogs’ and online collections of writing that do strive to adhere to the strict citation standards of traditional academia, and these places on the internet serve as valuable starting points of investigation. The unfortunate detail is that these sites are dumped into the category of ‘blogs’ along with joe internet’s blog about how great his night out at VIBE was or something, and we have no terminology to, in a sense, call a spade a spade, a diamond a diamond. We have no language to differentiate between what are actually viable works of thought online and what are just passing thoughts, tweets, and diaries.
solutions for the term ‘blog’
One possibility of action is to create new language to differentiate the two, a new term for a website or blog that purports that all of its contents are rather verifiably undeniable, and also interesting to our attention. But I would not want to get into such a task of creating new language, though, for the simple reason that if there were new language to differentiate between joe internet’s blog and joe interesting’s investigations, the new terminology would rapidly be misused, and end up confusing everyone even further. It seems like we’re kind of stuck with the word ‘blog’, and perhaps I just have to get over my dislike of the aural properties of the word.
It could be that the question of finding a way to differentiate between different kinds of writing online is moot, and that the only way we may verify that a piece of writing is of actual interest is that it is edited and displayed in a peer reviewed environment. But such an argument brings up questions about accessibility to information, when we consider the prohibitive costs involved in subscribing to and interacting with such journals. For anyone not connected to an institution of higher learning, access to this level of rigor is impossible online.
Though certain attempts are being made to offer a more open model [4], the problem still remains that there is a lot of really good writing online, and none of it ever went through a peer review process, nor did the author of said writing ever intend to validate his or her ideas in a professional peer community. Maybe the author believes the work can stand on its own. The question becomes: what is an author who chooses the web as his or her medium to do, in order to guarantee to his or her dear reader that they have made all diligent efforts to verify the information they intend to share?
It is my opinion that we cannot solve the problem of the connotations that come with the word ‘blog’ at this moment, and we cannot easily rely on peer review communities to verify the quality of a piece of writing. It seems that the only recourse one has is, simply, to write and think well, and to hope that the quality and depth of a piece of writing is enough to place it in a valuable realm.
One fact, however, is that sometimes a blog wants to offer non serious, perhaps more playful or reflective content. In these cases we see seriousness right alongside playfulness and irony, and it may be hard to distinguish between the two. For the purposes of this blog, I will file each post into one of two catergories: investigation or reflection. Any writing that falls within the category of investigation will be clear, cite its sources, and be generally much more serious. Posts that are in the reflection category will have a more playful tone, and may not offer the clarity that traditional academic writing requires of us.
Having considered the effects of the term ‘blog’ on the act of writing online, I would like to now consider the problem of citation of things on the web, in terms of its nature and severity, and consider existing and possible solutions to the problem.
the problem of citation
The most severe problem confronting a writer of online content is just how we cite sources when writing. Citation is an important method in presentation, both as a way of giving validity to a thought by referencing one that inspired it, but also, just as importantly, as a way to guide readers to where they can find more information about a subject. I would argue that citation is an indispensable part of writing, for the fact is that we are all standing on the shoulders of giants [5] whenever we have a novel idea worth sharing.
The MLA [6] has now for some time provided a standard for doing so, and that standard has worked rather well as it “has been widely adopted by schools, academic departments, and intructors for over half a century” [7]. I would argue, however, that current MLA standards fail when dealing with the web. The MLA standards do not recognize the dynamic nature of websites and their cycles of birth, change, and eventual death; and, in doing so, these standards present us with some troubling problems. [8]
Take as an example the latest (7th) edition of the MLA’s approach to the citation of websites. If one were to reference an article online, a good student of philosophy, or the humanities in general, would be correct in doing so like this:
The Socratic Problem. Thomas Saunders, Oct. 2009. Web. Nov. 2009. http://philosophy.modern-carpentry.com/2009/10/the-socratic-problem/.
A web citation requires (in order) a title, a publisher or sponsor, a date for the publish of the article, a date for an author’s access of the article, and, finally a link to the web resource. While this form of citation approaches the certainty we require, we are left wanting in an important way. There is no guarantee that the possession of the required datum will provide us with the same certainty that a normal MLA citation would.
the web changes in a way that old models of publishing didn’t
A reference to a book is a reference to a title, date, author, and more importantly, a publisher. We can take solace in the fact that by referencing both publishers and authors we can be sure that at least one of these to parties ( or a library somewhere ) probably kept some copies of a certain edition of a certain book around, and, most hopefully, we’ll be able to track one of those down. A reference to a book or written article, then, carries a kind of certainty to it, in that we know we can track a citation down and confirm it in its original form. But sometimes we cannot, and this problem becomes a challenge to bibliophiles and others worldwide. If we cannot get our hands on a book we need to read discovered by citation, we are probably trained in the humanities or are most probably beyond the fringes of research into any topic, and possibly verging towards a personal love or obsession with a topic. In these cases, though, we are dealing with the failures of those who have gone before us to preserve data, not the failure of the MLA style guidline.
On the other hand, a cited web link brings with it no such certainty. The austere fact is that websites and their authors, just like publishers and authors, go out of existence. The first problem is that where the great libraries of our time probably keep copies of expired publishers and authors’ work, there are no institutions capable of preserving the web in a meaningful way. While honorable institutions such as the Internet Archive [9] have as their purpose to do this task, I am pessimistic as to their ability to do so in a thorough way.
Websites, unlike books, can change every day, or, indeed, almost every moment. A simple act of saving something online changes forever how it will be viewed by those who encounter it, and so, in less than a moment, a web author can change what is recorded of them with the ease that would never have been available to article and book writers who relied on the publishing industry to share their ideas. In a sense, a website could say any of many, many, things at one moment of time and quite another thing at another time, and we have no way to predict how these changes may occur. In short, the MLA method of website citation offers us no guarantee that what is presented to us in a link is the same as what we are confronted with when we click it.
While we do have resources such as the Internet Archive, or Google (TM) cache services, it must be understood that these resources are inadequate representations of what the internet consists of at a certain point in time, and that a simple reference to URL and time of access does not guarantee a person that they are viewing what an original author was viewing. While many sites are monthly or daily archived, there are no sites which the Internet Archive or Google can properly record the minute by minute changing nature of websites. Such a task is near impossible- to accurately and centrally record what every certain website looked like at a certain moment require resources far beyond that of the Internet Archive and possibly beyond current human data storage efforts ( I hope to be proved wrong on this! ).
It may be said that perhaps Google will soon probably be able to actually cache and record the breadth of the ever changing web. If they were able to do this, I think people would have volumes to learn by studying how the internet changes, in minute ways, over time. In short- it would be incomprehensibly awesome. But I am still hesitant towards leaving the responsibility for such a task, that of being the 21st century library, up to a private corporation. I would hope that if something were ever done it would be a public institution, open to information seekers and free from speculation about monetary motivations in the preservation of the history of the web.
Looking at the Internet Archive a bit closer, I’ll cite a few examples of what they have recorded and how their version of a website at a point in time is not useful to us. The first case comes is that of an organization I’ve been involved with for a few years, Twin Cities Open Circuit [10] which began in 2007. Here are the results, which, actually, look to be a better representation than I thought the Internet Archive would have:
click images to view full size
The largest problem starts when one looks at one of the versions. You can see two of them here:
The first edition can hardly be said to represent the original version, as it existed in February of 2008, and the July version doesn’t do much better:
If you are thinking that Twin Cities Open Circuit don’t have very good web developers on staff, you are in the wrong; the websites linked above look, at that time, looked very much like the site does today. What explains the discrepancy here I am unsure of, but I would venture to guess that the folks over at the Internet Archive chose to save only semantic and textual data, in order to save space used for such a small and un-notable website such as ours. I would guess they chose to leave styling and other information out.
While this ends up being a pragmatic question- only so much space, so many websites, what is really necessary to archive? I would argue that it is necessary to save everything, and to present it in the manner that the normal user experienced the site, and otherwise not save the website at all. Layout and design of a website are often just as critical to clearly presenting information as actually having the actual information itself is. By only saving out the textual data we lose access to the textual data itself, in the fact that we can no longer access it and experience it in the way it was intended to be experienced.
Going further, we might examine the Internet Archive’s treatment of the much more popular website Metafilter, and how this may be lacking. We need only one image to do this in that the first reveals how incomplete such an archive is, as it does not provide even a consistent, much less a more momentary edition, of what Metafiter was at any point in time:
Though we can see that the archives for Metafilter are much more consistent, and almost regular, it would be hard to prove that the archives were complete. I can say with certainty on many days which fall between updates listed on the archive that changes occurred to Metafilter that are not reflected in this archive. In this case we are presented with gaps between updates of a site, that, while it seems like something we could pass off as unimportant, I think that it cannot be passed over. It seems as if the limit of the ability of the Internet Archive to archive the content of the internet here meets up with the exponential content of a fruitful website, and this presents another problem, which I will not elaborate on.
In addition to concerns of the inadequacies of the completeness of existing or possible web archives, we may be troubled further by the fact that malicious forces work often on the web to alter, mangle, or destroy various portions of it. I would assume that at this moment many pretty smart individuals are trying their best to earn the spammer’s dime and make some grub off of companies and individuals who are too lazy or ignorant to write secure applications or enforce standards. Because of this, there is a lot of odd and creative noise online- faked comments and odd emails add a very random depth to the internet that kind of amazes me. It adds a level of randomness to what is online that seems to infect the real things that happen online. Going further I would also assume that there are at least a few and probably many really smart individuals of a certain quality that are this moment focused on breaking things and messing up systems, and that they are probably pretty good at it. I would hope that most of these people are doing so in a quest to understand systems and their vulnerabilities, so as to improve those systems. I would, however, guess that there are some individuals who may take their power to add, edit, or delete any thing from the internet, their power to completely destroy an anonymous user’s computer, and their power to break systems as something to be exercised and improved. A power where, with a simple typing of a command and the satisfying press of the enter key, millions of computers, or, indeed, the course of history might be altered- this is not a good power to have.
In order to make penance for my previous paragraph (in which I overused some metaphors and tried to scare the ever-livin s*** out of anyone about the internet, and in general didn’t say anything I’ve carefully proved) I would like to offer to you what happens when a script kiddie [11] ups his skills and messes with Twitter and MSNBC:
Going further, however, we realize that what is most important to consider in these matters is that our technologies so quickly now become obsolete. The current state of data storage, retreival, and citation matter little when viewed through a longer lens. In 30 years, will we be using HTML? Will it be supported on browsers of the time, will there even be browsers? How do we know that what is currently considered a valid source ( as an MLA HTML weblink ) will, in the same way that a source to a book survives 30 years), be rendered to the end user in the same way that we are confronted with a 30 year old book?
Stepping back from the problem of time, and Google and the Internet Archive, I must finally say that our current methods of referencing websites in general is broken. An MLA version 7 styled link to a website is useless to both the person who wishes to cite the web, and also to someone who, perhaps 30 years later, wants to go back and investigate what was linked to. An MLA citation to the web offers no guarantee that a linked source will:
- still exist
- still exist as it was when the person who cited it saw it
- exist in a form that technology of the age can interpret
And because of these problems, we might begin to consider what we can do to provide more accurate citations of websites and content, and to, perhaps, consider this an important aim in our pursuit of clearness of source online.
Conclusions
At the bottom of the issue is that we cannot rely upon third parties such as web-hosts of original content, the Internet Archive, or Google Cache, to record and save the web for us to responsibly cite later. We have no reason to believe that, when provided with a link, an author, and a date that we will be able to track down the actual web experience that originally inspired the citation’s author to cite it. Because of this fact we have a breakdown, or perhaps a bug, in how MLA standards work for any writing that references an online source.
It could be said that web links as sources have never been taken seriously in academic writing, and I would have to agree with that. To those who may say web links never will be taken as serious sources, I would urge you to look around the internet for a bit and see what the children are doing these days, and how. Though I mourn to say this, there will be a day, not so far off, where our best students will never step in a library and never be confronted with the challenge of tracking down an old book with all the wit and wisdom of a good detective ( though the good ones will ). At a certain point, for most people, the libraries of our current age will become relics of another age, a tourist destination or interesting architecture; for others, libraries will remain as shrines to past human efforts at knowledge that they always were. At some point, however, everyone will find the information they need with some clicks on some device, and at no point will the hand-holding of an actual book be necessary.
In short- the progress of human thought is moving online. It has broken out of libraries and academic institutions, and it is freely available for anyone who dares to know. Whether you want to know what Plato said in the early 3rd century BC, or if you want to know what your best friend who moved to Portland wore to her friend’s halloween party, the information is either there, or it is sitting in a library somewhere and it will soon be digitalized for our uses.
Because of this, it is imperative to consider how we are to cite writing that we find online, in a clear way that stands the test of time.
Solutions to the problem of citation of online sources
This is a breakdown that must be fixed before we can trust content online. The tricky part involves how we can source the internet at a particular point in time. It gets trickier when we consider how to preserve a place on the internet at a particular point in time well into the future ( say, 50 years or more ). There are no guarantees that the technologies we currently use to display and view content on the web ( browsers, html, javascript, php, flash, unity plugin, etc ) will remain available for easy use.
What is needed is a system of reference and storage, where users can provide a link to a page they are want to cite, and be provided with a URL which will archive that page as it was originally seen by the user. I have some ideas [12] as to how a system such as this, which I will for the present call the Open Citation Directory, but there is much to be considered by such an endeavor.
For my current purposes, until such a system in operation I will begin to save web pages I cite myself. When I reference another website, I will actually link to my copy of that site, as it was when I experienced it. I understand that this is an insufficient solution to the problem, but it is the most I can do for the moment.
references
1: http://www.wired.com/entertainment/theweb/news/2007/12/blog_anniversary
3: 4: http://en.wikipedia.org/wiki/Open_peer_review
5: On The Shoulders of Giants: A Shandean Postscript, Free Press (1965)
6: Modern Language Associtation
7: http://en.wikipedia.org/wiki/The_MLA_Style_Manual
8: I would also like to note that the MLA does not publish its standards on its website- you’ll have to either buy their $32 book or actually go to a library. MLA style is, in a sense, a proprietary standard of citation in opposition to an open standard. [13]
12: My own thoughts about a possible Open Citation Directory











One Comment
On the issue of the web and citation, have a look at this discussion going on today with my friend David Eaves and the magazine, The Walrus:
http://eaves.ca/2009/12/14/the-walrus-fair-dealing-the-culture-of-journalism/
http://www.walrusmagazine.com/blogs/2009/12/14/on-fair-dealing-and-the-dark-country/
http://eaves.ca/2009/12/14/some-thoughts-on-the-walrus/
One Trackback
[...] few days ago, however, I read a post called “Notes on Methodology” on the Philosophy and Modern Carpentry blog that was working through the difficulties of [...]