Project Gutenberg: The What, When and Why

I always liked reading Shakespeare, and I always wanted to have a copy of every one of his plays, tragedies and sonnets on my bookshelf ready for consultation, but such things always seemed unrealistic because I had neither the space for them nor the time to find them all nor the money to spend on them when I did find them.
Now I can store the complete works of William Shakespeare directly on my mobile phone, and they take up as little as 1.4 MB compressed...

Origins

Even if you never heard the word ?e-book?[1] before, you can probably guess its meaning: electronic book, or a book in digital format. What you probably don't knoe is that people started copying books into digital format nearly as soon as computers were available to the public, and maybe even before: the first e-book was created in 1971.

That year, a student at the University of Illinois named Michael Hart was given the equivalent of $100,000,000 (or $100,000, or $1,000,000 - there is no official estimation) in computer time. Basically, since he was friends with some of the operators at the Materials Research Lab, he was given an operator account on the Xerox Sigma V mainframe, which later became one of the 15 nodes that developed into the global network that eventually became the Internet. At that time, having that much computer time at your disposal was indeed a great privilege, and Hart felt that he had to use that time for something useful that could in theory generate a profit - not an easy task when you consider that only a limited amount of people in the world had access to a computer, and that those computers weren?t even connected together.

Foreseeing an era where computers where interconnected and regular people had access to them, Michael Hart thought that virtually all texts and books could be made available in digital format, for free, to anyone who wanted to read them. Certainly, such a project seemed quite unrealistic and excessively time consuming at the time; nevertheless, he decided to start copying the first book himself, the Declaration of Independence of the United States, which he was carrying in his backpack.

Project Gutenberg[2] was born with that one single text, and it has grown through the years. Today, there are more than 16,000 e-books available to download and read.

What is Project Gutenberg?

By that name, Michael Hart probably wanted to define the project?s scope and vision: an idea as revolutionary for the diffusion of literature as the invention of moveable type printing[3] in the 1450s.

The mission of the project can be summarized as follows[4]:

"To encourage the creation and distribution of eBooks."

In order to achieve this, Project Gutenberg is set up such that anyone can contribute to it, in many different ways. It is run completely by volunteers, hundreds of people around the world who share the same ideals and believe that literature should be freely available to everyone at virtually no cost.

The Internet serves this purpose magnificently: it is possible to download all of the over 16,000 free e-books from the Project Gutenberg website[5] in different formats and many different languages[6]!

However, having such a large amount of books available within a few clicks can make people forget about how time consuming the process of making one single e-book is: originally, after acquiring a paper copy of the book, Gutenberg?s volunteers had to transcribe it themselves, typing every word from the beginning to the end. Then the book had to be checked for mistakes before it was accepted into the Project.

Producing a single e-book can therefore take many people and many hours from beginning to end, and presumably this was one of the reasons why Project Gutenberg was criticized for being more of an utopian ideal than a tangible reality: every year since its creation people have doubted the project, accusing Hart of pursuing an impossible dream, and prophesying that fewer and fewer people would join the team and that there was no future for Project Gutenberg.

Oddly enough, they were all wrong: not only is the Project still active today, but the number of books released every year has grown consistently over time, from a few dozen in the early days to thousands per year now.

More and more people became involved, partially because they share the same ideals and partially because it has always been easy to get involved[7]: Project Gutenberg strives to remove all the institutional barriers which could potentially interfere with members? motivation; they try not to impose any restrictions, and they don't support perfectionism. It is believed[8] that there shouldn?t be any proper or standard way to release e-books, but instead many different ways, to appeal to many tastes: the Project doesn?t support any particular standard for releasing ebooks, although it normally takes the simplest path. Therefore, the majority of the books are available in Plain Vanilla ASCII, i.e., texts are written using only ASCII characters, and bold, italicized or underlined words are capitalized instead. While this format has the most limitations, it is also the most portable.

At this point, you might wonder why they don't just scan the original books, and make them available as image files or PDF files. While it would be much faster, it also has disadvantages, such as large file size and an inability to be displayed at particular resolutions; a scanned book probably wouldn't be readable on a PDA, mobile phone, or other equally small device.

Nonetheless, scanners do play an important part nowadays in the process of making an e-book: texts are no longer copied manually if a printed edition already exists. Instead, they are scanned with OCR[9] and then proofread twice before being accepted. The (un)official procedure recommends scanning at least one page a day, having it proofread once by someone in charge of doing so (a ?junior? proofread), and then again by a more experienced member. This has undoubtedly sped up the process.

Not All Books Are Equal (for now)

By looking at some of the titles available on Project Gutenberg, you?ll notice that most of them are classics or relatively old works: for example, you won?t find the latest Harry Potter[10] available for download.

Since all of the books at Project Gutenberg are free to download (more details of the license will be given later on), and therefore not subject to fees or copyrights, only books in the public domain[11] can generally be included in the Project.

Public domain includes all those works of art whose intellectual property cannot be legally claimed or exploited by any person, institution or legal entity, and therefore belong to all mankind. In the case of books, copyright can expire only if some particular conditions subsist:

  • The work was created and first published before January 1, 1923, or at least 95 years before January 1 of the current year, whichever is later.
  • The last surviving author died at least 70 years before January 1 of the current year.
  • Neither a perpetual copyright is granted by the Berne Convention nor has a particular government (US or EU) passed a copyright term extension.

Now we can see why there are not very many new publications available in the project, and that?s really frustrating for Michael Hart and other volunteers:

"In the USA, no copyrights will expire from now to 2019!!! It is even much worse in many other countries, where they actually removed 20 years from the public domain. Books that had been legal to publish all of a sudden were not. Friends told me that in Italy, for example, all the great Italian operas that had entered the public domain are no longer there... Same goes for the United Kingdom. Germany increased their copyright term to more than 70 years back in the 1960's. It is a domino effect. Australia is the only country I know of that has officially stated they will not extend the copyright term by 20 years to more than 70."[12]

After all these considerations, we can take a closer look at Gutenberg?s license[13] which comes in two different versions: informative and normative (?legalese?, as they call it), the latter of which is the real document. Luckily, the non-legalese version is simple and complete enough: basically PG releases books which are either in the public domain or ? if copyrighted ? the author gave express permission to re-distribute them. The difference lies in the fact that if you remove PG?s trademark and license from a book which is in the public domain, you can re-distribute it freely on your own, but if the book is copyrighted and permission to distribute was given only to PG, you?ll have to contact the author to obtain permission.

Furthermore, anybody can use the PG trademark when distributing verbatim copies of a book, with no changes (re-formatting is allowed); if you want to charge money for the copies you distribute, you have to pay royalties to PG.

Satellite Sites and Similar Projects

Michael Hart was ? and still is ? an authentic pioneer in his field: he had the idea to create the largest free library on the Internet to ?Break Down the Bars of Ignorance and Illiteracy?. A lot of people thought he wouldn?t achieve anything, but his dedication and perseverance were simply so exemplary that more and more people got involved, a few satellite sites were created and similar projects were started in all over the world sharing the same goals.

Hart is obviously aware of the fact that there are also some sites selling e-books, but he explains that neither those sites nor any other free online library should be considered a competitor to Project Gutenberg: they all contribute to the diffusion of e-books.

One of the most important satellite site of PG is ?Distributed Proofreading?[14] which is now considered the main source of PG books: every month more than 100 books are proofread by hundreds of volunteers who can register on the site for free and then get added to the project. The key concept of this parallel organization is that a single book can be proofread by more than one person at the same time, and thereby speeding up a project which would be otherwise very difficult to coordinate.

Another site which helps the main project is HWG, the HTML Writers Guild[15]. It aims to convert PG?s plain text ebooks into more feature-rich HTML documents: by using a mark-up language it is possible to add footnotes and it can be analyzed easily by automatic tools.

Although Project Gutenberg releases well-known books in many languages, a few sites officially affiliated with the project were created to focus particularly on their regional literature and works. That?s the case for both Australia[16] and Germany[17], for example; they both focus on their own national heritage. Regarding the latter, they recently claimed their own copyright for their e-books, and thus a new foundation is in the process of being created: Project Gutenberg Europe[18] which aims, among other things, to address the myriad copyright issues and laws of the EU.

Last but not least, there?s an interesting discussion[19] about similarities and differences between Project Gutenberg and Wikisource[20] a Wikipedia[21]?s sister project aiming to create a free repository of texts which are either in the public domain or licensed under the GFDL[22].

Wikisource people obviously noticed that their project was quite similar to PG, but with an important difference: their texts were formatted and freely editable by any user who was able to spot a mistake or inaccuracy; PG doesn?t offer this. In this context, Project Gutenberg was sometimes blamed for allowing inaccurate material to be included in the project: this was due to the fact that even if PG uses Distributed Proofreading website to proofread e-books, this is often not comparable to a wiki system. However, in PG's defense, wiki articles, being much more open, are subject to much more vandalism, and therefore must be more closely watched. One can imagine a high school student changing Hamlet to read "To be or not to be, who gives a crap."

However, the members of Project Gutenberg have proposed a sort of mutual cooperation between PG and wikisource: wikisource should maintain a broader scope, focusing not only on literary works but also on quotations and other kind of texts, and at the same time provide some revised edition of some book to Project Gutenberg.

The Future of Project Gutenberg

Project Gutenberg demonstrated the ability to grow considerably during its over 30-year existence. During that same time, copyright laws were extended, and some new technologies tried to intimidate the Project, which seems to remain relatively unchanged. However, last year a long-awaited DVD containing all the Project's e-books was released, showing the world that PG can keep up with the progress of technology to a certain extent.

One aspect that makes PG a successful project even today is its ability to adapt: CD-ROMs and a DVD were released, OCR was almost immediately taken into consideration, and since last year, all e-books have been released in both plain text and HTML format: there are still no fixed standards or rigid guidelines, but common sense seems to prevail over chaos, and for now, the system works.

So far, Michael Hart showed the entire world that a single person can do a lot when pursuing a noble goal. Call him an idealist, call him a dreamer, but he surely created something able to gratify and motivate him and his fellow volunteers forever:

?I can't think of anything more rewarding to do as a career than Project Gutenberg. It is something that will reach more people than any other project in all of history. It is as powerful as The Bomb, but everyone can benefit from it.?[12]

Notes & Further Readings