Ben Foster Brings Important News Author: Jim Tinsley Edition: 10 Language: English by Jim Tinsley http://ibiblio.org/gutenberg/faq/gutfaq.txt Acknowledgements Writing a FAQ for an organization of fanatical proofreaders has its ups and downs! I'd like to thank all those who corrected my facts and my typos, and especially the people who pointed out the lack of clarity in certain answers. The remaining errors and opacity are all mine. Preface to the archive edition However, while PG's production expanded geometrically, at Moore's Law rates, there were barriers to participation. Most volunteers had to find an eligible book, scan or type it, and proof the resulting text all by themselves. This was and is a fairly significant amount of work: 40 painstaking hours would be a typical commitment for one book. Beyond that, simply learning the mechanics of producing e-texts could be a serious challenge for newcomers. Nearly all internal PG communication, except for the Newsletter, was by private e-mail, and instructions had to be repeated many times to individual new volunteers, all of whom showed up with great good will, but most of whom vanished after a week or two. Michael Hart was unstinting in his editing of incoming texts and handling questions by e-mail, but any one person has only so many hours. The Directors of Production at the time — Sue Asscher, Dianne Bean, John Bickers and David Price — served as contact points for advice and help, made enormous efforts of production themselves, and tried to share the scanned texts among new volunteers for proofing. They made a huge contribution to building community in PG. Pietro Di Miceli set up a web site for the project in 1996, and with the popularization of the Web (as opposed to the Internet), this became a beacon for readers and new volunteers. In 1999, I wrote, in response to an offer to volunteer: I think I can best answer your offer, and many others like it, by giving an extended description of what actually happens in the making of PG texts, and why it's often not easy to get started. There is no agenda, no master list of tasks ready to be given to volunteers. This is often the hardest thing to get across to new volunteers. I know I waited quite a while after volunteering for someone to give me a job to do before I realized it. Exactly five steps are normally performed in the publishing of 1. Someone, somewhere gets a public-domain copy of a text they 2. That volunteer confirms its PD status by sending TP&V to 3. Someone, usually the same volunteer, scans and corrects the 4. Someone, often a different volunteer, second-proofs the 5. The e-text is sent to Michael for posting. There are three barriers which make it difficult for most people to contribute: 1. Getting a PD book. 2. People without scanners and typing skills have no way of 3. Even with a scanner, turning a book into an e-text is not Since, generally, people who have a PD book don't just want to send it off to a stranger for scanning, the people who produce e-texts have to get over all three of these barriers. This is the bottleneck in production. It's relatively easy to get an e-text second-proofed; making it in the first place is the hardest part. You need to have a book, the means to turn it into an e-text and the time and will to do it. After that comes second proofing. There are two problems here. One is that there may not be enough texts for all the people who want to second-proof; the other is that a lot of beginners just abandon texts given to them for second-proofing, which holds up the process and is discouraging for others. So a lot of volunteers do their own second-proofing or send their texts to established contacts with a track record of finishing the job, rather than making them available to newbies. The Directors of Production do serve as contact points, and at any given moment may have some texts for proofing, but they can only distribute the texts that have already been made. With that explanation out of the way, I can better address your question of what you can do. Second-proofing is an easy way to start, but material isn't just waiting for you. If you want to look for some, post your offer here and wait a week or so. If no takers by then, e-mail Michael and ask if there are any texts available; he may be able to refer you to a Director of Production who has something current. You may not get an e-text immediately, but you will get one. Of course, you can also look here for offers of e-texts ready to proof. Your other option is to take on a book yourself. In your case, you already have a scanner, so you are equipped to become a producer. You need to find a PD book. Getting PD books means finding and borrowing or buying them. You can do this through used bookshops, libraries or book sites on the Internet. I mention a few net sites in the FAQ in the link below. I get all my books through them, since they make it easy for me to find the books I want. Prices range from $5 up to (in my case) about $30. The best advice I can offer here is: pick a book that you want to contribute, and a book you'll enjoy working with—you'll be living with it up close and personal for quite a while. In March and April of 1999, Pietro created the PG Volunteers' WWWBoard and Greg Newby set up the mailing list gutvol-d, and, for the first time, volunteers who hadn't been introduced to each other by Michael or the Directors could meet online and communicate directly. A few FAQs and HOWTOs were written, covering the basics, the nitty-gritty of producing books. All of this activity made it much easier for people to get involved, and the Project experienced a new influx of interested volunteers. Improved OCR software was also a factor at this time: in response to the commoditization of scanners, there was rapid improvement in the quality of OCR, and better OCR made for easier production of e-texts. More work was shared out in co-operative proofing experiments. It was in this new, expansive atmosphere, with ideas flooding in from enthusiasts newly energized by the project, that Charles Franks (Charlz) came up with the idea of a web site that would serve to distribute the work of proofing a book among many volunteers. But not only did he think of the concept; he went ahead and did it! In April 2000, Charlz first requested comments on his idea in a post on the Volunteers' WWWBoard, and by the end of September, the first e-texts were queueing up on the production line. On October 9th, Charlz wrote: Number of pages proofed by date: 2nd 6 3rd 6 4th 20 <— Newsletter 5th 27 6th 25 7th 29 8th 30 9th 45!! (and the day ain't over yet) (The "Newsletter" is a reference to the site being mentioned in the PG Newsletter on October 4th, 2000). I began writing this FAQ in March 2002, and was essentially finished around December 2002. It sat around, with a few tweaks here and there in response to comments, until the start of September 2003. jim September 7th, 2003. I have a question not answered in this FAQ. How do I ask it? If it's about how to produce a text, the Volunteers' Board at </vol/wwwboard/> is generally the best place to ask. If it's a question of active interest to the general body of volunteers, you can ask it on the gutvol-d mailing list. See </subs.html> for joining it. For other questions, you should check our Contact Information page at </contactinfo.html> and e-mail the appropriate person. Readers' FAQ About Finding eBooks: About Using the Web Site: About the Files: R.34. What types of files are there, and how do I read them? R.35. What do the filenames of the texts mean? R.36. What is the difference within PG between an "edition" and a "version"? R.37. What is the difference between an "etext" and an "eBook"? R.38. What are the "Etext/Ebook numbers" on the texts? R.39. What do the month and year on the text mean? Copyright FAQ Volunteers' FAQ About the Basics: About production: About Proofing: About Net searching: V.62. I've found an eligible text elsewhere on the Net, but it's not About author-submitted eBooks: About what goes into the texts: V.73. Why does PG format texts the way it does? About the characters you use: V.74. What characters can I use? About the formatting of a text file: V.83. How long should I make my lines of text? About formatting poetry: V.116. I'm producing a book of poetry. How should I format it? About formatting plays: V.118. How should I format Act and Scene headings? About some typical formatting issues: V.121. Sample 1: Typical formatting issues of a novel. About problems with the printed books: V.125. I found some distasteful or offensive passages in a book I'm Word Processing FAQ W.1. What's the difference between an editor and a word processor? About using MS-Word: W.7. I've edited my book in Word - how do I save it as plain text? Scanning FAQ S.1. What is a scanner? HTML FAQH.1. Can I submit a HTML version of my text? Programs and Programming FAQ Formats FAQ Volunteers' Voices - Volunteers talk about PG Amy Zelmer Bookmarks - web pages commonly referred to in the FAQ In 1971, Michael Hart was given $100,000,000 worth of computer time on a mainframe of the era. Trying to figure out how to put these very expensive hours to good use, he envisaged a time when there would be millions of connected computers, and typed in the Declaration of Independence (all in upper case—there was no lower case available!). His idea was that everybody who had access to a computer could have a copy of the text. Now, 31 years later, his copy of the Declaration of Independence (with lower-case added!) is still available to everyone on the Internet. During the 70s, he added some more classic American texts, and through the 80s worked on the Bible and the collected works of Shakespeare. That edition of Shakespeare was never released, due to copyright law changes, but others followed. Today, we have a target of 200 books a month. In mid-2002, we are not only still going, we have made over 5,000 eBooks available, with a current production target of 200 more each month. We have many mirrors (copies) of our archives on all five continents. In terms of the day-to-day production of eBooks, our volunteers run themselves. :-) They produce books, and submit them when completed. Our Production Directors help with general volunteer issues. The Posting Team check submitted texts and shepherd them onto our servers. You can find current contact information for these people on the Contact Information page at </contactinfo.html>. As of mid-2002, there are about 100 active producers, and 200 regular, active helpers doing tasks like proofing. Something like 1500 people receive our Newsletter. There are lots of ways to contact us, depending on what you want to talk about. The Contact Info page </contactinfo.html> on the main web site lists them. Donate money! We're an all-volunteer project, and we don't have much to spend, so even a little goes a long way. Our Donation page </donation.html> tells you how. Produce a text! Turn an old book into an immortal etext. Subscribe to one of the Newsletters—weekly or monthly! The page </subs.html> gives details of how to subscribe, unsubscribe and access the archives. No. Any books that we legally can, and that our volunteers want to work on. We cannot publish any texts still in copyright without permission. This generally means that our texts are taken from books published pre-1923. (It's more complicated than that, as our Copyright FAQ explains, but 1923 is a good first rule-of-thumb for the U.S.A.) So you won't find the latest bestsellers or modern computer books here. You will find the classic books from the start of this century and previous centuries, from authors like Shakespeare, Poe, Dante, as well as well-loved favorites like the Sherlock Holmes stories by Sir Arthur Conan Doyle, the Tarzan and Mars books of Edgar Rice Burroughs, Alice's adventures in Wonderland as told by Lewis Carroll, and thousands of others. These books are chosen by our volunteers. Simply, a volunteer decides that a certain book should be in the archives, obtains the book and does the work necessary to turn it into an e-text. If you're interested in volunteering, see the Volunteers' FAQ at [V.1] below. We have published some music files, in MIDI and MUS formats. We have published the Human Genome. We have published pictures of the prehistoric cave paintings from the south of France. We have published some video files and some audio files, including a Janis Ian track and readings from public domain books. Whatever languages we can! As above, this is decided by what languages our volunteers choose to work with. G.15. Why don't you have any / many books about history, geography, If we can legally publish a book, and it isn't in the archives, it's because no volunteer has produced it yet. At the moment, we have a predominance of English language novels because that is what most people have chosen to work on. We're always looking for new languages and topics, and always delighted to see people producing them. If we don't have enough of the types of books you would like to see, why don't you help us out by contributing one? If the people interested in a particular area don't contribute, we'll always be short in that area. G.16. Why don't you have any books by Steven King, Tom Clancy, Don't misrepresent us—we support and publish many open formats, but, yes, we do want to have a plain text version of everything possible. We're looking at our history, and we're planning for the long term—the very long term. Today, Plain Vanilla ASCII can be read, written, copied and printed by just about every simple text editor on every computer in the world. This has been so for over thirty years, and is likely to be so for the foreseeable future. We've seen formats and extended character sets come and go; plain text stays with us. We can still read Shakespeare's First Folios, the original Gutenberg Bible, the Domesday Book, and even the Dead Sea Scrolls and the Rosetta Stone (though we may have trouble with the language!), but we can't read many files made in various formats on computer media just 20 years ago. We're trying to build an archive that will last not only decades, but centuries. The point of putting works in the PG archive is that they are copied to many, many public sites and individual computers all over the world. No single disaster can destroy them; no single government can suppress them. Long after we're all dead and gone, when the very concept of an ISP is as quaint as gas streetlamps, when HTML reads like Middle English, those texts will still be safe, copied, and available to our descendants. The PG archive is so valuable, yet free and easily portable, that even if every current PG volunteer vanished overnight, people around the world would copy and preserve it. If the ZIP format loses popularity, and is replaced by better compression, it will be easy to convert the zip formats automatically (and we post all plain-text files in unzipped format as well). If hard drives are replaced by optical memory, it will be easy to copy the files onto that. If even ASCII is superseded by Unicode or one of its descendants, it will be possible for our grandchildren to convert it automatically (and ASCII is included in Unicode anyway). By contrast, many of us have files saved in proprietary formats from word-processors only 5 or 10 years old that are already impractical for us to read. Some of our files produced just a few years ago using non-ASCII character sets like Codepage 850 are already giving problems for some readers. Some eBook reader formats launched within the last few years are already obsolete. We have learned from that experience. We also encourage other open formats based on plain text, like HTML and Readers' FAQ About Finding eBooks: R.1. How can I find an eBook I'm looking for? For PG books, the simplest way is to go to the home page at <>, type the Author or Title into the search form, press the "Search" button, and follow the choices. As of late 2002, there is a full-text search available at <http://public.ibiblio.org/gsdl/cgi-bin/library.cgi> where you can search not only for titles and authors, but any words or phrases you want to look up. For example, entering "Ample make this bed" and running an "entire books" search for all words leads you to Poems Of Emily Dickinson, Series Two. Yes. There are two main options: GUTINDEX.ALL is the raw list of files posted. You will find it at: <ftp://ibiblio.org/pub/docs/books/gutenberg/GUTINDEX.ALL> PGWHOLE.TXT is the list of files cataloged. A Zipped version is: <http://promo.net/gg/pgwhole.zip> When we post a book, the posting information contains title and author, eBook number, base filename and schedule year and month. This raw information goes into GUTINDEX.ALL. After posting, our catalogers get to work and add more information —things like full title, subtitle, author birth and death dates, Library of Congress Classification, full filenames and sizes. When a book has been cataloged, it is entered onto the website database so that you can search for it. PGWHOLE.TXT is a summary of the books in the website database. People who want to bypass the search on the website and find books themselves will probably want to use GUTINDEX.ALL, since it doesn't wait for the cataloging. R.3. How can I download a PG text that hasn't been cataloged yet? In short, just browse to: <http://www.ibiblio.org/pub/docs/books/gutenberg/> choose the schedule year of the text (newly-posted texts will usually be in the latest year) and look down the list to find the filename you're looking for. In general, you need to know: a) the address of an FTP site b) the schedule year of the text you want c) the basename of the text you want. The fastest and safest FTP site to use for this is ftp.ibiblio.org, which is the first of our two primary posting sites (the other being ftp.archive.org). We post to these two sites, and then other sites copy from them at intervals, so with any FTP sites other than these two, the file may not be available immediately. You can get the schedule year and basename of the text from its line in GUTINDEX.ALL. Let's take an example. The file Mar 2004 The Herd Boy and His Hermit, by C. M. Yonge [#32][hrdbhxxx.xxx]5313 has been posted just a few hours ago as I write this. From the GUTINDEX entry, the schedule year is 2004, and the basename of the text is hrdbh. We divide our texts into directories (folders) based on the schedule year, so this eBook will be in the directory for 2004, which will be named something ending in /etext04. All the directories are named etext plus the last two digits of the year. (Somebody's going to have to change that convention in about 87 years from now! :-) We currently have directories starting at 90, running through the 90s and then 00, 01, 02, 03, 04. All eBooks produced before 1991 are in the /etext90 directory, so if you're looking for Dec 1971 Declaration of Independence [whenxxxx.xxx] 1 you should look in /etext90. As it happens, ibiblio supports both HTTP (web) and FTP access to the text, so we can just browse to <http://www.ibiblio.org/pub/docs/books/gutenberg/> and choose the 2004 directory from there. If you want to automate this, you could also use the more direct address <ftp://www.ibiblio.org/pub/docs/books/gutenberg/etext04/> The equivalent address for ftp.archive.org is <ftp://ftp.archive.org/pub/etext/etext04/> Either way, we see a long page of files, in alphabetical order. Scroll down to the "H"s and look for hrdbh. We see four files with this basename: hrdbh10.txt hrdbh10.zip hrdbh10h.htm hrdbh10h.zip This means that both plain text and HTML formats are available, and you can choose to download them either zipped or uncompressed. For more detail about conventions for filenames, see the FAQ "What do the filenames of the texts mean?" [R.35]. The main thing you need to know is that any file beginning with hrdbh is some format or edition of this book. Finally, all you have to do is click on the format you want to download. R.4. You don't have the eBook I'm looking for. Can you help me find it? Sorry, no. We can suggest (see below) some other places to look for publicly accessible books on the Net, but we can't do the search for you. R.5. Where else can I go to get eBooks? The On-Line Books Page <http://onlinebooks.library.upenn.edu/> and the If you're looking for commercial books, like current textbooks or bestsellers, you're not likely to find them here, since recent books are not in the public domain. For these, you should look for commercial booksellers on the Net—any search engine will direct you to some if you enter search terms like "shop ebook". R.6. I see some eBooks in several places on the Net. Do different people really re-create the same eBooks? It does happen, but mostly by accident. Anyone experienced in eBook creation will first search the usual places to see whether anyone else has already transcribed the book they're interested in. If it has been transcribed, they will not duplicate the effort. Etexts that are in the public domain very often float around the Net for years—stored in a gopher server here, posted to Usenet there, held on someone's local computer for a year or two and then reformatted as HTML and uploaded to a web site somewhere else. And this is good, because we want texts to be copied as widely as possible. Public domain eBooks are fair game for anyone to copy, correct, mark up, package and post: that's what being in the public domain means. If you find an eBook in many different places, the odds are good that it came from one original source, and was copied around. It does sometimes happen that people duplicate the transcription of books already made into text. Sometimes it's because they didn't find the version already made. Sometimes they have a different edition, and want to transcribe that. Mostly, though, we all try not to do more work than we have to. About Using the Web Site: R.7. Why couldn't I reach your site? (or: Why is your site slow?) There may also be a bottleneck somewhere else between you and the site. If at first you don't succeed, don't tell us, just try, try again. The correct address is either: http://promo.net/pg/ R.8. I get an error when I try to download a book. Usually, the easiest solution is to choose another FTP site to download your text from. Go to the Search page, choose a different FTP site, and search again for your text. Tip: You should always try to choose the FTP site closest to you. Not only are you helping to minimize Net traffic by choosing a nearby site, but your file will download faster! If all else fails, note the year and the filename of the book you want, choose an FTP site from this list and click on one of them. Then browse your way through the listings to the file you want. For example, if you find "Lady Susan" by Jane Austen, you will see that it was published by Gutenberg in 1997, and its filename is lsusn10.txt, so browse to one of the FTP sites, choose the directory called etext97 and click (or right-click and Save, depending on your browser) on the file lsusn10.txt. First go to the Advanced Search page. Sometimes you may miss in searching because of alternative spellings, so try searching separately using just one word in Author or Title. Read the Search Tips. If that fails, you can Browse through the site catalog. Let's say you're looking for "The Wandering Jew" by Eugene Sue. Go to the PG Home page: </> Once on this page, click on: "Browse" in "Browse by Author or Title" You are then brought to a new page, asking you to select an "FTP site". Further details on how and why to choose an "FTP Site" are available on this page. Select an FTP Site from the Selection List available at the bottom of the page, then click on "Select". You get a new page, Click on "S", initial for "Sue, Eugene" You should now see a list of all of the Authors whose Last name starts with "S". Scroll down till you find the direct links to the Sue, Eugene works. Click on the work you are interested to, then click on the file link found on the page you were brought to, Etext Card ID -3987- when selecting the work, as immediately above. On this page, above the teaser, there are two working links: DOWNLOAD: Click on the link of your choice in order to get the book. If you can't find your text either way, the book has not been cataloged. The site catalog always lags behind the postings, since we need to collect extra information about the book and the author before it goes into the full catalog. If you know that the book has been posted recently, and maybe hasn't made it into the catalog yet, read the FAQ "How can I download a PG text that hasn't been cataloged yet?" If even this doesn't help, don't despair! We don't have it, but it may be elsewhere on the Web. Go to the major search engines and try there. You can also try looking in the Book Search section of The On-Line Books Page <http://onlinebooks.library.upenn.edu/> or the Internet Public Library <http://www.ipl.org>, and if you have no luck with that, you might be able to find it listed as being In Progress somewhere on their Books In Progress and Requested page at <http://onlinebooks.library.upenn.edu/in-progress.html>. R.10. Can I copy your website, or your website materials? No. We welcome mirrors and copies of our e-texts, in new FTP sites [R.14], but the main web site itself is copyrighted and may not be copied. R.11. Your site doesn't look right in my browser. We take a lot of trouble to ensure that our website uses only valid, standard HTML, and we're not even slightly tempted to use glitzy features that look good in one browser but don't work in another, so we can promise you that our site is not the problem. The site uses Cascading Style Sheets (CSS), a W3C standard since 1996. Some older browsers have a buggy implementation of CSS, and this can cause some things to appear off-kilter. If your browser is even older, or doesn't know about CSS at all (as in the case of Lynx, for example) it should have no problem. If you actually clicked on a button, like the Search button or the Post button on the Volunteers' Web Board page, and nothing happened, you might be behind a proxy or web filter that doesn't like you making POST requests. If you have a web filter switched on, turn it off, reload the page and try again. R.12. What does that thing about "Select FTP Site" mean? Our texts are not actually held on the website. The website just holds an index; the files themselves are held on many sites throughout the world, called FTP sites. When you have found the book you're looking for, and you make that final click to get it, you're not actually talking to our website any more—you are transferred to the FTP site you selected. Some FTP sites are near you; some are far away. Some may be faster than others, even if they are about the same distance; some may have temporary technical problems. You should usually select the FTP site nearest you. If you find you're having problems with that one, you can select another. R.13. What exactly is an FTP site anyway? FTP stands for File Transfer Protocol, one of the oldest and most reliable protocols of the internet. This is the method by which a file can be copied from one computer to another. An FTP site, or FTP server, is a computer that holds files that people can upload and download. In the case of PG, the Posting Team upload our texts when they're ready to two main FTP servers, <ftp://ftp.ibiblio.org> and <ftp://ftp.archive.org>, which serve as our master copies. Other FTP sites around the world automatically download the files from these master sites, so they have a full set of PG publications for you to download. Because they only check for updates and new files at intervals, some FTP sites may be a day or two behind. Some FTP sites don't have space available for everything, so they may hold only the zipped versions of the files. But most FTP sites will have the entire PG collection. These are called FTP "mirrors", since they are a copy of the original. Many FTP sites exist that offer a full PG mirror but are not on our FTP sites list. Commonly, these are in schools, where they serve the local students, but don't have enough bandwidth to offer downloads to worldwide users. R.14. Can I become an FTP mirror? Yes! We're always looking for more FTP mirrors. If you manage an FTP site with a few GB of space, please check our Contact Information page </contactinfo.html> and contact the appropriate person, who will make the arrangements for you. If space is a problem, you can consider holding only zipped copies of the texts. We can move you up or down the FTP site list as you want more or less traffic. R.15. Can I make a private FTP mirror for my school, library or organization? Yes. We like all FTP mirrors to be open to as many people as possible, but we know that not all schools have the resources to be a public mirror, so we welcome all mirrors. And anyway, you don't even have to ask, because we don't control what happens to our texts once we post them! R.16. When I clicked on the file I want, nothing happened. When you select a file for download, your request goes to the FTP site you selected, not to our website. If the FTP site you selected is having problems, or if there is the Net version of a traffic jam between you and it, you may have problems downloading. Select a different FTP site [R.12] and try again. R.17. How many texts are downloaded through the web site? We don't really do statistics, but in one particular month for which we did, we had a figure of about 800,000 searches completed. Since the final request for download goes to the FTP site selected and not to our website, we can't confirm that all of these were actually downloaded, but we expect that most people who have gone all the way through the search will finish the job. In another month, we had about 1,000,000 downloads of files from ftp.ibiblio.org, our main FTP site. This does not count downloads from other FTP sites, of course. Why are there more downloads than searches? Because people who are already familiar with getting PG texts can skip the website search and download straight from the FTP sites. R.18. What are the most popular books? We very rarely do statistics, but on one occasion in late 1999 when we did, we found the top author searches to be: 1 shakespeare and the top individual books searched for to the point of downloading were: 1. Lady Susan, by Jane Austen These numbers vary a lot. When a movie based on a classic is released, downloads of that eBook go through the roof! R.19. Should I download a ZIP or a TXT file? If you know how to unzip a file, then downloading the zip is faster. For some non-text eBooks that contain multiple files, like HTML with included images, only a zip file may be available. For some other formats, like MP3 or MPEG, there may not be a zipped version available because the native format of the file is already compressed enough that zipping it doesn't save much. |