The Project Gutenberg FAQ 2002

Previous

CHAPTER XV THE ELECTION

Chapter III

Ben Foster Brings Important News

Chapter IV

A Night Attack

Chapter V

The Dare Boys in New York

Chapter VI

Chosen for Dangerous Work

Chapter VII

Dick's First Adventure

Chapter VIII

Tom Dare Acts

Chapter IX

The Brothers Together

Chapter X

In the Enemy's Camp

Chapter XI

Tom in Trouble

Chapter XII

Dick Does Wonderful Work

Chapter XIII

General Washington is Pleased

Chapter XIV

The Haunted House

Chapter XV

Dick Again Does Spy-Work

Chapter XVI

The Battle of Long Island

Author: Jim Tinsley

Edition: 10

Language: English

by Jim Tinsley

    http://ibiblio.org/gutenberg/faq/gutfaq.txt
    or
    http://ibiblio.org/gutenberg/faq/gutfaq.htm

Acknowledgements

Writing a FAQ for an organization of fanatical proofreaders has its ups and downs! I'd like to thank all those who corrected my facts and my typos, and especially the people who pointed out the lack of clarity in certain answers. The remaining errors and opacity are all mine.

Preface to the archive edition

However, while PG's production expanded geometrically, at Moore's Law rates, there were barriers to participation. Most volunteers had to find an eligible book, scan or type it, and proof the resulting text all by themselves. This was and is a fairly significant amount of work: 40 painstaking hours would be a typical commitment for one book.

Beyond that, simply learning the mechanics of producing e-texts could be a serious challenge for newcomers. Nearly all internal PG communication, except for the Newsletter, was by private e-mail, and instructions had to be repeated many times to individual new volunteers, all of whom showed up with great good will, but most of whom vanished after a week or two.

Michael Hart was unstinting in his editing of incoming texts and handling questions by e-mail, but any one person has only so many hours.

The Directors of Production at the time — Sue Asscher, Dianne Bean, John Bickers and David Price — served as contact points for advice and help, made enormous efforts of production themselves, and tried to share the scanned texts among new volunteers for proofing. They made a huge contribution to building community in PG.

Pietro Di Miceli set up a web site for the project in 1996, and with the popularization of the Web (as opposed to the Internet), this became a beacon for readers and new volunteers.

In 1999, I wrote, in response to an offer to volunteer:

I think I can best answer your offer, and many others like it, by giving an extended description of what actually happens in the making of PG texts, and why it's often not easy to get started.

There is no agenda, no master list of tasks ready to be given to volunteers. This is often the hardest thing to get across to new volunteers. I know I waited quite a while after volunteering for someone to give me a job to do before I realized it.

    Exactly five steps are normally performed in the publishing of
    an e-text.

    1. Someone, somewhere gets a public-domain copy of a text they
       want to contribute.

    2. That volunteer confirms its PD status by sending TP&V to
       Michael, and getting copyright clearance.

    3. Someone, usually the same volunteer, scans and corrects the
       text, or, if skilled in typing, types the book into an e-text.

    4. Someone, often a different volunteer, second-proofs the
       e-text, removing the smaller errors.

5. The e-text is sent to Michael for posting.

There are three barriers which make it difficult for most people to contribute:

1. Getting a PD book.

    2. People without scanners and typing skills have no way of
       turning a book into an e-text.

    3. Even with a scanner, turning a book into an e-text is not
       easy or quick.

Since, generally, people who have a PD book don't just want to send it off to a stranger for scanning, the people who produce e-texts have to get over all three of these barriers. This is the bottleneck in production. It's relatively easy to get an e-text second-proofed; making it in the first place is the hardest part. You need to have a book, the means to turn it into an e-text and the time and will to do it.

After that comes second proofing. There are two problems here. One is that there may not be enough texts for all the people who want to second-proof; the other is that a lot of beginners just abandon texts given to them for second-proofing, which holds up the process and is discouraging for others. So a lot of volunteers do their own second-proofing or send their texts to established contacts with a track record of finishing the job, rather than making them available to newbies. The Directors of Production do serve as contact points, and at any given moment may have some texts for proofing, but they can only distribute the texts that have already been made.

With that explanation out of the way, I can better address your question of what you can do.

Second-proofing is an easy way to start, but material isn't just waiting for you. If you want to look for some, post your offer here and wait a week or so. If no takers by then, e-mail Michael and ask if there are any texts available; he may be able to refer you to a Director of Production who has something current. You may not get an e-text immediately, but you will get one. Of course, you can also look here for offers of e-texts ready to proof.

Your other option is to take on a book yourself. In your case, you already have a scanner, so you are equipped to become a producer. You need to find a PD book.

Getting PD books means finding and borrowing or buying them. You can do this through used bookshops, libraries or book sites on the Internet. I mention a few net sites in the FAQ in the link below. I get all my books through them, since they make it easy for me to find the books I want. Prices range from $5 up to (in my case) about $30.

The best advice I can offer here is: pick a book that you want to contribute, and a book you'll enjoy working with—you'll be living with it up close and personal for quite a while.

In March and April of 1999, Pietro created the PG Volunteers' WWWBoard and Greg Newby set up the mailing list gutvol-d, and, for the first time, volunteers who hadn't been introduced to each other by Michael or the Directors could meet online and communicate directly. A few FAQs and HOWTOs were written, covering the basics, the nitty-gritty of producing books. All of this activity made it much easier for people to get involved, and the Project experienced a new influx of interested volunteers. Improved OCR software was also a factor at this time: in response to the commoditization of scanners, there was rapid improvement in the quality of OCR, and better OCR made for easier production of e-texts. More work was shared out in co-operative proofing experiments.

It was in this new, expansive atmosphere, with ideas flooding in from enthusiasts newly energized by the project, that Charles Franks (Charlz) came up with the idea of a web site that would serve to distribute the work of proofing a book among many volunteers. But not only did he think of the concept; he went ahead and did it!

In April 2000, Charlz first requested comments on his idea in a post on the Volunteers' WWWBoard, and by the end of September, the first e-texts were queueing up on the production line.

On October 9th, Charlz wrote:

Number of pages proofed by date:

2nd 6 3rd 6 4th 20 <— Newsletter 5th 27 6th 25 7th 29 8th 30 9th 45!! (and the day ain't over yet)

(The "Newsletter" is a reference to the site being mentioned in the PG Newsletter on October 4th, 2000).

I began writing this FAQ in March 2002, and was essentially finished around December 2002. It sat around, with a few tweaks here and there in response to comments, until the start of September 2003.

jim September 7th, 2003.

I have a question not answered in this FAQ. How do I ask it?

If it's about how to produce a text, the Volunteers' Board at </vol/wwwboard/> is generally the best place to ask.

If it's a question of active interest to the general body of volunteers, you can ask it on the gutvol-d mailing list. See </subs.html> for joining it.

For other questions, you should check our Contact Information page at </contactinfo.html> and e-mail the appropriate person.

Readers' FAQ

About Finding eBooks:

About Using the Web Site:

About the Files:

R.34. What types of files are there, and how do I read them? R.35. What do the filenames of the texts mean? R.36. What is the difference within PG between an "edition" and a "version"? R.37. What is the difference between an "etext" and an "eBook"? R.38. What are the "Etext/Ebook numbers" on the texts? R.39. What do the month and year on the text mean?

Copyright FAQ

Volunteers' FAQ

About the Basics:

About production:

About Proofing:

About Net searching:

V.62. I've found an eligible text elsewhere on the Net, but it's not
       in the PG archives. Can I just submit it to PG?
V.63. I've found an eligible text elsewhere on the Net, but it's not
       in the PG archives. Why should I submit it to PG?
V.64. I have already scanned or typed a book; it's on my web site.
       How can I get it included in the Gutenberg archives?
V.65. I have already scanned or typed a book; it's on my web site.
       The world can already access it. Why should I add it to the
       Gutenberg archives?
V.66. I have already scanned or typed a book, but it's not in plain text
       format. Can I submit it to PG?

About author-submitted eBooks:

About what goes into the texts:

V.73. Why does PG format texts the way it does?

About the characters you use:

V.74. What characters can I use?
V.75. What is ASCII?
V.76. So what is ISO-8859? What is Codepage 437? What is Codepage 1252?
       What is MacRoman?
V.77. What is Unicode?
V.78. What is Big-5?
V.79. What are "8-bit" and "7-bit" texts?
V.80. I have an English text with some quotations from a language that
       needs accents—what should I do about the accents?
V.81. I have some Greek quotations in my book. How can I handle them?
V.82. I want to produce a book in a language like Spanish or French
       with accented characters. What should I do?

About the formatting of a text file:

V.83. How long should I make my lines of text?
V.84. Why should I break lines at all? Why not make the text as one
       line per paragraph, and let the reader wrap it?
V.85. Why use a CR/LF at end of line?
V.86. One space or two at the end of a sentence?
V.87. How do I indicate paragraphs?
V.88. Should I indent the start of every paragraph?
V.89. Are there any places where I should indent text?
V.90. Can I use tabs (the TAB key) to indent?
V.91. How should I treat dashes (hyphens) between words?
V.92. How should I treat dashes replacing letters?
V.93. What about hyphens at end of line?
V.94. What should I do with italics?
V.95. Yes, but I have a long passage of my book in italics! I can't
       really CAPITALIZE or otherwise /mark/ all that text, can I?
V.96. Should I capitalize the first word in each chapter?
V.97. What is a Transcriber's Note? When should I add one?
V.98. Should I keep page numbers in the e-text?
V.99. In the exceptional cases where I keep page numbers, how should
       I format them?
V.100. Should I keep Tables of Contents?
V.101. Should I keep Indexes and Glossaries?
V.102. How do I handle a break from one scene to another, where the
       book uses blank lines, or a row of asterisks?
V.103. How should I treat footnotes?
V.104. My book leaves a space before punctuation like semicolons,
       question marks, exclamation marks and quotes. Should I do
       the same?
V.105. My book leaves a space in the middle of contracted words like
       "do n't", "we 'll" and "he 's". Should I do the same?
V.106. How should I handle tables?
V.107. How should I format letters or journal entries?
V.108. What can I do with the British pound sign?
V.109. What can I do with the degree symbol?
V.110. How should I handle . . . ellipses?
V.111. How should I handle chapter and section headings?
V.112. My book has advertisements at the end. Should I keep them?
V.113. Can I keep Lists of Illustrations, even when producing a
       plain text file?
V.114. Can I include the captions of Illustrations, even when producing
       a plain text file?
V.115. Can I include images with my text file?

About formatting poetry:

V.116. I'm producing a book of poetry. How should I format it?
V.117. I'm producing a novel with some short quotations from poems.

About formatting plays:

V.118. How should I format Act and Scene headings?
V.119. How should I format stage directions?
V.120. How should I format blank verse?

About some typical formatting issues:

V.121. Sample 1: Typical formatting issues of a novel.
V.122. Sample 2: Typical formatting issues of non-fiction
V.123. Sample 3: Typical formatting issues of poetry
V.124. Sample 4: Typical formatting issues of plays

About problems with the printed books:

V.125. I found some distasteful or offensive passages in a book I'm
       producing. Should I omit them?
V.126. Some paragraphs in my book, where a character is speaking,
       have quotes at the start, but not at the end. Should I close
       those quotes?
V.127. The spelling in my book is British English (colour, centre).
       Should I change these to American spellings?
V.128. I'm nearly sure that some words in my printed book are typos.
       Should I change them?
V.129. Having investigated what looks like a typo, I find it isn't.
       Do I need to do anything?
V.130. Aarrgh! Some pages are missing! Do I have to abandon the book?
V.131. Some words are spelled inconsistently in my book (e.g. sometimes
       "surprise", sometimes "surprize"). Should I make them consistent?

Word Processing FAQ

W.1. What's the difference between an editor and a word processor?
W.2. Should I use an editor or a word processor?
W.3. Which editor or word processor should I use?
W.4. How can I make my word processor easier to work with for plain text?
W.5. What is the difference between proportional and non-proportional
       fonts?
W.6. I can't get words in a table or poem to line up under each other.

About using MS-Word:

W.7. I've edited my book in Word - how do I save it as plain text?
W.8. Quotes look wrong when I save a Word document as plain text.
W.9. Dashes look wrong when I save a Word document as plain text.
W.10. I saved my Word document as HTML, but the HTML looks terrible.

Scanning FAQ

S.1. What is a scanner?
S.2. What types of scanners are there?
S.3. Which scanner should I get?
S.4. What is ADF?
S.5. Should I get ADF?
S.6. What's a "TWAIN driver" and why do I need one?
S.7. How do I scan a book?
S.8. My book won't open flat enough for a good scan, and I don't
       want to cut the pages.
S.9. How long does it take to scan a book?
S.10. What scanner settings are best?
S.11. Can I use a digital camera in place of a scanner?
S.12. What is OCR?
S.13. What differences are there between OCR packages?
S.14. How accurate should OCR be?
S.15. Which OCR package should I get?
S.16. What types of mistakes do OCR packages typically make?
S.17. Why am I getting a lot of mistakes in my OCRed text?
S.18. I got an OCR package bundled with my scanner. Is it good enough
       to use?
S.19. I want to include some images with a HTML version. How should I
       scan them?
S.20. I want to include some images with a HTML version. What type of
       image should I use?
S.21. Will PG store scanned page images of my book?

HTML FAQ

H.1. Can I submit a HTML version of my text?
H.2. Why should I make a HTML version?
H.3. Can I submit a HTML version without a plain ASCII version?
H.4. What are the PG rules for HTML texts?
H.5. Can I use Javascript or other scripting languages in my HTML?
H.6. Should I make my HTML edition all on one page, or split it into
       multiple linked pages?
H.7. How can I check that I haven't made mistakes in coding my HTML?
H.8. Can I submit a HTML or other format of somebody else's text?
H.9. How big can the images be in a HTML file?
H.10. The images I've scanned are too big for inclusion in HTML.
       What can I do about it?
H.11. Can I include decorative images I've made or found?
H.12. How can I make a plain text version from a HTML file?
H.13. How can I make a HTML version from my plain text file?

Programs and Programming FAQ

Formats FAQ

Volunteers' Voices - Volunteers talk about PG

       Amy Zelmer
       Ben Crowder
       Col Choat
       Dagny
       Gardner Buchanan
       Jim Tinsley
       John Mamoun
       Ken Reeder
       Lynn Hill
       Sandra Laythorpe
       Tony Adam
       Tonya Allen
       Walter Debeuf

Bookmarks - web pages commonly referred to in the FAQ

In 1971, Michael Hart was given $100,000,000 worth of computer time on a mainframe of the era. Trying to figure out how to put these very expensive hours to good use, he envisaged a time when there would be millions of connected computers, and typed in the Declaration of Independence (all in upper case—there was no lower case available!). His idea was that everybody who had access to a computer could have a copy of the text. Now, 31 years later, his copy of the Declaration of Independence (with lower-case added!) is still available to everyone on the Internet.

During the 70s, he added some more classic American texts, and through the 80s worked on the Bible and the collected works of Shakespeare. That edition of Shakespeare was never released, due to copyright law changes, but others followed.

Today, we have a target of 200 books a month.

In mid-2002, we are not only still going, we have made over 5,000 eBooks available, with a current production target of 200 more each month.

We have many mirrors (copies) of our archives on all five continents.

In terms of the day-to-day production of eBooks, our volunteers run themselves. :-) They produce books, and submit them when completed. Our Production Directors help with general volunteer issues. The Posting Team check submitted texts and shepherd them onto our servers. You can find current contact information for these people on the Contact Information page at </contactinfo.html>.

As of mid-2002, there are about 100 active producers, and 200 regular, active helpers doing tasks like proofing. Something like 1500 people receive our Newsletter.

There are lots of ways to contact us, depending on what you want to talk about. The Contact Info page </contactinfo.html> on the main web site lists them.

Donate money! We're an all-volunteer project, and we don't have much to spend, so even a little goes a long way. Our Donation page </donation.html> tells you how.

Produce a text! Turn an old book into an immortal etext.
The Volunteers' FAQ [V.1] tells you how.

Subscribe to one of the Newsletters—weekly or monthly!

The page </subs.html> gives details of how to subscribe, unsubscribe and access the archives.

No.

Any books that we legally can, and that our volunteers want to work on.

We cannot publish any texts still in copyright without permission. This generally means that our texts are taken from books published pre-1923. (It's more complicated than that, as our Copyright FAQ explains, but 1923 is a good first rule-of-thumb for the U.S.A.)

So you won't find the latest bestsellers or modern computer books here. You will find the classic books from the start of this century and previous centuries, from authors like Shakespeare, Poe, Dante, as well as well-loved favorites like the Sherlock Holmes stories by Sir Arthur Conan Doyle, the Tarzan and Mars books of Edgar Rice Burroughs, Alice's adventures in Wonderland as told by Lewis Carroll, and thousands of others.

These books are chosen by our volunteers. Simply, a volunteer decides that a certain book should be in the archives, obtains the book and does the work necessary to turn it into an e-text. If you're interested in volunteering, see the Volunteers' FAQ at [V.1] below.

We have published some music files, in MIDI and MUS formats. We have published the Human Genome. We have published pictures of the prehistoric cave paintings from the south of France. We have published some video files and some audio files, including a Janis Ian track and readings from public domain books.

Whatever languages we can! As above, this is decided by what languages our volunteers choose to work with.

G.15. Why don't you have any / many books about history, geography,
      science, biography, etc.?
      Why aren't there any / more PG books available in French, Spanish,
      German, etc.?

If we can legally publish a book, and it isn't in the archives, it's because no volunteer has produced it yet. At the moment, we have a predominance of English language novels because that is what most people have chosen to work on.

We're always looking for new languages and topics, and always delighted to see people producing them. If we don't have enough of the types of books you would like to see, why don't you help us out by contributing one? If the people interested in a particular area don't contribute, we'll always be short in that area.

G.16. Why don't you have any books by Steven King, Tom Clancy,
      Tolkien, etc.?

Don't misrepresent us—we support and publish many open formats, but, yes, we do want to have a plain text version of everything possible.

We're looking at our history, and we're planning for the long term—the very long term.

Today, Plain Vanilla ASCII can be read, written, copied and printed by just about every simple text editor on every computer in the world. This has been so for over thirty years, and is likely to be so for the foreseeable future. We've seen formats and extended character sets come and go; plain text stays with us. We can still read Shakespeare's First Folios, the original Gutenberg Bible, the Domesday Book, and even the Dead Sea Scrolls and the Rosetta Stone (though we may have trouble with the language!), but we can't read many files made in various formats on computer media just 20 years ago.

We're trying to build an archive that will last not only decades, but centuries.

The point of putting works in the PG archive is that they are copied to many, many public sites and individual computers all over the world. No single disaster can destroy them; no single government can suppress them. Long after we're all dead and gone, when the very concept of an ISP is as quaint as gas streetlamps, when HTML reads like Middle English, those texts will still be safe, copied, and available to our descendants.

The PG archive is so valuable, yet free and easily portable, that even if every current PG volunteer vanished overnight, people around the world would copy and preserve it.

If the ZIP format loses popularity, and is replaced by better compression, it will be easy to convert the zip formats automatically (and we post all plain-text files in unzipped format as well). If hard drives are replaced by optical memory, it will be easy to copy the files onto that. If even ASCII is superseded by Unicode or one of its descendants, it will be possible for our grandchildren to convert it automatically (and ASCII is included in Unicode anyway).

By contrast, many of us have files saved in proprietary formats from word-processors only 5 or 10 years old that are already impractical for us to read. Some of our files produced just a few years ago using non-ASCII character sets like Codepage 850 are already giving problems for some readers. Some eBook reader formats launched within the last few years are already obsolete. We have learned from that experience.

We also encourage other open formats based on plain text, like HTML and

Readers' FAQ

About Finding eBooks:

R.1. How can I find an eBook I'm looking for?

For PG books, the simplest way is to go to the home page at <>, type the Author or Title into the search form, press the "Search" button, and follow the choices.

As of late 2002, there is a full-text search available at <http://public.ibiblio.org/gsdl/cgi-bin/library.cgi> where you can search not only for titles and authors, but any words or phrases you want to look up. For example, entering "Ample make this bed" and running an "entire books" search for all words leads you to Poems Of Emily Dickinson, Series Two.

Yes. There are two main options:

GUTINDEX.ALL is the raw list of files posted. You will find it at: <ftp://ibiblio.org/pub/docs/books/gutenberg/GUTINDEX.ALL>

PGWHOLE.TXT is the list of files cataloged. A Zipped version is: <http://promo.net/gg/pgwhole.zip>

When we post a book, the posting information contains title and author, eBook number, base filename and schedule year and month. This raw information goes into GUTINDEX.ALL.

After posting, our catalogers get to work and add more information —things like full title, subtitle, author birth and death dates, Library of Congress Classification, full filenames and sizes. When a book has been cataloged, it is entered onto the website database so that you can search for it. PGWHOLE.TXT is a summary of the books in the website database.

People who want to bypass the search on the website and find books themselves will probably want to use GUTINDEX.ALL, since it doesn't wait for the cataloging.

R.3. How can I download a PG text that hasn't been cataloged yet?

In short, just browse to:

<http://www.ibiblio.org/pub/docs/books/gutenberg/>

choose the schedule year of the text (newly-posted texts will usually be in the latest year) and look down the list to find the filename you're looking for.

In general, you need to know:

a) the address of an FTP site b) the schedule year of the text you want c) the basename of the text you want.

The fastest and safest FTP site to use for this is ftp.ibiblio.org, which is the first of our two primary posting sites (the other being ftp.archive.org). We post to these two sites, and then other sites copy from them at intervals, so with any FTP sites other than these two, the file may not be available immediately.

You can get the schedule year and basename of the text from its line in GUTINDEX.ALL. Let's take an example. The file

Mar 2004 The Herd Boy and His Hermit, by C. M. Yonge [#32][hrdbhxxx.xxx]5313

has been posted just a few hours ago as I write this. From the GUTINDEX entry, the schedule year is 2004, and the basename of the text is hrdbh.

We divide our texts into directories (folders) based on the schedule year, so this eBook will be in the directory for 2004, which will be named something ending in /etext04. All the directories are named etext plus the last two digits of the year. (Somebody's going to have to change that convention in about 87 years from now! :-) We currently have directories starting at 90, running through the 90s and then 00, 01, 02, 03, 04. All eBooks produced before 1991 are in the /etext90 directory, so if you're looking for

Dec 1971 Declaration of Independence [whenxxxx.xxx] 1
or
Aug 1989 The Bible, Both Testaments, King James Version [kjv10xxx.xxx] 10

you should look in /etext90.

As it happens, ibiblio supports both HTTP (web) and FTP access to the text, so we can just browse to

<http://www.ibiblio.org/pub/docs/books/gutenberg/>

and choose the 2004 directory from there.

If you want to automate this, you could also use the more direct address

<ftp://www.ibiblio.org/pub/docs/books/gutenberg/etext04/>

The equivalent address for ftp.archive.org is

<ftp://ftp.archive.org/pub/etext/etext04/>

Either way, we see a long page of files, in alphabetical order. Scroll down to the "H"s and look for hrdbh. We see four files with this basename:

hrdbh10.txt hrdbh10.zip hrdbh10h.htm hrdbh10h.zip

This means that both plain text and HTML formats are available, and you can choose to download them either zipped or uncompressed. For more detail about conventions for filenames, see the FAQ "What do the filenames of the texts mean?" [R.35]. The main thing you need to know is that any file beginning with hrdbh is some format or edition of this book.

Finally, all you have to do is click on the format you want to download.

R.4. You don't have the eBook I'm looking for. Can you help me find it?

Sorry, no. We can suggest (see below) some other places to look for publicly accessible books on the Net, but we can't do the search for you.

R.5. Where else can I go to get eBooks?

The On-Line Books Page <http://onlinebooks.library.upenn.edu/> and the
Internet Public Library at <http://www.ipl.org/> are two sites that
specialize in creating a list of all books on-line from any source.
Searching them is a good place to start.

If you're looking for commercial books, like current textbooks or bestsellers, you're not likely to find them here, since recent books are not in the public domain. For these, you should look for commercial booksellers on the Net—any search engine will direct you to some if you enter search terms like "shop ebook".

R.6. I see some eBooks in several places on the Net. Do different people really re-create the same eBooks?

It does happen, but mostly by accident. Anyone experienced in eBook creation will first search the usual places to see whether anyone else has already transcribed the book they're interested in. If it has been transcribed, they will not duplicate the effort.

Etexts that are in the public domain very often float around the Net for years—stored in a gopher server here, posted to Usenet there, held on someone's local computer for a year or two and then reformatted as HTML and uploaded to a web site somewhere else. And this is good, because we want texts to be copied as widely as possible.

Public domain eBooks are fair game for anyone to copy, correct, mark up, package and post: that's what being in the public domain means.

If you find an eBook in many different places, the odds are good that it came from one original source, and was copied around.

It does sometimes happen that people duplicate the transcription of books already made into text. Sometimes it's because they didn't find the version already made. Sometimes they have a different edition, and want to transcribe that. Mostly, though, we all try not to do more work than we have to.

About Using the Web Site:

R.7. Why couldn't I reach your site? (or: Why is your site slow?)

There may also be a bottleneck somewhere else between you and the site. If at first you don't succeed, don't tell us, just try, try again. The correct address is either:

http://promo.net/pg/
              or
/

R.8. I get an error when I try to download a book.

Usually, the easiest solution is to choose another FTP site to download your text from. Go to the Search page, choose a different FTP site, and search again for your text.

Tip: You should always try to choose the FTP site closest to you. Not only are you helping to minimize Net traffic by choosing a nearby site, but your file will download faster!

If all else fails, note the year and the filename of the book you want, choose an FTP site from this list and click on one of them. Then browse your way through the listings to the file you want.

For example, if you find "Lady Susan" by Jane Austen, you will see that it was published by Gutenberg in 1997, and its filename is lsusn10.txt, so browse to one of the FTP sites, choose the directory called etext97 and click (or right-click and Save, depending on your browser) on the file lsusn10.txt.

First go to the Advanced Search page. Sometimes you may miss in searching because of alternative spellings, so try searching separately using just one word in Author or Title. Read the Search Tips.

If that fails, you can Browse through the site catalog. Let's say you're looking for "The Wandering Jew" by Eugene Sue.

Go to the PG Home page: </>

Once on this page, click on: "Browse" in "Browse by Author or Title"

You are then brought to a new page, asking you to select an "FTP site". Further details on how and why to choose an "FTP Site" are available on this page.

Select an FTP Site from the Selection List available at the bottom of the page, then click on "Select".

You get a new page, Click on "S", initial for "Sue, Eugene"

You should now see a list of all of the Authors whose Last name starts with "S". Scroll down till you find the direct links to the Sue, Eugene works.

Click on the work you are interested to, then click on the file link found on the page you were brought to, Etext Card ID -3987- when selecting the work, as immediately above.

On this page, above the teaser, there are two working links:

DOWNLOAD:
                  · es12v10.txt - 2.95 MB
                  · es12v10.zip - 1.10 MB

Click on the link of your choice in order to get the book.

If you can't find your text either way, the book has not been cataloged. The site catalog always lags behind the postings, since we need to collect extra information about the book and the author before it goes into the full catalog. If you know that the book has been posted recently, and maybe hasn't made it into the catalog yet, read the FAQ "How can I download a PG text that hasn't been cataloged yet?"

If even this doesn't help, don't despair! We don't have it, but it may be elsewhere on the Web. Go to the major search engines and try there. You can also try looking in the Book Search section of The On-Line Books Page <http://onlinebooks.library.upenn.edu/> or the Internet Public Library <http://www.ipl.org>, and if you have no luck with that, you might be able to find it listed as being In Progress somewhere on their Books In Progress and Requested page at <http://onlinebooks.library.upenn.edu/in-progress.html>.

R.10. Can I copy your website, or your website materials?

No.

We welcome mirrors and copies of our e-texts, in new FTP sites [R.14], but the main web site itself is copyrighted and may not be copied.

R.11. Your site doesn't look right in my browser.
      I clicked on a button, and nothing happened.

We take a lot of trouble to ensure that our website uses only valid, standard HTML, and we're not even slightly tempted to use glitzy features that look good in one browser but don't work in another, so we can promise you that our site is not the problem.

The site uses Cascading Style Sheets (CSS), a W3C standard since 1996. Some older browsers have a buggy implementation of CSS, and this can cause some things to appear off-kilter. If your browser is even older, or doesn't know about CSS at all (as in the case of Lynx, for example) it should have no problem.

If you actually clicked on a button, like the Search button or the Post button on the Volunteers' Web Board page, and nothing happened, you might be behind a proxy or web filter that doesn't like you making POST requests. If you have a web filter switched on, turn it off, reload the page and try again.

R.12. What does that thing about "Select FTP Site" mean?

Our texts are not actually held on the website. The website just holds an index; the files themselves are held on many sites throughout the world, called FTP sites. When you have found the book you're looking for, and you make that final click to get it, you're not actually talking to our website any more—you are transferred to the FTP site you selected. Some FTP sites are near you; some are far away. Some may be faster than others, even if they are about the same distance; some may have temporary technical problems.

You should usually select the FTP site nearest you. If you find you're having problems with that one, you can select another.

R.13. What exactly is an FTP site anyway?

FTP stands for File Transfer Protocol, one of the oldest and most reliable protocols of the internet. This is the method by which a file can be copied from one computer to another.

An FTP site, or FTP server, is a computer that holds files that people can upload and download. In the case of PG, the Posting Team upload our texts when they're ready to two main FTP servers, <ftp://ftp.ibiblio.org> and <ftp://ftp.archive.org>, which serve as our master copies.

Other FTP sites around the world automatically download the files from these master sites, so they have a full set of PG publications for you to download. Because they only check for updates and new files at intervals, some FTP sites may be a day or two behind. Some FTP sites don't have space available for everything, so they may hold only the zipped versions of the files. But most FTP sites will have the entire PG collection. These are called FTP "mirrors", since they are a copy of the original.

Many FTP sites exist that offer a full PG mirror but are not on our FTP sites list. Commonly, these are in schools, where they serve the local students, but don't have enough bandwidth to offer downloads to worldwide users.

R.14. Can I become an FTP mirror?

Yes! We're always looking for more FTP mirrors.

If you manage an FTP site with a few GB of space, please check our Contact Information page </contactinfo.html> and contact the appropriate person, who will make the arrangements for you. If space is a problem, you can consider holding only zipped copies of the texts. We can move you up or down the FTP site list as you want more or less traffic.

R.15. Can I make a private FTP mirror for my school, library or organization?

Yes.

We like all FTP mirrors to be open to as many people as possible, but we know that not all schools have the resources to be a public mirror, so we welcome all mirrors.

And anyway, you don't even have to ask, because we don't control what happens to our texts once we post them!

R.16. When I clicked on the file I want, nothing happened.

When you select a file for download, your request goes to the FTP site you selected, not to our website. If the FTP site you selected is having problems, or if there is the Net version of a traffic jam between you and it, you may have problems downloading.

Select a different FTP site [R.12] and try again.

R.17. How many texts are downloaded through the web site?

We don't really do statistics, but in one particular month for which we did, we had a figure of about 800,000 searches completed. Since the final request for download goes to the FTP site selected and not to our website, we can't confirm that all of these were actually downloaded, but we expect that most people who have gone all the way through the search will finish the job.

In another month, we had about 1,000,000 downloads of files from ftp.ibiblio.org, our main FTP site. This does not count downloads from other FTP sites, of course. Why are there more downloads than searches? Because people who are already familiar with getting PG texts can skip the website search and download straight from the FTP sites.

R.18. What are the most popular books?

We very rarely do statistics, but on one occasion in late 1999 when we did, we found the top author searches to be:

  1 shakespeare
  2 poe
  3 doyle
  4 melville
  5 dante
  6 joyce
  7 shaw
  8 christie
  9 conrad
 10 porter
 11 verne
 12 hemingway
 13 darwin
 14 miller
 15 woolf
 16 zola
 17 king
 18 eliot
 19 churchill
 20 smith
 21 twain

and the top individual books searched for to the point of downloading were:

  1. Lady Susan, by Jane Austen
  2. 1st PG Collection of Edgar Allan Poe
  3. The Adventures of Sherlock Holmes, by Arthur Conan Doyle
  4. Moby Dick, by Herman Melville
  5. A Christmas Carol, by Dickens
  6. The King James Bible
  7. Twelve Stories and a Dream, by H.G. Wells
  8. Stories by Modern American Authors
  9. Lock and Key Library, Magic & Real Detectives
 10. [Hans Christian] Andersen's Fairy Tales
 11. The Legend of Sleepy Hollow, Washington Irving

These numbers vary a lot. When a movie based on a classic is released, downloads of that eBook go through the roof!

R.19. Should I download a ZIP or a TXT file?

If you know how to unzip a file, then downloading the zip is faster. For some non-text eBooks that contain multiple files, like HTML with included images, only a zip file may be available. For some other formats, like MP3 or MPEG, there may not be a zipped version available because the native format of the file is already compressed enough that zipping it doesn't save much.

R.20. I've got a ZIP file. What do I do with it?

Unzip it.

If you want a free program, you could try the open source Info-Zip
software available at
<http://www.ctan.org/tex-archive/tools/zip/info-zip/> for Mac, MS-DOS,
Unix, Windows and just about everything else you might have.

If you want a commercial program, PKZIP from <http://www.pkware.com> and WinZip from <http://www.winzip.com> are among many popular shareware utilities that allow you to unzip files.

Mac-users using Stuffit Expander may like to set a preference (File / Preferences / Cross Platform) to "Convert text files to Macintosh format . . . When a file is known to contain text". This gets rid of strange characters (linefeeds), which are not wanted on a Mac, at the beginnings of lines. MacZip is another free program for Macs. Mac users can also try ZipIt or other shareware programs available from the Info-Mac archives, e.g. from <ftp://mirrors.aol.com/pub/info-mac/Compress&_Translate/>.

R.21. I tried to unzip my file, but it said the file was corrupt, or damaged.

The chances are that it didn't download correctly. Try downloading it again. If you don't succeed the second time, try downloading the unzipped version.

R.22. I see gibberish onscreen when I click on a book.

To save download time, our etexts are stored in zipped form as well as text form. Zipped files are smaller, and take less time to transfer to your computer, but you need a program to unzip them. If you try to view a zipped file directly, it looks like gibberish.

You can recognize zipped files easily because their filenames end in .zip.

If this happens, either make sure you're asking your browser to Save the file rather than display it (often, you right-click the file and choose Save) or else click on the version of the file that ends in .txt instead of .zip. You don't need a zip program to view .txt files.

Looking at a zip rather than a text file is by far the most common reason for this problem, but there are some others. If you're quite sure that you're not looking at a zip file, then it could be that the file you downloaded is in a character set that your viewer doesn't recognize, like Big-5 [V.78] for Chinese texts, or Unicode [V.77]. If this is the case, you will have to find a viewer that works on your computer for the specified character set. We may also have an ASCII version of the same text available for you—we do try to have ASCII versions for everything [G.17], but some languages, like Chinese, just cannot be sensibly expressed in ASCII.

If you can see most of the characters, enough to be able to make out the text, but there are regular gibberish characters, black squares, empty boxes or obviously missing characters scattered about through words, then you are probably looking at an "8-bit" text [V.79], with accented characters, and your viewer doesn't handle the character set. See the FAQ "I can read the text file, but a few characters appear as black squares, or gibberish" [R.31].

If there are a very few gibberish characters, black squares or obviously missing characters in the text, then it's likely that this was intended to be a 7-bit text, but a few 8-bit characters like the British pound symbol or accented letters slipped through.

R.23. Can I download and read your books?

R.24. What am I allowed to do with the books I download?

No, and we don't want to!

Like any Internet transfer, our sites have to know the IP addresses that contact them; without that, no communication is possible. But we do not trace, hold or examine them beyond what is necessary to deal with any problems or maintain logs or statistics. We never identify IP addresses with people.

Further, we encourage people, sites, schools around the world to mirror, or copy, our texts to their sites. Once that happens, we have no control over them, and we never have any idea who or even how many people access them after that.

Even further, we encourage people to distribute the texts on disks, CDs, paper, and any other storage format they can find. We encourage them to convert the texts to other formats, and share them.

For most people reading this, anonymity is probably not an issue, but you may live in a place or time where reading Paine, or Voltaire, or the Bible, or the Koran, is considered suspicious or even subversive. We don't know who you are, and what we don't know, we can't tell.

Currently (mid-2002), by means of DRM (Digital Rights/Restrictions Management) many commercial publishers can make a list of exactly who is reading which of their eBooks. We don't know, and we don't want to know.

The first thing to remember is that the people who actually make the corrections you suggest are very experienced, and are used to seeing lots of different types of errata reports. So the exact format of your report isn't really very important—just get the report to us in any clear form that we can understand.

Beyond that, here are some tips to avoid misunderstandings.

It's always helpful if you report the full title, etext number, year and filename of the text you are correcting. We have multiple editions and versions of some texts, like Homer's "Odyssey", and unless you tell us exactly what text you mean, we may have to spend some time searching and guessing.

Especially, please check and report the exact filename of the text. It is amazingly common for people to report problems with abcde10.txt, when abcde11.txt is already posted, and has these and other errors already fixed.

When there are only a few errors, it's usually easiest to cut and paste the line or lines where the error is into your e-mail, with your comment.

It can also be useful to give the line number of the place where the error is, and some people who check texts regularly do this. If this seems natural to you, do it; if it doesn't, don't.

An ideal report for a typical errata list might look like:

    Title: The Odyssey, by Homer
           Translated by Butcher & Lang
           April, 1999 [Etext #1728]
    File: dyssy08.txt

 Line 884:
   back Telemachus, who bas now resided there for a month.
     "bas" should be "has"

 Line 1491:
   Ithaca yet stands. But I wouldask thee, friend, concerning
      "would" and "ask" are run together here

 Line 1563:
   in his father's seat and the elders gave place to him
      This is the end of a paragraph, and needs a period at end.

 Line 15346-7:
    'Hearken to me now, ye men of Ithaca, to the
    will say. Through your own cowardice, my friends, have
       I think there is something missing between "the" and "will"

But the following would get the job done as well:

    In Homer's Odyssey, translated by Butcher and Lang, from /etext99,
    file dyssy08.txt, I found the following errors:

    Telemachus, who bas now resided
    change "bas" to "has"

    But I wouldask thee,
    "would ask" run together

    and the elders gave place to him
    needs period

ye men of Ithaca, to the will say. line missing between "the" and "will"?

Where there are more than a few changes, it may be easiest all round just to submit a corrected version of the file. However, if you do this, please do not re-wrap the paragraphs unless it is really necessary; we need to check your suggestions before reposting, and if the file is very different, it is difficult and time-consuming for us to find your real changes among all of the changes in the lines.

The Posting Team, who post the books, also make the corrections, and ultimately, the corrections need to go to them.

Many producers put their e-mail addresses in their texts, specifically so that readers can contact them when errors are found. If you see that in your text, you should try to contact the producer first. This is especially true if the corrections aren't obvious, as in the case of missing words. The producer is likely to have the original book, and will probably be able to confirm your corrections without visiting a library. If the book needs the corrections, the producer can then notify the Posting Team.

If you get no response from the producer, or if there is no e-mail address listed, or if the corrections are small and obvious, you can send them to any or all of the Posting Team directly.

R.28. I've reported some typos. What will happen next?

This varies wildly. Sometimes, you may just get a response e-mail in a day or three saying thanks, and that we've fixed the typo. This is normal when you've just reported one or a few obvious typos.

Where there is some text missing, or the changes you suggest are otherwise not obvious, we may have to find someone with an eligible copy of the book to confirm the changes, and that might take time. Normally, you will get an e-mail explaining that within a week.

Sometimes, even though you've noticed only one or two small typos, one of the Posting Team who was looking at it may find many more, and decide that the whole text needs to be re-proofed. This may also take time.

If the text needs a lot of changes, we may post a new EDITION [R.35] of it, with a new filename: e.g. abcde10.txt may become abcde11.txt. In this case, you will receive a copy of the e-mail sent to the posted list announcing the new file. Our current rule of thumb is that we create a new edition when we make twelve significant changes, but we judge each on a case-by-case basis, and especially will usually not make a new edition if the original was posted recently.

R.29. I've got the text file, and I can read it, but it seems to be double-spaced or it has control characters like ^J or ^M at the end of every line.

This is most often seen on Mac or Linux. If you want to dig into why this effect happens, see the FAQ "Why use a CR/LF at end of line?" [V.85].

Perhaps viewing it in a different editor or viewer will help, but it's usually easiest just to globally replace all of the control characters (if you see them) with nothing, or to replace all double line-ends with single line-ends.

R.30. When I print out the text file, each line runs over the edge
      of the page and looks bad.

This is the most widely accepted format for text files, but it's not ideal on all computers and all programs. 70 characters per line means that if you are using an unusually large or small font to print it, lines may wrap around or not reach across the page. The hard return means that on some systems, the lines may appear double-spaced.

Unfortunately, we can't advise you how best to format texts on all systems, mostly because we don't know every system! Here are a couple of tips you might try:

If your font is too big or too small, try setting the font to Courier size 10 or Times size 12. It may not be ideal, but it mostly works.

In a word processor, you may be able to remove the Hard Returns, but
beware! if you remove too many, the whole text will become one
paragraph. One common formula for removing the HRs goes like this:
 1. First, all paragraphs and separate lines should be separated
    by two HRs, so that you can see one blank line between them.
    Where they aren't, as in the case of a table of contents or
    lines of verse, add the extra HRs to make them so.
 2. Replace All occurrences of two HRs with some nonsense character
    or string that doesn't exist in the text, like ~$~.
 3. Replace All remaining HRs with a space.
 4. Replace your inserted string ~$~ with one HR.

R.31. I can read the text file, but a few characters appear as black
      squares, or gibberish.

The text is using some character set that your editor or viewer isn't.
For example, the text is using ISO-8859-1, and your viewer is using
Codepage 850—or vice versa. You can see the plain ASCII characters,
but non-ASCII characters like accented letters display as nonsense.

Look at the top of the file for a clue to the character set encoding: if it's there, it may help you to find which editor, or font, or viewer you should be using.

R.32. Can I get a handheld device for reading PG texts? Which device should I get?

To read eBooks on a handheld, you need three things: the eBook content itself (which you can get from PG and other sites), a device (which I will sometimes call a PDA, even though technically, the RocketBook isn't a PDA) and the reader software that runs on the PDA.

In mid-2002, there are three main families of handheld devices people use for reading eBooks: Palms, Pocket PCs and RocketBooks (or their successor, REB1100s). In general, it is possible to use any of these in combination with any common type of personal computer.

Palms are very common, especially when you count not just the Palm <http://www.palm.com> itself, but PalmOS-based devices from other manufacturers, like:

the Franklin eBookman <http://www.franklin.com/ebookman/>, the Handspring Visor <http://www.handspring.com>. the Sony Clie <http://www.sony.com> and

The RocketBook, and its successor the Gemstar REB1100, <http://www.gemstartvguide.com> are quite different from the others. These were built specifically for reading eBooks, and do not have additional functions. They are not, technically, PDAs. Their screens are bigger, and excellent for reading, but do not offer color. They also don't offer a choice of readers—the dedicated reader is built-in to the device. Both of them require the eBooks you load to be formatted for their reader, and files made for them usually have the extension .rb for RocketBook. The REB1100 does not come with the RocketLibrarian, which is the program you run on your PC to turn an etext into a RocketBook file, but people are still making .rb files, and the RocketLibrarian is still available and popular among an enthusiastic group of Rocket users. (The REB1200 is entirely different from the REB1100, and, as far as we know, PG etexts cannot easily be transferred to it.)

In summary, the Rocket/REB1100 is a dedicated reader, with a good screen, but limited to what it does.

Palms are relatively cheap and common, with a wide range of options, and the capacity to function as PDAs as well. They can run all common readers except the Microsoft one.

The iPaq <http://www.compaq.com> has a good color screen, but is
bulkier than a Palm, and can run lots of readers, including the
Microsoft one, but not all Palm readers are available for Pocket PC.
Like Palms, the iPaq can do other jobs besides displaying eBooks.

Different people make different choices among these for reading their eBooks, and they all work well; it's a matter of personal taste.

R.33. How can I read a PG eBook on my PDA (Palm, iPaq, Rocket . . .)

To read a book on your PDA, you need to get the file into a format that your reader software understands. Each PDA reader program will work only with a specific format of file. Some will read several formats, but, in general, it's a jungle of competing options.

Unless you use a Rocket or REB1100, you will need to install at least one reader program, and many veteran readers install two or three to deal with different formats. There are many of them available. In a recent internal poll of Gutenberg volunteers who use PDAs,

 C Spot Run <http://www.32768.com/bill/palmos/cspotrun/index.html>,
 Mobipocket <http://www.mobipocket.com>,
 PalmReader <http://www.peanutpress.com/>
 Plucker <http://www.plkr.org>

were our favored choices for reader programs.

Further, the process may be different depending on which reader software you're using. Each format that a reader understands has one or more converter programs that run on your PC, and turn the plain text file into that format. So in general, you have to:

 1. Download the PG text
 2. Edit the text for the layout the converter wants (often HTML).
 3. Use the converter to create a file of the format the reader wants.
 4. Transfer the converted file to your PDA.

If all this sounds too complicated, remember that many people take and convert PG texts into many formats, and offer them for download from their sites. Of course, there is no guarantee that someone will have converted the particular eBook you want, but there are lots of options. Try Blackmask <http://www.blackmask.com>, which lists thousands of texts already converted for Mobipocket, iSilo, RocketBook and the Microsoft Reader.

There are many other sites that serve pre-converted PG texts.

MemoWare <http://www.memoware.com> is also a useful resource for converted eBooks, and has lots of information, including an excellent map of the readers and formats jungle at <http://www.memoware.com/mw.cgi/?screen=help_format>

Tecriture <http://www.tecriture.net> hosts a service that downloads and converts PG texts on the fly, and delivers them straight to you.

If you're "rolling your own", you'll probably need to convert our plain texts to HTML at some point, because a lot of converters require HTML as input, and this is a common theme in readers' explanations of how they get texts onto their PDAs. Don't panic! You don't have to be a HTML wizard to do this—in fact, you don't need to know anything about HTML at all! Usually, it's just a matter of removing some line ends and Saving As HTML. You won't get a lot of fancy markup, or images out of thin air, but you will get the book.

One of the main things you usually have to do in making HTML is unwrap the lines. If you're making your HTML manually, this is usually done by replacing two paragraph marks with some nonsense marker like @@Z@@, replacing all single paragraph marks with a space, and replacing the nonsense marker with a paragraph mark. After unwrapping, the text can just be Saved As HTML.

There are some applications that specifically assist with auto-converting text into HTML:

GutenMark <http://www.sandroid.com/GutenMark> was specifically written for the purpose, and knows enough about PG conventions to do a very good job.

InterParse <http://www.interparse.com> is a Windows-based generic text parser that is very easy and intuitive to use.

The World Wide Web Consortium lists some other options at <http://www.w3.org/Tools/Misc_filters.html>

If you're using a RocketBook or REB1100, you don't have either the choices or the confusion to deal with. One of our volunteers who uses a RocketBook offered this recipe for getting a PG text onto a RocketBook:

On converting to Rocket:

 1. Download text file.
 2. Using your utility for showing formatting, enter your word
       processing program's edit mode.
 3. Replace all double paragraph marks with some nonsense sequence
       that can't possibly actually be there, such as @@Z@@.
 4. Replace all single paragraph marks with one single space
       (enter).
 5. Replace your nonsense sequence with one paragraph mark.
 6. Convert all your double spaces to single spaces. Repeat this
       until you get "0" for how many replacements were made.
 7. Save in HTML.
 8. Go into your Rocket Librarian. Use "import file using Rocket
       Librarian." Go and pick up the file, which will be automatically
       converted to .rb in this process.

This sounds long, but it usually takes me under three minutes except for a very long text. I've never taken longer than five minutes. You can just go in and pick up the text file with Rocket Librarian, but what you get onscreen doing this looks very odd. Steps 2-7 are not essential, and if I'm in a hurry to read something once I might skip them, but if it's something I know I want to keep I use them.

This formula is not ideal for poetry or blank verse—if you want to keep the lines unwrapped, you should avoid removing the paragraph marks.

Another volunteer, who reads on Mobipocket <http://www.mobipocket.com> offered this suggestion:

I use the MobiPocket Publisher, available free from www.mobipocket.com. It wants to take a HTML file as input, so the first thing I have to do is convert my PG text to HTML.

I usually do this by running GutenMark, available at <http://www.sandroid.org/GutenMark>. I can also do it in Microsoft Word using the following sequence:

Edit / Replace / Special and choose Paragraph Mark twice (or, from replace, you can type in ^p^p to get two Paragraph Marks) and replace with @@@@. Replace All. This saves off real paragraph ends by marking them with a nonsense sequence.

Now Replace one Paragraph Mark (^p) with a space. Replace All. This removes the line-ends.

Finally, replace @@@@ with one Paragraph Mark. Replace All. This brings back the Paragraph Ends.

Now I can Save As HTML.

GutenMark does a better job of converting to HTML than my simple Word formula, since it recognizes standard PG features, and sometimes Mobipocket doesn't like the HTML produced from Word—it complains of a missing file, or doesn't recognize quotation marks.

I recently came across InterParse 4 at <http://www.interparse.com>. It doesn't have the built-in knowledge of GutenMark, so the results aren't as good, but it's really easy to use, and you can see the effect of your changes onscreen as you do it. For most PG books, all you have to do is just Open the text file and choose Options / Remove all CRLFs (Except at Paragraph End), then Convert / Text to HTML and Save As the HTML filename you want. Quick and painless.

About the Files:

R.34. What types of files are there, and how do I read them?

The vast majority of our files are plain text. You can read these with any editor or text viewer or browser. Some are HTML. You can read these with any browser.

For a full listing of other file types as of mid-2002, and how to read them, please see the Formats FAQ [F.2].

R.35. What do the filenames of the texts mean?

PG files are named for the text, the edition, and the format type.

As of February, 2002, all PG files are named in "8.3" format—that is, up to eight characters, a dot, and three more characters.

The first five characters in the filename are simply a unique name for that text, for example, "Ulysses" by Joyce begins with "ulyss".

If the text has been posted as both a 7-bit and 8-bit text, then the
first character of the filename will be a 7 or an 8, to indicate that.
For example, we have both 7crmp10 and 8crmp10 for Dostoevsky's
Crime and Punishment.

The 6th and 7th characters of the name are the edition number—01 through 99. We normally start at edition 10 (1.0); numbers lower than that indicate that we think the text needs some more work; numbers higher than that mean that someone has corrected the original edition 10.

The 8th character of the filename, if it exists, indicates either the version or the format of the file. When we get a different version of the text based on a different source, we give it an a, b, c, as for example if the text is from a different translation. Where we have posted a text in a different format, we also add an eighth character—"h" for HTML, "x" for

So, for example:

  7crmp10 is our first edition of Crime and Punishment in plain ASCII
  8sidd10 is our first edition of Siddhartha, as an 8-bit text
  dyssy10b is our first edition of our third translation of Homer's
           Odyssey, in plain ASCII
  jsbys11 is our second edition of Jo's Boys, in plain ASCII
  vbgle10h is our HTML format of our first edition of Darwin's
           Voyage of the Beagle
  7ldv110 is our 7-bit ASCII version of the first volume of the
           Notebooks of Leonardo da Vinci

To make it worse, we don't always stick to these rules, for example:

  1ddc810 is our first edition of the first book of Dante's
          Divina Commedia in Italian, as an 8-bit text
  80day10 is our first edition of Verne's Around the World in 80 days,
          in plain 7-bit ASCII in English.
  emma10 is our first edition of Jane Austen's "Emma"—with a
          4-character basename instead of 5.

Some series have special, non-standard names. Shakespeare is named with a digit representing the overall source (First Folio, etc), then "ws", then a series number, so for example 0ws2610, 1ws2610 and 2ws2610 are all versions of "Hamlet". The Tom Swift series is named with a two-digit prefix denoting the series number, then "tom", so for example 01tom10 is "Tom Swift and his Motor-Cycle".

And what should we do with a text from a different source that is formatted as HTML? For example, if dyssy10b is the name of the third translation, what should the HTML version be named? dyssy10bh is obvious, but it uses 9 characters.

The problem, of course, is that we are trying to fit a lot of information into an 8-character filename, and as the collection grows, and the number of formats and versions increases, we come across more pressure on filenames, so while the filename is a good guide to the contents, it's not definitive.

R.36. What is the difference within PG between an "edition" and a "version"?

We give the name "edition" to a corrected file made from an existing PG text. For example, if someone points out some typos in our file of "War and Peace", we will fix them, and, if enough are found to warrant a "new edition", then instead of just replacing the file wrnpc10.txt, we may make a new file wrnpc11.txt, and leave the original alone. A new edition is always filed under the same year and etext number as the original—it's just an update.

We give the name "version" to a completely independent e-text made from the same original book, but a different source. For example, Homer's Odyssey was translated by many different people, but they all worked from the same book. The translations by Lang, Butler, Pope and Chapman are very different, but they all come from the same root.

Thus, these are all "versions" of Homer's Odyssey. We give them all the same basename—dyssy—and each gets a new number, but we keep the original basename, and add a letter to the filename to indicate that they are "versions" of the same original book:

dyssy10.txt Butler's Translation dyssy10a.txt Butcher & Lang's Translation dyssy10b.txt Pope's Translation

The differences don't have to be as extreme as this for us to create a new version. "Clotelle"/"Clotel", for example, was a book published multiple times in English by William Wells Brown, and each time, he changed the text. We preserve three different texts of the same book as different versions: clotl10 clotl10a and clotl10b.

R.37. What is the difference between an "etext" and an "eBook"?

If there is any, it seems to be in the eye of the Marketing Department! Michael Hart started the whole thing, and coined the word "Etext". The term "eBook" is gaining in popularity, even for texts that are not full books, so we've started using that more now.

R.38. What are the "Etext/Ebook numbers" on the texts?

These are simply a series of numbers. We give one to each etext as it is posted, so the earliest etexts have low numbers and later etexts have higher numbers. Etext number 1 is the Declaration of Independence, the first text that Michael Hart typed in to the mainframe that he was using in 1971.

A few numbers are reserved for books that we hope to have in the PG archive someday; for example, 1984 is reserved for Orwell's classic.

When we improve an text by making some corrections, we call it a new EDITION, and it keeps the same etext number, but when we post a different VERSION of the same text, from a different paper book—like different translations of Homer's Odyssey—each new version gets a new etext number.

R.39. What do the month and year on the text mean?

The fact that we're so far ahead of schedule makes this quite confusing for newcomers. If it bothers you, just don't think about it! But at least it's better than being behind schedule. We didn't always produce so many books. In the September 1994 newsletter, Michael Hart wrote:

As always, I am terrified of the prospect of doubling our output to 16 Etexts per month for next year, we really need your help!!!

That was when the Project's target was 8 Etexts per month. Today, our target is heading towards 8 eBooks per day!

Copyright FAQ

C.1. What is copyright?

Copyright is a limited monopoly granted to the author of a work. It gives the author the exclusive right, among other things, to make copies of the work, hence the name.

C.2. Does copyright differ from country to country? From state to state?

Copyright laws are constantly changing all over the world. Each country has its own copyright laws, some within the framework of international treaties, some not. Within the U.S., copyright laws are federal, and do not vary from state to state.

C.3. What are the copyright laws outside the U.S.?

Sorry, we can't advise on copyright law outside the U.S. We can point you to resources like <http://onlinebooks.library.upenn.edu/okbooks.html> which tries to summarize the various copyright regimes, but we can't guarantee that these are accurate. Even when they are accurate, it is very hard to express some of the subtleties of copyright law in a summary—for example, the question of what constitutes "publication" for copyright purposes is sometimes unclear.

C.5. I don't live in the U.S. Do these rules apply to me?

Your country's copyright laws are different from those in the U.S., and understanding and dealing with them is up to you. If you have a book that is in the public domain in your country, but not in the U.S., it is perfectly legal for you to publish it personally there, but we can't.

Similarly, it may be legal for us to publish it here, but not for you to publish it, or perhaps even copy it, where you are.

C.6. What is the public domain?

The public domain is the set of cultural works that are free of copyright, and belong to everyone equally.

C.7. What can I do with a text that is in the public domain?

Anything you want! You can copy it, publish it, change its format, distribute it for free or for money. You can translate it to other languages (and claim a copyright on your translation), write a play based on it (if it's a novel), or a novelization (if it's a play). You can take one of the characters from the novel and write a comic strip about him or her, or write a screenplay and sell that to make a movie.

You don't need to ask permission from anyone to do any of this. When a text is in the public domain, it belongs as much to you as to anyone.

(However, when some character or part of the work is also trademarked, as in the case of Tarzan, it may not be possible to release new works with that trademark, since trademark does not expire in the same way as copyright. If you propose to base new works on public domain material, you should investigate possible trademark issues first.)

C.8. How does a book enter the public domain?

A book, or other copyrightable work, enters the public domain when its copyright lapses or when the copyright owner releases it to the public domain.

U.S. Government documents can never be copyrighted in the first place; they are "born" into the public domain.

There are certain other exceptional cases: for example, if a substantial number of copies were printed and distributed in the U.S. before March, 1989 without a copyright notice, and the work is of entirely American authorship, or was first published in the United States, the work is in the public domain in the U.S.

C.9. How does a copyright lapse?

Copyrights are issued for limited periods. When that period is up, the book enters the public domain.

Copyrights can lapse in other ways. Some books published without a copyright notice, for example, have fallen into the public domain.

C.10. What books are in the public domain?

Any book published anywhere before 1923 is in the public domain in the U.S. This is the rule we use most.

U.S. Government publications are in the public domain. This is the rule under which we have published, for example, presidential inauguration speeches.

Books can be released into the public domain by the owners of their copyrights.

Some books published without a copyright notice in the U.S. prior to
March 1st, 1989 are in the public domain.

Some books published before 1964, and whose copyright was not renewed, are in the public domain.

If you want to rely on anything except the 1923 rule, things can get complicated, and the rules do change with time. Please refer to our Public Domain and Copyright How-To at </vol/pd.html> for more detailed information.

C.11. My book says that it's "Copyright 1894". Is it in the public domain?

Yes.

Its copyright date is 1894, which is before 1923, so its copyright has lapsed.

C.12. How can a copyright owner release a work into the public domain?

A simple written statement, which may be placed into the work as released, is sufficient. When a copyright holder places a book into the public domain and wants PG to publish it, all we need is a letter [V.70] saying that they are or were the holder of the copyright, and that they have released it into the public domain.

C.13. When is an author not the owner of a copyright on his or her works?

An author may sell, assign, license, bequeath or otherwise transfer his or her copyright to another party, such as a publisher or heir.

A book is eligible for inclusion in the archives if we can legally publish it.

We can legally publish any material that is in the public domain in the U.S. [C.10], or for which we have the permission of the copyright holder.

C.15. I have a manuscript from 1900. Is it eligible?

Maybe not.

Works that were created but not "published" before 1978 will not enter the public domain before the end of 2002. This gets complicated, and it's not too common. If you have such a case, ask about it.

A borderline example is the classic "Seven Pillars of Wisdom" by T. E. Lawrence, which was actually printed and privately distributed, but not "published", in 1922. We haven't been able to confirm any pre-1923 "publication" for this.

C.16. How come my paper book of Shakespeare says it's "Copyright 1988"?

Shakespeare was published long enough ago to be indisputably in the public domain everywhere, so how can a Shakespeare text be copyrighted?

There are two possibilities:

1. The author or publisher has changed or edited the text enough to qualify as a "new edition", which gets a "new copyright".

2. The publisher has added extra material, such as an introduction, critical essays, footnotes, or an index. This extra material is new, and the publisher owns the copyright on it.

The problem with these practices is that a publisher, having added this copyrighted material, or edited the text even in a minor way, may simply put a copyright notice on the whole book, even though the main part of it—the text itself—is in the public domain! And as time goes on, the number of original surviving books that can be proved to be in the public domain grows smaller and smaller; and meanwhile publishers are cranking out more and more editions that have copyright notices. Eventually it becomes harder and harder to prove that a particular book is in the public domain, since there are few pre-1923 copies available as evidence.

Among the most important things PG does is preventing this creeping perpetuation of copyright by proving, once and for all, that a particular edition of a particular book is in the public domain, so that it can never be locked up again as the private property of some publisher. We do this by filing a copy of the TP&V, the title page where the copyright notice must be placed, so that if anyone ever challenges the work's public domain status, we can point to a proven public domain copy.

C.17. What makes a "new copyright"?

1. New edition

When a text is in the public domain, anyone—from you to the world's biggest publisher—can edit it and republish the edited version. When the edits are substantial enough, the edited work is deemed a "new edition", and gets a new copyright, dating from the time the new edition was created.

How substantial must the edits be to qualify as a "new edition"? That is for a court to decide in any particular case. Changing some punctuation or Americanizing British spelling would not qualify a work for a new edition. Theorizing something about Shakespeare and rewriting lots of lines in "Hamlet" to emphasize your point would make a new edition. In between those extremes is a grey area, where each new edition would have to be considered on a case-by-case basis.

A special case, that isn't quite a new edition, is when someone "marks up" a public domain text in, for example, HTML. Where this happens, the text is in the public domain, but the markup is copyrighted. We've already seen that when an editor adds footnotes to a public domain text, he owns copyright on the footnotes but not on the text: similarly, when he adds markup to the text, he owns copyright on the markup.

2. Translation

Translation is a common and justified special case of a new edition. When someone translates a public domain work from one language to another, they get a new copyright on the translation (but not on the original, of course, which stays in the public domain so that lots more people can use it.)

C.18. I have a 1990 book that I know was originally written in 1840, but the publisher is claiming a new copyright. What should I do?

From a practical point of view, there's not much you can do about it. It's a Catch-22 situation: in order to prove that the new printing should be in the public domain, you need a provably public domain copy to compare against the allegedly copyrighted edition, and if you have that, you don't need the modern edition anyway.

C.19. I have a 1990 reprint of an 1831 original. Is it eligible?

Yes, as long as we can show that it is a reprint, which usually means that it has to say that it's a reprint somewhere on the TP&V.

However, we need to be very careful in a case like this. Commonly, the book itself is eligible, but introductions, indexes, footnotes, glossaries, commentaries and other such extras may have been added by the modern publisher, so you should not include them except where you can prove that they are part of the reprinted material.

C.20. I have a text that I know was based on a pre-1923 book, but I don't have the title page. Can I submit it to PG?

Unfortunately, no.

What you "know" isn't proof that we could take into court if we were challenged about it in 20 years, and the whole problem of "new copyright" [C.17] makes it effectively impossible to tell for sure what is and isn't copyrighted anyway, without reliable evidence like the title page.

You need to find a matching paper edition for proof. See the FAQ "I've found an eligible text elsewhere on the Net, but it's not in the PG archives. Can I just submit it to PG?" [V.62]

Usually, we just look at the TP&V. If it was published before 1923, or says it is a reprint of a pre-1923 edition, that's all we have to do.

In other cases, we may look up library publication data to prove, say, that a book published in the U.S. without a copyright notice was indeed published in the years when a copyright notice was required. Or we may simply see that a particular text was published by the U.S. Government.

The bottom line is the question: if someone comes to us claiming to hold the copyright on a text, do we have proof to show that they're wrong?

Whatever proof or search we have to do, we then file it, either on paper or electronically, so that the proof will be available in 20 or 50 years' time, or whenever the challenge is made.

C.22. I want to produce a particular book. Will it be copyright cleared?

If it was published before 1923, you will have no problem with its clearance. If you're relying on one of the other rules, it may just be too much work to try and prove its public domain status.

C.23. I have some extra material (images, introduction, preface, missing
      chapter) that should go into an existing PG text. Do I have to
      copyright-clear my edition before submitting it?

Yes.

Otherwise we would have no proof that the extra material you're adding isn't copyrighted by someone. It's quite common for modern publishers to add introductions or illustrations to a public-domain novel, and we need the same standard of proof for these additions that we do for the main text.

This doesn't apply to an occasional word or two that was omitted by mistake when the text was first typed. For example, you don't need to clear another edition just to restore the words "thus perfected the" and "eliminating all" to the sentence:

And while we Country, we were also sorts of tediums, disputable possibilities, and deadlocks from the game.

while fixing typos.

These copyrighted PG publications can still be copied, but the permissions granted are spelled out in their headers, and usually forbid anyone to republish them commercially.

C.25. What are "non-renewed" books?

Works published before 1964 needed to have their copyrights renewed in their 28th year, or they'd enter into the public domain. Some books originally published outside of the US by non-Americans are exempt from this requirement, under GATT. Some works from before 1964 were automatically renewed.

As of mid-2002, you probably can't. Because of all of the checks we need to do to ensure that the book wasn't renewed, or wasn't one of the exceptions that was automatically renewed, we just don't have the time to do it. But we're working on it. Right now, we're processing copyright renewal records with the aim of making them searchable.

Volunteers' FAQ

About the Basics:

What you actually need to do to produce a PG text can be stated very simply:

1. Borrow or buy an eligible book. 2. Send us a copy of the front and back of the title page. 3. Turn the book into electronic text. 4. Send it to us.

That's it! All the rest of the producing parts of the FAQ are about the details of how different people approach these steps.

Different people find their own ways into PG work, and once in, find their own niches. If you have your own ideas, don't let anything here stop you from pursuing them.

Some people just read the FAQs, go up to their attic, pull an eligible book off the shelf, send TP&V [V.25] in, and start typing or scanning. Next time we hear from them is when they send in [V.46] the completed eBook for posting. It can be as simple as that.

Some people just download existing PG texts, re-proof them very carefully and send in corrections.

Some people find regular collaborators through gutvol-d or the Volunteers' Board or the distributed proofing sites, earn a reputation as reliable proofers, and continue working as proofers.

Most people start small, and after a little experience of distributed proofreading or other proofing, begin their PG career as producers.

If you're a typist, cheer now, because you can ignore all the complicated paraphernalia of computer interfaces, and scanners, and the quality of OCR software and the mistakes it makes. You can just sit down at the keyboard with your eligible [V.18] book.

If you're not a typist, start thinking about scanners. It may be a while before you're ready to start scanning for yourself, but it's never too early to find out about them.

Orientation.

Absolutely everyone—scanners, typists, proofers—should first spend some time working on a distributed or co-operative proofing project. This will allow you to get a feel for what happens in making an etext from paper pages without committing you to more than a few hours' work.

This is not in any way an institutional requirement, since we don't have any institutional requirements, but it is very good advice. Many volunteers start eagerly, wanting to do lots of PG work, and then drop out because they took on too much, too fast, without understanding the nature of the work. Don't let that happen to you. Take it in small chunks.

Check out these distributed proofing sites:

Charles Franks: <https://www.pgdp.net/>
JC Byers: <http://www.wollamshram.ca/1001/index.htm>
Dewayne Cushman: <http://www.metalbox.net/dcushman/pgroot.htm>

and spend a few hours over a couple of weeks just processing some pages for real.

While you're doing that, you should also join a couple of PG mailing lists [V.12]—gutvol-d and either the weekly or monthly Newsletter list. Reading these will start to get you connected to what's going on. Browse the Volunteers' Board—there may be some offers going, and there's a lot of experience captured in some of those "back-issues", so don't confine yourself to the front page.

Have a look at our In-Progress List and some lists of suggestions from others [B.4].

Look at sites like Blackmask <http://www.blackmask.com> and Pluckerbooks <http://www.pluckerbooks.com/> and Memoware <http://www.memoware.com> and Bookshare <http://www.bookshare.org> to learn how our work is being used as a basis and copied and converted and amplified in many other projects.

 The Gift of the Magi, by O. Henry.
 The Lady, or the Tiger?, by Frank R. Stockton
 A Christmas Carol, by Charles Dickens
 Alice in Wonderland, Lewis Carroll
 Anne of Green Gables, by Lucy Maud Montgomery
 The Marvelous Land of Oz, by L. Frank Baum
 A Princess of Mars, by Edgar Rice Burroughs
 Heidi, by Johanna Spyri
 A Connecticut Yankee in King Arthur's Court, by Mark Twain
 Black Beauty, by Anna Sewell
 Tarzan of the Apes, by Edgar Rice Burroughs
 Tom Swift and his Motor-Cycle, by Victor Appleton
 Rebecca Of Sunnybrook Farm, by Kate Douglas Wiggin
 Little Lord Fauntleroy, by Frances Hodgson Burnett
 Aesop's Fables
 Grimms' Fairy Tales
 The Art of War, by Sun Tzu
 Dracula, by Bram Stoker
 Swiss Family Robinson, by Johann David Wyss
 The War of the Worlds, by H.G. Wells

If you have a taste for detectives and mysteries, there's

 The Adventures of Sherlock Holmes, by Arthur Conan Doyle
 Monsieur Lecoq, by Emile Gaboriau
 The Mysterious Affair at Styles, by Agatha Christie
 Arsene Lupin, by Edgar Jepson & Maurice Leblanc
 Edgar Allen Poe's "The Gold-Bug" and
 "The Murders in the Rue Morgue" in The Works of Edgar Allan Poe V. 1

For the excessive buckling of various swashes, see:

 The Prisoner of Zenda, by Anthony Hope
 The Man in the Iron Mask, by Dumas, Pere
 The Three Musketeers, by Alexandre Dumas
 Treasure Island, by Robert Louis Stevenson
 The Scarlet Pimpernel, by Baroness Orczy

Effen youse got a hankerin' for a Western, there's:

 Riders of the Purple Sage, by Zane Grey
 The Virginian, Horseman Of The Plains, by Owen Wister
 Back to God's Country, By James Oliver Curwood
 Selected Stories by Bret Harte
 Jean of the Lazy A, by B. M. Bower

Or if you prefer your fiction more domesticated, there's:

 Little Women, by Louisa May Alcott
 Pride and Prejudice, by Jane Austen
 The Warden, by Anthony Trollope
 The Heir of Redclyffe, by Charlotte M Yonge
 Mother, by Kathleen Norris

For something to raise a smile, you can rely on:

 The Devil's Dictionary, by Ambrose Bierce
 The Wallet of Kai Lung, by Ernest Bramah
 The Importance of Being Earnest, by Oscar Wilde
 Three Men in a Boat, by Jerome K. Jerome
 Piccadilly Jim, by P. G. Wodehouse

If poetry is your thing, you have lots to choose from:

Now, that's just a handful from our over 5,000 eBooks, so don't tell me you can't find anything to read! If you do have ideas of your own, download GUTINDEX.ALL or PGWHOLE.TXT and browse through the whole list, or Browse by Author on the website at <http://promo.net/cgi-promo/pg/cat.cgi>.

Download a few. Read them on your PC, or reformat them and print them out, or convert them for your PDA. Get used to working with and formatting text. Look at the formatting decisions that earlier volunteers have made—they're not entirely consistent; different people make different choices, different books require different methods, and PG conventions have shifted slightly over the last 10 years—but they're all perfectly readable and convertible today.

If you find typos [R.26] in any of them, tell us! That's also a part of being a Gutenberg volunteer. Our eBooks improve with time!

If you're thinking of making the best use of your time looking for errors in posted texts, a good start would be to download 40 or 50 texts, and run a spelling checker and gutcheck [P.1] on them all, spending only 5 or 10 minutes on each. Having had a quick look at all of them, concentrate on the ones that seem to have most problems—where automated checkers see 10 problems, a careful human will usually be able to pick up 20.

Getting Productive

OK, so you've seen what etexts should look like, you know what we do, and proofing hasn't scared you off. It's time to step up and become a producer. If you're not a typist and you don't have a scanner, take a detour down to the Scanning FAQ [S.1] now, and come back when your scanner is set up. If you're a typist or you've already got a scanner, read on . . .

Get a book. Just do it, OK?

Ya gotta start somewhere, right? And finding an eligible book is definitely somewhere.

Finding an eligible book is a threshold for many beginning volunteers—it's the first major step on the way to producing. For a lot of people, it's also the toughest barrier they have to cross. Fortunately, the barrier is only psychological, and can be crossed in a few minutes.

It's an unfamiliar process, and one that a lot of beginners feel some anxiety about. Don't. It's quite straightforward: it's just buying a book—you've done that, haven't you? Don't over-think it, don't worry about whether you're making the "right" choice, don't spend months comparing lists and choosing. Just do it. Once you've got your first, you'll wonder what all the fuss was about. Thanks to the wonders of the internet, your book can be on its way to you in an hour if you have $20 to spend.

Typists blessed with a good local library don't even have to buy their books—they can just borrow one and type it up! (You may be able to scan a library book, but get some experience with scanning first, and avoid damage!)

Let's deal with the decisions and other issues of picking one.

Copyright

For your first book, don't try getting fancy with copyright issues. Choose one that was published before 1923, and you're in the clear for U.S. and PG copyright purposes. You can read the dates just as well as we can—with books printed before 1923, there are no hidden catches: "Pre-'23 is free". Just read the TP&V [V.25] of the book, and see that it was printed before 1923, and you have no problems. Of course, reprints [V.19] of books copyrighted pre-1923 (and various other cases) are also clear, but if you have any concerns, just stick to pre-'23 editions.

Which book?

The answer to this question is different for everyone, but see how much you agree with the following statements:

"I have a favorite book, and I'd really like to produce that."

Well, hey, this is no problem! You already know what you want.
Go check out whether the book is already on-line [V.29].

"I'd like to work on an important book, but I don't know which."

Well, everybody's definition of "important" is different, but some people have put their various ideas forward already; you can see whether you agree with them! The InProg List contains some, with the notation "Suggested book to transcribe" beside them. Steve Harris keeps a list of unproduced possibles at Steveharris.net. John Mark Ockerbloom's "Books Requested" page lists titles that people have asked for. [B.4] Your problem if you fall into this category is that other people probably wanted to produce "important" books too, and lots are already done.

"I just want an easy, trouble-free book to start with."

Your first book doesn't have to be War and Peace (we've already got that anyway!). Here's a tip: try looking for children's or what we would nowadays call "Young Adult" books. These are typically short, and may have large print, which makes life much easier if you're scanning. They age well: children's stories from a century or more ago are still readable and interesting to children today. We have many children's and YA eBooks: not just the classics like Grimm and Andersen and Heidi and Oz and Peter Pan and William Tell, but lesser-known but still enchanting stories like The Counterpane Fairy, or Lang's Fairy books. There are series, like the Motor Girls, or the (Country) Twins series, or the Bobbsey Twins. There is lots and lots of material here for you to start with, and these books are relatively plentiful, since they were made to take the kind of treatment children dish out, and many of them have been in school libraries or attics for years.

Whatever your choice, pick a book that you'll like; you'll be living with it up close and personal for a while. Light reading, adventure fiction, and books aimed at younger readers are safe first choices for most people. If you admire 19th Century scientists or scholars, and want to immortalize their work, great! But don't feel that you have to dive in at the deep end just because someone else wants you to.

Getting your book: a practical exercise

The Search

At this point, you've got a list of books—maybe just one, maybe several by an author or two, maybe just a genre like "Children's Books" with some specific ideas. Maybe your mind is still wide-open.

Before used booksellers had the Net, finding a particular old book was a daunting job. Booksellers had informal networks among themselves and exchanged catalogs so that each would know something about what was available elsewhere, but, for a buyer, finding a particular book was still hit-and-miss. Now, however, a number of large sites provide a service to booksellers, where they can list their inventories for people to search from anywhere.

So now we go hunt for them on the Net. No, you don't have to buy them on the Net—you can rummage in booksales and garage sales and used bookstores, and that's its own kind of fun, though on a physical hunt, what you need is to bring a long list of "already done" books with you. But even if you never buy over the Net, it's a vast source of information about what books are available, which are plentiful, and which are cheap. It gives you some experience of what to expect when you do your in-person browsing.

Here's a story of a typical Net-hunt. And you can follow along with it at home. :-) Your results, and the sites you end up at, will be different from mine, but even if you don't end up buying a book on this hunt, you'll get some experience of what's involved. C'mon, do it with me—see if you can find a better bargain!

I'm starting with two lists, and I'll follow up whatever seems promising. I'd like to spend about $20—might go to $30. Definitely not interested in $50 and up. I'm keeping in mind that I'll have to add a bit for delivery—usually up to $10 within the U.S., but can get expensive if you're in Perth, and ordering from a bookstore in Munich.

I'm also avoiding anything that might be tricky to clear on this search, and confining myself to books printed before 1923.

Of course, by the time you read this, some of these books may already have been produced, so if you're actually thinking of buying any, check carefully first!

My first shortlist consists of books that caught my eye from David
Price's In-Progress List, Steve Harris's site, and The On-Line Books
Requested page [B.4], and it reads:

    Louisa May Alcott: The Inheritance
    E. W. Hornung: Irralie's Bushranger
    E. W. Hornung: Stingaree
    A. A. Milne: The Dover Road
    A. A. Milne: Once on a Time
    Samuel Richardson: Pamela
    Oscar Wilde: The Critic as Artist

As well as following along with my list, you should try finding two or three books of your own, from those sites or from your own preferences, and search for them in the same ways that I do.

Everyone has their own searching technique and their own favorite sites to search. For this session, I'm opening up three copies of my browser—one for Alibris <http://www.alibris.com>, one for Abebooks <http://www.abebooks.com>, and one for the Catalog of the Library of Congress <http://catalog.loc.gov>. I'll do my initial searches on Alibris and Abebooks, and keep the LoC site handy for reference.

In Alibris, I head straight for the Advanced Search page, since they allow searching by date, and I immediately put "before 1923" into every search, which avoids having to scan through modern reprints. In Abebooks, I choose "Hardcover" in their advanced search, which is not quite as good a filter, but does at least screen out recent paperback editions.

In each of the sites, I just enter the author's surname and one word from the title of each book, and look at the search results.

Louisa May Alcott's "Inheritance" looks like it's going to be tough. I don't find it in either of my two bookstores. On doing a little checking with modern bookstores, I find it was her first novel, written when she was 17, and as far as I can see, not published during her life: apparently only recently published—the LoC site has nothing prior to 1997. A disappointing start to my search. I understand why it's very desirable to get it online, but this one's going to be very tough to clear, and I'm staying away from it.

E. W. Horning's "Irralee's Bushranger" is also elusive: it doesn't show up at either of my sites, so I check out the LoC to confirm I have the title right, and yes, there it is: "Irralee's Bushranger, a story of Australian adventure, 1896." So I widen my search by visiting <http://www.trussel.com/f_books.htm> and searching many of the sites there. Still no luck. If I were particularly eager to get this book, there are several things I might do at this point: I might register a "want" with one of the sites, asking to be notified when a copy is listed, I might use the OCLC WorldCat search (which Abebooks calls "Find it at a local library") where I can locate libraries that have copies, or I might even contact some individual booksellers and make a request that they look for it. Some booksellers actually specialize in looking for hard-to-find books; but of course I expect I'd have to pay a bit more for it when they do find it, and given my success with the rest of my list, and my price bracket, there seems no need to go that far today.

Horning's "Stingaree", by contrast, seems to be everywhere, in several editions, and cheap. It must have been a bestseller in its day—not surprising, from the author of "Raffles". 1902, 1905, 1909 editions abound. The cheapest are 1910 and 1907 editions for $4.95 and $5.00 from booksellers listed at Abebooks.

Milne's "Dover Road" is available from both sites. There seems to have been a Putnam's printing in 1922 of "Three Plays: The Dover Road. The Truth About Blayds. The Great Broxopp." of which lots of copies survive. There also seem to be later printings which would qualify as reprints if I were desperate, but the 1922 edition is priced from $12.00 to $50.00, so I'll take the 1922 $12.00 copy from Abebooks. As a bonus, I don't see the other two plays listed as being online anywhere, so I'll get three texts (and short ones, too!—279 pages for all three) for the price and effort of one.

Milne's "Once on a Time" is a bit less common, but once again a Putnam's printing of 1922 keeps it in the race. There are a couple of booksellers in England selling for 15 pounds (which just about makes my $20 threshold) and 20 pounds, and an ex-library copy going for $25.

There are lots of eligible copies of "Pamela" available, ranging from a fourth edition at a mere $4,999 (no, thanks!) to a 1921 printing at $6.60 at Alibris. I'll take that one, please.

Wilde's "Critic as Artist" is fairly widely available. A 1905 edition of "Intentions: the Decay of Lying; Pen Pencil and Poison; the Critic as Artist; the Truth of Masks" is available at Alibris for $8.80, (and other copies of the same edition there and on Abebooks in the $20-$30 range) and Abebooks lists a London 1919 edition at $12.50. There are several copies listed in both places as "undated" and "reprints"—I'm avoiding these, since while it's quite likely that they might be clearable, I'm not taking risks on this search.

My second list isn't a list—just a vague category: children's books that are easy to do.

I go to Alibris' Advanced Search, and enter "Child's" in the title, and pre-1923 in the date, and, excluding titles already on-line, immediately get:

 A Child's History of France $13.20
 A Child's Story of the Bible $5.50
 First Lessons in Botany or The Child's Book of Flowers $13.20
 The Child's Book of American Biography $11.00
 The Child's First Bible $8.80
 The Child's Music World $8.80

and so on through quite a list.

OK. That's a good start. But my choice so far is unimaginative. I need better search terms. So I go to main search engines with the terms "children's antiquarian books" and find a half-dozen or so sites that specialize in them. I can browse around there, though it's slower going without searches to focus my results. I find <http://www.bookrescue.com>, specializing in children's books. Wading through the miles and miles of Alcotts and Barries and Burnetts, which are mostly already online, I think, I find a couple of authors from them who must have been popular, because they seem to have published lots of books before 1923: Angela Brazil and Dorothy Canfield. (I only got as far as the "C"s!)

I could of course stop here and buy some, but today I want to see what else is out there.

Back at Alibris and Abebooks, armed with my authors to search by, I turn up 4 pre-1923 books under $20 for Angela Brazil:

 A Terrible Tomboy
 The Youngest Girl in the Fifth
 A Fourth Form Friendship
 A Pair of Schoolgirls

and several between $20 and $30.

Dorothy Canfield immediately yields multiple copies of:

 The Brimming Cup
 Home Fires in France
 Hillsboro People
 Understood Betsy
 Rough Hewn
 The Real Motive

and others, and I haven't even got to $20 yet, nor to the letter "D".

A browse through the Ebay Collectible and Antiquarian Books section also throws up a respectable list of eligibles. I won't even bother counting that.

In 20 minutes, I have found five of the seven on my search list. In less than hour after that, I found over 16 eligible children's books, all under or around $20 and all available online.

Before committing to one, though, I would double-check that the book hasn't been transcribed online, and isn't In Progress.

Double-checking your selection

If you're concerned that the book you have chosen duplicates another that might be in progress, and want to double-check, you can e-mail the Posting Team asking them to check whether any recent clearances have come in for that title.

Duplications do happen—there's no way of avoiding them when different people are making independent decisions—but they are rare.

Dealing with used booksellers

As a class, used booksellers are very pleasant people—remarkably friendly, knowledgeable and helpful, even to people buying on a typical Gutenberger's budget.

Some of them are not, however, models of ideal data organization when it comes to Internet listings. There are lots of one- or two-person operations dealing with an inventory of many thousands of books, and having located your book online, you should check that it's still available.

You can place an order through the site and wait for the confirmation, or you can simply call the bookseller. Not all booksellers' contact details are listed, so it's not always an option, but when you do phone you're likely to be speaking immediately to someone who can tell you for sure whether the book is still there, can pull the book off the shelf and answer questions about it, and can take your credit card details on the spot and dispatch the book immediately.

Copyright Clearance

As soon as your book arrives, send us the information needed for Copyright Clearance first. Even if your book is a true-blue, no-questions-asked pre-1923 edition, we should know about it as soon as possible so that it can go onto the In-Progress list for others to see that someone has started on it.

Wait for the confirmation e-mail before starting any serious work. Some people have thought that "Copyright 1923" plus some wishful thinking would be good enough, and, unfortunately, it isn't. Some people have gone ahead and produced the whole book before sending in the clearance, only to be disappointed, all their work wasted.

Books published in 1922 or earlier are clearable, but some people, ever optimists, overlook that little "1927" in small print on the verso. Sometimes there is no copyright date on the front, and other optimists assume that these books are OK. They may be; they may not be. Don't get caught in the copyright trap.

As soon as you have what you think might be an eligible book, do not start on it. Do not ask another volunteer's opinion. Just send in the TP&V and wait for the confirmation e-mail to find out for sure.

Even when your TP&V clearly says "Copyright 1901", send it in. We need to get it into the clearance files so that we can register it as being In-Progress.

Producing

If you're a typist, there's not much more you need to know from this point: you can just get on with the job, with maybe a few tips from the FAQ. In fact, if you're a typist, you might wonder why the rest of us make such a fuss about scanners, and settings, and OCR. Take pity on us! we just can't produce the way you can. Smile indulgently, ignore all the scanner jargon, and submit your completed text while we're still saying bad words about the guttering on a greyscale image of page 372. :-)

If you are using a scanner to copy a book for the first time, be patient with yourself. Some people start off with too high expectations of what they can achieve. Believe it or not, scanning does work effectively; it just doesn't work perfectly. And often, you need a little practice before your scans work right with your OCR. The Scanning FAQ [S.1] has lots of specific tips you can try. Start by scanning a double-page about a third of the way through the book. Scan in Black and White and in Greyscale, at 300dpi and 400dpi. Try 600 dpi if it seems like a good idea. Put it through your OCR and see what comes out. Move your scanner so that you can be comfortable while placing the book and turning pages. Allow yourself an hour to experiment with different settings, and different pages. Put the sample images included with the Scanning FAQ through your OCR and see how the output compares to the text produced by other packages. That first hour finding out about how your setup works will be the most valuable hour of scanning you will ever do.

Having figured out what settings you want to use for this book, make sure you implement the best speed you can. Usually this means telling the scanner to scan only as much area as the book covers. This is quite important, since the scanner will by default scan its whole area, and you don't need all that; it just wastes time and makes your images bigger.

You may also be able to set your OCR or scanner software to auto-scan pages with some preset delay, like 5 seconds. This also speeds things up, because the scanner isn't waiting for you to hit the keyboard, and you have both hands free at all times to turn the page and replace the book. It takes a few pages to get into the rhythm; if you miss a page-turn, don't worry—you can get it on the next scan.

Using a reasonably modern but quite ordinary home/office type flatbed scanner, you should be able to scan 200 pages an hour [S.9] of a typical book, at good quality. 400 pages an hour is not unheard-of. Now, it may fairly be said that scanning offers all the fun of ironing, without the sense of adventure :-), but if you have got your settings right, you will probably be able to do the whole job in less than two hours. And now you're really on the road!

V.2. What experience do I need to produce or proof a text?

None.

For producing, you will have to be able to type pretty well, or have a scanner.

For proofing someone else's text, when you don't have a copy of the book in front of you, you should be reasonably familiar with the language used in the book, and the styles of the time—Chaucer's English was quite different from ours, and even 19th Century novelists write some phrases unfamiliar to us today.

That's it. You don't need experience in publishing, editing, or computers.

V.3. How do I produce a text?

There are acres of words in this FAQ about that, but it all boils down to 4 simple steps:

 1. Get an eligible book—pre-1923, or one of the exceptions. Pull
    it from your attic, borrow it from a library or a friend, buy it
    in your local bookstore, in a flea-market or on-line. We don't
    care which.
 2. Send us a copy or the front and back of the title page so we
    can file proof of copyright clearance.
 3. Copy the text from the book into a computer text file. We don't
    care whether you type it, scan it, voice-dictate it, or think of
    some totally new way to do it. Just get it into a file.
 4. Send us the computer text file.

That's all there is to it!

V.4. Do I need any special equipment?

You need the use of a computer of some kind, and Internet access is usual, though we have had some volunteers contribute texts on floppy disks.

If you intend to scan books, you will need a scanner, but if you're just typing or proofing you won't.

V.5. Do I need to be able to program?

V.6. I am a programmer, and I would like to help by programming.
     What can I do?

A lot of programmers work on PG books, and anything easy has probably already been done. The challenge for programmers who want to write something that will help to produce etexts is not in writing the code; it's in identifying ways that programs can help.

Please see the FAQ "What programs could I write to help with PG work?" [P.2] for some ideas in this direction. Whatever you do, don't just hang around waiting for someone to ask you to write something, because that's not going to happen. Think up a project, ask volunteers if they would use it, and dig in! Better still, produce a few etexts yourself, using the existing tools, and get a feel for the kinds of problems that new software could help with.

Apart from text production, we do develop some programs to help with posting work, but as of mid-2002, we have nothing like an ongoing programming project which people can join.

V.7. What does a Gutenberg volunteer actually do?

We buy or borrow eligible books, scan, type, and proofread. There are a few other activities, but they consume only a very small fraction of volunteer time.

V.8. Can I produce a book in my own language?

Yes! We want to encourage people to produce books in all languages, and we cheer when we can add a new language to the list.

V.9. Does it have to be a book? Can I produce pieces from a magazine
     or other periodical?

Magazines, newspapers, and other publications are just fine. For copyright clearance, they work just the same way as a book.

You do need to check the length of your piece [V.17]; we don't want a zillion separate one- or two-page files. If the piece you have in mind isn't long enough, you can add other pieces to it, or even most or all of the magazine. If the work was serialized over multiple issues, you can join them together for your PG text, but you do have to copyright clear every issue of the magazine from which you copy material.

If you have lots of old periodicals, you could even take one piece from several, and make a new text which is a "theme" anthology of those pieces. You can give it an appropriate title: "Civil War Commentaries from X magazine 1892-1898."

V.10. Do I have to produce in plain ASCII text?

Certainly not if it doesn't make sense. To take an extreme example, if you're working in Japanese or Arabic, or creating audio files, there is no point in trying to reproduce that in ASCII!

Where the text can largely be expressed in ASCII, we do want to post an ASCII version, even if it is somewhat degraded compared to the original. However, we will post your file in as many open formats as you want to create, so that your original work is available for those who have the software to read it.

V.11. Where do I sign up as a volunteer?

You don't. We have no formal sign-up process, no list of volunteers, no roll-call. If you produce a PG eBook, or help to produce one, you are a volunteer.

V.12. How do PG volunteers communicate, keep in touch, or co-ordinate work?

We are very scattered geographically: U.S., Australia, Brazil, Taiwan, Germany, South Africa, Italy, India, England, and all over the world, so we can't really meet for coffee on Thursdays. :-)

Most co-operation and co-ordination goes on by private e-mail. This is efficient for volunteers who have worked with each other before, since they know each other's interests and skills, but not so easy for beginners to break in on, since they don't.

The Volunteers' Web Board at <http://promo.net/pg/vol/wwwboard/> is a publicly accessible forum for volunteers or potential volunteers to post any question or information about how to create a PG eBook.

The Volunteers' Discussion Mailing list, gutvol-d, is a an e-mail discussion forum for subscribers about any Gutenberg topic.

The Volunteers' List, gutvol-l, is for private announcements for active volunteers.

The Programmers' List, gutvol-p, is for discussion of programming topics.

There are some other, specialized, closed lists for people who do specific work within PG:

The "Posted" List, posted, is for people who perform indexing on our texts. An e-mail is sent to this list every time we post a text (see the FAQ "How does a text get produced?" [V.16] section 5: Notification) and the members of the list use it to update their catalogs.

The Whitewashers' List, pgww, is for Posting Team internal messages.

The Heroic Helpers List, hhelpers, is for people who can devote some fairly regular time to doing odd jobs.

V.13. Where can I find a list of books that need proofing?

There is no central list of this kind. There are distributed proofing projects, currently at

Charles Franks: <https://www.pgdp.net/>
JC Byers: <http://www.wollamshram.ca/1001/index.htm>
Dewayne Cushman: <http://www.metalbox.net/dcushman/pgroot.htm>

where you can proof parts of a book. This is advisable when you're just starting out because it gives you some feel for what the work is like.

You can also look up existing, posted texts from the archives and proof them. Just as there always seems to be one more bug in any given program, there always seems to be one more typo in any given text! Download a few, and scan quickly for problems by doing a spellcheck or other automated check; if you can find any problems quickly, then there are likely others to be discovered by a careful proofing.

Of course, individual volunteers and non-volunteers have their preferences, and may suggest books to transcribe, and such suggested lists pop up every so often, and are often useful to people looking for ideas.

There are usually some suggestions in David Price's InProgress list. The On-Line Books Page has a section where people can list requests, and Steve Harris has a site devoted to lists of books not yet in Gutenberg or elsewhere. Treat all of these lists with some caution, since someone may have started or even finished one of their suggestions since they were last updated.

PG Books In Progress <http://www.dprice48.freeserve.co.uk/GutIP.html>
On-Line Requested List <http://onlinebooks.library.upenn.edu/in-progress.html#requests>
Steve Harris' "To-do"s <http://www.steveharris.net/PGList.htm>

V.15. I have one book I'd like to contribute. Can I do just that without
      signing up?

About production:

V.16. How does a text get produced?

As stated back in the Basics section, all you need to do is:

 Borrow or buy an eligible book.
 Send us a copy of the front and back of the title page.
 Turn the book into electronic text.
 Send it to us.

That's all you actually need to know in order to be a producer. But if you're interested in the details of how other people actually do this, and want to know what else happens behind the scenes, here's a full, blow-by-blow account.

1. Finding an eligible book

Volunteers find eligible books [V.18] in all sorts of ways. Some lucky people have them in their bookshelves, or their attic. A lot of people have a good library nearby, where they can find books, or request them on interlibrary loan. Some people are big eBay fans; others like to hunt for bargains on specialist booksites. And of course lots of volunteers enjoy rummaging through actual used bookstores, or local markets, or yard sales.

Even if you're not going to take on a book yourself right now, search for some on the Net and find out about how to get a copy. Next time you pass an antiquarian bookstore, or a book market, drop in and browse. Ask your local library about interlibrary loans. Eligible books aren't hard to find once you know where to look.

2. Copyright Clearance

The copyright laws can be difficult to understand, and sometimes it may take serious research to prove that a particular edition is actually in the public domain. If you're not legally-inclined, just keep repeating "Pre-'23 is free" if you're in the U.S.A. and stick to books published before 1923. If you do want to delve deeper, read our Copyright Rules page at </vol/pd.html> and then go on to reading the Library of Congress Copyright Office official papers at <http://www.copyright.gov/>. If you're in another country, find out about your own copyright laws.

Volunteers send in the TP&V from the book for us to inspect. This not only gives us the proof to file, it also lets us know that someone is really working on the text so that we can list it as being In Progress for the information of others who might be interested.

3. Scanning, typing, proofing and editing

This makes up the bulk of PG's effort, and is discussed at great length elsewhere in this FAQ. There are many, many ways to create an etext from a paper book, and different people use different methods, but it all boils down to making a text file. For a typical book, it will probably take 40 hours of a volunteer's time. All that happens here is that somebody makes the effort to transcribe one paper book into a file that can be shared around the world and for all time.

4. Posting

[Note: this information is quite specific to the process we go through now. It is quite likely to change as we improve the automation of the tasks.]

In a simple case, where everything goes right, this can take as little as fifteen minutes. In a complicated case, where we have to convert formats, or there are a lot of errors in the text, or there are problems with the copyright clearance, it can take hours or even days while we wait for responses, or do a lot of editing, or find conversion tools.

Michael Hart used to do this work entirely alone, but in September 2001, he created the Posting Team to handle the load. (The Posting Team are nicknamed the "Whitewashers" in honor of Tom Sawyer's victims. :-)

Transferring the file

You send the text to us [V.46] either by Web, by FTP with a username and password that any of the Posting Team can give you privately), or by e-mail.

If you're FTPing, you should e-mail one or more of us as well, to let us know what you've uploaded.

One problem is files that don't transfer correctly. Especially by e-mail, some files get damaged on the way. It's better to ZIP the file before sending, if possible, to prevent some common problems with text files. The use of compression formats other than Zip can also create problems. Members of the Posting Team work on multiple platforms—DOS, Windows, Linux, Solaris—and zipping and unzipping programs are commonly available for all of these. Other compression methods, like Stuffit or bzip2, are not so readily available, and may give us trouble.

We login via ssh to beryl, which is the Unix system on which we work when posting, the same one that you FTPed the file to, unzip the file and glance at the top of it.

Checking Clearance.

We then check it for copyright clearance. The one and only absolute rule that we NEVER bend, no matter what, is that we WILL NOT post a file that doesn't have a clearance. If it ain't in the clearance files, it don't get posted.

Most regulars know that they should include their clearance line in the e-mail submitting the text, but not everybody does, and not everybody remembers every time. This can be frustrating, when clearance is not included and not obvious.

When Michael gives you your clearance on a book, he sends you back an e-mail that has just one line, something like this:

The Works Of Homer [Iliad/Odyssey] Tr. George Chapman Jim Tinsley 06/14/01 ok

He saves these lines in files that we posters can access. We regard this information as private, so we don't publish the details of who has cleared what.

When we get the text, we check whether the submitter has cleared it. If there is a clearance line in the e-mail notifying us about the text, there's no problem. If we can find the title of the text under the submitter's name in the clearance files, there's no problem. Unfortunately, sometimes we can't find it. There are two usual reasons: either the text submitted is part of the work cleared (for example, submitting one play from a collection), or the text hasn't been cleared yet. If the clearance isn't straightforward, we can go back and forth and round and round in e-mails for a while.

This is why it's a good idea to paste the clearance line into your e-mail.

If the title of the text you're sending isn't the same as the title of the text cleared, BE SURE to paste in the clearance line AND explain that the text you're sending is PART of the cleared book. Please also list the titles of the other parts; it really does cause confusion and delay when this is not clear.

Checking and Editing

Sometimes, people send in a book in a non-text format like Word Perfect or Microsoft Word, or send a text with unwrapped lines. In that case, we try to get the submitter to fix them, but if they can't, we have to convert the file to straight text before starting.

Some producers, particularly inexperienced ones, want to add non-standard annotations and mark-up and symbols to the text. This can get ticklish; we don't want to discourage them, but we need to keep texts reasonably standard. Usually, we can work something out. Maybe the book should be added in both text and HTML, for example.

Assuming that it's a plain text file, we next run gutcheck and a quick spellcheck on the file. This will tell immediately if it adheres to PG standards and if there is any serious problem with it.

If the file looks clean, we may skim it, looking for potential problems or formatting issues. For clean texts, the only things we usually need to change are unindented quotations or inconsistent chapter headings (a lot of people seem to mix "CHAPTER III" with "Chapter 14" and have irregular numbers of blank lines) or spacing and a few 8-bit characters. Occasionally, we have to rewrap a text. We also look out for included publishers' trademarks, which we normally prefer to remove (trademarks are NOT subject to copyright expiration: Macmillan(TM), the publishing house, is still around and trading), unnecessary or downright odd indentation or centering, stray page numbers, and prefaces or introductions or appendices that may not be in the public domain. If the file has lots of 8-bit characters, we probably need to make a separate 7-bit version, and post both.

If the gutcheck and spellcheck don't look clean, or if conversion is required, we may spend a lot more than 15 minutes on it. In a bad case, we may have to get the file re-proofed.

If you are conscious that you're doing something non-standard, and really mean it to stay, say so in your e-mail. (For example, I recently posted a text containing a family-tree representation that had lines over 80 characters. Now, I would have left that one alone anyway, but it helped that the submitter drew my attention to it in the e-mail.) If it's too non-standard, the poster may not allow it to stay, but at least you can discuss it. When a text needs a lot of non-standard formatting or markup, you really need to ask yourself whether you shouldn't be submitting it in HTML, with all the bells and whistles, and settle for something more normal in the text variant.

Mostly, errors are obvious, and there are at least some obvious errors in most texts. When errors are completely obvious, we just fix them without feedback to the producer unless you have specifically asked for feedback in your e-mail.

We're getting more HTML formats now, which is great, but incoming HTML often needs a lot of work, because people who are not experienced with HTML often make mistakes. The W3C <http://validator.w3.org> is the official standard for valid HTML, but, for the average volunteer, it's awkward to use. However, if you're submitting a HTML format, please use Tidy, which you can get from <http://tidy.sourceforge.net>, to check your text before sending it.

Header and Footer

We add the PG header and footer. If there is a header and footer already there, we strip them off first, since recent changes in the header mean that a lot of people send files with headers that are out of date. We have written programs to help with this.

We get the number for the text from a program on beryl called "ticket" that Brett Fishburne wrote, that dispenses the next number. That way, if two or three of us are posting at the same time, we won't all grab the same number. We create a 5-letter base filename, checking that it hasn't been used before, and finally zip up the file.

Posting

We now transfer the .ZIP and .TXT files to two servers: ftp.ibiblio.org and ftp.archive.org. (This is usually the point at which we realize that we forgot to make a change we noticed while checking. Aaaargh!)

5. Notification

At this point, the book is posted, but nobody knows about it! We need to do something about that. . . .

We compose an e-mail to the "posted" e-mail list, cc: the producer, with the line that is to go into GUTINDEX.ALL, the master list of PG files.

The "posted" list has only a few subscribers. These are the people who index and create links to PG texts, and include both PG volunteers and the maintainers of other sites that link to PG texts.

They also commonly download the texts to get more information for their indexes, and tell us if there is anything wrong with the files.

This e-mail is simply the official notification to all these people and the producer that the file has been posted. Here's a sample of such an e-mail:

Mar 2004 The Imperialist, by Sara Jeannette Duncan [SJD#4][mprlsxxx.xxx]5301

There may also be some remarks, if the text is in any way non-standard, or if files other than plain text were posted with it.

From this e-mail, you can, if you want to see any corrections made, immediately download the posted file and compare it to your version. Since the notification is made after the file has been copied to the servers, it should be there waiting for you.

To find out how to download a book that has just been posted, see the
FAQ "How can I download a PG text that hasn't been cataloged yet?" [R.3]

6. Indexing

From the "posted" list, the posting line is added to GUTINDEX.ALL and our indexers begin the cataloging process, which is much more thorough, for the website. This includes work like finding author's dates of birth & death, getting the Library of Congress classification, and the other information that makes up the website searchable index. That process takes extra time, which is why the website searchable catalog must always lag behind the actual titles posted.

7. Corrections

It's remarkable how many people who went over and over the text to the point of hating it suddenly see problems with it when they download it a couple of days after it's posted! Something psychological there, I expect. Anyhow, if you do download your text and see problems with it, don't worry, just e-mail whoever posted it, or any other member of the Posting Team. No, you're not stupid, or if you are, you're in good company, because we've all done it! There's no big deal about replacing the posted file with a corrected copy immediately.

When the corrections are small, as most are, we will just make the change to the existing text. If there are a lot of changes, we may post a new edition [R.35] with a new edition number; e.g. if the file abcde10 was corrected, we may post abcde11. We never make a new edition when we get corrections immediately after posting.

V.17. How long must a text be to qualify for PG?

The rule of thumb is that we try not to post texts shorter than 25K, or about 350 lines of 70 characters. This rules out, for example, a lot of individual short poems. If you are interested in contributing this type of material, consider making a collection of similar texts—poems by the same author, or magazine articles on the same subject. We have made a few exceptions, like Martin Luther King's "I have a dream" speech, but very few.

V.18. What books are eligible?

A book is "eligible" for posting if we can legally publish it. This is the case if:

    1. it is in the public domain in the U.S.A.,
                    OR,
    2. the copyright holder has granted unlimited
        non-exclusive distribution rights to PG.

V.19. Are reprints or facsimiles eligible?

A reprint or facsimile of a book that would be eligible is itself eligible.

For example, if a book published in 1995 is a reprint of a book published in 1900, then it is eligible. However, the onus is on us to prove that it is a reprint, and if it doesn't say on the TP&V that it is a reprint, confirming its eligibility may be impractical.

V.20. What is the difference between a reprint and a facsimile?

A facsimile retains the page layout and formatting of the original. A reprint keeps the same words, but may lay the pages out differently. For our copyright purposes, there is no difference—we can use either.

V.21. What is the difference between a reprint and a "new edition"?

A reprint contains only the words and pictures that were printed in the original. A new edition is in some way changed; it has different text, or pictures. It may be abridged, or expanded. It may have material added or changed, using other versions of the book.

A new edition gets a new copyright, and has to be cleared based on its own copyright date and status, not the date of the original printing of the title. See also the FAQ "How come my paper book of Shakespeare says it's 'Copyright 1988'?" [C.16] for an example.

V.22. What book should I work on?

Nobody in Gutenberg is going to set assignments for you. You decide what book to process. Just pick one that no-one else has already done, or is working on. It's also sensible to pick one that you'll like—you'll be living with it for a while. On a practical note, it's probably better to start with a short book or even a short story, since a long book can take quite a while to produce.

Start by thinking of books written before 1923. Pick a book you like, and check it out. If it's already done or still in copyright, try other books by the same author.

Check out your old books. Maybe you have an eligible edition that would be of great help to the project.

Try your library. They may have some eligible editions—books we can prove to be in the public domain—and you will certainly come away with ideas. Ask your librarian. Librarians are keen to help on projects like this.

Browse second-hand bookshops in your area. There are lots of treasures to be picked up very cheaply.

Search for literature pages and bookshops on the Internet.

If all else fails, you can always ask on the Volunteers' Board or try the gutvol-d mailing [V.12] list for ideas. Others may know of books that people are especially looking for, or projects already started where you could help out.

V.23. I have a book in mind, but I don't have an eligible copy.

First, determine whether there are any eligible copies of the book, by finding out the date it was published, possibly from the Catalog of the Library of Congress [B.5] and checking the Public Domain and Copyright Rules [B.1]. If there is a public domain edition, the next problem is to find one to work with.

V.24. Where can I find an eligible book?

The most commonly used outlets are used bookstores, garage sales, library sales, charity shops and any other place that sells old books.

The Internet is a wonderful medium for finding used and antiquarian books—used bookstores all over the world have found ways of co-operating and listing their inventories on the Net, so that whether you live in Los Angeles, Moscow or Perth, you can still find that book you're looking for in a shop in a laneway of Amsterdam. Most on-line listings will quote the publication year of the book, so you can check that it's pre-1923.

Two such sites that allow second-hand booksellers to list their inventory are:

Advanced Book Exchange <http://www.abebooks.com>

Alibris <http://www.alibris.com>

The book search page at trussel.com [B.5] has a list of many such Net bookshops, or you can simply visit any search engine and search for Used or Antiquarian Bookshops. You can often buy eligible books through these sites very cheaply.

If you still can't find the book you need, post a message on the Volunteers' Board or to the gutvol-d mailing list; maybe someone else can find it for you.

Sometimes, it may be possible for you to work from a later edition, so long as somebody who has an eligible edition can check it to make sure that no changes have been made. Sometimes, you may be able to find a modern reprint; reprints may be eligible, as long as they say they are reprints of an edition that would be eligible.

If you can type, or can scan without damaging the book, you can borrow books long enough to produce them. Even if your local library doesn't have the books you want, they may well be able to get them for you on inter-library loan. Ask your librarian about it.

V.25. What is "TP&V"?

This is an abbreviation for "Title Page and Verso", and means a paper or image copy of the front and back of the title page.

Even if the back is blank, we need to have an image of it for the files, to show that it is blank, so that if, in ten years' time, somebody queries our right to publish, we can show that we haven't just lost it.

Publishers print copyright information, like title, author, copyright year and owner, and whether the book was a reprint, on the TP&V, and by filing this, we can prove that the book we produced was in the public domain.

Sending us the TP&V is the One True Way to getting PG copyright clearance [V.37].

V.26. What is "Posting"?

Posting is the final stage in the production process, where the file is given a number and official PG header, and copied onto our FTP servers for distribution. See section 4 of the FAQ "How does a text get produced?" [V.16] for a blow-by-blow account.

V.27. I think I've found an eligible book that I'd like to work on.
      What do I do next?

Make sure nobody else is working on it, and that it's not already online somewhere.

V.28. What books are currently being worked on?

Check out David Price's In Progress List (a.k.a. "the InProg List") online at <http://www.dprice48.freeserve.co.uk/GutIP.html>. David gets the information from Copyright Clearances that have been done, and organizes it into a list. It can never be 100% up to date, since clearances come in all the time, but it's the best online facility we have, and it's much more clearly presented than the original clearance files.

V.29. How do I find out if my book is already on-line somewhere?

There's no foolproof method; some student somewhere could have scanned it and put it on her college web page without announcing it anywhere. However, there are some regular places to check.

It may sound obvious, but you should always look in the PG archives first. Download GUTINDEX.ALL and keep it handy. Search the InProg List [B.1].

The two other main places to search for your book are the Internet Public Library <http://www.ipl.org> and the On-Line Books Page <http://onlinebooks.library.upenn.edu/>. These projects specialize in indexing books that people make available on-line.

If you still don't see your book on-line anywhere, hit your favorite search engine, and give it the title, author's last name, and preferably a few uncommon words from the first page of the book. Sometimes one of those solo efforts shows up in a general search.

V.30. My book is not on the In-Progress list, and I can't find it on-line.
      Is it safe to go ahead and buy it?

Probably. It could have been cleared, but not included in the InProg list yet. If the amount of money to buy it is a consideration, you can e-mail any of the members of the Posting Team, and ask them to check the latest clearances for you. Even this isn't foolproof; another volunteer could be placing their order at the same time you're placing yours. Such duplications do happen, but they are very rare.

If the on-line file is from the same edition as the one you have (e.g. not a different translation) then you may be able to submit that file, perhaps slightly edited, to Gutenberg using the clearance from your paper copy. See "I've found an eligible text elsewhere on the Net, but it's not in the PG archives. Can I just submit it to PG?" [V.62] for how to do that.

And of course, you can always still make your own version for PG. It's surprising how often even very similar paper editions have small differences that can be interesting or significant.

Yes! In fact, assuming that the version already there is in the public domain, you can piggyback on the work already done by what is called "comparative retyping". For example, let's say that you have a later edition than the existing file; you can just take the existing file, edit it to match your paper version, and submit it as a new file. Of course, you must have Copyright Cleared [V.37] your paper version as well.

V.33. I see a book that was being worked on three years ago. Is anyone still working on it?

Maybe, maybe not. Some people abandon books, some people who are regular producers clear them and put them at the bottom of the pile, perhaps for years (though they will get round to them sometime), and some people just simply take two or three years to produce a book.

Once, we put names and contact details on the public InProg list, but for privacy and spam-prevention reasons, we've taken them off. However, the Posting Team have access to the master list of cleared files, and will send a message on your behalf to the person who originally cleared the book, asking if the project is still active, or if the producer wants help.

So if you really want to check this situation out, e-mail one of the
Posting Team.

V.34. I've decided which book to produce. How do I tell PG
      I'm working on it?

As soon as you get Copyright Clearance [V.37], your book is entered in the "cleared" files. David Price will take these, and add your entry in his next release of the In Progress List.

V.35. I have a two- or three-volume set. Should I submit them as one text, or one text for each volume?

Both.

Quite a lot of 18th and 19th Century books, even straightforward novels, were published as multipart sets. When you have such a set, you should usually submit one text for each volume, and a "complete" text with the contents of all volumes together.

People who do this often complete and submit one volume at a time, until they've finished, and then contribute the "complete" file.

V.36. I have one physical book, with multiple works in it (like a
      collection of plays). Should I submit each text separately?

If the works are clearly separate, stand-alone texts, and are long enough [V.17] to warrant inclusion on their own in the archives, then yes, you should, and you may also submit a "complete" version as well, if it seems appropriate. This most commonly happens in a collection of plays, though essays and other works may also fit the criteria. Collections of poetry rarely do, since most poems are too short to submit as stand-alone texts.

Sometimes the book includes a preface or introduction or glossary covering all the works in it. In this case, you can decide whether to include these with each of the parts, or save them for the "complete" version.

V.37. How do I get copyright clearance?

Basically we need to see images of the front and back of the title page of the book, which is where copyright information is usually shown. This is called "TP&V", for "Title Page and Verso" [V.25].

To Submit Online:

As of late 2002, we have a new automated upload procedure using a web page. This is by far the fastest and easiest way to get clearance. You need scanned images (PNG, JPEG, TIFF, GIF), of the two pages, of good enough resolution that the text can be read clearly, though the files don't need to be huge.

Just go to <http://beryl.ils.unc.edu/copy.html> and follow the instructions.

There are two other, older ways to submit a text for clearance.

To submit by paper mail, photocopy the front and back of the title page, even if the back is blank, write your e-mail address on it, and send the photocopies to:

MICHAEL STERN HART 405 WEST ELM STREET URBANA, IL 61801-3231 USA

This is called Title Page & Verso, or TP&V for short, and is needed for copyright research. A colored envelope is best, to make sure your letter is easily recognized as TP&V.

E-mail Michael hart@pobox.com when you send them, so he knows they're on the way. It's a good idea to check back with him by e-mail after a week or so if you haven't heard from him.

About this, Michael says: "Please include always your e-mail name and address, and mark the envelope with some distinctive mark and or color. Colored envelopes fine. Just something so I can find it easily, the mail here is slow and deep, like snow. Please send a note to: <hart@pobox.com> for more info."

To submit by e-mail, scan the front and back of the title page, even if
the back is blank, and e-mail the images to Greg Newby
<gbnewby@ils.unc.edu> as TIFF, JPEG or GIF in medium resolution. Make
sure that the print is legible before you send.

Whichever method you use, you should expect to get an e-mail back after about a week, with one line containing the Author, Title, your name and date with the word "OK" at the end. This means that your text has been cleared.

A Clearance Line looks something like:

The Works Of Homer [Iliad/Odyssey] Tr. George Chapman Jim Tinsley 06/14/01 ok

If you don't get any response, e-mail to check that your TP&V was received OK. If the word at the end of the line is not "OK", then your text is not eligible, and a comment will probably be appended explaining why it is not eligible.

Don't start work on your book until you get that OK! It's very sickening to do all that work, and then find out that your text can't legally be put on-line!

V.38. I have a two- or three-volume set. Do I have to get a separate clearance on each physical book?

Yes.

Some multi-volume works, notably reference books and translations, were published in a series, and it may be that the first volume is 1922, but the others are 1923 or later, so we have to clear each individually.

V.39. I have one physical book, with multiple works in it (like a collection of plays). Do I have to get a separate clearance for each work?

No. Since they were all printed together, one TP&V will suffice for all, but . . .

You should list each separate title included, if you intend to submit each title separately (see the FAQ "I have one physical book, with multiple works in it like a collection of plays. Should I submit each work separately?" [V.36]). If, say, you clear a "Collected Plays of Sheridan", and later submit an eBook of "The School for Scandal", we will have trouble finding your clearance unless we have made a note that "School for Scandal" is part of the contents of "Collected Plays".

In a case like this, you should include, on your paper or e-mail, something like:

George Bernard Shaw. Plays Unpleasant. 1905.
Contents:
    Preface to Unpleasant Plays
    Widower's Houses
    The Philanderer
    Mrs. Warren's Profession

You only need to do this when you are going to submit each part separately, which is commonly the case with plays, and sometimes essays, stories and novellas. Taking a different example, the "Collected Poems of Emily Dickinson", we would not need to list the contents, since we wouldn't publish each poem separately.

There is one exceptional case: if your book was printed after 1923, but contains stories or plays some of which are stated to be reprints of pre-1923 editions, you should give as much detail as possible about what you intend to submit.

V.40. Who will check up on my progress? When?

Nobody. There are no schedules or timetables. You're welcome to contact other volunteers [V.12] with comments or questions, though.

V.41. How long should it take me to complete a book?

Most books get done in between one and three months, but this varies wildly. It depends on the amount of time you can afford to give it, the length of the book and, if you're not typing, the quality of the scan—if the book scans badly, you need to put more time into proofing.

Some very productive volunteers manage to turn out an e-text a week; some books can take a year or more.

Scanning itself doesn't take too long. Even if it takes you as much as two minutes per page to scan, you will still complete a 300 page book in 10 hours, and you will probably be scanning much faster than that [S.9]. The problem is that the text generated by the scanner and your OCR package is usually faulty. There are many cute scanner errors, mistaking b for h, or e for c, so that "heard" is scanned as "beard" or "ear" as "car". Makes the story more interesting sometimes!

So now you need to do a first proof of the e-text. Read it carefully, correct scanning mistakes, and make sure that you haven't left out pages or got them in the wrong order. Unless your scan was exceptionally good, this is the time-burner in the process.

When you've done the first proof, you can either do a second proof yourself, or send it to another volunteer for second proofing.

If you're a typist, of course, you can skip right over the messy scanning and scan-correction process. Yay typists!!

V.42. I want/don't want my name published on my e-text

No problem. When you send the e-text for posting, mention exactly what, if anything, you want the Credits Line [V.47] to say.

V.43. I'd like to put a copy of my finished e-text, or another
      Gutenberg text, on my own web page.

Great! PG encourages the widest possible distribution of e-texts. We like to publish everything in plain text, which is the most accessible format, since everybody can read plain text. But once it's available in plain text, it's open to you or anyone else to convert it to other formats like HTML for further distribution.

If you are reposting a text, though, please be careful to check that your posting complies with the conditions spelled out in the header, especially for copyrighted works.

V.44. I've scanned, edited and proofed my text. How do I find someone
      to second-proof it?

You can post a request on the Volunteers' Board, or on the gutvol-d
Mailing List. You will probably get some offers there. In a difficult
case, you might ask Michael Hart to add it to the "Requests for
Assistance" section of the next Newsletter.

In general, the best way to handle it is to make a co-operative proofing project out of it. This is like a miniature version of the distributed proofreading sites, without the page images.

There are always people looking for proofing work, but many beginners take on more than they can handle, and don't finish the job, and this can be very disappointing if you give the whole thing to one volunteer who then vanishes without trace. You can minimize the risk of this by splitting the book into chunks of about 20-30 pages, or one chapter if that's around the right size, each. Write explicit instructions about what you want them to do when they spot a suspected error, like fix it or mark it with an asterisk. (Marking is probably safer with beginners who don't have the book or an image of the page to refer to.) Give the first chapter to the first person who responds, the second to the second, and so on. As you hand out the chapters, let the proofers know that if they're not returned within three or five days, you'll assume they've quit. Three days is more than plenty of time for 20 pages. If someone returns a chapter, you can give them another. If someone doesn't get back to you within the time set, assume they're not going to, and recycle that chapter to someone else. No hard feelings, no problem. This process of "co-operative proofing" ensures that beginning proofers don't choke on the work, and that one vanishing volunteer doesn't hold up the whole project.

V.45. I've gone over and over my text. I can't find any more errors, and I'm sick of looking at it. What should I do now?

We all know that feeling! Particularly with your first book, you've probably gone through a patch when you thought you'd never finish—and when you do, you can't stand the idea of looking at it again. Heh. Cheer up—the first twenty texts are the worst! :-) And you'll feel a lot better when you see your text available for everyone to read.

You have three choices:

You can send it for posting as it is. [V.46]

You can put it aside for week or so, and come back to it with fresh eyes.

You can ask in any of the standard ways [V.12] for someone else to second-proof it for you. This has a lot to recommend it; it gets other sets of eyes looking at the text, it relieves the pressure that you may feel, it may rekindle your enthusiasm for the text, it allows you to "meet" other volunteers, and possibly form partnerships for future PG collaboration. Above all, it gives new proofers a chance to get their feet wet, and this is good for them, and good for PG. You are not only contributing a text, you're helping to train and encourage the next generation of producers.

V.46. Where and how can I send my text for posting?

As of late 2002, we have a new automated upload procedure using a web page. This has a lot of good things going for it, because we keep a record of what's uploaded, you get an e-mailed copy of the notification, you don't have to fiddle with FTP, and we can make up the header automatically from the information you enter, which saves time and prevents keying errors.

As always, it's better to ZIP your file first, because it'll take less time to transfer.

Just go to <http://beryl.ils.unc.edu/cgi-bin/upload>, fill in the form, specify the file to upload, and hit "Send" at the bottom.

And you're done!

If, for some reason, you can't use this page, there are two backup options: you can e-mail it, or you can upload it by FTP. Whichever you use, it is always best to ZIP the file first if you can.

If you are comfortable with sending files by FTP, this is better than e-mail, First, you will need a username and password, which you can get by e-mailing any of the Posting Team.

If you already know how to use command-line FTP, here's how to do it:

Log in to beryl.ils.unc.edu using the username and password supplied and change to the work directory by typing "cd work". Change to binary mode with the "bin" command and "put" your file.

 Summary instructions:
 ftp beryl.ils.unc.edu
 login: yourlogin
 password: yourpassword
 cd work
 bin
 put yourfile.ext
 quit

Here is a sample session:

 >ftp beryl.ils.unc.edu
 Connected to beryl.ils.unc.edu.
 220-Access from unknown@127.0.0.1 logged.
 220 FTP Server
 User (beryl.ils.unc.edu:(none)): xxxxxxxx
 331 Password required for xxxxxxxx.
 Password: xxxxxxxx
 230 User xxxxxxxx logged in.
 ftp> cd work
 250 CWD command successful.
 ftp> bin
 200 Type set to I.
 ftp> put MYFILE.ZIP
 200 PORT command successful.
 150 Opening BINARY mode data connection for MYFILE.ZIP.
 226 Transfer complete.
 ftp: 172313 bytes sent in 17.34Seconds 9.94Kbytes/sec.
 ftp> quit

When you are in the work directory, you will not be able to list files, but they do exist and they are there.

When you have uploaded your file, e-mail a note to any or all of the
Posting Team, including your
 1. filename
 2. credits line as you want it on your text
 3. clearance line you received [V.37]

An ideal note might be:

Subject: Beryl upload for posting: Hamlet

    I have uploaded to beryl:
        Hamlet, by William Shakespeare

File is: hamlet.zip

Credits line is: Produced by John Doe <jdoe@example.com>

    Clearance was given as:
    Hamlet William Shakespeare John Doe 05/03/02 ok

If you'd rather send it by e-mail, send the e-mail, including the Credits Line and Clearance Line as in the sample above, to any or all of the Posting Team, with your text as an attachment. Again, ZIPped is better, since it avoids certain damage that can happen to a plain text e-mail along the way.

Please read section "4: Posting" of the FAQ "How does a text get produced?" [V.16] for more detail about what happens in posting. Especially, if you want to draw some peculiarities of this text to the Posting Team's attention, or want feedback on any minor edits done during posting, you should say so in the e-mail you send.

Don't assume that we know anything when you send the e-mail. We don't know what you want us to put on the Credits Line. We don't know that this is an unusual text, and needs some kind of special reformatting. We don't know that the text should be split into two volumes before posting. We don't know that you would really like us to check it closely before posting. You have to tell us, exactly and precisely, what you want on the Credits Line. If the text needs some specific work, you have to tell us exactly what that is. And please do that in your e-mail, not in the text itself. Remember that we could be dealing with five or ten other texts at the same time, and even if the poster you discussed it with two weeks ago is the same one who posts the book, he may not remember.

V.47. What is the "Credits Line"?

The Credits line is a line that the Posting Team can insert into each PG text naming the producer or producers of a particular text.

You should decide what you want on the credits line of your text; it's really not up to us.

Most credits lines are something like:

Produced by John Doe <jdoe@example.com>.

If you don't want to be mentioned by name at all, just say, in your e-mail:

  Please omit the Credits Line for this text. I want to contribute
  it anonymously.

If you do want to be mentioned, please give the exact wording you want us to use. Some people want their name only; they don't want us to include their e-mail addresses. Others want to make their e-mail addresses public so that readers can contact them with comments. That is entirely up to you, but you do need to tell us. If you do want to include your e-mail, remember that having it permanently on the net is a spam-magnet, and we can't effectively remove or change it later.

Occasionally, a Credits Line can spill onto more than one line, for example:

  This text was converted to HTML by Jane Roe <jroe@example.com>
  from an original ASCII text scanned by Jack Went
                          and proofed by Jill Hill

V.48. How soon after I send it will my text be posted?

First read the "Posting" section of the FAQ "How does a book get produced?" [V.16] to understand the process.

You should expect some response within three or four days. We try to get to all submissions within that time. In most cases, that response will be simply the official notification that it has been posted. If there is a query on your text, for example if we can't find the copyright clearance or if we have trouble converting or correcting your text, we will probably e-mail you back directly with questions.

If you don't hear from us within four days, send a follow-up e-mail; it could be that your original note never got to us, or just fell through the cracks.

If your file happens to arrive while one of us is logged in and working, it could get posted within the hour. Some frequent contributors who know our habits know just how to time their uploads!

V.49. I found a problem with my posted text. What do I do?

Most postings go smoothly, but problems can happen. Sometimes, one of the servers is down. Sometimes a file gets corrupted for some unknown reason. Sometimes, let's face it, we screw up.

Usually, one of the indexers will tell us about it, but if you catch it first, e-mail whoever sent out your notification e-mail and explain the problem. Don't worry; your original file will be quite safe, since we keep these long after posting them.

V.50. Someone has e-mailed me about my posted text, pointing out errors.

Great!

Since you're the original producer, you're in the best position to decide whether these are real errors. If they're right about it, tell the Posting Team and we'll correct the text.

V.51. Someone has e-mailed me about my posted text, thanking me.

Nice feeling, isn't it? :-)

About Proofing

A very big one!

Typists' work doesn't usually need many corrections, but unfortunately, scanners and OCR packages are far from perfect, and scanned text varies from "almost-right" down to "maybe I should consider typing instead of scanning". Proofing is the process that turns a scan into a readable e-text.

Proofing a typist's work is straightforward; you just read it, and keep an eye out for mistakes. Typists typically have few mistakes in their texts, but the errors that they do make tend to be hard to spot. Proofing OCRed text has its quirks, and you can expect many, many errors to correct.

The only thing that all proofers agree on is to differ in their methods. Some people scan and almost complete the proofing process within their OCR package, others do no editing at all within their OCR. Some spell-check first, others spell-check last. Some work through in one pass, doggedly line by line, others make several light passes. Some start at the end and work backwards! Some proofers mark all queries with special characters like asterisks (*) in the text, most just make all the obvious changes and mark only the dubious ones. Some people always send their texts out for proofing, others prefer to do it all themselves.

So this guide is not prescriptive; this is not the "only way" to do it. The only rule is that, at the end of the process, your e-text should be as error-free as you can make it, and should conform to Gutenberg's editing standards, which are mostly just common sense guidelines to make readable text.

The aim of this FAQ is to give you an understanding of what text looks like when it comes fresh off the scanner, and an overview of the whole process by which it becomes a publishable e-text.

V.53. What is Distributed Proofing?

It has always been common for volunteers to share proofing work among themselves—you take the first five chapters, I'll take the next, and so on.

When you're just starting as a PG volunteer, you should go to one of the Distributed Proofing sites [B.4] and do some work there to get a grounding in the basics and a feel for whether you would like to continue working in PG. In distributed proofing, you get a very short section, as little as a page of text at a time, and usually an image file of the page as it scanned. You then make the text match the image. This is a great start, since all you have to do is read, compare and correct. However, other work also needs to be done, and will normally be done by the project managers of these sites. The samples below give you an idea of the whole process, and also some ideas of what proofing a whole book from start to finish is like.

V.54. What do I need to proof an e-text?

You actually need only two things: the e-text itself and a text editor or word-processor that can handle book-sized files and save them as text.

Nearly all word processors and text editors in current use will work. Volunteers use many common programs, including WordPerfect, Microsoft Word, WordPad, DOS EDIT, vi, Brief, Crisp, EditPad, MetaPad, emacs, AbiWord, and the word processors from Open Office abd AppleWorks. And all of these are in actual use by volunteers today. Since all of them contain the necessary basic functions, the best program is the one you're most comfortable with.

Be cautious with recent, powerful word-processors that "auto-correct" text, or use "smart quotes" or any other such automatic retyping or formatting feature, since they can Do Bad Things to your e-text without your consent! When using any such package, it is best to switch off any feature that makes changes without asking you.

Two utilities which may come in useful are a spell-checker and a version difference checker. These may be built into your word processor, or you may have them as separate packages.

A spell-checker is like a chain-saw: a powerful tool, but one to be used very carefully. It is very easy to say "Yes" to the wrong change, and make a really bad mess of the text. Spell-checkers have problems with proper names, foreign words, archaic usages, and dialects. Incautious use can leave you with a text such as that immortalized in the

Owed two a Spell in Chequer.

    Eye half a spell in chequer,
    It cane with my Pea Sea.
    It plane lee marques four my revue
    Miss steaks eye can knot sea.

Every e-text should pass through a spell-checker at some point, but the human half of the partnership needs a very light hand on the confirmations of change!

A difference checker, such as FC or COMP for MS-DOS, diff for Unix or ExamDiff <http://www.prestosoft.com/examdiff/examdiff.htm> for Windows, may also come in handy. A difference checker compares two versions of the text, and points out the changes. This is important when you've sent a text out for proofing, and you get it back with changes. Rather than re-reading the whole text, you can use a difference checker to highlight the changes so that you can verify them against the printed text. As a proofer, you can use it to compare the original text with what you're sending back to ensure that you've only changed what you meant to change.

V.55. Do I need to have a paper copy of the book I'm proofing?

No.

Your job as proofer is to ensure that the e-text you're working on is readable in itself, and contains no obvious errors. Where you think there might be an error, but you're not sure, you mark the spot in the e-text, and let the volunteer who has the paper book look it up.

V.56. What's the difference between "first proof" and "second proof"?

These are fuzzy terms used to indicate how accurate the e-text is, and what type of work is needed to improve it. Quite commonly, the same volunteer who scans the book proofs the whole thing in one or two passes. Sometimes, given a good scan, the text can be sent out for "first proof" with little or no preparatory fixing-up. Often, the scanner makes quite a lot of corrections, then sends the text out for "second proof".

A text is ready for first proofing when it's obvious that there are plenty of errors, but it's possible to figure out, in almost every case, what the correct text should be without needing to refer to the book.

The objective of first proofing is to eliminate all the obvious errors, so that if you speed-read quickly through the text, you probably won't notice any.

Second proofing involves taking a text that has been first-proofed and correcting all the remaining, more subtle errors. Often, some simple errors such as incorrect spacing and quotes may be left for second proofing. Texts that have been typed instead of scanned will always be of at least second-proof quality.

V.57. What do I do with an e-text sent to me for proofing?

First, establish reasonable expectations. A typical book takes 10-15 hours of concentrated effort, and when you first start, you're climbing a learning curve. For your first session, decide to mark out a chapter or two—something like 500 to 1,000 lines—and work only on that. If you get through 1,000 lines in your first sitting, you have done extremely well! It's a good idea to send this first 1,000 lines or so back immediately. The volunteer who sent you the e-text will comment on it, and let you know about any style guidelines you may have breached or common errors you may have missed. Most beginning proofers do make mistakes, so don't worry about it—it's easier to correct these in 1,000 lines than to go back over them in 15,000 lines!

You will usually receive the e-text as an attachment to your e-mail. It's better to send e-texts as attachments than to paste them as text into the body of the e-mail to make sure that the text isn't changed by different e-mail clients. It's better to send e-mailed attachments as ZIP files [R.20], since e-mails sent as text can be damaged along the way. But whether you receive a TXT file or a ZIP file that you have to open, you should save the .TXT file to your hard disk and open it with your editor.

It may be that the text you see appears double-spaced—every second line is blank—or that all the text is on one incredibly long line. This is a familiar effect when moving between a DOS/Windows computer and a Mac or Unix system, but it can happen between any two editors. It is caused by the use of different characters to mark the end of a line. If you have this problem, ask whoever sent you the text to re-send it, telling them what kind of computer and editor you have.

Now you make any changes that obviously need to be made, and mark any places where the text looks wrong, but you're not sure what the right text should be. You can usually use asterisks (*) to mark these dubious spots, but you might use other characters if the text already contains asterisks. When in doubt, mark them all, and let the volunteer with the text sort them out!

It is usually best not to make global changes to line lengths by reformatting lots of paragraphs, since the person who sent you the e-text may want to use a difference checker when you return it, and changed line-lengths throughout mean that every line will be different.

When working on a long text, or when making a lot of changes, it may be wise to save several versions of the text with different filenames at different stages so that if something goes badly wrong, you can revert to the last good version. This applies especially to saving the text just before performing a spell-check.

When you're finished with the e-text, make sure you save it as a plain text file (.TXT) and send it back by zipping it if you can, and attaching it to an e-mail.

V.58. What kinds of errors will I have to correct?

Each text has its own peculiarities, but there are a number of well-known scanning errors you will be dealing with all the time.

Punctuation is always a problem. Periods, commas and semi-colons are often confused, as are colons and semi-colons. There are also usually a number of extra or missing spaces in the e-text.

The problem of quotes can assume nightmarish proportions in a text which contains a lot of dialog, particularly when single and double quotes are nested.

The numeral 1, the lower-case letter l, the exclamation mark ! and the capital I are routinely confused, and often, single or double quotes may be mistaken for one of these.

Lower-case m is often mistaken for rn or ni.

The letters h and b and e and c are commonly mis-read, and these are probably the hardest of all to catch, since ear/car, eat/cat, he/be, hear/bear, heard/beard are all common words which no spell-checker will flag as problems.

For example:

" Hello1' caIled jirnmy breczily. 11Anyone home ? "

There seemed to he no-oneabout. Only tbe eat beard him."

should read:

"Hello!" called Jimmy breezily, "Anyone home?"

There seemed to be no-one about. Only the cat heard him.

As well as scanner errors, which affect one letter at a time, you have to keep an eye out for editing mistakes by the volunteer who scanned the text or by previous proofers. These are typically cases where a whole line, paragraph or page has been omitted or misplaced. They show up as sentences that don't make sense, or paragraphs that don't follow from the previous one.

This means that you have to keep reading the flow of the text, so that you can spot context errors as well as typos.

V.59. How long does it take to proof an e-text?

This depends on how long the e-text is, how clean the text is when you start, and how thorough you're being, as well as how much time per day you can give it and how fast you can proof.

On a first proof, it can take a very long time to get the e-text to a readable condition if it scanned badly. As a beginner, you would be unlikely to be given such a difficult text to work with. First proofs are usually done by the same person who did the scanning, and are only given out in the context of established scanning/proofing teams.

You might expect to proof anywhere between 500 and 2,000 lines per hour during a second proof. A short novel or novella might have as few as 6,000 or 7,000 lines; War and Peace weighs in at about 54,000 lines. Most novels run to 10,000 to 15,000 lines. So you might spend anything between 5 and 30 hours second-proofing a standard book, with 10 to 15 hours being typical.

For an average novel, a week or two for second proofing is good going.
A month is reasonable.

Proofing an e-text is a significant amount of work, and you may find it psychologically more comfortable to take on a chunk at a time—say 1,000 lines per session—and send that proofed section back, rather than wait until the whole job is done before sending anything back. This helps to avoid the fairly common case where you keep falling behind where you expect to be until you dread the thought of getting back to the text, and finally just abandon it.

If you find after a while that you just don't want to continue, please tell the person who sent you the text that you're not going ahead with it. It's very frustrating for the volunteer who scanned the book, and who wants to get it posted, to wait for two or three months, only to have to start all over again with another proofer.

V.60. Are there any special techniques for proofing?

The classic way to proof is to open the text in your editor or word processor, and just start reading carefully.

This method has received a major boost since editors and word processors have added a feature of showing squiggly red underlines under words not in their dictionary. While this is very useful, you still need to read carefully, since not all errors produce misspelled words. The classic, and very common, example of this is scanning "he" for "be". These visual spellchecks also commonly do not check words beginning with capitals. Capitalized words are commonly names not in the dictionary, and when checking of capitalized words is switched off, they will not query "Tbe". Other errors that a spellchecker doesn't look for include missing spaces, mismatched quotes and misplaced punctuation. For these, you can try gutcheck [P.1]. And of course, no automatic check will find omitted lines or words. Worse, spellcheckers will query words not in their dictionary that might be quite correct, and this can be quite troublesome when dealing with older texts or dialect.

Still, if your concentration is up to the job, scrolling through a text with non-dictionary words underlined in red is a fast and effective way of giving a text the final once-over.

Volunteers have also used other techniques for proofing. Some people can't sit at their screen and read for hours; many people don't want to.

Some people just use the good old-fashioned method of printing out the text to be proofed, and blue-pencilling the mistakes.

It is becoming fairly common now for people to load the text onto their PDA, and read it from that. Mistakes found can be bookmarked or jotted down and fixed when they go back to their PC.

Getting your computer to read the text aloud is a very effective way of achieving high accuracy. Modern PCs have audio capabilities built in, and it is possible to find free or cheap shareware "read-aloud" text-to-speech packages for just about everything. Some PDAs are also capable of doing text-to-speech.

The first time you try text-to-speech, it will probably sound and feel a little strange, but you will quickly learn to hear errors in words. This can be very effective, but you should have given the text at least a light proofing before you begin; it is hard to deal with a high number of errors using a text-to-speech method.

When proofing by a speech program, you either set your text-to-speech program to pronounce all punctuation, or, if that is not possible, you make a special version of your text to feed it, first doing a global replace of "," with " comma ", ";" with " semi-colon ", and so on. Mark a block of 500 to 1,000 lines for reading aloud, and set the reading speed to whatever is comfortable for you. Then you sit down with the original book in front of you, and listen. When you hear an error, mark the place in the text with a light pencil. Stopping the reading at every error, editing the text and restarting is possible, but it breaks the flow, and ends up taking longer. When the reading is done, go to your keyboard and correct the errors found.

V.61. What actually happens during a proof?

Stage One—The original Scan

We start with a scanned e-text, in this case a paragraph from The Odyssey. The paragraph used as an example here has been "enhanced" with more errors than in the real scanned text, so that you can see samples of many problems all in one place.

We begin by looking at the original OCRed text, of which our sample section reads:

 1There Periniedes and Eurylochus held the victims, but l
     drew my sharp sword from my thigh, and dug a pit, as it were
     a cubit in length and breadth, and about it poured a drink-
     offering to all the dead, first with mead and thereafter with
     sweet wine, and for the third time with water, And 1 sprink-
  BOOK XL
 ODYSSEY X, 24-56.
          173

                    ODYSS.EY XI, %4-56. 173
  lef white incal thereon, and entreated with many prayers
   strengthless beads of the dead, and prornised that on my
  return to Ithaea 1 would offer in my halls a barren heifer,
  the best 1 had, and fil the pyre with treasure, and apart unto
  Teiresias alone sacrifice a black rarn without spot, the fairest
  of my flock. But when 1 bad hesought the tribes of the
     d with vows and prayers, 1 took the sheep and cut their
        s over the trench. and the dark blood flowed forth,
           he spirits of the dead that he departed gathered
       from out of Erebus.

It's clear that we should tidy up the page headings and numbers that have been scanned in with the main text, and that we should separate the paragraphs and remove the spaces inserted by the scan at the start of some lines. We also need to restore some of the text that got lost in the scan. Since there isn't much of it, we just type it in. Having done this, we get to . . .

Stage Two—First pass through the scanned text

At this point, we have a complete text. All of the words are actually there, and we have eliminated page breaks and other extraneous artifacts of proofing. Again, mileage varies: some people like to preserve page breaks and numbering until much later, to make it easy to refer back from the e-text to the book.

Our job in this phase is to fix all of the obvious scanning errors and double-check that we really do have all the text. Our aim here is to create an e-text that is ready for First Proof. In fact, since it's fairly clear what all the words are, this text could be considered ready for first proof.

1There Periniedes and Eurylochus held the victims, but l drew my sharp sword from my thigh, and dug a pit, as it were a cubit in length and breadth, and about it poured a drink- offering to all the dead, first with mead and there after with sweet wine, and for the third time with water. And 1 sprink- led white incal thereon, and entreated with many prayers the strengthless beads of the dead, and prornised that on my return to Ithaea 1 would offer in my halls a barren heifer, the best 1 had, and fill the pyre with treasure, and apart unto Teiresias alone sacrifice a black rarn without spot, the fairest of my flock. But when 1 bad besought the tribes of the dead with vows and prayers, 1 took the sheep and cut their throats over the trench. and the dark blood flowed forth, and lo, the spirits of the dead that he departed gathered them from out of Erebus.

Now we convert those numeral 1s to capital Is and to quotes, where appropriate, we straighten up the quotes and we deal with other obvious scanning errors, which brings us to . . .

Stage Three—The First Proof

At this point, we could hand over the text to an experienced proofer who doesn't have a copy of the book. This would be called a "first proof". An e-text is at first proof stage when there are still plenty of errors, but in each case it's pretty obvious what the correct word is. The excerpt now looks like normal text.

Unfortunately, in stage two above, we accidentally deleted a line.

'There Periniedes and Eurylochus held the victims, but l drew my sharp sword from my thigh, and dug a pit, as it were a cubit in length and breadth, and about it poured a drink- offering to all the dead, first with mead and there after with sweet wine, and for the third time with water. And I sprink- led white incal thereon, and entreated with many prayers the strengthless beads of the dead, and prornised that on my return to Ithaea I would offer in my halls a barren heifer, Teiresias alone sacrifice a black rarn without spot, the fairest of my flock. But when I bad besought the tribes of the dead with vows and prayers, I took the sheep and cut their throats over the trench, and the dark blood flowed forth, and lo, the spirits of the dead that he departed gathered them from out of Erebus.

Stage Four—Corrections from First Proof

We receive the first proof back from the proofer, and find that it has been mostly corrected.

The corrections made were "l/I", "there after/thereafter", "prornised/promised", "bad/had", and "rarn/ram".

We have also wrapped the lines—at 60 characters in this case, but it is commonly as much as 70 characters per line. Sentences which look wrong, but where it isn't clear what the right text should be, have been marked with asterisks (*).

'There Periniedes and Eurylochus held the victims, but I drew my sharp sword from my thigh, and dug a pit, as it were a cubit in length and breadth, and about it poured a drink-offering to all the dead, first with mead and thereafter with sweet wine, and for the third time with water. And I sprinkled white incal * thereon, and entreated with many prayers the strengthless beads of the dead, and promised that on my return to Ithaea I would offer in my halls a barren heifer, * Teiresias alone sacrifice a black ram without spot, the fairest of my flock. But when I had besought the tribes of the dead with vows and prayers, I took the sheep and cut their throats over the trench, and the dark blood flowed forth, and lo, the spirits of the dead that he departed gathered them from out of Erebus.

We look up the text where the first proofer has asterisked it, and make the corrections.

The text is now ready for second proofing. An e-text is ready for second proofing when you can skim through the text without noticing that there are errors.

We can either do a second proof ourselves, or send it out for second proofing.

Second proofing involves a very careful reading of the text, looking for small errors. In some ways, it's much harder than first proofing, since it's very easy to let your eyes run on auto-pilot and in doing so, miss subtle errors.

Having performed the second proof, which caught errors like "beads/heads", "Ithaea/Ithaca", "Periniedes/Perimedes" and "he/be", we now have our final e-text.

'There Perimedes and Eurylochus held the victims, but I drew my sharp sword from my thigh, and dug a pit, as it were a cubit in length and breadth, and about it poured a drink-offering to all the dead, first with mead and thereafter with sweet wine, and for the third time with water. And I sprinkled white meal thereon, and entreated with many prayers the strengthless heads of the dead, and promised that on my return to Ithaca I would offer in my halls a barren heifer, the best I had, and fill the pyre with treasure, and apart unto Teiresias alone sacrifice a black ram without spot, the fairest of my flock. But when I had besought the tribes of the dead with vows and prayers, I took the sheep and cut their throats over the trench, and the dark blood flowed forth, and lo, the spirits of the dead that be departed gathered them from out of Erebus.

Hooray! At long last we have an e-text to post, which can be downloaded, read and enjoyed by anyone in the world from now on.

About Net searching:

V.62. I've found an eligible text elsewhere on the Net, but it's not in the PG archives. Can I just submit it to PG?

You can submit it, but you can't "just" submit it.

We wish we could give a permanent home to all the etexts that people have produced and placed on the Net, but without proof of their public domain [C.10] status, we can't.

We need to be able to prove that the eBooks we publish are in the public domain, so, in order to use one of the many texts that are just floating around the Net, you need to find a matching paper edition that we can prove is eligible [V.18].

(By the way, please be sure that it isn't already in the PG archive. A lot of texts circulating on the Net originated at PG, and people quite often submit them back to us.)

Before you get into this, you should check whether the text you have found is likely to be in the public domain in the U.S. A quick way to verify this is to hit the Library of Congress Catalog site at <http://catalog.loc.gov> and search for the title or author. If you find no publications before 1923, then you should probably move on; the Library of Congress doesn't list every book, and in particular doesn't list all books published outside the U.S., but, if there isn't a pre-1923 copy there, it may be difficult to follow up on. If you're not dissuaded, do a search on the Net for used book shops that might have pre-1923 copies.

Sometimes, with a text on the Net, you know who typed it; it's on someone's website, or the transcriber is named in the text. Sometimes, the text has just been floating around Usenet or old gopher sites for years, with no attribution.

The first thing to remember is that we would like to give credit to the original transcriber if they want it, and if we can identify them.

The next thing to consider is that the original transcriber may well have an eligible copy of the book, and may be able to provide TP&V [V.25] for it.

So, if you can locate the original transcriber, it makes sense to e-mail them, explain what you propose to do, and ask them whether they can help with copyright clearance and whether they would like to be credited in the PG edition. Often, you will get no response, or a response but no prospect of material that will help with clearance, but sometimes you will get lucky.

If the transcriber can't help with TP&V, it's up to you to find a matching paper edition of the same book. This may not be as hard as it sounds. Libraries can help, and may get editions for you on interlibrary loan.

This is an ideal way for students, academics and librarians to contribute texts to PG, since you probably have access to a good library with stocks of old books to find matching paper editions.

If you find a matching paper edition, you then need to compare the etext you found with the book. Legally, what we're trying to prove here is that we have done "due diligence"—that we have done our best to prove that the etext is indeed a copy of a public domain work.

The minimum "due diligence" we can perform is to compare the first and last pages of each chapter, (or every 20 pages where the book is not neatly divided into chapters of about that size). You should list all of the differences between the book and the etext that you find on those pages. It is to be expected that there will be some minor differences of punctuation, spacing and spelling, and even perhaps of wording. Minor differences are OK, but we do need to list them, to prove that we did the comparison. When you have your lists, you can send in the TP&V as normal, accompanied by your lists, for clearance.

Many texts floating round without attribution, and indeed many with attribution, could do with a thorough checking, and another option you have is "comparative retyping", where you go through the whole etext, proofing it carefully against the cleared paper book, and changing everything that is different in the etext to match the paper edition. If you do this, you don't need to produce a list of differences, since there won't be any by the time you've finished; you can just submit it as a normal text—and it may well be a lot cleaner! However, if you do take this path, please do a very thorough job on the proofing and comparison.

If the etext you find has been marked up, in HTML for example, you should remove all HTML for the PG edition, because, even though the text itself has been proved to be in the public domain, the original transcribers may hold copyright on the HTML markup, even if you can't find them. If you do want to make a HTML edition of it for PG, strip out all of the original markup and then re-add your own markup.

If you do find the producer and he or she wants to be identified, you may submit a double credits line like:

Transcribed by Sally Wright <theoriginaltranscriber@example.com>
Produced for PG by You <you@example.com>

V.63. I've found an eligible text elsewhere on the Net, but it's not
      in the PG archives. Why should I submit it to PG?

The first reason is file safety.

Yes, we accept that the file is already available to everyone today, but it may not be safe in the long term. We've seen college students who put books on their personal site, and then lose that site when they graduate. We've seen individuals who transcribe several books, and later lose interest, or move, or die, and the work they've done is lost. We've seen small projects with a few volunteers who produce and post books for a few years, but then break up or run out of funds to maintain their site. We've seen large institutions drop their collections as part of a cost-cutting exercise. We've even seen organizations lock public domain works up behind licenses, requiring users to commit to registration and a "no copying" agreement before downloading them.

Whenever a set of etexts is published and distributed by only one person or organization, there is a danger that their etexts will disappear from the Net sometime. We want all etexts to be spread as widely as possible, copied as much as possible, so that no one event or loss, or whim of a sponsor, can obliterate them.

We think that the PG collection is, for that reason, the safest place to put a text for its long-term survival. There are copies of the PG archives all over the world, on public servers and private CDs. PG publications are widely converted, collected and read on PDAs. Other text projects copy works from PG.

The PG archive is so valuable, yet free and easily portable, that even if every current PG volunteer vanished overnight, people around the world would copy and preserve it. Even if PG itself decided to withdraw all our texts, we couldn't do it, because so many people have made copies.

The second reason is legal safety.

Unlike some other projects and individual efforts, PG retains documentary proof of the public domain status of its texts. This is more valuable than it might appear at first glance.

Publishers often claim a new copyright [C.17] on works that they republish, and as time goes on, it becomes harder and harder to prove that a particular book is in the public domain. Walk into your local bookstore and check out how many works by Shakespeare, Poe, Dickens, and Twain have copyright notices on them! People who want to translate these, or create derivative works like screenplays or lyrics or films must first prove that they are basing their work on a public domain edition, but the creeping copyright practices of commercial publishers make that difficult.

Here's a practical example: we were approached by a film student who wanted to make a short piece based on characters from James Joyce's "Ulysses". But before he could do that, he needed to confirm that the material on which he was basing his movie was in the public domain, and all the editions he could find were copyrighted. However, because PG had already established the public domain status of Ulysses, we could point him to our established PD version, and even tell him where to find a paper copy published in 1922. Without that evidence, he could not have made his project.

V.64. I have already scanned or typed a book; it's on my web site.
      How can I get it included in the Gutenberg archives?

Great! We get these a lot, but it's always nice to see another!

You need to send us the TP&V [V.25] so that we can prove that your edition is in the public domain. If you don't have the TP&V, you will need to find a matching paper book with eligible TP&V for us to be able to use it.

V.65. I have already scanned or typed a book; it's on my web site.
      The world can already access it. Why should I add it to the
      Gutenberg archives?

If you want to let readers know that your site has other related
material, you can put that information in the Credits Line [V.47].
Taking a real-world example, you could ask us to add this to the
Credits line for a C. M. Yonge text:

A web page for Charlotte M. Yonge will be found at www.menorot.com/cmyonge.htm

V.66. I have already scanned or typed a book, but it's not in plain text format. Can I submit it to PG?

Yes, of course. We'll be happy to discuss format options with you, and we're quite experienced in converting between multiple formats and deciding which formats work best and will have the longest life. All you need is to get us a copy of your TP&V [V.25].

About author-submitted eBooks:

V.67. I've written a book. Will PG publish it?

Maybe.

PG gets submissions from young people, for example, who just want to get a story they wrote published in PG. We wish them well with their writing, but that's not really why we're here.

If you are a published author, or perhaps an academic who wants to put a textbook into the archives, it's quite likely that we will publish it.

V.68. I have translated a classic book from one language to another.
      Will PG publish my translation?

Yes, if we can.

The book that you translated needs to be in the public domain, and we will need the same proof of eligibility that we would use if you were contributing the book in its original language.

For example, if you were translating Hesse's Siddhartha (published pre-1923 in German, but no pre-1923 English translation available), we would need to copyright clear [V.25] the original German edition from which you worked—it needs to be a pre-1923 or otherwise public domain edition. (We actually did this one, thanks to the hard work and scholarship of some volunteers.)

V.69. OK, this is one of the cases where PG will publish it.
      What do I do next?

You need to decide about copyright issues. Do you want to release your work to the public domain, or do you want to retain copyright? If you want to retain copyright, what terms do you want to release it under? The next few questions deal with those issues.

Having decided that you want PG to publish it, and decided what restrictions (if any) you want to place on further distribution, you just need to write the appropriate letter and send the text to us. [V.46]

V.70. I hold the copyright on a book. Can I release it to the public domain?

You can. All you need to do is put a statement into the released version of the text saying that you have.

Sincerely,

Gregory B. Newby

Once you have released it into the public domain, neither we nor anyone else needs your permission to publish it, but for us to be sure that it is a public domain version, we do need a signed letter.

Absolutely not! For example, many contributors of copyrighted material want to share it with the world, but do not want it commercially republished by other companies.

If you want some related information, like a link to your website, included in the text, we will be happy to oblige.

Once we have posted a text, many people will copy it. We have no effective mechanism for "recalling" texts that we have posted, so please be sure, before you commit to this, that you intend to follow through with it, because there is no way to change your mind later.

Here is a sample letter, including the address to send it to:

Sincerely,

Gregory B. Newby

For PG to be in a position to copy it, we do need perpetual, worldwide, non-exclusive, royalty-free rights to distribute the book in electronic form. What rights you choose to assign to readers after that is a decision for you to make.

The Creative Commons site <http://www.creativecommons.org> may give you some ideas of what practical use you can make of your copyright to see that the work is used in the ways you intended.

About what goes into the texts:

V.73. Why does PG format texts the way it does?

PG texts are formatted as plain ASCII, with 60-70 characters per line, with a hard return [CR/LF] at end of line, and some people ask "Why do it this way? You could omit the hard returns and let the reader's word processor or Reader software wrap the lines. You could use "8-bit" accented characters for non-English characters." "You could use ' - ' instead of '—' for an em-dash." And so on, through a different choice we could make for every formatting feature. And the answer, of course, is that we could do it differently, and sometimes we do, but mostly we keep to one consistent style.

We'll be discussing each of the formatting decisions below, not only giving the summary PG answer, but also discussing the plusses and minuses of each, and the possible options.

Like any question beginning "Why does/doesn't PG . . . ?", the answer is "Because that's what the volunteers and readers want!". These conventions have been worked out over the years, largely by Michael Hart, our founder and chief volunteer, in conjunction with all of us volunteers, as the result of feedback from readers.

We are guided throughout by the principle that we want to produce texts in the simplest format that will adequately express the content. Quoting Michael Hart (1994):

1. to encourage the creation and distribution of electronic texts for the general audience.

  2. to provide these Etexts in a manner available to everyone in terms
     of price and accessibility [i.e. no special hardware or software],
     and no price tag attached to the Etexts themselves.

  3. to make the Etexts as readily usable as possible, with no forms or
     other paperwork required, and as easily readable to the human eyes
     as to computer programs, and in fact, more readable than paper.

There is sometimes a conflict between "simplest format" and "adequately express the content"; further, different people have different views on what is "simple" or "adequate". You, the producer of the text, have spent the time and effort to make the eBook available to the world, you have thought more about it than anyone else, and we respect your informed judgment. However, please make sure that your judgment has been informed, by studying the precedents and reasons behind our guidelines.

Where a simple, standard PG-ASCII layout does not, in your view, "adequately express the content", you should think of making your text in another open format, perhaps HTML or

Just ten years ago, presentation as plain ASCII was not only a universal standard, it was effectively the only way that most people could view the books. The first version of the HTML specification had been drafted, but was unknown among the general public.

In 2002, plain vanilla ASCII is still readable everywhere, but people also want to convert our texts into other formats for more convenient loading on readers and web sites. We therefore have to keep in mind that our works will be processed by automatic conversion programs, none of which is perfect, and we have evolved some "defensive formatting" practices, which, while retaining the universality of plain text, also supply clues to automatic converters about how they should treat the layout. These do help to keep converters from making at least the worst mistakes. The most significant "defensive formatting" practices are indenting unwrappable text like quotations, and using underscores rather than CAPITALS for italics. Different volunteers have different priorities: at one extreme, some people want to make the best plain text they can, giving no weight to conversion issues; at the other, some people emphasize the cues that will allow automatic reformatters to convert the texts well, even if that causes some ugliness in the plain text. Most of us operate somewhere between, making the choices we feel are best depending on the context. Getting a text on-line is the important thing; which choices you make in doing so is a matter of detail.

About the characters you use:

V.74. What characters can I use?

a) You should use plain ASCII for straight English texts.

b) When producing a text partly or completely in a language that requires accents, you should use the appropriate ISO-8859 character set for the language, and specify which you are using, and also provide a 7-bit plain ASCII version with the accents stripped.

c) When producing a text in a language that doesn't use one of the ISO-8859 character sets, you should use the encoding most commonly used for that language. [e.g. Chinese—Big 5]

d) When producing a text containing more characters than can be found in any one of the ISO-8859 character sets, you should use Unicode.

You should use plain ASCII wherever possible—that is, the letters and numbers and punctuation available on a standard U.S. keyboard, without accented letters. The immediate and major exception to this is when you are typing a text written in a language like French or German that requires accents.

There is a problem with using non-ASCII characters. They do not display consistently on all computers; in fact, they do not even display consistently on the same computer! On my computer, for example, what looks like an e-acute in this editor just shows as a black box in another editor, or even using a different font in the same editor. And this is by no means confined to some theoretical minority; we have to deal with it all the time when posting texts.

Further, standards are changing: ten years ago, the character set Codepage 850 [MS-DOS] was very common; now it's rare except in some texts that have survived those ten years.

We want to preserve these texts over centuries, not just decades, and at the moment there is no single clear standard that we can use across all texts. Unicode may perhaps be a future standard, but, right now, it's not something that people use every day, and it's not supported by a lot of common software.

ASCII, while limited, is supported by almost all computers everywhere, so we make a point of always supplying an ASCII version where possible, even if the ASCII version is degraded when compared to the 8-bit original. When we get a text in, say, German, we post two versions of it—one with accents and one without.

V.75. What is ASCII?

Don't get scared by the computer jargon; ASCII (pronounced ASS-key) is just a name for the set of unaccented letters, numbers and other symbols on a standard U.S. keyboard.

ASCII (American Standard Code for Information Interchange) is a set of common characters, including just about everything that you can type in on an English-language keyboard. It includes the letters A-Z, a-z, space, numbers, punctuation and some basic symbols. Every character in this document is an ASCII character, and each character is identified with a number from 0 through 127 internally in the computer.

Just about every computer in the world can show ASCII characters correctly, which makes it ideal for PG's purpose of providing texts that can be read by anyone, anywhere, but ASCII does not include accented characters, Greek letters, Arabic script and other non-English characters, which causes some problems when we produce texts that need non-ASCII characters.

V.76. So what is ISO-8859? What is Codepage 437? What is Codepage 1252?
      What is MacRoman?

Today's computers mostly work on the basis of dealing with one "byte" at a time. A byte is a unit of storage than can contain any number from 0 through 255—256 values in all. It's very convenient for computers to associate one character with each of these numbers, so that we can have up to 256 "letters" viewable from the values stored in one byte. The first 128 values, zero through 127, are defined by ASCII—so, for example, in ASCII, the number 65 represents a capital "A", 97 represents a lowercase "a", 49 stands for the digit "1", 45 for the hyphen "-", and so on.

ASCII doesn't define characters for the values 128 through 255, and in early days computer manufacturers used these values to hold non-ASCII characters like accented letters and box-drawing lines. Of course, 128 wasn't nearly enough values to hold all of the characters that people needed to use for different languages, so they made the character sets switchable, so that a PC in France could use a different set of accented letters from a PC in Poland. Microsoft's version of this was called Codepages. Each Codepage held a different set of non-ASCII characters. Codepage 437, and later Codepage 850, were commonly used for English and some major Western European languages on MS-DOS.

MacRoman was Apple's first codepage, containing most of the accented letters in Latin-derived languages, and MacRoman is still in common use on Apple Macs today.

Later, the International Standards Organization ISO got around to looking at the problem, and defined ISO-8859-1, ISO-8859-2 and so on, as the standards for different language groups. These sets all define the characters 160 through 255 as accented letters and other symbols, and define the 32 characters from 128 through 159 as control characters.

Since Microsoft Windows has no use for the control characters 128 through 159, Windows fonts commonly use Codepage 1252, which has ASCII in the first 128 characters, ISO-8859-1 in characters 160 through 255, and other symbols in the characters 128 through 159. Just to make an already chaotic system worse, all characters can be defined differently in different fonts!

Of course, most of these codepages are incompatible with each other. For example, the byte value 232 shows as a lower-case "e" with a grave accent in ISO-8859-1 and CP1252, a capital letter "E" with diaeresis in MacRoman, a Latin capital letter "Thorn" in CP850, a Cyrillic lower-case "Sha" in ISO-8859-5, a Greek capital letter "Phi" in CP437, and so on. So if you view a text intended for one of these character sets with a program that assumes a different character set, you see gibberish.

The good news, for mostly-English texts at least, is that ISO-8859-1, Codepage 1252 and Unicode agree on the numerical values of the accented characters and symbols to be represented by the values 160 through 255. And everybody accepts ASCII—a pure ASCII file is valid ISO-8859-anything, valid Codepage-anything, and valid Unicode UTF-8.

For more detail about the mappings between Unicode and other formats, you can view Unicode<—>ISO-8859 mappings at ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/ Unicode<—>Windows mappings at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/ and Unicode<—>Apple mappings at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/

If you're not confused enough by now, please read the excellent guide to the whole "alphabet soup" problem at <http://czyborra.com>.

V.77. What is Unicode?

Recognizing that no single set of 256 characters can hold all of the symbols necessary for true multi-lingual texts, ISO 10646 was created. This defined the Universal Character Set (UCS) using 31 bits, which has the potential for a staggering 2 billion characters.

The Unicode Consortium is a group of computer industry companies who agree the Unicode standard. Unicode accepts the ISO 10646 standards, and adds some restrictions and implementation processes. It plans for a modest million or so characters; however, this is enough for all living and extinct languages, and imaginable future ones too.

Using 4 bytes for each character is wasteful, though, when most characters need only one or two, and there are programming problems with implementing 4-byte characters, so Unicode provides Transformation Formats (UTF) which allow the characters to be encoded using fewer bytes where possible. UTF-8 and UTF-16 are common.

UTF-8, which is the most practical of these from the PG point of view, allows ASCII to be encoded normally, and usually uses two or three bytes for other non-ASCII characters.

Because of the extra work needed to support this extra space, and the fact that most people work mostly in one or maybe two languages, Unicode is being adopted only slowly, and most computer programs in 2002 do not fully support it. But when you need to mix Arabic, Greek, Ogham and Sanskrit in one text, it's the only possible answer!

For more about this, go straight to the source at <http://www.unicode.org>.

V.78. What is Big-5?

Big 5 is an encoding of a set of 13,000+ traditional Chinese characters.

V.79. What are "8-bit" and "7-bit" texts?

For practical purposes, 7-bit texts are plain ASCII; 8-bit texts have accented letters.

This comes from computer jargon. You can represent the 128 characters of ASCII using 7 bits—binary digits—but to represent the 256 characters needed for the various codepages and ISO-8859 standards, like accented letters, you need 8 bits. Hence, we call a text that uses non-ASCII characters in a character set like Codepage 850 or ISO-8859-1 an "8-bit" text.

When we post a text as both 8-bit and 7-bit, as we do when ASCII is not enough to render the text acceptably, we name the file with an "8" or a "7" at the start. So, for example, Crime and Punishment by Dostoevsky is named 8crmp10 for the 8-bit version with accents, and 7crmp10 for the 7-bit version without accents.

See also FAQ [R.35]: "What do the filenames of the texts mean?"

V.80. I have an English text with some quotations from a language that needs accents—what should I do about the accents?

If stripping the accents would unacceptably degrade the book, then submit two versions, one "8-bit" with the accents included and one "7-bit" plain ASCII, and we will post both.

This is a hard choice. What constitutes "unacceptable degradation"?

Clearly this is a decision that all of us in PG have to make. It's a very common problem, and different people have different views. For that matter, different print publishers have different views; you will see the words "debris", "facade" and "cafe" printed with and without accents in different books, and even in different editions of the same book.

We don't want to post two versions when we don't have to. It doubles the posting work, doubles the disk space needed, potentially confuses downloaders, doubles the maintenance when we need to correct the text. On the other hand, we don't want to degrade the text.

There is no clear line, no definitive answer to what level of degradation is acceptable. Most producers feel that there is no point in making a separate version when dealing only with a few foreign words thrown in among the English, but when, for example, some significant dialog between the characters is in French or Spanish, it's harder to say that stripping the accents is acceptable. You, the producer, need to decide this on a case-by-case basis. If you're not sure, discuss it with one of the Directors of Production or one of the Posting Team.

If you have made the text with accents, you can choose to make your own 7-bit version and send it to us, or just send the 8-bit version and we'll make the 7-bit version from it. Some people prefer to make their own 7-bit editions; some don't. Whether you use a Microsoft Codepage, one of the ISO standards or MacRoman doesn't matter—we can convert any of them for you.

V.81. I have some Greek quotations in my book. How can I handle them?

There is no way to show Greek letters in ASCII. You have three options:

You can just replace the Greek words with [Greek] to indicate to the reader that you have omitted it.

You can "transliterate" the Greek to ASCII. Greek letters do have a correspondence to plain "Latin" letters—for example, the Greek letter "delta" can be represented by the letter "d". There is a simple PG guide to transliteration at <http://www.promo.net/pg/vol/greek.html>. This practice has had a long and honorable history: words like "amphora" and "hubris", for example, are straight transliteration from the Greek. This is usually the best option.

If there is enough Greek to warrant it, and no other accented characters, you may be able to use the ISO-8859-7 character set, and submit both 7-bit and 8-bit versions [V.79]. ISO-8859-7 is for modern rather than classical Greek, but, if necessary, you will surely be able to express the Greek fully in Unicode. However accurate your Greek, that still leaves the issue of what to do with the 7-bit ASCII version, where transliteration is probably still your best bet.

V.82. I want to produce a book in a language like Spanish or French
      with accented characters. What should I do?

Use the appropriate ISO-8859 Character set [V.76] for your 8-bit version.

About the formatting of a text file:

This section of the FAQ goes into great detail about all kinds of formatting questions. However, looked at from a higher level, the only real issue is that we want to render texts clearly, with formatting that reflects the original, so that readers of the plain text format can read them easily, and people converting them to other formats can do so reliably. When you come across a case that is not covered by the detailed guidelines below, keep this ultimate aim in mind, and make the best decision you can. Don't get hung up for hours or days over a question of formatting—if you want advice, look at how other people have handled the same situation in previous texts, or ask other volunteers for their ideas.

V.83. How long should I make my lines of text?

For normal prose, such as you find in a novel, your lines should mostly be 60 to 70 characters long, not shorter than 55, not longer than 75 except where it can't be helped. Never, ever longer than 80, except where you're trying to render a non-text structure, like a family tree.

For poetry, make the text look as much like the book as possible. This also applies to some plays where the lines are clearly intended to be broken at specific points, whether blank verse or not.

V.84. Why should I break lines at all? Why not make the text as one line per paragraph, and let the reader wrap it?

We could either use 70-character lines and let readers unwrap them if they want to, or use infinite-length lines and let readers wrap them if they want to. We choose to wrap the lines so that they are readable on even the simplest of text editors and viewers.

V.85. Why use a CR/LF at end of line?

CR/LF can lead to double-spacing, notably on Mac and Unix, but at least there is a CR in there for Mac users, and there is an LF for *nix users.

If you don't know or care what this is about, please skip blithely on.

There are three differing standards for how to represent the end of a line of text. In brief, Apple Macs use the CR character. Unix and its variants use the LF character. Microsoft systems, from MS-DOS through Windows, use both together.

If you want the history behind these:

CR stands for Carriage Return, and comes from the old typewriter / teletype idea of a command to move the print head from the right of the page back to the left when it reaches the end;

LF stands for Line Feed, and comes from the old typewriter / teletype idea of a command to move the print head down a line;

CR/LF together indicate moving down a line and back to the left of the page.

The history is not relevant to today's computers in principle, but in practice they all use one of these legacy conventions, and there's nothing we can do about it but pick one.

V.86. One space or two at the end of a sentence?

Whichever you prefer, but if using two spaces, please use them only at the end of a sentence, not after abbreviations like "Dr." and "per cent.", and not after non-sentence-ending punctuation like the question-mark in the sentence: "Must you go? when the night is yet so black!"

Many people have strong views on either side of the "one space or two?" question, and we're not about to try and argue with them. Use whichever is most natural for you.

However, if using two, you take responsibility for deciding where the sentence ends. You can't just place two spaces after every period, question-mark and exclamation mark, since periods are also used for abbreviations end ellipses, and question-marks and exclamation-marks don't always end sentences.

V.87. How do I indicate paragraphs?

Just leave a blank line before each paragraph.

V.88. Should I indent the start of every paragraph?

No.

Printers do this when publishing paper books because they do not leave blank lines in the text, but there is no need for indenting in our eBooks.

V.89. Are there any places where I should indent text?

Yes. You should always make poetry look like the original, and that may mean indenting some lines, for example:

  I was a child and she was a child,
      In a kingdom by the sea;
  But we loved with a love that was more than love—
      I and my Annabel Lee;

Even when poetry doesn't have indented lines, it is a good idea to indent quotations embedded in prose. Remember, others will be converting your text later—to HTML, to PDA reader formats, to formats that don't even exist yet—and much of this conversion will be done automatically, by computer programs. It is very hard for a program to know when it can and can't re-wrap lines to fit a screen size unless it has a clear signal that this line should not be wrapped. This is one of the biggest problems with auto-converting PG texts.

Just about all formatting programs "know" that lines that are indented shouldn't be wrapped, so by indenting lines just a space or two, you can prevent

  I think that I shall never see
  A poem lovely as a tree.

from turning into

I think that I shall never see A poem lovely as a tree.

in some future reader's eBook.

You don't really need to do this in texts where the whole book is poetry or blank verse, since these will probably be recognized as whole books that shouldn't be rewrapped, but when there are a few lines of quotation amid an acre of straight prose, a few spaces will be a life-saver. Even in the original plain text version, the extra spaces serve to set the quotation off from the main text.

You shouldn't get carried away and indent things 20 spaces for this reason, though. Anything up to four spaces is reasonable; more is excessive. If you're indenting many short verses in this way, keep your number of spaces for indentation consistent throughout the book.

There are some other times when you may judge it best to indent, where text is indented in the paper book, like newspaper headlines or pictures of handwritten notes.

V.90. Can I use tabs (the TAB key) to indent?

No.

The problem with tab characters is that they act differently in different applications. Typically a tab will move the text to the next tab stop, which might be four spaces on your PC, but 20, or none, on someone else's. The effects are unpredictable.

V.91. How should I treat dashes (hyphens) between words?

In typography, there are four standard types of dashes: the hyphen, the en-dash, the em-dash, and the three-em-dash.

Originally, printers called these the "em-dash" because it was the same width as the capital letter M in whichever font they were using, the "en-dash" because it was the same width as the capital letter N, and the "three-em-dash" because it was as long as three capital Ms.

The hyphen is used for hyphenated words, like "en-dash" itself, or "to-day" or "drawing-room". For this, you just press the single dash or hyphen key on your keyboard.

In typography, the en-dash is a little longer than the hyphen, and is typically used for duration, where you could substitute the word "to". For example, if you were printing "1830-1874", or "9:00-5:30", you would use an en-dash instead of a hyphen. The en-dash is also sometimes used as hyphenation between words that are already hyphenated, for example, "bed-room-sitting-room" might use an en-dash as its central dash to emphasize that it is a different type of separator from the plain hyphens before "room". However, there is no ASCII character for an en-dash, and we use the hyphen in these cases. (HTML and some character sets do provide separate entities for en-dash and em-dash.)

The em-dash is shown in print as a longer dash, and for PG purposes, you should render it as two hyphens with no spaces around them.

You use the em-dash as a kind of parenthesis—as I am doing here—or to indicate a break in thought or subject within a sentence. There is no ASCII equivalent of the em-dash; there is no key on your keyboard that you can press to get one. For PG texts, we represent the em-dash as two dashes with no space between or around them—like this.

The em-dash can also be used at the end of a sentence or speech to indicate that the speaker stopped or trailed off. For example:

"When I saw you with Emily, I thought you were— I thought she was—"

In a case like this, there may be a space following the em-dash, and the context may demand that there should be a space following the em-dash, not because of the em-dash as such, but to make the break between the statements or sentences clear.

These two hyphens represent one character, so you should never break them at line end, with one hyphen at the end of the first line and the other at the start of the second. If you have an em-dash near line end, you can break the line either before or after the em-dash, but never in the middle.

The fourth type of dash, the three-em-dash, is used to represent a missing word, or an undetermined number of missing letters. You will often see it in a sentence like:

Dr. P——— was known for his honesty.

or

Dr. ——— was known for his honesty.

where there is a convention that the character's name has been redacted. Logically, we should represent the three-em-dash as six dashes, but you may reduce that to four. Whichever you choose, do use it consistently in the text you're producing.

Unlike the em-dash, you should leave a space in such cases wherever a space would have been before the letters were replaced by dashes.

Here's a summary table of the dashes:

Name ASCII Used for

 Hyphen - Hyphenated Words
 En-dash - Durations, like "3:00-5:30"
 Em-dash — Break in sentence or parenthetical comment
 Three-em-dash ——— Indicating a word that was edited out.

V.92. How should I treat dashes replacing letters?

If the dashes obviously represent individual letters, use the same number of hyphens. Otherwise, you can use a three-em-dash (see above: 6 or 4 hyphens) in such places.

A common convention when a character in a novel is using bad language, or when reference is given to a character whose full name is not being used, is to replace the letters with dashes. For example,

"That D—-l, Mr. C———s will regret his hasty actions!"

In this case, it is clear that "D—-l" is meant to represent "Devil" and that there is a character whose name begins with "C" and ends in "s" whose name is not spelled out in full. Where the book makes it clear how many letters are represented by hyphens, just use that number of hyphens.

Where the number of letters omitted is not clear, you can decide how long you want to make your extended dash. Typographers often use the "three-em-dash" for this, so called because it is as wide as three capital Ms. Logically, since we represent an em-dash by two hyphens, we might represent a three-em-dash as six, but if you feel that six hyphens is too long, you can choose a shorter length, like four, but if you do, keep it consistent within your text:

  It was in the town of S——, walking on M—— Street, that
  Sowerby came upon Dr. T—— taking the morning air.

V.93. What about hyphens at end of line?

Remove the hyphens from single words that were wrapped by the printer at line-end on the paper copy. Where two words are joined with a hyphen, you can leave the hyphen at end of the text line.

Books are usually printed with words broken at end of line to make the right side of the text perfectly even. You should remove all such hyphens. For example, in the sentence:

Mary's mouth tightened as she saw the marks on the car- pet, and her hands balled into fists.

you should remove the hyphen from "carpet".

Words which are strung together and hyphenated by the author pose a different question. It is perfectly OK from the point of view of a reader of the plain text version for such a hyphen to occur at end of line, for example:

Now that the guns were silent, convoys brought badly- needed medical supplies and food.

However, be aware that if somebody later rewraps the text for use in a different format like HTML, it is possible that they will introduce a space where it should not be:

Now that the guns were silent, convoys brought badly- needed medical supplies and food.

so there is still a small disadvantage to having a hyphen at line-end.

Sometimes it's not entirely clear whether the hyphen is there because it has to be, or just because it happens to fall at the end of the line:

Daisy rushed to the door, but there were no letters for her to- day, and she retreated sadly.

Sometimes "today" is written as "to-day", especially in older works. So which is this? Should we remove the hyphen or not? In this case, the best thing to do is search the rest of the text for the same word, and see whether it is consistently hyphenated or not in other places.

V.94. What should I do with italics?

There are three different ways volunteers currently render italics: like THIS, like this and like /this/. Pick one, and use it consistently in your text.

There are really two questions here: "How should I render italics?" and "When should I render italics?"

The original PG standard for italics was to render emphasis italics as CAPITALS, using underscores for an italicized I, and do nothing for non-emphasis italics like foreign words and names of ships, and this is still the most common usage. For reading a plain-text file in a plain text editor, it is still arguably the most reader-friendly usage as well.

It has two drawbacks:

1. if you do want to preserve italics for non-emphasis words, you may end up with a very ugly text where there are too many capitals.

2. it is impossible to convert CAPITALS reliably back into italics, since the original text might have had a capital letter, or even been all capitals in the first place. This is especially true of automatic conversion for people who want to read PG texts on eBook readers.

To overcome these problems, many volunteers now use underscores or /slants/ to render italics. These allow you to preserve all italics without creating an ugly plain-text, and to remove the ambiguity of CAPITALS. Underscores are more popular than slants, but some people feel that underscores should properly be reserved for underlined text. Since printers tend to avoid underlines, however, there aren't many books where this causes a real conflict.

V.95. Yes, but I have a long passage of my book in italics! I can't really CAPITALIZE or otherwise /mark/ all that text, can I?

No, you really can't. On the other hand, if the author intended that section to stand out, you don't want to ignore that information and withhold it from future readers.

What you can do is format it differently from the rest of the text. For example, if you're averaging a 68-character line throughout normal paragraphs, you could reasonably use shorter lines, like 58 characters, for the italicized section. Going a step further, you could shorten the lines and indent them a space or two as well. This will give a clear signal to future readers and converters that this section is to be treated specially.

V.96. Should I capitalize the first word in each chapter?

No.

Capitalization of the first word is often used in printed material to emphasize the break at the start of a section or chapter on the paper, but it is not necessary in an eBook, and leads to the same kind of ambiguity as does the capitalization of italics, and for far less reason.

If you feel you really must capitalize the first word, we probably won't stop you, but if so, please do it consistently throughout the book, not just in one or two places, so that a future reader can be certain that these capitalized words were a chapter-head convention, and not otherwise intended for emphasis.

V.97. What is a Transcriber's Note? When should I add one?

A Transcriber's Note is a small section you can add to a text you produce to give the reader some information about changes you made to the book when rendering it into text.

A Transcriber's Note is not the same as a footnote—a footnote is part of the text you have transcribed; a Transcriber's Note is a note that you add to the text, explaining something you have done or omitted. If there is a Transcriber's Note, it may be at the top or the end of the text, and it should be clearly marked so that a reader cannot confuse it with the main text or an introduction.

The main thing is to ensure that a reader cannot confuse text that you have added with text that was in the original book.

Transcriber's Notes are rarely needed, but if, for example, you found misprints in the text, or things that might look like misprints even though they're not, you may note them here, if it seems relevant. If there is an image in the book that is important to the content, you may describe it in a note. If there was unusual typography that you had to represent in some uncommon way, you might well explain that here.

You don't need to add a Transcriber's Note just for common conversions like italics, and you should not use such a note to add your own comments or views about the text or the author. It's just there to let the reader know what decision you have made about rendering the text.

Here are some examples of Transcribers' Notes:

Transcriber's Note:

The irregular inclusion or omission of commas between repeated words ("well, well"; "there there", etc.) in this etext is reproduced faithfully from the 1914 edition . . .

Transcriber's Note:

Inserted music notation is represented like [MUSIC—2 bars, melody] or
[MUSIC—4-part, 8 bars]

[Transcriber's Note: This letter was handwritten in the original.]

Transcriber's Note:

The spelling "Freindship" is thus in the original book.

Transcriber's Note: Some words which appear to be typos are printed thus in the original book. A list of these possible misprints follows:

If there is an image that is important to the content you may describe it at the point in the text where it appears, for example:

[Transcriber's Note: Here there is a map of three islands just West of and parallel to a coastline running SW to NE, with a big X marked on the North of the middle island. A spur of land extends from the mainland, sheltering the islands from the north-east.]

Transcriber's Notes that apply to the whole text should be placed at the start or end of the text—your choice. Notes that pertain to a specific point in the text, like the map example above, should be placed at the point where in the text where they are relevant, but not interrupting a paragraph except where it cannot be avoided.

V.98. Should I keep page numbers in the e-text?

No. But there are exceptional cases . . .

In general, the page numbers of the original book are irrelevant when making a reader's edition for PG; they are annoying and intrusive for anyone trying to read it, and if you did keep them, they would probably be removed by anyone converting it. Get rid of them!

But there are a few books where page numbers are appropriate. Non-fiction books that use page numbers as internal cross-references are the prime example; if, on page 204, the text reads

"Our studies of plants (see pp. 141-145) show that this is true."

and this kind of cross-reference is frequent throughout the text, then it is probably best to keep the page numbers, since it is otherwise very difficult to honor the author's intent.

In the more common case where cross-references exist, but are not frequent, and not essential to the text, you have several choices: leave the cross-references in, meaningless though the page numbers are, remove the cross-references, change the cross-references to something relevant (like "Start of Chapter 12" instead of "pages 141-145"), or, if you can make it work in context, insert references in the text for the cross-references to point to, like [Reference: Plants] and then reformat the cross-reference like "Our studies of plants (see [Reference: Plants]) show that this is true."

There are a few other cases, where the text you create is likely to be the subject of study or reference, in which it may also be desirable to retain page numbering.

When there are pages at the end of the book with notes referring to page numbers, the simplest answer is to change the page number references to chapter numbers, and add a quote from the page referred to if it's not already in the book's end-notes. That way, a reader can search for the phrase.

V.99. In the exceptional cases where I keep page numbers, how should
      I format them?

Within brackets of your choice, with one space either side, simply added to the text at the exact point of the page break. Unless there is some [142] special reason, you shouldn't insert a line break or new paragraph when indicating a page number; just insert it in the text, as I did with "142" above.

You should use whichever of round brackets, (143) square brackets, [144] or curly brackets {145} is not used (or least used) within the main text itself, and then use it consistently. Try to make sure that your page numbers cannot be confused with anything else.

Don't run your[146]page[147]numbers right up against words with spaces omitted; this just makes the text hard to read. Use spaces before and after.

Where the page break is at the start of a chapter or headed section, you can put it on a line of its own, for example:

[148]

CHAPTER XI. PLANTS

Where a paragraph begins on a new page, you should put the page number at the start of the paragraph, as:

[149] With the extinction of the dinosaurs . . .

V.100. Should I keep Tables of Contents?

Yes, but just keep the contents themselves, and not the page numbers for each chapter or section, except where you have kept the page numbers in the whole text. When you have removed the page numbers from the book, it doesn't make much sense to leave them in the TOC.

Here, for example, is a typical TOC. In the original text, each chapter had a page number beside it:

THE DUKE'S CHILDREN

CONTENTS

   1 When the Duchess was Dead
   2 Lady Mary Palliser
   3 Francis Oliphant Tregear
   4 It is Impossible
   5 Major Tifto
   6 Conservative Convictions
   8 He is a Gentleman
   9 'In Media Res'
  10 Why not like Romeo if I Feel like Romeo?
  11 Cruel
  12 At Richmond

Note that I have indented the lines here, to give a sign to automatic converters that these lines should not be wrapped into one paragraph.

V.101. Should I keep Indexes and Glossaries?

If you are working from a pre-1923 publication, then yes.

If you are working from a modern reprint, you must be careful not to take any of the text that might have been added by the modern publisher. If you have any doubt about whether the index or glossary was part of the original printing, you should leave it out. Often with reprints, under your Clearance Line [V.37], you may see an instruction not to use indexes. In such cases, or if there is any doubt at all, don't.

V.102. How do I handle a break from one scene to another, where the
       book uses blank lines, or a row of asterisks?

Use a blank line, followed by a line of 3 or 5 spaced asterisks or dashes, followed by another blank line.

In a printed book, where the point of view switches from one character to another, or some other break in the narrative is made without a new chapter or headed section, the publisher will often denote the break just by a couple of blank lines. This gives the reader a cue to notice that the point of view has switched, and avoids confusion.

However, a printed book cannot be edited or changed, while an eBook will be edited and converted over its lifetime, and it is likely that if you denote this break just by a couple of blank lines, as in the book, your break may be lost. For example, in automated conversion to a PDA reader format, it is common to merge multiple blank lines into one.

In making a PG e-text, you may indicate this break by a couple of additional blank lines, but, if your text is later converted into another format such as HTML, the extra blank lines may get lost in the editing or rendering. Or the person doing the conversion may simply think that the extra blank line was a mistake, and remove it. To guard against this, you should add an unambiguous visual break such as a line of spaced asterisks:

* * * * *

The exact layout of your break is not really important, and you can use whatever format you prefer. Blank line followed by five spaced asterisks followed by another blank. Or you could use two blank lines, and dashes instead of asterisks. Just make sure that future readers can be in no doubt that you intended to indicate a break that was really in the original printed text.

V.103. How should I treat footnotes?

In a printed text, the most common treatment for footnotes is to put them at the end of the page to which they refer. Sometimes, editors gather them all at the end of the book. Footnotes are a real formatting problem for an eBook without defined physical pages; there is no agreement between readers about which is the best way to render them.

There are three basic ways of rendering footnotes in an e-text:

You can insert them right into the text, in brackets, at the point in the paragraph where they occur, with or without an indication that they were originally footnotes. This is only reasonable in a text with very short footnotes.

You can insert them after the paragraph to which they refer, either contiguous with the paragraph or as a new "paragraph" of their own, as I am doing with this one. If the text contains any footnotes longer than a line, [1] you should not try to just append them to the paragraph; you should make a new "paragraph" of them, with a blank line before and after.

[1] Some footnotes can go on not only for several lines, but for several pages!

You can gather all footnotes at the end of the e-text, or to the end of the chapter to which they refer.

Of these three, gathering all footnotes to the end of the chapter or the end of the whole text is probably the friendliest option, since it preserves the original intention of allowing the reader to continue reading the main text without interruption. However, it may involve some renumbering and general note-keeping on your part, and may not be needed where there are only a few short footnotes. You can see an ideal example of this kind of footnote marking in our edition of Darwin's "The Voyage of the Beagle", file vbgle10.txt from 1997, Etext number 944, which you can get from: <ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/etext97/vbgle10.txt>

V.104. My book leaves a space before punctuation like semicolons, question marks, exclamation marks and quotes. Should I do the same?

No.

If you look closely at these "spaces", you will see that they are not as wide as a normal space—they tend to be half to three-quarters as wide. These don't actually represent spaces as such; they were just a convention used by typesetters to make the text feel less cramped, and they did not express any specific intent on the part of the author.

OCR software tends to see them as full spaces, and one of the jobs you typically have to do when editing a text that has been OCRed is to remove them.

In some texts, this also happens following an opening quote, so your
OCR might read a sentence as:

" Hello ! How are you to-day ? "

which you should correct to:

"Hello! How are you to-day?"

Samples of this can be seen in the images used for the FAQ
"Why am I getting a lot of mistakes in my OCRed text?" [S.17]

V.105. My book leaves a space in the middle of contracted words like
       "do n't", "we 'll" and "he 's". Should I do the same?

Unlike the pseudo-spaces before punctuation, these really were intended as spaces indicating the break between words—that is, where we would nowadays contract two words into one, the author or editor has made the contraction, but left them as two separate words.

Since this effect was intended, it is usual to leave the spaces in. Some people who really do n't like this style of spelling do remove them, but generally volunteers want to preserve the text as printed.

V.106. How should I handle tables?

Just line up the information neatly in columns. If you use a non-proportional font [W.5] you will be able to do this reliably. You can also use the dash character "-" , the underscore "_" and the pipe character """ to make borders if you really need to, but it's usually better to omit them. It is, though, often good to indent your table a little, to set it off from the main text, and to avoid the danger of having it automatically wrapped by some converter later. For example, from "The Albert N'Yanza, Great Basin of the Nile" by Sir Samuel White Baker:

TABLE No. 1.

Table for Increased Reading of Thermometer, using 0 degrees 80 as the
Result of Observations for its Error.

    Month. 1861. 1862. 1863. 1864. 1865.
    January. . . — 0'143 0'314 0'487 0'659
    February . . — '157 '328 '501 '673
    March . . . 0'000 '172 '344 '516 '688
    April . . . '014 '186 '358 '530 '702
    May . . . . '028 '200 '372 '544 '716
    June . . . . '043 '214 '387 '559 '730
    July . . . . '057 '228 '401 '573 '744
    August . . . '071 '243 '415 '587 '758
    September. . '086 '257 '430 '602 '772
    October . . '100 '271 '444 '616 '786
    November . . '114 '285 '458 '630 0'800
    December . . 0'129 0'300 0'473 0'645 —

V.107. How should I format letters or journal entries?

Make them look like they are in the printed book. If the signature is indented in the book, indent it in the letter. For example:

 "Sir,
     No consideration would induce me to
 change my resolve in this matter, but I am
 willing to engage your services as my agent
 for a fee of 100 pounds.
                  "H. Middleton"

When a letter appears in the middle of lots of prose, using shorter lines for the letter is an effective way of making the letter stand out, without resorting to indenting the whole thing.

When the book is largely composed of letters or entries, as happens in an epistolary novel or the publication of somebody's letters or journal, you might reasonably leave two or three (but whichever you choose, keep it consistent throughout the book!) blank lines between entries to give the reader a visual cue that the next is not just a new paragraph, but a new entry, for example:

10 pm.—I have visited him again and found him sitting in a corner brooding. When I came in he threw himself on his knees before me and implored me to let him have a cat, that his salvation depended upon it.

I was firm, however, and told him that he could not have it, whereupon he went without a word, and sat down, gnawing his fingers, in the corner where I had found him. I shall see him in the morning early.

20 July.—Visited Renfield very early, before attendant went his rounds. Found him up and humming a tune. He was spreading out his sugar, which he had saved, in the window, and was manifestly beginning his fly catching again, and beginning it cheerfully and with a good grace.

I looked around for his birds, and not seeing them, asked him where they were. He replied, without turning round, that they had all flown away. There were a few feathers about the room and on his pillow a drop of blood. I said nothing, but went and told the keeper to report to me if there were anything odd about him during the day.

11 am.—The attendant has just been to see me to say that Renfield has been very sick and has disgorged a whole lot of feathers. "My belief is, doctor," he said, "that he has eaten his birds, and that he just took and ate them raw!"

11 pm.—I gave Renfield a strong opiate tonight, enough to make even him sleep, and took away his pocketbook to look at it. The thought that has been buzzing about my brain lately is complete, and the theory proved.

This is different from the case mentioned in the FAQ [V.102] "How do I handle a break from one scene to another, where the book uses blank lines, or a row of asterisks?". In that case, we added a row of asterisks because future reformatting or conversion could cause confusion about the scene break that was explicitly signalled by the blank lines on paper. In this case, each new letter or journal entry cannot be mistaken by a careful reader, so we don't need asterisks or dashes to signal that; we're just adding a bit of extra space to make it more readable.

V.108. What can I do with the British pound sign?

The British pound sign cannot be expressed in ASCII, but is very common in the works of English novelists. It evolved as a stylized version of the letter L (from the Latin "Librii"), and it's entirely appropriate to represent it as such, either like:

The horse cost L8 12s. 6d.

or

The horse cost 8l. 12s. 6d.

This works particularly well where an amount is expressed in pounds, shillings and pence (Librii, soldarii, denarii).

Where there is a simple number of pounds, you may prefer just to use the word:

She was a handsome widow with 500 pounds a year.

V.109. What can I do with the degree symbol?

Just type out the word "degrees" or the abbreviation "deg."—for example:

By the time we reached Cairo it was 115 degrees in the shade.

Geographical degrees are more awkward, but should be handled the same way:

It was at 30 deg. 15' E, 14 deg. 45' N.

In general, any symbol can be represented in words.

V.110. How should I handle . . . ellipses?

Just as I did above . . . and here! Leave one space before and after each dot. Do not break an ellipsis over the end of a line. In principle, an ellipsis is one symbol, like an em-dash, and should not be broken at line end.

A special case arises when an ellipsis follows a sentence instead of being in the middle. . . . In this case, put the period after the last letter of the sentence, as you normally would, then follow the usual format for ellipses. You end up with four dots, with spaces everywhere except before the first.

V.111. How should I handle chapter and section headings?

For a standard novel, you can choose either four blank lines before the chapter heading and two lines after, or three lines before and one line after, but whichever you use, do try to keep it consistent throughout.

Normally, you should move chapter headings to the left rather than try to imitate the centering that is used in some books.

V.112. My book has advertisements at the end. Should I keep them?

Most people seem to think "no", and "no" is the safe choice, but opinions vary.

The typical arguments are: "The ads are not part of the author's intent, so you should remove them." vs. "They give a flavor of the original book, so you should keep them". This latter is particularly cogent when the ads are for other books by the same author.

Decide which of these statements best fits your own views in the case you're looking at; after that, it's up to you!

V.113. Can I keep Lists of Illustrations, even when producing a
       plain text file?

Yes. As in the case of the Table of Contents, there is no point in including page numbers when your text doesn't have them, but the list of illustrations itself may go in.

V.114. Can I include the captions of Illustrations, even when producing a plain text file?

Yes.

You can format them as short paragraphs of their own, in brackets, with the word Illustration: followed by the caption, something like:

[Frontispiece: A Flash of Light]

or

[Illustration: Goldsmith at Trinity College]

Don't interrupt a paragraph to insert one, unless the reader really needs to know that the original illustration was in the middle of the paragraph; place the note between paragraphs instead.

V.115. Can I include images with my text file?

Yes, as I have done with the zipped version of the plain-text format of this FAQ, but in general it makes much more sense, if you want to include images, to make a HTML version of the book and include them there, where they are anchored into the text in a predictable way, and leave them out of the text version. But there are exceptional cases, such as this—I included images with this plain-text FAQ because I wanted you to be able to experiment with them using your own OCR package.

If you do include images with plain text, they will be included with the ZIP file, but not downloadable separately with the plain text file; for example, if your file gets named abcde10.txt, and you include images pic1.gif, pic2.gif and pic3.gif, then abcde10.zip will include all four files, but only abcde10.zip and abcde10.txt will be posted, so the images will be available only within the zip file, so, even if you are including images, don't assume that the reader will be able to see them.

If you do include images with plain text, be sure to mention them by filename in a note at the appropriate places in the text file; otherwise readers may not even realize they're there. For example:

[Illustration: Goldsmith at Trinity College—see goldtrin.gif]

If you do include images with a text file, don't make them too big. Readers downloading zip files of plain text expect them to be relatively small; don't burden them with huge downloads they don't want. Use the same kind of rules and processing that you would for a HTML file, or better still, include the images only with the HTML version.

About formatting poetry:

V.116. I'm producing a book of poetry. How should I format it?

Make it look like the original.

The only formatting change that you might consider is to limit the amount of centering. Often, in a poetry book, the title of a poem may be centered, when the body of the verse isn't. This can work on paper, particularly when the page is narrow, but "centering" the title on a 70-column line can mean that the title ends up far to the right of the body of the poem, which looks untidy. And even if you center the title correctly over the body of this poem, the next poem may have longer lines, and so its title may not have the same center as the first poem, and the title of one will be off-center with the title of the next!

If you have this kind of formatting in your book, you should consider moving all of the poem titles to the left margin rather than try to keep compensating for different line centers. It's more consistent, and easier to read, if you just left-align all titles. To see a not-quite-successful attempt at centering the titles over the poems, take a look at the Poems of Emily Dickinson, available from <ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/etext00/1mlyd10a.txt>

In that case, it would have been better to left-align the numbers and titles. Centering isn't really an effective formatting choice in etexts.

V.117. I'm producing a novel with some short quotations from poems.
       How should I format them?

As nearly as possible like they look in the book, with the exception that you should indent the whole verse anywhere between 1 and 4 spaces from the left. This is to give a signal to automatic conversion programs that these lines should not be wrapped.

For an example of a novel with many differently formatted quotations embedded, see the "a" version of Clotel, file clotl10a.txt, Etext number 2046, from the year 2000, which you can find at <ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/etext00/clotl10a.txt>

Some of these quotations touch the left-hand column; today, we would think it better to insert at least one space before every line.

About formatting plays:

V.118. How should I format Act and Scene headings?

Pretty much like chapter headings. You can use 4 blank lines between acts, and 3 blank likes between scenes, or 3 between acts and 2 between scenes. If your book has "END OF ACT/SCENE" footers, leave them in the etext.

You may center act/scene headers and footers if they are centered in the book, but it's usually best to left-align them, for the same reasons it's usually best to left-align poem titles in poetry.

V.119. How should I format stage directions?

Generally, in brackets.

In printed texts, it is common to show stage directions as italics inside brackets. You don't have the option of italics in plain text, and you shouldn't need to use underscores or /slants/, and certainly not CAPITALS, to indicate italics for stage directions. Normal text within the brackets is all you need. It will be immediately clear to a reader that bracketed text consists of stage directions.

[Square brackets] are most common for stage directions, but (round) or {curly} brackets will work too, if there's a reason why they are preferable in the case of your text. Just make sure that you use the same kind of brackets consistently and only for stage directions—don't use round brackets for stage directions if characters' speeches also contain text in round brackets.

Some printed plays follow the convention of not closing brackets when the direction is at the end of a speech or scene. For example: [Exeunt.

Where the book doesn't close the bracket in a case like this, you shouldn't either.

V.120. How should I format blank verse?

Just like normal verse in poetry. Make it look like the printed book. Left-align it, and make one line of etext the same length as one line of print.

Sometimes in blank verse, a speech may start mid-line, and the print reflects that by leaving a space on the left, and starting mid-way. In a case like that, do the same in the etext.

About some typical formatting issues:

V.121. Sample 1: Typical formatting issues of a novel.

Look at the image novel.tif. It shows a page of a novel, with several typical formatting decisions to be made.

We note that there is no end-quote on the first paragraph, but that's OK, since the second paragraph is a continuation by the same speaker, so the first paragraph doesn't need a closequote. There is also an italicized "I", which will end up with underscores, but there is nothing else to give us any difficulty.

In the second paragraph, we have an ellipsis, an italicized French word with an accented letter, the British pound symbol, and an italicized "Here".

The ellipsis is simple.

Let's assume we're making this into a 7-bit text, so we're going to convert the non-ASCII character a-circumflex and the pound sign. The a-circumflex just goes to an "a", but we have several choices we can make about the pound sign.

The italicized "Here" is clearly for emphasis, so we will mark that up. The word "flaneur" is italicized because it is not English, but possibly also for emphasis . . . if the sentence had read "The Major is a fool", with the word "fool" italicized, it would clearly be emphasis. As it stands, we don't know whether emphasis is intended. This doesn't matter if we are just using underscores or /slants/ to render italics, but if we use CAPITALS, we're going to have to impose our best guess on one side or the other.

The third paragraph shows some vaguely familiar squiggles—Greek letters! We hit the PG transliteration guide at </vol/greek.html> and spell it out . . . rough-breathing upsilon = hu; beta = b; rho = r; iota = i; final sigma = s. So the Greek word transliterates as "hubris". Since hubris is a familiar word, we don't need to make a fuss about it, though we may italicize it.

We then have a note, which we will format a little differently from the main text to help it stand out, and a new chapter heading.

We should certainly indent the second line of the Byron quotation to preserve its original form, but we have the option whether or not to indent the first line a little to signal to any future automatic converter that this is not to be rewrapped.

In the first paragraph of the new chapter, we need to get rid of the hyphenation of "Wentworth" at line-end and fix the two em-dashes.

In the second paragraph of the new chapter, we have a long dash between "d" and "l", clearly meant to denote "devil", so we will fill it in with three dashes, and we see a three-em-dash after "Lord H", so we can use six, or possibly four, dashes for that.

Finally, we have a table, a list of money values against names.

Depending on the standards we've chosen to use throughout the book, we could render these details in a variety of ways. For illustration, here are two acceptable possibilities:

"I shall go down to Wokingham", said Middleton, "a few days before the election, and the Major will stay here. I understand that there will be no other candidate, and I shall take the seat.

"The Major is a . . . flaneur. He has no interest beyond his own advancement. I can buy him for a hundred pounds. Here is his answer."

Wallace wondered at the hubris of his friend, and examined the note Middleton thrust upon him.

"Sir, No consideration would induce me to change my resolve in this matter, but I am willing to engage your services as my agent for a fee of 100 pounds. H. Middleton"

CHAPTER XV

THE ELECTION

  Now hatred is by far the longest pleasure;
      Men love in haste, but they detest at leisure.
                                    —— BYRON

On hearing of Middleton's visit, Mr. Wentworth began his preparations. Meeting with Thomas Lake and Riley at the back of the tap-room of The Bull—where the landlord saw to it that they remained undisturbed—he laid out their plan of campaign.

"That d—-l Middleton shall not have the seat," he raved, "not for Lord H———; no, nor for a hundred Lords! We shall see to it that every man's hand is turned against him when he arrives."

Lake unfolded a paper from his vest-pocket and smoothed it
on the table. "Here are the expenses we should undertake."
       Doran L13 10s.
       Titwell L 8 7s. 6d.
       St. Charles L25

* * * * *

"I shall go down to Wokingham", said Middleton, "a few days before the election, and the Major will stay here. I understand that there will be no other candidate, and I shall take the seat.

"The Major is a . . . flaneur. He has no interest beyond his own advancement. I can buy him for L100. HERE is his answer."

Wallace wondered at the hubris of his friend, and examined the note Middleton thrust upon him.

"Sir, No consideration would induce me to change my resolve in this matter, but I am willing to engage your services as my agent for a fee of L100. H. Middleton"

                                                                                                                                                                                                                                                                                                           

Clyx.com


Top of Page
Top of Page