Re: Diacritics and special characters


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Hello Bob and others,

Here at The American University in Cairo, we asked for the UTF-8 diac
table to be installed shortly after the last IUG Conference. At the
conference Doug Randall had explained that this would allow all of the
Arabic diacritics to be displayed on the Web OPAC. I will check on the
delay of this implementation next week. 
We have been anxiously awaiting this ability since we started inputting
Arabic script along with the diacritics that are used with the
transliterated Arabic almost four years ago.
Of course the 2 curly brackets and 3 numbers for each diacritic mark
will still show in Guicat--but I am hoping that this will be history
when we move to Millennium Cataloging.  
Martha Plettner
The American University in Cairo
Cairo, Egypt


Bob Rasmussen wrote:
> 
> In response to questions, here's some background on the subject.
> 
> Innopac stores data in MARC format, which can include a wide variety of
> "scripts". The curly-bracket display is used for cataloging, in order to
> ensure that the exact desired character is in the record.
> 
> When Innopac goes to display (or print) the data, via browser, Java, or
> telnet, it makes use of a "diac" file, which tells the server what diacritics
> (and other scripts) the client can display. This technique originated with
> dumb terminals. A VT220, for instance, can display the characters in the
> Latin-1 set, suitable for Western Europe. III also produced a T160E terminal,
> which could display about 400 character/diacritic combinations. For Far East,
> they support single-country solutions, such as Chinese Big-5, or CCCII for
> combined Chinese, Japanese, and Korean (CJK). They've done some work with
> Thai, Vietnamese, and Indic scripts, I think, but I don't know details.
> 
> In a character-based environment, such as dumb terminals or telnet, the choice
> of diac table is associated with the type of terminal. Users of Anzio (our
> telnet client) who want diacritics will typically choose T160E emulation,
> and those who want Far East will choose CCCII. It really works.
> 
> For Java or web-based servers (I don't know all the product names), there was
> ONE diac file associated with the server.
> 
> There were two shortcomings with this approach, that for me at least came to
> the surface at IUG 15 months ago: 1) There was no way to handle multiple
> language sets, such as Japanese and French, let alone Arabic and anything
> else; and 2) users of web browsers accessing CCCII data needed an add-on
> product, such as UnionWay or WinMass, to translate CCCII to and from
> characters that the PC could handle.
> 
> What was clearly needed for an all-encompassing "diac" file, that would
> translate ALL characters and diacritic combos to a lingua franca. And the
> lingua franca of the web and Java is Unicode, coded as UTF-8. Also, Anzio can
> be configured to process UTF-8 data coming from and going to the host.
> 
> At IUG in Philadelphia recently, III announced that a UTF-8 diac file was
> being released, initially for use with the Java and web products, and
> "later" for the telnet interface. This seems, at least theoretically,
> to be precisely the right solution.
> 
> So now the question: has anyone installed this UTF-8 diac support for web or
> Java? Can you provide your URL and a few sample books to look up?
> 
> And question 2 (of much more interest to me): have they released the UTF-8
> support for the telnet product? If so, has anyone installed that? Again, can
> you provide URL and samples?
> 
> Note that you may still have issues on the client side with:
> 
> 1. What characters are covered by fonts that are installed?
> 
> 2. How does the client handle font switching, if necessary, between languages?
> 
> 3. How well does the client handle combining (non-spacing) diacritics? Do they
> really combine?
> 
> 4. What methods are available for input of non-Roman characters?
> 
> Now, I hope you don't mind, I'll summarize Anzio's handling of these things.
> Anzio can process data to/from the host in UTF8, T160E, CCCII, USMARC, and
> various ISO sets and Windows codepages. It goes through some rather elaborate
> logic to ensure that combining diacritics are handled well, even in cases
> where the font does not contain them (we recommend using Courier New, with all
> the extensions downloadable from Microsoft).
> 
> Anzio does not currently do font-switching. Some users have set up macro keys
> to allow the user to switch fonts as needed. Also, we are testing a font from
> Monotype that has all characters defined in Unicode 3.
> 
> Combining diacritics can be entered several ways. Far East characters can be
> entered with an add-on (such as WinMass), with the Input Method Editor of the
> Windows setup (such as Japanese, on a Japanese Windows installation), or on
> Windows 2000 with all of the IMEs available.
> 
> Printing of all these characters is available in AnzioWin but not Anzio Lite.
> 
> I welcome any corrections, comments, etc. Because I know there are lurkers
> interested, please respond on-list if at all appropriate.
> --
> Regards,
> ....Bob Rasmussen,   President,   Rasmussen Software, Inc.
> 
> personal e-mail: ras@xxxxxxxxxx
>  company e-mail: rsi@xxxxxxxxxx
>           voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
>             fax: (US) 503-624-0760
>             web: http://www.anzio.com