XML and Web Services In The News - 26 October 2006
Provided by OASIS |
Edited by Robin Cover
This issue of XML Daily Newslink is sponsored by IBM Corporation
HEADLINES:
A Meaningful Web for Humans and Machines, Part 1
Lee Feigenbaum and Elias Torres, IBM developerWorks
The World Wide Web empowers human beings like never before. The sheer
amount and diversity of information you encounter on the Web is
staggering. You can find recipes and sports scores; you share calendars
and contact information; you read news stories and restaurant reviews.
You can constantly consume data on the Web that's presented in a variety
of appealing ways: charts and tables, diagrams and figures, paragraphs
and pictures. The Semantic Web is a mesh of information linked up in
such a way as to be easily processed by machines, on a global scale.
The Semantic Web extends the Web by using standards, markup languages,
and related processing tools. Yet this content-rich, human-friendly
world has a shadowy underworld. It's a world in which machines attempt
to benefit from this wealth of data that's so easily accessible to
humans. It's the world of aggregators and agents, reasoners and
visualizations, all striving to improve the productivity of their
human masters. But the machines often struggle to interpret the mounds
of information intended for human consumption. In this series of
articles we'll examine the existing and emerging technologies that
enable machines and humans to easily access the wealth of Web-published
data. We'll discuss the need for techniques that derive the human and
machine-friendly data from a single Web page. Using examples, we will
explore the relationships between the different techniques and will
evaluate the benefits and drawbacks of each approach. The series
will examine, in detail: a parallel Web of data representations,
algorithmic approaches to generating machine-readable data,
microformats, GRDDL, embedded RDF, and RDFa. In this first article,
you meet the human-computer conflict, learn the criteria used to
evaluate different technologies, and find a brief description of
the major techniques used today to enable machine-human coexistence
on the Web.
Last Call for Pronunciation Lexicon Specification (PLS) Version 1.0
Paolo Baggia (ed), W3C Technical Report
W3C's Voice Browser Working Group has released the second Last Call
Working Draft for "Pronunciation Lexicon Specification (PLS) Version
1.0." The specification defines the syntax for specifying pronunciation
lexicons to be used by Automatic Speech Recognition and Speech
Synthesis engines in voice browser applications. The accurate
specification of pronunciation is critical to the success of speech
applications. Most Automatic Speech Recognition (ASR) and Text-To-Speech
(TTS) engines internally provide extensive high quality lexicons with
pronunciation information for many words or phrases. To ensure a
maximum coverage of the words or phrases used by an application,
application-specific pronunciations may be required. For example,
these may be needed for proper nouns such as surnames or business
names. The Pronunciation Lexicon Specification (PLS) is designed to
enable interoperable specification of pronunciation information for
both ASR and TTS engines within voice browsing applications. The
language is intended to be easy to use by developers while supporting
the accurate specification of pronunciation information for
international use. The language allows one or more pronunciations for
a word or phrase to be specified using a standard pronunciation
alphabet or if necessary using vendor specific alphabets. Pronunciations
are grouped together into a PLS document which may be referenced from
other markup languages, such as the Speech Recognition Grammar
Specification (SRGS) and the Speech Synthesis Markup Language (SSML).
In its most general sense, a lexicon is merely a list of words or
phrases, possibly containing information associated with and related
to the items in the list. Pronunciation lexicons are not limited to
voice browsers, because they have proven effective mechanisms to
support accessibility for persons with disabilities as well as greater
usability for all users (for instance in screen readers and other user
agents, such as multimodal interfaces).
See also: the W3C Voice Browser Activity
CellML Media Type Published as IETF Informational RFC
Andrew Miller (ed), IETF Approved RFC
The IETF RFC Editor announced that a new "Request for Comments"
document is now available in online RFC libraries. "CellML Media Type",
Request for Comment 4708, defines a method for exchanging mathematical
models represented in a CellML Umbrella 1.0 compliant markup language.
The CellML Umbrella format is a standardised markup meta-language for
the interchange of mathematical models. The CellML Umbrella format
provides a common base that is supported by a number of specific
formats used in the interchange of mathematical models. The CellML
Umbrella format provides enough information to determine which specific
language is used to express the model. The syntax and semantics of the
CellML Umbrella format are defined by "CellML Umbrella Specification
1.0". The CellML Umbrella format is an actual media format. Although
CellML Umbrella documents contain elements in namespaces defined by
other specifications such as RDF and MATHML, the elements in these
namespaces do not contain sufficient information to define a
mathematical model, and so CellML provides the information required
to interconnect the different CellML components, as well as the
information required to link CellML components to their metadata. As
such, CellML Umbrella documents are more than just a collection of
entities defined elsewhere, and so a new media type is required to
identify CellML. As all well-formed CellML Umbrella documents are also
well-formed XML documents, the convention described in Section 7 of
RFC 3023 has been observed by use of the '+xml' suffix. The
information in CellML Umbrella documents cannot be interpreted
without understanding the semantics of the XML elements used to mark
up the model structure. Therefore, the application top-level type is
used instead of the text top-level type.
See also: the IETF RFC Editor web site
Sun CEO Sets Open Source Java Time Frame
Paul Krill, InfoWorld
Demonstrating a perhaps more aggressive path than anticipated, Sun
Microsystems is set to announce the open-sourcing of the core Java
platform within 30 to 60 days, Sun President and CEO Jonathan Schwartz
said at the Oracle OpenWorld conference on Wednesday morning
[2006-10-25]. The core platform encompasses the Standard Edition of
Java, and it will be offered via an open source format under an OSI
(Open Source Initiative)-approved license, likely the same one used
for Sun's open source Solaris OS. Sun officials, including Rich Green,
Sun executive vice president for software, have talked about Java
being offered via open source in stages later this year and into 2007.
Parts of it, such as the Java Enterprise Edition, already are available
via open source, with the GlassFish application server constituting
the open source enterprise variant. Schwartz also offered perspectives
on a variety of technology trends. He noted that everyday, new devices
besides PCs and servers are being networked, and he cited an amusement
park's usage of RFID tags in dolls given to children. The dolls are
then used to track children and the formation of lines at the park.
Oil rigs also are being outfitted, he said. Sun focuses on customers
who see technology as offering a competitive advantage rather than just
as a cost center. The company through its Sun Fire servers has become
the industry's fastest-growing provider of x64 servers, he said:
"Special-purpose systems are dying out; at this point, you can take
general-purpose infrastructure and replace almost all custom
infrastructure."
Ellison Says Oracle Knows What's Best for Linux
Renee Boucher Ferguson, eWEEK
Asserting that to win acceptance in big companies Linux requires
enterprise-grade support, Oracle CEO Larry Ellison said his company
would provide full support for Red Hat Linux. While Oracle will
remove the Red Hat trademarks from the Linux it distributes, Ellison
denied that this would in any way "fragment" the Linux Market. Oracle
needs to provide enhanced support for Linux, he contends, because
enterprise customers are holding back on implementing Linux with
Oracle's Grid computing system because of serious support issues.
"The most serious issue: true enterprise support," said Ellison to
a packed-to-the-rafters audience. "If a customer has an issue with
the Linux kernel and a vendor fixes the bug, quite often it's not
fixed in the version the customer is running. It's fixed in the
future version that's about to come out. You have to upgrade to get
the fix. That really is not acceptable to our large customers." What
Oracle's support for Red Hat, now under the aegis of Oracle's
Unbreakable Linux program, is not supposed to be is a death knell for
Red Hat, according to Ellison. For premier support — a level of
service that Red Hat doesn't even offer, again according to Ellison,
Oracle is charging $1,200 per system per year for two processors,
and $2,000 for larger systems. For that package users get two key
features: back-porting and indemnification. Oracle's offer to back-
port bug fixes means it will fix bugs in the version users are on,
regardless of whether it's the latest version. The indemnification
clause means Oracle takes on any legal claims users may be subject
to from companies like the SCO Group — in whose wake indemnification
seems critical.
RELAX NG, the XML Schema Alternative
Ed Tittel, SearchWebServices.com
Those who've been knocking around the XML or Web development communities
for any length of time have come across the work of James Clark, if not
evidence of the man himself. His is a pretty fascinating story, which
you can read more about on his bio page. For the purposes of our
discussion, let's just say he's been around the SGML and XML
communities since the early 90s and has contributed a substantial and
extremely useful body of work. His highlight reel includes an open
source SGML parser he wrote in C, acting as technical lead during the
development of the XML 1.0 Recommendations, enhancing SGML to make XML
a formal subset of SGML, development of expat, "the world's fastest XML
parser," co-authoring the XSL submission, editing the XSLT and XPath
Recommendations, and last and most relevant, developing TREX, a schema
language for XML that pre-dated (and many believe outclasses) XML Schema.
In fact, TREX plus another alternate XML schema language named RELAX,
gave rise to RELAX NG (where NG stands for Next Generation), which is
an OASIS development project and is now also enshrined as ISO/IEC
standard 19757-2. Why bother with RELAX NG when there's XML Schema, a
W3C recommendation also available? Three short answers explain why this
markup language is worth digging into: (1) The language is designed to
be simple and easy to learn, which many would observe is not the case
for XML Schema. (2) The language includes both an XML syntax and a
compact non-XML syntax. It also supports XML namespaces and does not
change the information set of any XML document it touches. (3) It works
with XML Schema Datatypes (just as does XML Schema itself) and can draw
on the expressive power of that markup language.
See also: RELAX NG as DSDL part 2
On Web Standards, Libertarian Candidates Win
Declan McCullagh and Anne Broache, CNET News.com
The Libertarian Party hasn't had much success in [US] national elections:
It garnered just 353,265 votes in the 2004 presidential race and boasts
precisely zero elected representatives in the U.S. Congress. But a
survey of political sites by CNET News.com shows that Libertarian
candidates are ahead in the race to ensure their pages comply with a
widely accepted litmus test for good Web design, which can aid mobile
device users and people with visual disabilities. Of approximately
1,000 campaign Web sites surveyed two weeks before the November 7, 2006
election, only 35 passed the validation tests created by the World Wide
Web Consortium, or W3C. Seven of those were created by Libertarian
candidates, some of whom have degrees in computer or electrical
engineering or count themselves as free-software aficionados.
Republicans came in a close second. Call the Libertarians the political
party of geeks, for geeks. "I'll be the first to admit that we do have
a lot of geeks in the party, and I'm one of them," Shane Cory, executive
director of the national Libertarian Party, said Wednesday. To compile
a list of campaign Web sites to review, News.com used a database of U.S.
House of Representatives and U.S. Senate candidates created by Voter
Information Services, a nonprofit and nonpartisan group. Then we wrote
a computer program to test each campaign Web site against a "validator"
maintained by the World Wide Web Consortium, or W3C, and record and
then sort the results...
XML.org is an OASIS Information Channel
sponsored by BEA Systems, Inc., IBM Corporation, Innodata Isogen, SAP AG and Sun
Microsystems, Inc.
Use http://www.oasis-open.org/mlmanage
to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml
for the list archives. |