XML and Web Services In The News - 16 November 2006
Provided by OASIS |
Edited by Robin Cover
This issue of XML Daily Newslink is sponsored by Sun Microsystems, Inc.
HEADLINES:
OASIS Forms Technical Committee to Standardize Content Analytics
Staff, OASIS Announcement
OASIS has announced the formation of a new effort aimed at
standardizing semantic search and content analytics. The work of the
OASIS Unstructured Information Management Architecture (UIMA) Technical
Committee will advance a common method for meaningfully accessing data
contained in text such as e-mails, blog entries, news feeds, and notes,
as well as in audio recordings, images, and video. The OASIS work will
be complemented by an Apache Software Foundation incubator project for
developing UIMA-based open source software. "Unstructured information,"
according to the TC's charter, is "all the information that has not
been carefully encoded in enterprise databases but rather exists as
natural language text, speech or video. These applications rely on
the rapid assignment of semantics to huge volumes of unstructured
content exactly so that this content may be structured and exploited
by traditional application infrastructure (e.g., database management
systems, knowledgebase systems, information retrieval systems, etc.).
Examples include natural language documents, email, speech, images and
video. It is information that was not specifically encoded for machines
to process but rather authored by humans for humans to understand. We
say it is 'unstructured' because it lacks explicit semantics
("structure") required for applications to interpret the information
as intended by the human author or required by the end-user application."
David A. Ferrucci of IBM, convener of the OASIS UIMA Technical
Committee: "UIMA will enable the productive use of content that exists
as natural language text, speech, and video. By assigning semantics
to this content, UIMA will allow information to be exploited by database
management systems, information retrieval systems, and other traditional
application infrastructure." OASIS will refine and finalize a set of
UIMA specifications based on an initial contribution from IBM with
input from DARPA, Carnegie Mellon University, Columbia University,
Stanford University, University of Massachusetts-Amherst, MITRE
Corporation, and Science Applications International Corporation (SAIC).
See also: the IBM announcement
Google, Yahoo, Microsoft Partner on Open Source Search Protocol
Juan Carlos Perez, InfoWorld
Strange bedfellows Google, Microsoft, and Yahoo have partnered to
simplify how webmasters and online publishers submit their sites'
content for indexing in the companies' search engines. In a rare
collaborative effort, Google, Microsoft and Yahoo, which compete
directly in Internet search and other online services, plan to
announce on Thursday their support for the open source, Sitemap
Protocol based on XML (Extensible Markup Language). This protocol,
which Google created and has been using for about 18 months, will
be adopted by Yahoo effective Thursday, and the three companies
will collaborate to extend and enhance it. Yahoo has been using
another protocol, which it will continue to support. Microsoft will
stop using its current protocol after it implements Sitemap Protocol
in its search engine in early 2007. A site map is a file that
webmasters and publishers put on their sites to guide the search
engines' automated Web crawlers in properly indexing their Web pages.
Site maps are particularly useful in highlighting to crawlers the
dynamic Web content that is served up on the fly. Crawlers generally
index content contained in static Web pages without problems but
often they have difficulty with dynamic content, such as the one
that is generated as a result of a search query. A site map can be
formatted using various protocols, but this means more work for
webmasters and publishers, which is why Google, Microsoft and Yahoo
are throwing their weight behind the Sitemap Protocol to promote
it as a standard.
See also: XML Sitemap Format
Grid Services Via Open Grid Services Architecture (OGSA)
Daniel Rubio, searchWebServices.com
With XML serving as a catalyst, Web services have allowed many
organizations to expose as well as access many resources that in
years past might have been considered difficult to share. Grids
initially emerged in research and academic projects to confront the
limitations of standalone computing power, allowing applications to
tap processing cycles on machines located across a network. OGSA is
the creation of the Globus Alliance , a community of organizations
and individuals dedicated to advancing grid technologies. Since its
inception more than a decade ago, the Globus Alliance has been a
pioneer in enabling grid applications through its flagship toolkit
Globus, an open-source project that is behind many successful grid
projects, such as: Open Science Grid, Earth System Grid, and Tera
Grid. In its current state, OGSA's building blocks are based on the
Web Services Resource Framework (WSRF) along with other WS-*
specifications like WS-Notification, WS-Security and WS-Addressing.
Version 3 of the Globus toolkit is the earlier OGSI based
implementation, while Version 4 of the Globus toolkit is based on
the more recent WS-* standards, both providing support for Java-,
C- and Python-based grid Web services. Like any other software
application, grids come with their own share of unique requirements
which have to be dealt with from the outset. Prior to the emergence
of OGSA, such choices required organizations to take the plunge on
a particular software suite to achieve results with vendor or
platform lock-in being the norm in grid development. On the other
hand, WS-* standards have achieved a true consensus among Web
services vendors, from ESB producers to platform initiatives like
Project Tango for Java or WCF for .NET, many organizations have
rallied around the use of such standards.
Managing SOA Semantics Using Ontologies and Supporting W3C Standards
Dave Linthicum, InfoWorld
[Part 2 on RDF and OWL] Resource Description Framework (RDF), a part
of the XML story, provides interoperability between applications that
exchange information. RDF is another Web standard that's finding use
everywhere, including SOA. RDF was developed by the W3C to provide a
foundation of metadata interoperability across different resource
description communities and is the basis for the W3C movement to
ontologies such as the use of Web Ontology Language (OWL). RDF uses
XML to define a foundation for processing metadata and to provide a
standard metadata infrastructure for both the Web and the enterprise.
The difference between the two is that XML is used to transport data
using a common format, while RDF is layered on top of XML defining a
broad category of data. When the XML data is declared to be of the
RDF format, applications are then able to understand the data without
understanding who sent it. RDF benefits SOA in that it supports the
concept of a common metadata layer that is sharable throughout an
enterprise or between enterprises. Thus, RDF can be used as a common
mechanism for describing data within the SOA problem domain. Using
these Web-based standards as the jumping-off point for ontology and
SOA, it's possible to define and automate the use of ontologies in
both intra- and intercompany SOA domains. Domains made up of
thousands of systems, all with their own semantic meanings, bound
together in a common ontology that makes short work of SOA and
defines a common semantic meaning of data. Extending from the
languages, we have several libraries available for a variety of
vertical domains, including financial services and e-Business. We
also have many knowledge editors that now exist to support the creation
of ontologies, as well as the use of natural-language processing
methodologies. In other words, we have a standards set of tools to
define, manage, and share application semantics from domain to domain,
including from the enterprise to the Internet, and back. It's time
we started to use them.
Timed Text Authoring Format: Distribution Format Exchange Profile (DFXP)
Mike Dolan, Geoff Freed, Sean Hayes (et al., eds), W3C Technical Report
W3C has announced the advancement of the "Timed Text (TT) Authoring
Format 1.0 - Distribution Format Exchange Profile (DFXP)" specification
to the level of Candidate Recommendation. Members of the Timed Text
(TT) Working Group expect to request that the Director advance this
document to Proposed Recommendation once the Working Group has, for
each test in the DFXP 1.0 Test Suite, demonstrated support by two
interoperable implementations. The Distribution Format Exchange Profile
is intended to be used for the purpose of transcoding or exchanging
timed text information among legacy distribution content formats
presently in use for subtitling and captioning functions. "Imed Text"
is textual information that is intrinsically or extrinsically associated
with timing information. Typical applications of timed text are the real
time subtitling of foreign-language movies on the Web, captioning for
people lacking audio devices or having hearing impairments, karaoke,
scrolling news items or teleprompter applications. The Timed Text
Authoring Format (TT AF) Distribution Format Exchange Profile (DFXP)
provides a standardized representation of a particular subset of textual
information with which stylistic, layout, and timing semantics are
associated by an author or an authoring system for the purpose of
interchange and potential presentation. DFXP is expressly designed to
meet only a limited set of requirements established by the "Timed Text
(TT) Authoring Format 1.0 Use Cases and Requirements" docuument. In
particular, only those requirements which service the need of performing
interchange with existing, legacy distribution systems are satisfied.
In addition to being used for interchange among legacy distribution
content formats, DFXP content may be used directly as a distribution
format, providing, for example, a standard content format to reference
from a 'text' or 'textstream' media object element in a SMIL 2.1 document.
Comments are welcome through 16-February-2007. W3C encourages developers
to implement the specification and share their experience with the
Synchronized Multimedia Working Group.
See also: the Interop report
XML.org is an OASIS Information Channel
sponsored by BEA Systems, Inc., IBM Corporation, Innodata Isogen, SAP AG and Sun
Microsystems, Inc.
Use http://www.oasis-open.org/mlmanage
to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml
for the list archives. |