Content publishing and republishing has become
daily business on the Internet. Zillions of information sources
are pushed to the Web by thousands of service providers using
hundreds of different publishing and republishing systems. Despite
the increasing need, high quality tools are rare and content management
severely lacks support by sophisticated approaches. To introduce
structure and manageability into publishing and republishing of
information on the Web we created JML, the Jessica Markup Language.
JML is a textual language for specifying and implementing complex
Web applications. Its language features support the separation
of content, structure and layout by defining documents as objects,
layout as classes and complete Web sites as object collections.
Since JML is declarative by nature, it is not only usable for
the generation of new documents from more or less structured content,
but also for the analysis of such documents. This allows to republish
specific content for other channels as will be necessary for handheld
computers or GSM devices in XML based repositories in an syndication
effort.
multi-target publishing, triggered republishing,
reactive databases, syndication, XML
The Internet and the World Wide Web[1]
have currently become the largest information resources for the
online community. Myriads of data are kept all over the globe,
and both experts and mere users keep pushing all kinds of information
onto the Web. Consequently, not even the best indexing machines
can keep track of the enormous distributed knowledge, and individual
users often get lost on a single Web site. Is there a solution
for content providers to shed a light into information overloading?
Managing and maintaining a Web service becomes non-trivial
when the size of the service exceeds a certain limit. Keeping
the data organization, the information mapping to WWW pages, and
the navigation system manageable while providing a consistent
interface in terms of layout and usability are basic requirements
to Web services. Publishing new and republishing portions of the
enormous contents existing on the Internet has become a daily
routine for Web site owners. Although the amount and complexity
of Web presences is increasing, publishing tools for structured
publishing are hardly used, even if running multimodal sites dealing
with several languages or several output channels or complete
Web applications.
Structured publishing separates content from layout
and allows to define a structure of information independent from
content reducing the maintenance costs (daily updates and site
restructuring). Commercial publishing tools are either high-end
positioned[17], targeted to specific markets
(e.g. news papers)[4] or to the low-end consumer
market[14,16].
Technologically based on XML[6],
the authors have begun the development of an advanced publishing
language JML[5] addressing typical problems
for Web site multichannel publishing. As a recent trend in the
Web engineering research world[2,3],
JML employs the strength of the object-oriented paradigm to increase
the flexibility and manageability of complex Web based services.
The proposed infrastructure concentrates on the reuse of information
already published on the Web and abstractly describes the republishing
of contents on various target platforms such as the Web, XML,
WAP[7], WML[8], SMS[12]
and others. An XML-based repository stores information gathered
from relevant resources and provides access for publishing/republishing
services targeted to user needs. Such syndication infrastructures
offer means to analyze weakly structured information on the Internet
and ways to address this structured information afterwards for
a reuse on other Web-sites or alternative distribution channels,
e.g. for handheld or GSM devices.
This article is structured as follows: Section 2
concentrates on the JML language concepts to provide the publishing
and republishing of information. Section 3 describes the syndication
infrastructure and its main components, the repository and the
republishing approaches. An example service illustrates the process
of importing and republishing information using the JML language.
Section 4 gives a short summary and concludes the article with
a glance on future developments in the republishing area.
The Jessica Markup Language (JML) was designed to help developers and administrators to manage information and complete applications on Web sites over their entire life cycle. JML helps to separate content from layout and presents a uniform approach to master static documents, such as HTML as well as dynamically generated objects.
In the following, an introduction to the most basic JML language concepts is presented. JML is entirely defined in XML. The components description will deal with documents, layouts and collections. Most of the concepts of JML can be reduced to only one generalized object concept. For better understanding, the components are assigned with descriptors according to their particular use. The mapping to JML objects will be done transparently by the JML compiler or an optional pre-compilation process.
JML provides object-oriented support to abstractly describe information for the Web, including typical OO-benefits like encapsulation, reusability, and inheritance. The most basic components of JML are pages and layouts.
A simple HTML document may be written in JML as
<jml:PAGE NAME="HelloWorld">
<HEAD> <TITLE>Example I</TITLE> </HEAD>
<BODY>An HelloWorld example. </BODY>
</jml:PAGE>
Stored in the file hello.jml a compiler run will produce exactly the HTML code we defined above and will output it in a file named HelloWorld.html. The file name is derived from the component's NAME-attribute or it can be set explicitly by a destination (DST) attribute in the JML document.
The language supports the use of multiple destinations within one DST attribute for a single component and even to specify differing MIME types as in DST="text/html -> file:hellow.htm ; text/plain -> file:hellow.txt".
The first destination (before the semicolon) is a verbose way to have a hellow.htm file written. The second part will force the compiler to convert the HTML page into some ASCII equivalent before writing to hellow.txt. Which converter to take is configured in the local installation.
One JML file can carry several document definitions. This eases working on Web applications; the manageability of a few JML files is higher than coping with hundreds of Web pages (or a couple of CGI scripts).
Typical Web sites consists of HTML files, some static,
i.e. already generated, some which will have to be generated dynamically
on user request, such as database queries. Regardless of when
a Web page is to be created, all of them should adhere to a common
layout. Bigger Web services, e.g. portal sites, consists of several
services, each of them having their own layout, more or less following
the master layout of the Web site. JML supports to define layouts,
from which specific pages or other layouts can be derived.
<jml:LAYOUT NAME="HelloPretty">
<HEAD><TITLE><jml:COMPONENT NAME="title"/></TITLE></HEAD>
<BODY>
<jml:COMPONENT NAME="what"/>
</BODY>
</jml:LAYOUT>
Two components, title
and what are
defined above. They act as placeholders for particular content.
Whenever a document from this layout is derived, values for title
and what, are
to be provided, respectively. The following example denotes a
document to be derived from HelloPretty
by presenting the name of the generic layout component in the
SRC attribute.
<jml:PAGE NAME="HelloEurope" SRC="jml:HelloPretty">
<jml:PACKAGE>
<jml:SNIPPET DST="jml:this.title">Another Example</jml:SNIPPET>
<jml:SNIPPET DST="jml:this.what">Hello Europe</jml:SNIPPET>
</jml:PACKAGE>
</jml:PAGE>
HelloEurope now has the same structure as HelloPretty. To provide values for the components HelloEurope.title and HelloEurope.what, anonymous text fragments (snippets) were defined. As these snippets only have relevance to HelloEurope, they are encapsulated locally in an anonymous package. In the above example, the page HelloEurope has no body of its own, but inherits it from HelloPretty. The keyword this was used as a reference to the current document. Components can be provided with default values. This means, that not for every component a snippet is required to assign a value to it.
When other layouts are derived from layouts, some
components can be assigned with values, some may be left untouched
and even further components may be introduced to act as additional
placeholders. We can add any number of new components during the
deriving process. In any case, a derived layout will have a more
specific structure that the layout from which it was derived.
A very useful way of supporting the reuse of commonalities in JML is the concept of macros. A specific piece of information may have to appear in several different places on a Web site. Instead of copying it directly for multiple instances, a macro is defined once with this information and references to it are used at the different places.
A macro is defined as
<jml:MACRO NAME="webmaster">webmaster@example.org<jml:MACRO>
and is referenced by an anonymous snippet somewhere
else, e.g.
... and in the case of major earthquakes
please contact <jml:SNIPPET SRC="jml:webmaster"/>
...
During compilation all references to macros will be expanded by their contents. For convenience, macros may contain other macros, but cyclic references are not allowed.
Encapsulation and collections of components are handled in packages. In general, packages can be used to segment a bigger application into smaller pieces to make it easier to manage them independently, maybe by different service administrators. In a typical approach, one package might carry all database functionality of a search engine, while another package only deals with layout and is under the control of a graphic designer.
Packages can contain any JML definition, including other packages. They are used to structure the name space within a JML environment. Documents outside a package do not directly see documents inside. To address those, the package must be named and this package id must be prepended to the document name, like LayoutPackage.MasterLayout. If a package is not named (anonymous) none of its content can be referenced from the outside.
JML is not restricted to governing HTML documents.
Consider a pure ASCII text file like the following:
<jml:PAGE NAME="people" SRC="jml:this'body(text/plain)"
DST="file:peoples.lst">
pg Peel Gehts
mu Murgh Undriesn
dpl Dim Purnas Li
</jml:PAGE>
Here, the SRC attribute explicitly states that the body of the page is to be interpreted as text/plain instead of text/html which is the default for JML pages. The content of the body consist of TAB separated entries, organized in lines. Strong structuring of information provides high manageability for republishing as discussed in Section 3. A compiler will write the list stated above into peoples.lst.
Although most text processing editors provide support
for plain text documents, binary data require some extra treatment.
To enforce a textual representation of other MIME types, like
image/gif the information content may be uuencoded within an JML
body:
<jml:PAGE NAME="WorldImage"
SRC="image/gif <- jml:this'body(text/x-uuencoded)"
DST="file:world.jpg(image/jpeg)">
begin 644 world.gif
M]P(<@Y+`'#L``````^@;(%1E6"<O=71P=70@,3DY.2XP,BXP-CHQ-#`WBP``
M9G1C;W<Y+4-U<G)E;G1086=E(&1R869T8V]P>2U#=7)R96YT4&%G92`Q(&%`
end
</jml:PAGE>
From the SRC and DST attributes the compiler automatically
transfers the textual information into a binary representation
of the image. The converters required in the above example (from
GIF source to JPG destination) are orthogonal to the JML language
concept, particular instances need to be plugged into the compiler.
Nested documents are not a common notion in HTML, though they are implicit when using inline images or frames. JML allows to explicitly nest documents within each other while the interpretation thereof in target languages, like HTML, is burdened into specific compiler parts (embedders) which will be activated when nested objects should be mapped into a target language.
The following example shows an inline image added
into the HelloWorld
examples from section 2.1.1:
<jml:PAGE NAME="WorldIllustrated">
...<BODY> Hello <jml:PAGE NAME="WorldImage"
SRC="image/gif <- jml:this'body(text/x-uuencoded)">
begin 644 world.gif
M]P(<@Y+`'#L``````^@;(%1E6"<O=71P=70@,3DY.2XP,BXP-CHQ-#`WBP`` M``$```````````````````````````````````````````````#.....H`)Y
M``"-H/VC``"@`C"``(V1/@``H/W:``#R```<'"`A('5S97)D:6-T(&)E9VEN
end
</jml:PAGE>
World </BODY>
</jml:PAGE>
WorldImage is now nested inside WorldIllustrated. A typical installation might generate a WorldIllustrated.html and the default embedder will generate a file WorldIllustrated.WorldImage.gif to make an <IMG> reference to. Some embedders could produce a floating frame with the picture inside, others could convert the image to an ASCII equivalent embedding it inside a <PRE> tag.
To embed documents they also can be imported from a remote URL. Depending on the used embedder the result might be a floating-frames solution or a simple HTML page with hyperlinks to newly created copies of the remote documents. Other embedders might strip off <HTML>, <HEAD> and <BODY> tags from the imported documents and place the remaining code directly into the embedding parent document.
Every installation provides a set of embedders. The appropriate embedder is selected depending on the MIME types of the embedded and the enclosing object.
JML provides integrated support of hyperlinks between
Web pages. Within a JML object, logical names of documents can
be used instead of physical file names of where the document will
reside later
<jml:PAGE NAME="HelloVienna">
... And Vienna is part of the <jml:REF DST="jml:HelloWorld'URL">world</jml:REF>
... </jml:PAGE>
The compiler will transform the <jml:REF/>
into an HTML Anchor, like <A HREF="hellow.htm">world</A>
Using logical object names instead of physical file names in references enables the compiler to check link consistency very efficiently, thus guaranteeing referential integrity within a JML-governed Web service. This includes the detection of orphan documents within a Web service that are not referenced at all.
JML provides additional attributes in references
that address features that go beyond the capabilities of unidirectional
links in HTML. The use of the SRC attribute in
<jml:REF SRC="jml:HelloUniverse'URL"
NAME="universe">the universe</jml:REF>
defines an incoming reference which cannot be expressed
in HTML while there are such in other technologies like HyperWave[4].
A typical JML compiler will be at most able to add a label in
the HTML of the target document, while not modifying HelloUniverse
from which the link is originating.
The abstract object-oriented description of an entire Web service improves the handling and management of both the layout and the content. JML directives define how the imported data is mapped into the target language, e.g. HTML.
All data imported into the JML environment is static from the JML point of view. Differences in handling arise from the structure of the data the importing system can expect.
Often, information to be published on the Web is
already stored in another, external resource (file, database,
...). External content can be easily incorporated within a document
using the SRC attribute of a snippet:
<jml:PAGE NAME="...">
... <PRE> <jml:SNIPPET SRC="file:peoples.lst"/> </PRE> ...
</jml:PAGE>
The compiler will access the resource, in this case
a local file named peoples.lst
and will place its contents in between the <PRE/> element.
As the SRC can be arbitrarily complex, some shell processing and
modification on the resource can be done first. The output of
this process is incorporated into the JML definition then. JML
can import whole objects from external sources. This makes it
easy to treat remote documents as if they were local:
<jml:PAGE NAME = "Universe" SRC
= "http://www.universe.org/"/>
At each compiler run, the remote document at the specified URL will be fetched and may be addressed in our local JML definition like a local document. HTTP provides the compiler implicitly with the external document's MIME type, other sources may require explicit type description.
To assemble large projects, entire packages can be
imported.
<jml:PACKAGE NAME="Definitions"
SRC="file:layout-test.jml">
The compiler will put every definition found in layout-test.jml into the package body of Definitions.
Electronic data, however, has already a structure which can be exploited to import specific information. Primitively there are table-oriented structures which are organized in rows and columns. JML allows one to import such information on a record-by-record basis.
The following example imports information from the
peoples.lst document
of section 2.1.2 and puts the names mentioned therein into an
unnumbered list:
<UL>
<jml:SNIPPET NAME="loop" SRC="file:peoples.lst <-> jml:line2match">
<LI><jml:SNIPPET SRC="jml:loop.name"/>
</jml:SNIPPET>
</UL>
A named snippet loop
contains a SRC attribute which includes two source references
separated by the matching operator '<->'. The first reference
obviously refers to the information to be imported, the second
to a pattern which has to be declared within the current JML scope:
<jml:PAGE NAME="line2match" SRC="jml:this'body(text/plain)">
<jml:COMPONENT NAME="initials"> <jml:COMPONENT NAME="name">
</jml:PAGE>
line2match defines the line structure of a record in the file peoples.lst, namely 2 fields, separated by TABs with a linefeed for the record end.
Because of the matching operator in the SRC attribute
of loop, the compiler will read any
of the referenced objects and will try to match them against each
other. It will match the first line of peoples.lst
against line2match
and will bind the two components initials
and name to the
values pg and
Peel Gehts, respectively.
Once such a match is complete, the compiler will expand the body
of the loop snippet with these values
rendering a "<LI>Peel Gehts" for the first line
of peoples.lst.
This process is repeated until there is no unprocessed data in
peoples.lst resulting
in
<UL>
<LI>Peel Gehts
<LI>Murgh Undriesn
<LI>Dim Purnas Li
</UL>
Another option is to import data and generate one
document per record:
<jml:PACKAGE NAME="PeopleCollection" SRC="jml:line2match <->
file:peoples.lst">
<jml:PAGE NAME="jml:PeopleCollection.initials">
... <BODY> ...<jml:SNIPPET SRC="jml:PeopleCollection.name"/>... </BODY>
</jml:PAGE>
</jml:PACKAGE>
Again, line2match is used to iterate over the lines of peoples.lst. The loop is iterating inside a package which declares one HTML page each. The contents of the page uses the matched components like in our previous loop example. Finally, we end up with a package named PeopleCollection containing three HTML documents named pg, mu and dpl.
Structured data is not necessarily only data organized
in tables. Even HTML itself is regarded as a document structure
description language (if we ignore rendering features). JML allows
one to import complete HTML documents, analyze their structure
and use the analyzed parts in other components.
<jml:PAGE NAME = "SportEvents" SRC
= "http://www.news.com/sports.html"/>
The mere import of an external page opens a data
stream. To structure this HTML stream, a description of the general
layout of the sports.html
page is needed to match against the contents of an actual page.
Such a pattern will consist of static, invariable parts, mainly
concerning layout, and variable information which might be different
every time the page is fetched. Such patterns can be built with
layouts as defined in section 2.1.1:
<jml:LAYOUT NAME="SportsEventsPattern">
<HTML><HEAD>...
<H2>Today's Events</H2>
<jml:COMPONENT NAME="Events"/>
</jml:LAYOUT>
We use SportsEventsPattern
to match against a freshly fetched page:
<jml:PAGE NAME="SportsEventsSection"
SRC="http://www.news.com/sports.html <->
jml:SportsEventsPattern"/>
SRC-ing from two references will cause the compiler
to match both streams. If this is successful, SportsEventsSection.Events
will contain values which might be used in another document, e.g.
a digest. For real-world applications, however, such patterns
are too primitive. Additional JML elements to declare alternatives
(<jml:ALT/>, <jml:ALTSET/>) and repeating (<jml:SEQ/>)
patterns are used to describe more complex pattern structures.
<jml:LAYOUT NAME="SportsEventsPattern"> ...
<BODY>
<jml:ALTSET>
<jml:ALT NAME="NoEvents"> No events today. </jml:ALT>
<jml:ALT NAME="Events"> <H2>Today's Events<H2>
<UL> <jml:SEQ NAME="EventList" N="+">
<LI><jml:COMPONENT NAME="SingleEvent">
</jml:SEQ>
</UL>
</jml:ALT>
</jml:ALTSET>
</BODY>
</jml:LAYOUT>
SportsEventsPattern accepts now either a completely empty list with the text 'No events today.' or an <H2> header together with an <UL> list with events. Alternatives are bracketed by <jml:ALT/>, all alternatives are bracketed inside an <jml:ALTSET/>.
In the case that the second alternative matches, the list must contain at least one SingleEvent. This is specified by the <jml:SEQ/> element which carries the attribute N. The value '+' means that the enclosed pattern must occur at least once, but may occur arbitrarily often else. The value for N can be also '*' which poses no restrictions at all or a positive number which exactly specifies how often the enclosed pattern is expected to occur.
Once the complete match is successful, the component
SingleEvent carries
the list of all matched values. To iterate over this list for
republishing, we treat SingleEvent
as stream and match it against the pattern any
which will accept everything in its only component text
(it is one of the predefined objects in JML):
<jml:PAGE name="any"><jml:COMPONENT
NAME="text"></jml:PAGE>
The following object presents a bandwidth conserving
republishing description for handheld media:
<jml:PAGE NAME="MyEvents">
... My event list for today (rendered for Palm III):<BR>
<jml:SNIPPET SRC="jml:EventsSection.EventList.SingleEvent <-> jml:any" NAME="EventLoop"> <jml:SNIPPET SRC="EventLoop.Text"/><BR>
</jml:SNIPPET>
</jml:PAGE>
Independently from the way information was imported
into JML scope, the language provides facilities on how the information
should be republished. Previous subsections described how layouts
can be used for republishing data. Conceptually, a layout represents
a set of documents, i.e. those documents which are potentially
derivable from this layout. Opposed to the static information
discussed above, this section concentrates on the demonstration
of how a layout can be used at runtime as it is necessary when
writing server side scripts, e.g. for accessing databases. JML
offers two approaches, integrated and delegated:
Integrated:
Here a document is derived from the layout as if
it were a static document. It will, however, contain scripting
segments in a specified programming language. For a specific backend
technology a corresponding embedder will produce code. For CGI,
for example, the following object would result in one Perl-CGI
script, where all static texts are output via print
statements. For Mason[21] the Perl code
would be enclosed by <%perl>
- brackets.
<jml:PAGE NAME="QueryResult" SRC="jml:SomeBeautifulLayout">
... <jml:SCRIPT TYPE="text/perl" SERVER>
unless ( $dbh = Mysql->connect( ) ) {
# not ok, write log and output rest of page, abort
}
unless ( $sth = $dbh->execute("Select * from ") ) {
# not ok, write log and output rest of page, abort
}
</jml:SCRIPT>
<H2>Results</H2>
<jml:SCRIPT TYPE="text/perl" SERVER>
while ( $sth->fetch_row( ) )
{ print "Result: ...."; }
</jml:SCRIPT> ....
</jml:PAGE>
Delegated:
In bigger projects all design specific parts usually are delegated to an HTML designer who will deliver HTML code. The manual incorporation of design information into scripts by programming is tedious. JML supports the integration and manipulation of templates which the script can use to avoid layout details during accessing online resources.
The problem with this approach is that the programmer
has no control on the correctness of the use of placeholders inside
the templates, while the scripts typically rely on those. Also,
changes in the scripts might have impacts on templates and vice
versa. Another problem are the myriads of templates one needs
for all realistic situations. These, however, can also be managed
with JML.
In the simplest case the JML compiler generates a
template for every required layout. For HelloPretty
in section 2.1.1 there will be a HelloPretty.tpl
file containing the text
<HEAD><TITLE>$data{title}</TITLE></HEAD>
<BODY>$data{what}</BODY>
as Perl string with the components title
and what replaced by $data{title}
and $data{what}, respectively.
Given our preference for Perl we could use a subroutine from a
package to expand this template with values:
$s = &JML'expand ("HelloPretty.tpl",
(title => "Hi", what => "World"));
$s contains afterwards the completely expanded template. Any fields not provided explicitly with values will be left blank. Often, however, one needs several related templates in an application. For a database query the following situations should be covered:
Whereas cases (1) and (2) can be dealt with by simple
layout templates, cases (3) and (4) need more flexibility, since
the number of matches and the page size cannot be hard-coded easily.
The following JML code shows the use of <jml:SEQ/> for repetitions
and <jml:ALT/> for alternatives. The compiler will create
one template for each alternative. The script can use them according
to the number of matches.
<jml:LAYOUT NAME="Matches" SRC="jml:HelloPretty">
...
<jml:ALTSET>
<jml:ALT NAME="error"> The database is currently not available.
</jml:ALT>
<jml:ALT NAME="noresult"> There was no result matching your query.
</jml:ALT>
<jml:ALT NAME="someresult"> <H2>Results</H2> <UL>
<jml:SEQ NAME="matchlist" N="+">
<LI> <jml:COMPONENT NAME="singlematch"/>
</jml:SEQ> </UL> ...
<jml:ALTSET NAME="continuation">
<jml:ALT NAME="nomore"> No further results. </jml:ALT>
<jml:ALT NAME="more"> <A HREF="......">More</A> </jml:ALT>
</jml:ALTSET>
</jml:ALT>
</jml:ALTSET>
</jml:LAYOUT>
Without going into details, the compiler will automatically
generate a template for every possible constellation.
As we have already seen, JML is no publishing language,
it is more a republishing language, allowing the transformation
of external resources such as databases, local files, or even
external documents, whatever their structure may be into other
documents. Focusing on republishing, we understand JML as a powerful
filter converting one set of (hypertext) documents to another.
Figure 3.1 demonstrates the conversion of document sets. Another publisher might reuse a previously generated document set, as it is daily practice with news agencies.
With the introduction of handheld computers and wireless devices (pagers, cellular phones) increasing pressure exists to republish selected information on these channels. Unfortunately, the involved partners do not only use incompatible document formats, they ship information completely rendered for inspection by the end user. This applies not only to agencies but to every Web site operator since any organization which runs a Web site is regarded as a publisher on the net.
Let us consider a local automobile club which publishes
the current traffic congestion warnings onto an HTML page on their
server, say, http://www.wild-drive.org/jam.html:
<DL>
<DT>Wien (11:09)
<DD>Südosttangente, Richtung Verteilerkreis, Stau wegen Bauarbeiten
<DT>Burgenland (10:10)
<DD>Südautobahn, Richtung Wien, Stau wegen Geisterfahrer
</DL>
With the increasing amount of volatile information-news messages become obsolete at some time-automatic processing is mandatory. This was the motivation for XML which allows one to add application/domain specific tags to documents enabling applications to add semantics later. Furthermore, XML lets information engineers structure content; more than was possible with HTML itself. While XML gains importance and there exist tools to convert non-XML conforming HTML into XHTML, the bad news is that a typical Web site operator will not have the resources to XML-ify the content. So, for a republishing infrastructure the following challenges exist:
Technology gap
Not every content provider/republisher can handle an XML based infrastructure. Serving channels like WML/WAP, SMS, CDF[13], Avantgo[15], etc. is definitely out of reach for most. Still the majority of Web sites are operated without publishing tools. Even if they were, most of today's used publishing tools cannot handle XML.
Coordination gap
Even if all are speaking XML, everyone would have to engage with every other party on a bilateral basis when exchanging information. While this is desirable for private information, it will not be for news messages which by nature are addressed to the open public. These will have to be directed in a correct but also timely manner between the concerned parties. Some information will be pushed by the provider towards the consumer if it is important to notify subscribers. Other information is better pulled from the publisher when the very latest status is relevant.
Neither XML itself nor its descendants cover this
coordination, while promising approaches exist[11].
Especially, applications which have to use information from different
content providers require much attention at high cost.
In the following we suggest a syndication infrastructure which is supported to fill the above gaps:
We define a resource to be a description (meta information),
typically encoded in RDF[9]. Aside from bibliographic
information (author, title, categories, copyright, rating) the
meta information also includes optional relations to other resources
(is-obsoleted-by, extends, is-similar-to) and also a time horizon.
This time horizon defines how many copies of instances of this
resource should be kept by the infrastructure archived.
The traffic report of the running example is described in RDF
as follows:
<?xml version="1.0">
<rdf:RDF>
<rdf:Description about="http://repository/wilddrive/jam"
s:Publisher="Wild-Drive Club"
s:Agent ="rho@telecoma.net"
s:Title ="Traffic Jams"
s:History ="1 week"
s:Costs ="ATS 0" />
</rdf:RDF>
To allow addressing of a particular resource every
resource has a unique id. The addressing scheme is derived from
the URL space, allowing the repository to be distributed. There
is no immediate need for location transparency[22].
Navigation through the resource repository is either done along
the categories known to the infrastructure or by search engines
over the resource name space. The structure of the information
is, of course, encoded in a DTD, the information itself is a conforming
XML document:
<!ENTITY % jamseq "jam+">
<!ELEMENT % jam>
<!ATTLIST jam
REGION CDATA # REQUIRED
DETAILS CDATA # REQUIRED
DAYTIME CDATA # REQUIRED
>
<!DOCTYPE jams SYSTEM "jams.dtd">
<JAM REGION ="Wien"
DETAILS=" "
DAYTIME=" " />
<JAM REGION=
/>
Additionally, resources are allowed to be dependent
on others, i.e. parts of one resource can also be part of another.
For these shared parts one resource is authoritative, i.e. changes
in the authoritative resource will be reflected in the dependent
resources.
Information can be published into the infrastructure by the content provider (or someone who acts on behalf of them). This upload might trigger republishing actions on dependent resources. As an example, traffic congestion warnings are pushed by automobile clubs. Consumers may have set up SMS messages which will be fired off whenever a relevant warning comes in.
Alternatively, information can be pulled by the repository
whenever the resource is requested directly or by one of its dependents.
A typical application thereof can be a personal news paper on
the Web which contains a reference to e.g. a stock price. Whenever
the news paper page is requested, the repository will have to
get the latest value of the stock. In either case the repository
will hold a specifyable history of values.
Once information is available in XML, it can be downloaded
as a whole or in parts. For the latter XQL[10]
should be used for selection. While this is more or less sufficient
to select relevant portions out of an XML tree, additional selection
criteria will be supported. The following query on the traffic
jam example shows restrictions for a certain region and a time
interval:
pathexpr=/jam[@REGION="Wien"]&from=1999-12-01&until=2000-01-02
For efficiency, modal operators are added for comparison;
for privacy encryption and signatures are used.
Problems arise, when the information at the content
provider is not available in XML at first hand. Instead of changing
the providers backend, JML's pattern matching facilities of the
import concept is used to analyze documents (typically some HTML
flavor) and relevant snippets are extracted out of them. At this
stage appropriate XML tags are added. The layout for the traffic
jam page at www.wild-drive.org is defined in XML as:
<jml:LAYOUT NAME="jam-session">
<DL>
<jml:SEQ NAME="jam-loop">
<DT><jml:COMPONENT NAME="REGION" /> ( <jml:COMPONENT NAME="DAYTIME" /> )
<DD><jml:COMPONENT NAME="DETAILS">
</jml:SEQ>
</DL>
</jml:LAYOUT>
As the manual process of configuring layout patterns is error prone and tedious HTML editors are suggested that should:
(a) collect various samples of the information over time
(b)analyze variations in the content, auto-detect alternatives and loops
(c) highlights variable portions and
(d) suggests layout patterns
The information engineer only needs to fine-tune the relevant
parts of the document stream and define an appropriate DTD. An
appropriate editor will be particularly helpful whenever the structure
of the document changes significantly, e.g. after a layout redesign.
As long as the structure is stable, the repository can automatically
extract information and correlate it into the XML world:
<jml:PAGE NAME="jamming-in"
SRC="http://www.wild-drive.com/jam.html
<-> jml:jam-session" />
<jml:PAGE NAME="jamming-xml" SRC="jml:this'BODY(text/html)">
<!DOCTYPE jams SYSTEM = "jams.dtd">
<jml:SNIPPET NAME="jam-list-iterator"
SRC="jml:jamming-in.jam-loop" >
<JAM REGION="<jml:SNIPPET SRC="jam-list-iterator.REGION" />"
DETAILS="<jml:SNIPPET SRC="jam-list-iterator.DETAILS" />"
DAYTIME="<jml:SNIPPET SRC="jam-list-iterator.DAYTIME" />"
/>
</jml:SNIPPET>
</jml:PAGE>
For the same reason as for the repository there is a need for a service which allows "XML challenged" organizations and individuals to profit from an XML repository. We propose the following principles:
Multitarget publishing
One and the same information can be published in different contexts, such as in Web pages on remote Web sites, as text-only versions for hand-helds or emails. Aside from content aspects, there is also the technological aspect of how particular application logic is represented in different contexts. Typically, a Web site will use some scripting languages to access databases to deliver results rendered in HTML. To offer a comparable functionality on a hand-held device, e.g. a WAP-enabled GSM phone, a completely different technology has to be used. JML does not cover this.
For the delivery to other targets standard protocols like FTP or DAV, HTTP can be used.
Timely publishing
Publishing, and especially the compilation should depend on particular events, such as the arrival of a news update at a specific time.
Separation of concerns
The more complex the new context is, the more there
is a need to decouple structure management (webmastering - where
is what, navigation modeling), layout management (designing -
how is a particular information rendered) and content management
(editing - what should information contain). While proper handling
of these aspects requires more planning, this approach reduces
the long-term costs.
In a more pragmatic approach we suggest a ready-to-go
interface for non-technical-minded consumers. These users are
provided with prepared publishing solutions, so called e-clips.
For the traffic messages example, the user only has to provide
a GSM number to get the messages delivered via SMS to his mobile
phone. Fine-tuning by setting up filters can be postponed to a
second step. Other examples include an online magazine prepared
as CDF channel or the daily sports events via an email newsletter.
While syndication is a well established industrial concept, open syndication [18] based on XML must prove itself in a commercial setting. Our next steps include a proof-of-concept implementation which serves as basis for a generic business model. A working system will help us to understand or adopt other approaches.
In the language sector the relationship between JML and XSL(T)[19] is worth discussing. While JML covers primarily a transformation between document sets, XSLT will become the prominent transformation mechanism when applying formatting objects to XML documents. Recently the W3O adopted XSL(T) as a language to derive (XML) documents from XML documents and XSL:FO as a language to control rendering, positioning XSL somewhere between CSS and DSSSL.
While sharing the declarative nature with XSLT, JML is more biased towards web applications as it directly supports link and template management while offering a class concept and a natural way to embed objects of differing MIME types within each other. XSLT assumes only to source XML documents while JML is open to other document types and can even treat weakly structured documents allowing a pragmatic migration path into the XML world. As a downside, JML cannot exploit structural elements of XML documents when it comes to detect specific patterns and to apply templates for output. If such is necessary, then XSLT transformed can be added to JML's infrastructure. In this sense, JML's functionality is more a generalization of XML processors like Cocoon[23].
Regarding to syndication a deeper analysis between
our operation model and that of ICE[11]
as well as to industrial efforts (e.g. Netscape[20])
is necessary.