Copyright ACM, 2000

Syndication with JML

Robert Barta

Bond University
Gold Coast, Queensland
4229 Australia
+61 7 5595 1121

rho@telecoma.net

Markus Schranz

Ernst&Young Unternehmensberatungsges.m.b.H
Aspernbrückengasse 2
1020 Vienna, Austria
+43 1 21163 / 8416

Markus.Schranz@at.eyi.com

Abstract

Content publishing and republishing has become daily business on the Internet. Zillions of information sources are pushed to the Web by thousands of service providers using hundreds of different publishing and republishing systems. Despite the increasing need, high quality tools are rare and content management severely lacks support by sophisticated approaches. To introduce structure and manageability into publishing and republishing of information on the Web we created JML, the Jessica Markup Language. JML is a textual language for specifying and implementing complex Web applications. Its language features support the separation of content, structure and layout by defining documents as objects, layout as classes and complete Web sites as object collections. Since JML is declarative by nature, it is not only usable for the generation of new documents from more or less structured content, but also for the analysis of such documents. This allows to republish specific content for other channels as will be necessary for handheld computers or GSM devices in XML based repositories in an syndication effort.

Keywords:

multi-target publishing, triggered republishing, reactive databases, syndication, XML

1 Introduction

The Internet and the World Wide Web[1] have currently become the largest information resources for the online community. Myriads of data are kept all over the globe, and both experts and mere users keep pushing all kinds of information onto the Web. Consequently, not even the best indexing machines can keep track of the enormous distributed knowledge, and individual users often get lost on a single Web site. Is there a solution for content providers to shed a light into information overloading?

Managing and maintaining a Web service becomes non-trivial when the size of the service exceeds a certain limit. Keeping the data organization, the information mapping to WWW pages, and the navigation system manageable while providing a consistent interface in terms of layout and usability are basic requirements to Web services. Publishing new and republishing portions of the enormous contents existing on the Internet has become a daily routine for Web site owners. Although the amount and complexity of Web presences is increasing, publishing tools for structured publishing are hardly used, even if running multimodal sites dealing with several languages or several output channels or complete Web applications.

Structured publishing separates content from layout and allows to define a structure of information independent from content reducing the maintenance costs (daily updates and site restructuring). Commercial publishing tools are either high-end positioned[17], targeted to specific markets (e.g. news papers)[4] or to the low-end consumer market[14,16].

Technologically based on XML[6], the authors have begun the development of an advanced publishing language JML[5] addressing typical problems for Web site multichannel publishing. As a recent trend in the Web engineering research world[2,3], JML employs the strength of the object-oriented paradigm to increase the flexibility and manageability of complex Web based services. The proposed infrastructure concentrates on the reuse of information already published on the Web and abstractly describes the republishing of contents on various target platforms such as the Web, XML, WAP[7], WML[8], SMS[12] and others. An XML-based repository stores information gathered from relevant resources and provides access for publishing/republishing services targeted to user needs. Such syndication infrastructures offer means to analyze weakly structured information on the Internet and ways to address this structured information afterwards for a reuse on other Web-sites or alternative distribution channels, e.g. for handheld or GSM devices.

This article is structured as follows: Section 2 concentrates on the JML language concepts to provide the publishing and republishing of information. Section 3 describes the syndication infrastructure and its main components, the repository and the republishing approaches. An example service illustrates the process of importing and republishing information using the JML language. Section 4 gives a short summary and concludes the article with a glance on future developments in the republishing area.

2 JML Language Concepts

The Jessica Markup Language (JML) was designed to help developers and administrators to manage information and complete applications on Web sites over their entire life cycle. JML helps to separate content from layout and presents a uniform approach to master static documents, such as HTML as well as dynamically generated objects.

In the following, an introduction to the most basic JML language concepts is presented. JML is entirely defined in XML. The components description will deal with documents, layouts and collections. Most of the concepts of JML can be reduced to only one generalized object concept. For better understanding, the components are assigned with descriptors according to their particular use. The mapping to JML objects will be done transparently by the JML compiler or an optional pre-compilation process.

2.1 Publishing Information

JML provides object-oriented support to abstractly describe information for the Web, including typical OO-benefits like encapsulation, reusability, and inheritance. The most basic components of JML are pages and layouts.

2.1.1 Pages and Layouts

A simple HTML document may be written in JML as

<jml:PAGE NAME="HelloWorld">

<HEAD> <TITLE>Example I</TITLE> </HEAD>

<BODY>An HelloWorld example. </BODY>

</jml:PAGE>

Stored in the file hello.jml a compiler run will produce exactly the HTML code we defined above and will output it in a file named HelloWorld.html. The file name is derived from the component's NAME-attribute or it can be set explicitly by a destination (DST) attribute in the JML document.

The language supports the use of multiple destinations within one DST attribute for a single component and even to specify differing MIME types as in DST="text/html -> file:hellow.htm ; text/plain -> file:hellow.txt".

The first destination (before the semicolon) is a verbose way to have a hellow.htm file written. The second part will force the compiler to convert the HTML page into some ASCII equivalent before writing to hellow.txt. Which converter to take is configured in the local installation.

One JML file can carry several document definitions. This eases working on Web applications; the manageability of a few JML files is higher than coping with hundreds of Web pages (or a couple of CGI scripts).

Typical Web sites consists of HTML files, some static, i.e. already generated, some which will have to be generated dynamically on user request, such as database queries. Regardless of when a Web page is to be created, all of them should adhere to a common layout. Bigger Web services, e.g. portal sites, consists of several services, each of them having their own layout, more or less following the master layout of the Web site. JML supports to define layouts, from which specific pages or other layouts can be derived.

<jml:LAYOUT NAME="HelloPretty">

<HEAD><TITLE><jml:COMPONENT NAME="title"/></TITLE></HEAD>

<BODY>

<jml:COMPONENT NAME="what"/>

</BODY>

</jml:LAYOUT>

Two components, title and what are defined above. They act as placeholders for particular content. Whenever a document from this layout is derived, values for title and what, are to be provided, respectively. The following example denotes a document to be derived from HelloPretty by presenting the name of the generic layout component in the SRC attribute.

<jml:PAGE NAME="HelloEurope" SRC="jml:HelloPretty">

<jml:PACKAGE>

<jml:SNIPPET DST="jml:this.title">Another Example</jml:SNIPPET>

<jml:SNIPPET DST="jml:this.what">Hello Europe</jml:SNIPPET>

</jml:PACKAGE>

</jml:PAGE>

HelloEurope now has the same structure as HelloPretty. To provide values for the components HelloEurope.title and HelloEurope.what, anonymous text fragments (snippets) were defined. As these snippets only have relevance to HelloEurope, they are encapsulated locally in an anonymous package. In the above example, the page HelloEurope has no body of its own, but inherits it from HelloPretty. The keyword this was used as a reference to the current document. Components can be provided with default values. This means, that not for every component a snippet is required to assign a value to it.

When other layouts are derived from layouts, some components can be assigned with values, some may be left untouched and even further components may be introduced to act as additional placeholders. We can add any number of new components during the deriving process. In any case, a derived layout will have a more specific structure that the layout from which it was derived.

A very useful way of supporting the reuse of commonalities in JML is the concept of macros. A specific piece of information may have to appear in several different places on a Web site. Instead of copying it directly for multiple instances, a macro is defined once with this information and references to it are used at the different places.

A macro is defined as

<jml:MACRO NAME="webmaster">webmaster@example.org<jml:MACRO>

and is referenced by an anonymous snippet somewhere else, e.g.

... and in the case of major earthquakes

please contact <jml:SNIPPET SRC="jml:webmaster"/> ...

During compilation all references to macros will be expanded by their contents. For convenience, macros may contain other macros, but cyclic references are not allowed.

Encapsulation and collections of components are handled in packages. In general, packages can be used to segment a bigger application into smaller pieces to make it easier to manage them independently, maybe by different service administrators. In a typical approach, one package might carry all database functionality of a search engine, while another package only deals with layout and is under the control of a graphic designer.

Packages can contain any JML definition, including other packages. They are used to structure the name space within a JML environment. Documents outside a package do not directly see documents inside. To address those, the package must be named and this package id must be prepended to the document name, like LayoutPackage.MasterLayout. If a package is not named (anonymous) none of its content can be referenced from the outside.

2.1.2 Information Types and Embedding

JML is not restricted to governing HTML documents. Consider a pure ASCII text file like the following:

<jml:PAGE NAME="people" SRC="jml:this'body(text/plain)"

DST="file:peoples.lst">

pg Peel Gehts

mu Murgh Undriesn

dpl Dim Purnas Li

</jml:PAGE>

Here, the SRC attribute explicitly states that the body of the page is to be interpreted as text/plain instead of text/html which is the default for JML pages. The content of the body consist of TAB separated entries, organized in lines. Strong structuring of information provides high manageability for republishing as discussed in Section 3. A compiler will write the list stated above into peoples.lst.

Although most text processing editors provide support for plain text documents, binary data require some extra treatment. To enforce a textual representation of other MIME types, like image/gif the information content may be uuencoded within an JML body:

<jml:PAGE NAME="WorldImage"

SRC="image/gif <- jml:this'body(text/x-uuencoded)"

DST="file:world.jpg(image/jpeg)">

begin 644 world.gif

M]P(<@Y+`'#L``````^@;(%1E6"<O=71P=70@,3DY.2XP,BXP-CHQ-#`WBP``

M9G1C;W<Y+4-U<G)E;G1086=E(&1R869T8V]P>2U#=7)R96YT4&%G92`Q(&%`

end

</jml:PAGE>

From the SRC and DST attributes the compiler automatically transfers the textual information into a binary representation of the image. The converters required in the above example (from GIF source to JPG destination) are orthogonal to the JML language concept, particular instances need to be plugged into the compiler.

Nested documents are not a common notion in HTML, though they are implicit when using inline images or frames. JML allows to explicitly nest documents within each other while the interpretation thereof in target languages, like HTML, is burdened into specific compiler parts (embedders) which will be activated when nested objects should be mapped into a target language.

The following example shows an inline image added into the HelloWorld examples from section 2.1.1:

<jml:PAGE NAME="WorldIllustrated">

...<BODY> Hello <jml:PAGE NAME="WorldImage"

SRC="image/gif <- jml:this'body(text/x-uuencoded)">

begin 644 world.gif

M]P(<@Y+`'#L``````^@;(%1E6"<O=71P=70@,3DY.2XP,BXP-CHQ-#`WBP`` M``$```````````````````````````````````````````````#.....H`)Y

M``"-H/VC``"@`C"``(V1/@``H/W:``#R```<'"`A('5S97)D:6-T(&)E9VEN

end

</jml:PAGE>

World </BODY>

</jml:PAGE>

WorldImage is now nested inside WorldIllustrated. A typical installation might generate a WorldIllustrated.html and the default embedder will generate a file WorldIllustrated.WorldImage.gif to make an <IMG> reference to. Some embedders could produce a floating frame with the picture inside, others could convert the image to an ASCII equivalent embedding it inside a <PRE> tag.

To embed documents they also can be imported from a remote URL. Depending on the used embedder the result might be a floating-frames solution or a simple HTML page with hyperlinks to newly created copies of the remote documents. Other embedders might strip off <HTML>, <HEAD> and <BODY> tags from the imported documents and place the remaining code directly into the embedding parent document.

Every installation provides a set of embedders. The appropriate embedder is selected depending on the MIME types of the embedded and the enclosing object.

2.1.3 Navigation and Hyperlinks

JML provides integrated support of hyperlinks between Web pages. Within a JML object, logical names of documents can be used instead of physical file names of where the document will reside later

<jml:PAGE NAME="HelloVienna">

... And Vienna is part of the <jml:REF DST="jml:HelloWorld'URL">world</jml:REF> ... </jml:PAGE>

The compiler will transform the <jml:REF/> into an HTML Anchor, like <A HREF="hellow.htm">world</A>

Using logical object names instead of physical file names in references enables the compiler to check link consistency very efficiently, thus guaranteeing referential integrity within a JML-governed Web service. This includes the detection of orphan documents within a Web service that are not referenced at all.

JML provides additional attributes in references that address features that go beyond the capabilities of unidirectional links in HTML. The use of the SRC attribute in

<jml:REF SRC="jml:HelloUniverse'URL" NAME="universe">the universe</jml:REF>

defines an incoming reference which cannot be expressed in HTML while there are such in other technologies like HyperWave[4]. A typical JML compiler will be at most able to add a label in the HTML of the target document, while not modifying HelloUniverse from which the link is originating.

2.2 Republishing Information

The abstract object-oriented description of an entire Web service improves the handling and management of both the layout and the content. JML directives define how the imported data is mapped into the target language, e.g. HTML.

2.2.1 Importing and Republishing Static Data

All data imported into the JML environment is static from the JML point of view. Differences in handling arise from the structure of the data the importing system can expect.

2.2.1.1 Importing and Republishing Unstructured Data

Often, information to be published on the Web is already stored in another, external resource (file, database, ...). External content can be easily incorporated within a document using the SRC attribute of a snippet:

<jml:PAGE NAME="...">

... <PRE> <jml:SNIPPET SRC="file:peoples.lst"/> </PRE> ...

</jml:PAGE>

The compiler will access the resource, in this case a local file named peoples.lst and will place its contents in between the <PRE/> element. As the SRC can be arbitrarily complex, some shell processing and modification on the resource can be done first. The output of this process is incorporated into the JML definition then. JML can import whole objects from external sources. This makes it easy to treat remote documents as if they were local:

<jml:PAGE NAME = "Universe" SRC = "http://www.universe.org/"/>

At each compiler run, the remote document at the specified URL will be fetched and may be addressed in our local JML definition like a local document. HTTP provides the compiler implicitly with the external document's MIME type, other sources may require explicit type description.

To assemble large projects, entire packages can be imported.

<jml:PACKAGE NAME="Definitions" SRC="file:layout-test.jml">

The compiler will put every definition found in layout-test.jml into the package body of Definitions.

2.2.1.2 Importing and Republishing Structured Data

Electronic data, however, has already a structure which can be exploited to import specific information. Primitively there are table-oriented structures which are organized in rows and columns. JML allows one to import such information on a record-by-record basis.

The following example imports information from the peoples.lst document of section 2.1.2 and puts the names mentioned therein into an unnumbered list:

<UL>

<jml:SNIPPET NAME="loop" SRC="file:peoples.lst <-> jml:line2match">

<LI><jml:SNIPPET SRC="jml:loop.name"/>

</jml:SNIPPET>

</UL>

A named snippet loop contains a SRC attribute which includes two source references separated by the matching operator '<->'. The first reference obviously refers to the information to be imported, the second to a pattern which has to be declared within the current JML scope:

<jml:PAGE NAME="line2match" SRC="jml:this'body(text/plain)">

<jml:COMPONENT NAME="initials"> <jml:COMPONENT NAME="name">

</jml:PAGE>

line2match defines the line structure of a record in the file peoples.lst, namely 2 fields, separated by TABs with a linefeed for the record end.

Because of the matching operator in the SRC attribute of loop, the compiler will read any of the referenced objects and will try to match them against each other. It will match the first line of peoples.lst against line2match and will bind the two components initials and name to the values pg and Peel Gehts, respectively. Once such a match is complete, the compiler will expand the body of the loop snippet with these values rendering a "<LI>Peel Gehts" for the first line of peoples.lst. This process is repeated until there is no unprocessed data in peoples.lst resulting in

<UL>

<LI>Peel Gehts

<LI>Murgh Undriesn

<LI>Dim Purnas Li

</UL>

Another option is to import data and generate one document per record:

<jml:PACKAGE NAME="PeopleCollection" SRC="jml:line2match <->

file:peoples.lst">

<jml:PAGE NAME="jml:PeopleCollection.initials">

... <BODY> ...<jml:SNIPPET SRC="jml:PeopleCollection.name"/>... </BODY>

</jml:PAGE>

</jml:PACKAGE>

Again, line2match is used to iterate over the lines of peoples.lst. The loop is iterating inside a package which declares one HTML page each. The contents of the page uses the matched components like in our previous loop example. Finally, we end up with a package named PeopleCollection containing three HTML documents named pg, mu and dpl.

2.2.1.3 Importing and Republishing weakly structured data

Structured data is not necessarily only data organized in tables. Even HTML itself is regarded as a document structure description language (if we ignore rendering features). JML allows one to import complete HTML documents, analyze their structure and use the analyzed parts in other components.

<jml:PAGE NAME = "SportEvents" SRC = "http://www.news.com/sports.html"/>

The mere import of an external page opens a data stream. To structure this HTML stream, a description of the general layout of the sports.html page is needed to match against the contents of an actual page. Such a pattern will consist of static, invariable parts, mainly concerning layout, and variable information which might be different every time the page is fetched. Such patterns can be built with layouts as defined in section 2.1.1:

<jml:LAYOUT NAME="SportsEventsPattern">

<HTML><HEAD>...

<H2>Today's Events</H2>

<jml:COMPONENT NAME="Events"/> …

</jml:LAYOUT>

We use SportsEventsPattern to match against a freshly fetched page:

<jml:PAGE NAME="SportsEventsSection"

SRC="http://www.news.com/sports.html <-> jml:SportsEventsPattern"/>

SRC-ing from two references will cause the compiler to match both streams. If this is successful, SportsEventsSection.Events will contain values which might be used in another document, e.g. a digest. For real-world applications, however, such patterns are too primitive. Additional JML elements to declare alternatives (<jml:ALT/>, <jml:ALTSET/>) and repeating (<jml:SEQ/>) patterns are used to describe more complex pattern structures.

<jml:LAYOUT NAME="SportsEventsPattern"> ...

<BODY>

<jml:ALTSET>

<jml:ALT NAME="NoEvents"> No events today. </jml:ALT>

<jml:ALT NAME="Events"> <H2>Today's Events<H2>

<UL> <jml:SEQ NAME="EventList" N="+">

<LI><jml:COMPONENT NAME="SingleEvent">

</jml:SEQ>

</UL>

</jml:ALT>

</jml:ALTSET>

</BODY>

</jml:LAYOUT>

SportsEventsPattern accepts now either a completely empty list with the text 'No events today.' or an <H2> header together with an <UL> list with events. Alternatives are bracketed by <jml:ALT/>, all alternatives are bracketed inside an <jml:ALTSET/>.

In the case that the second alternative matches, the list must contain at least one SingleEvent. This is specified by the <jml:SEQ/> element which carries the attribute N. The value '+' means that the enclosed pattern must occur at least once, but may occur arbitrarily often else. The value for N can be also '*' which poses no restrictions at all or a positive number which exactly specifies how often the enclosed pattern is expected to occur.

Once the complete match is successful, the component SingleEvent carries the list of all matched values. To iterate over this list for republishing, we treat SingleEvent as stream and match it against the pattern any which will accept everything in its only component text (it is one of the predefined objects in JML):

<jml:PAGE name="any"><jml:COMPONENT NAME="text"></jml:PAGE>

The following object presents a bandwidth conserving republishing description for handheld media:

<jml:PAGE NAME="MyEvents">

... My event list for today (rendered for Palm III):<BR>

<jml:SNIPPET SRC="jml:EventsSection.EventList.SingleEvent <-> jml:any" NAME="EventLoop"> <jml:SNIPPET SRC="EventLoop.Text"/><BR>

</jml:SNIPPET>

</jml:PAGE>

2.2.2 Republishing dynamically generated data

Independently from the way information was imported into JML scope, the language provides facilities on how the information should be republished. Previous subsections described how layouts can be used for republishing data. Conceptually, a layout represents a set of documents, i.e. those documents which are potentially derivable from this layout. Opposed to the static information discussed above, this section concentrates on the demonstration of how a layout can be used at runtime as it is necessary when writing server side scripts, e.g. for accessing databases. JML offers two approaches, integrated and delegated:

Integrated:

Here a document is derived from the layout as if it were a static document. It will, however, contain scripting segments in a specified programming language. For a specific backend technology a corresponding embedder will produce code. For CGI, for example, the following object would result in one Perl-CGI script, where all static texts are output via print statements. For Mason[21] the Perl code would be enclosed by <%perl> - brackets.

<jml:PAGE NAME="QueryResult" SRC="jml:SomeBeautifulLayout">

... <jml:SCRIPT TYPE="text/perl" SERVER>

unless ( $dbh = Mysql->connect(…) ) {

# not ok, write log and output rest of page, abort

}

unless ( $sth = $dbh->execute("Select * from …") ) {

# not ok, write log and output rest of page, abort

}

</jml:SCRIPT>

<H2>Results</H2>

<jml:SCRIPT TYPE="text/perl" SERVER>

while ( $sth->fetch_row(…) )

{ print "Result: ...."; }

</jml:SCRIPT> ....

</jml:PAGE>

Delegated:

In bigger projects all design specific parts usually are delegated to an HTML designer who will deliver HTML code. The manual incorporation of design information into scripts by programming is tedious. JML supports the integration and manipulation of templates which the script can use to avoid layout details during accessing online resources.

The problem with this approach is that the programmer has no control on the correctness of the use of placeholders inside the templates, while the scripts typically rely on those. Also, changes in the scripts might have impacts on templates and vice versa. Another problem are the myriads of templates one needs for all realistic situations. These, however, can also be managed with JML.

In the simplest case the JML compiler generates a template for every required layout. For HelloPretty in section 2.1.1 there will be a HelloPretty.tpl file containing the text

<HEAD><TITLE>$data{title}</TITLE></HEAD>

<BODY>$data{what}</BODY>

as Perl string with the components title and what replaced by $data{title} and $data{what}, respectively. Given our preference for Perl we could use a subroutine from a package to expand this template with values:

$s = &JML'expand ("HelloPretty.tpl", (title => "Hi", what => "World"));

$s contains afterwards the completely expanded template. Any fields not provided explicitly with values will be left blank. Often, however, one needs several related templates in an application. For a database query the following situations should be covered:

  1. the database connection failed
  2. there is no result
  3. there are results, but all fit into a page
  4. the number of matches exceeds the size of the page, continuation links must be presented.

Whereas cases (1) and (2) can be dealt with by simple layout templates, cases (3) and (4) need more flexibility, since the number of matches and the page size cannot be hard-coded easily. The following JML code shows the use of <jml:SEQ/> for repetitions and <jml:ALT/> for alternatives. The compiler will create one template for each alternative. The script can use them according to the number of matches.

<jml:LAYOUT NAME="Matches" SRC="jml:HelloPretty">

...

<jml:ALTSET>

<jml:ALT NAME="error"> The database is currently not available.

</jml:ALT>

<jml:ALT NAME="noresult"> There was no result matching your query.

</jml:ALT>

<jml:ALT NAME="someresult"> <H2>Results</H2> <UL>

<jml:SEQ NAME="matchlist" N="+">

<LI> <jml:COMPONENT NAME="singlematch"/>

</jml:SEQ> </UL> ...

<jml:ALTSET NAME="continuation">

<jml:ALT NAME="nomore"> No further results. </jml:ALT>

<jml:ALT NAME="more"> <A HREF="......">More</A> </jml:ALT>

</jml:ALTSET>

</jml:ALT>

</jml:ALTSET>

</jml:LAYOUT>

Without going into details, the compiler will automatically generate a template for every possible constellation.

3 Syndication

As we have already seen, JML is no publishing language, it is more a republishing language, allowing the transformation of external resources such as databases, local files, or even external documents, whatever their structure may be into other documents. Focusing on republishing, we understand JML as a powerful filter converting one set of (hypertext) documents to another.

Figure 3.1: Republishing document sets with JML

Figure 3.1 demonstrates the conversion of document sets. Another publisher might reuse a previously generated document set, as it is daily practice with news agencies.

With the introduction of handheld computers and wireless devices (pagers, cellular phones) increasing pressure exists to republish selected information on these channels. Unfortunately, the involved partners do not only use incompatible document formats, they ship information completely rendered for inspection by the end user. This applies not only to agencies but to every Web site operator since any organization which runs a Web site is regarded as a publisher on the net.

Let us consider a local automobile club which publishes the current traffic congestion warnings onto an HTML page on their server, say, http://www.wild-drive.org/jam.html:

<DL>

<DT>Wien (11:09)

<DD>Südosttangente, Richtung Verteilerkreis, Stau wegen Bauarbeiten

<DT>Burgenland (10:10)

<DD>Südautobahn, Richtung Wien, Stau wegen Geisterfahrer

</DL>


With the increasing amount of volatile information-news messages become obsolete at some time-automatic processing is mandatory. This was the motivation for XML which allows one to add application/domain specific tags to documents enabling applications to add semantics later. Furthermore, XML lets information engineers structure content; more than was possible with HTML itself. While XML gains importance and there exist tools to convert non-XML conforming HTML into XHTML, the bad news is that a typical Web site operator will not have the resources to XML-ify the content. So, for a republishing infrastructure the following challenges exist:

Technology gap

Not every content provider/republisher can handle an XML based infrastructure. Serving channels like WML/WAP, SMS, CDF[13], Avantgo[15], etc. is definitely out of reach for most. Still the majority of Web sites are operated without publishing tools. Even if they were, most of today's used publishing tools cannot handle XML.

Coordination gap

Even if all are speaking XML, everyone would have to engage with every other party on a bilateral basis when exchanging information. While this is desirable for private information, it will not be for news messages which by nature are addressed to the open public. These will have to be directed in a correct but also timely manner between the concerned parties. Some information will be pushed by the provider towards the consumer if it is important to notify subscribers. Other information is better pulled from the publisher when the very latest status is relevant.

Neither XML itself nor its descendants cover this coordination, while promising approaches exist[11]. Especially, applications which have to use information from different content providers require much attention at high cost.

In the following we suggest a syndication infrastructure which is supported to fill the above gaps:

3.1 Resource Repository

We define a resource to be a description (meta information), typically encoded in RDF[9]. Aside from bibliographic information (author, title, categories, copyright, rating) the meta information also includes optional relations to other resources (is-obsoleted-by, extends, is-similar-to) and also a time horizon. This time horizon defines how many copies of instances of this resource should be kept by the infrastructure archived.
The traffic report of the running example is described in RDF as follows:

<?xml version="1.0">

<rdf:RDF>

<rdf:Description about="http://repository/wilddrive/jam"

s:Publisher="Wild-Drive Club"

s:Agent ="rho@telecoma.net"

s:Title ="Traffic Jams"

s:History ="1 week"

s:Costs ="ATS 0" />

</rdf:RDF>

To allow addressing of a particular resource every resource has a unique id. The addressing scheme is derived from the URL space, allowing the repository to be distributed. There is no immediate need for location transparency[22]. Navigation through the resource repository is either done along the categories known to the infrastructure or by search engines over the resource name space. The structure of the information is, of course, encoded in a DTD, the information itself is a conforming XML document:

<!ENTITY % jamseq "jam+">

<!ELEMENT % jam>

<!ATTLIST jam

REGION CDATA # REQUIRED

DETAILS CDATA # REQUIRED

DAYTIME CDATA # REQUIRED

>

<!DOCTYPE jams SYSTEM "jams.dtd">

<JAM REGION ="Wien"

DETAILS="…"

DAYTIME="…" />

<JAM REGION=… />

Additionally, resources are allowed to be dependent on others, i.e. parts of one resource can also be part of another. For these shared parts one resource is authoritative, i.e. changes in the authoritative resource will be reflected in the dependent resources.

Information can be published into the infrastructure by the content provider (or someone who acts on behalf of them). This upload might trigger republishing actions on dependent resources. As an example, traffic congestion warnings are pushed by automobile clubs. Consumers may have set up SMS messages which will be fired off whenever a relevant warning comes in.

Alternatively, information can be pulled by the repository whenever the resource is requested directly or by one of its dependents. A typical application thereof can be a personal news paper on the Web which contains a reference to e.g. a stock price. Whenever the news paper page is requested, the repository will have to get the latest value of the stock. In either case the repository will hold a specifyable history of values.

Once information is available in XML, it can be downloaded as a whole or in parts. For the latter XQL[10] should be used for selection. While this is more or less sufficient to select relevant portions out of an XML tree, additional selection criteria will be supported. The following query on the traffic jam example shows restrictions for a certain region and a time interval:

pathexpr=/jam[@REGION="Wien"]&from=1999-12-01&until=2000-01-02

For efficiency, modal operators are added for comparison; for privacy encryption and signatures are used.

Problems arise, when the information at the content provider is not available in XML at first hand. Instead of changing the providers backend, JML's pattern matching facilities of the import concept is used to analyze documents (typically some HTML flavor) and relevant snippets are extracted out of them. At this stage appropriate XML tags are added. The layout for the traffic jam page at www.wild-drive.org is defined in XML as:

<jml:LAYOUT NAME="jam-session">

<DL>

<jml:SEQ NAME="jam-loop">

<DT><jml:COMPONENT NAME="REGION" /> ( <jml:COMPONENT NAME="DAYTIME" /> )

<DD><jml:COMPONENT NAME="DETAILS">

</jml:SEQ>

</DL>

</jml:LAYOUT>


As the manual process of configuring layout patterns is error prone and tedious HTML editors are suggested that should:

(a) collect various samples of the information over time

(b)analyze variations in the content, auto-detect alternatives and loops

(c) highlights variable portions and

(d) suggests layout patterns

The information engineer only needs to fine-tune the relevant parts of the document stream and define an appropriate DTD. An appropriate editor will be particularly helpful whenever the structure of the document changes significantly, e.g. after a layout redesign. As long as the structure is stable, the repository can automatically extract information and correlate it into the XML world:

<jml:PAGE NAME="jamming-in"

SRC="http://www.wild-drive.com/jam.html <-> jml:jam-session" />

<jml:PAGE NAME="jamming-xml" SRC="jml:this'BODY(text/html)">

<!DOCTYPE jams SYSTEM = "jams.dtd">

<jml:SNIPPET NAME="jam-list-iterator"

SRC="jml:jamming-in.jam-loop" >

<JAM REGION="<jml:SNIPPET SRC="jam-list-iterator.REGION" />"

DETAILS="<jml:SNIPPET SRC="jam-list-iterator.DETAILS" />"

DAYTIME="<jml:SNIPPET SRC="jam-list-iterator.DAYTIME" />"

/>

</jml:SNIPPET>

</jml:PAGE>

3.2 Republishing Infrastructure

For the same reason as for the repository there is a need for a service which allows "XML challenged" organizations and individuals to profit from an XML repository. We propose the following principles:

Multitarget publishing

One and the same information can be published in different contexts, such as in Web pages on remote Web sites, as text-only versions for hand-helds or emails. Aside from content aspects, there is also the technological aspect of how particular application logic is represented in different contexts. Typically, a Web site will use some scripting languages to access databases to deliver results rendered in HTML. To offer a comparable functionality on a hand-held device, e.g. a WAP-enabled GSM phone, a completely different technology has to be used. JML does not cover this.

For the delivery to other targets standard protocols like FTP or DAV, HTTP can be used.

Timely publishing

Publishing, and especially the compilation should depend on particular events, such as the arrival of a news update at a specific time.

Separation of concerns

The more complex the new context is, the more there is a need to decouple structure management (webmastering - where is what, navigation modeling), layout management (designing - how is a particular information rendered) and content management (editing - what should information contain). While proper handling of these aspects requires more planning, this approach reduces the long-term costs.

In a more pragmatic approach we suggest a ready-to-go interface for non-technical-minded consumers. These users are provided with prepared publishing solutions, so called e-clips. For the traffic messages example, the user only has to provide a GSM number to get the messages delivered via SMS to his mobile phone. Fine-tuning by setting up filters can be postponed to a second step. Other examples include an online magazine prepared as CDF channel or the daily sports events via an email newsletter.

Figure 3.2: Leveraging the level of abstraction in information management by using JML

4 Future and Related Work

While syndication is a well established industrial concept, open syndication [18] based on XML must prove itself in a commercial setting. Our next steps include a proof-of-concept implementation which serves as basis for a generic business model. A working system will help us to understand or adopt other approaches.

In the language sector the relationship between JML and XSL(T)[19] is worth discussing. While JML covers primarily a transformation between document sets, XSLT will become the prominent transformation mechanism when applying formatting objects to XML documents. Recently the W3O adopted XSL(T) as a language to derive (XML) documents from XML documents and XSL:FO as a language to control rendering, positioning XSL somewhere between CSS and DSSSL.

While sharing the declarative nature with XSLT, JML is more biased towards web applications as it directly supports link and template management while offering a class concept and a natural way to embed objects of differing MIME types within each other. XSLT assumes only to source XML documents while JML is open to other document types and can even treat weakly structured documents allowing a pragmatic migration path into the XML world. As a downside, JML cannot exploit structural elements of XML documents when it comes to detect specific patterns and to apply templates for output. If such is necessary, then XSLT transformed can be added to JML's infrastructure. In this sense, JML's functionality is more a generalization of XML processors like Cocoon[23].

Regarding to syndication a deeper analysis between our operation model and that of ICE[11] as well as to industrial efforts (e.g. Netscape[20]) is necessary.

5 References

  1. Berners-Lee, R. Cailliau, A. Loutonen, H.F.Nielsen, and A. Secret. The World Wide Web, Communications of the ACM, 37(8), August 1994
  2. H.-W. Gellerson, R. Wicke, and M. Gaedke. WebComposition: An Object-Oriented Support System for the Web Engineering Life Cycle, Computer Networks and ISDN Systems 29(8-13), April 1997
  3. B. Ingham, S. J. Caughey, and M. C. Little. Supporting Highly Manageable Web Services, Computer Networks and ISDN Systems 29(8-13), April 1997
  4. Maurer. Hyper-G now Hyperwave, the next Generation Web Solution, Addison-Wesley, England, 1996
  5. A. Barta and M. W. Schranz, JESSICA: An Object-Oriented Hypermedia Publishing Processor, Computer Networks and ISDN Systems 30(1-7), April 1998
  6. Laurent and E. Cerami. Building XML Applications, McCraw-Hill, 1999
  7. WAP. Wireless Application Protocol, specification. http://www1.wapforum.org/
    what/technical/SPEC-WAPArch-19980430.pdf
  8. WML. Wireless Markup Language, specification. http://www1.wapforum.org/
    what/technical/SPEC-WML-19990616.pdf
  9. Berners-Lee, Resource Description Format (RDF), http://www.w3.org/TR/NOTE-rdfarch, Oct 1997
  10. Robie, J.Lapp, D. Schach.
    XQL: XML Query Language, http://www.w3.org/TandS/QL/QL98/xql.html
  11. Webber, C. O'Donnell, B. Hunt, R. Levine, L. Popkin, G. Larose, The Information and Content Exchange (ICE) Protocol, http://www.w3.org/TR/NOTE-ice, Oct 1998
  12. Digital cellular telecommunication system (phase 2+); Technical realization of short messaging system (SMS) point-to-point (PP) (GSM 03.40), http://www.etsi.org/Publications/home.asp?wki_id=4691
  13. CDF: The Channel Definition Format, http://msdn.microsoft.com/workshop/delivery/cdf/reference/CDF.asp
  14. SoftQuad's HoTMetaL Pro, www.sq.com/products/hotmetal/
  15. Avantgo: a mobile interactive service, http://www.avantgo.com
  16. Frontpage, http://www.microsoft.com/frontpage
  17. BroadVision, http://www.broadvision.com
  18. OPENSYN: http://www.edventure.com/release1/
    abstracts/syndication.html
  19. XSLT: http://www.w3.org/TR/WD-xslt
  20. NETSCAPE: http://my.netscape.com/
    publish/help/quickstart.html
  21. Mason: http://www.masonhq.com/
  22. Coulouris, Dollimore, Kindberg. Distributed System: Concepts and Design, Addison-Wesley, 1994
  23. Cocoon: http://xml.apache.org/

Copyright 2000 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee.
SAC 2000 March 19-21 Como, Italy
(c) 2000 ACM 1-58113-239-5/00/003>...>$5.00