eXtensible Markup Language (XML) for Electronic Data Interchange For Administration, Commerce and Transport (EDIFACT)
This document is a NOTE made available to the World Wide Web Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.
Lynx and Mosaic enabled the masses to use the InterNet. Soon Online Ordering over the InterNet was claiming a terrain, Value Added Networks used to hold. Unfortunately the web was woven with a hot needle. HTML did not distinguish between layout and content. XML was born to fill this gap. Nearly as flexible as SGML while simpler, XML is well designed to be used for Order Chain Management and other classic applications of Electronic Data Interchange.
This note will cover an experimental implementation of United Nation EDIFACT using XML.
Various approaches for electronic data interchange exist, and therefore a distinctions has to be made to focus on the topic of this note.
For download: ftp://ftp.cpan.org/pub/perl/CPAN/modules/by-module/XML/
The following graph is a simplified version from D422_D.96B:
MESSAGE A MESSAGE contains: | - UN Segments | - Data segments +-------------------+-------------------+ | | ----------------------------------------- |Data |DATA |Data | A SEGMENT contains: ... |segment |SEGMENT |segment | ... - A Segment TAG -------------------|--------------------- - Simple data elements or | - Composite data elements +------------------+-------------------+ or both as applicable | | ---------------------------------------- |TAG |+|SIMPLE |+|COMPOSITE |'| A SEGMENT TAG contains: | | |DATA ELEMENT | |DATA ELEMENT | | - A segment code and, ---|--------------|----------|-----|---- if explicit indication, | | | | repeating and nesting | | | | value(s). See 8.1 and 9. | | | | | | | | A SIMPLE DATA ELEMENT -------------- ------- ------------- contains: contains: |Code|:|Value| |Value| |COMP|:|COMP| - A single data element -------------- ------- |D/E | |D/E | value | | | | A COMPOSITE DATA ELEMENT --|------|--- contains: | | - Component data elements ------- ------- | | | | A COMPONENT DATA ELEMENT |Value| |Value| contains: ------- ------- - A single data element value --.-- --|-- . means alternative to |
This simplified model for UN/EDIFACT will be used for XML::Edifact 0.3x and 0.4x versions. Later versions will introduce segment groups and more constrained document types than the generic message.
So a generic UN/EDIFACT message is just a stream of segments, containing either simple values, or coded values or composite of simple values or coded values. Some of those segments will need additional care. Those segments defined by the United Nations have UN as the first two characters of the segment tag and are defined in Annex B of D422 to build the frame around specific messages, the remaining segments codes are defined in trsd.
Any segment starts with a coded TAG of 3 characters length. The plus sign most times serves a COMPONENT DATA ELEMENT SEPARATOR, the double colon as a DATA ELEMENT SEPARATOR and the tick as a SEGMENT TERMINATOR. If the special meaning of one of those characters has to be released the question mark serves as a RELEASE INDICATOR. Numbers have to be coded conforming to the DECIMAL NOTATION, where the dot is most common. The UNA segment optionally heading a message can be used to redefine those characters.
Example segment and definition from trsd and uncl :
CED+2+:::Linux:1.2:13' trsd ---------------------------------------------------------------------- CED COMPUTER ENVIRONMENT DETAILS Function: To give a precise definition of all necessary elements belonging to the configuration of a computer system like hardware, firmware, operating system, communication (VANS, network type, protocol, format) and application software. 010 1501 COMPUTER ENVIRONMENT DETAILS QUALIFIER M an..3 020 C079 COMPUTER ENVIRONMENT IDENTIFICATION M 1511 Computer environment, coded C an..3 1131 Code list qualifier C an..3 3055 Code list responsible agency, coded C an..3 1510 Computer environment C an..35 1056 Version C an..9 1058 Release C an..9 7402 Identity number C an..35 uncl ---------------------------------------------------------------------- 1501 Computer environment details qualifier Desc: A code to identify the computer environment details. Repr: an..3 1 Hardware platform Code to identify the type of hardware installed in a computer environment e.g. PC, Mac, UNIX-Workstation, Mini, Mainframe. 2 Operating system Code to identify the operating system, like DOS, VMS, etc. used in a computer environment. 3 Application software Code to identify an application software, like AutoCad, Winword, etc. used in a computer environment. 4 Network Code to identify a network like Ethernet, Token Ring, etc. implemented in a computer environment. 5 Sending system Code to identify the system, which acts as a sending system in an interchange. ----------------------------------------------------------------------
So this segment contains the information about "COMPUTER ENVIRONMENT DETAILS". The second element in this segment is coded and translated to "Operating system". The third element is a composite and the values are given at the position of "Computer environment", "Version", and "Release". This segment may be a part of an message clarifying the operating system something has to be installed.
As stated above the simplified generic UN/EDIFACT is not organized in segment groups, but just a stream of segments. Therefore a batch of messages as defined in D422 will be express as a single XML-Edifact message. This message has the following form:
<?xml version="1.0"?> <!DOCTYPE edifact:message SYSTEM "path_to/edifact_version.dtd"> <!-- Comments --> <?edifact:processing instructions ?> <edifact:message ... xmlns definitions > ... segments </edifact:message>
Processing instructions are not yet used, but intended for authentication and security. They'll become probately used with XML::Edifact 0.5 for instruction of how to translate between namespaces using the RDF files.
Any string found in a United Nations document that has to be translated to an XML element tag, will be handled in the following way:
Perl code for this translation may clarify this:
sub recode_mark { my($mark) = @_; my($M,$s); $M = lc($mark); $s = '[^a-z][^a-z]*', $M =~ s/$s/_/g; $s = "__*\$", $M =~ s/$s//; $s = "^__*", $M =~ s/$s//; $s = '__*', $M =~ s/$s/./g; return($M); }
As name clashes between the different UN/EDIFACT documents exist, a namespace prefix will be allocated. The XML::Edifact perl module is using the following namespaces and definitions:
xmlns:edifact='./edifact03.rdf' xmlns:trsd='./edifact03_trsd.rdf' xmlns:trcd='./edifact03_trcd.rdf' xmlns:tred='./edifact03_tred.rdf' xmlns:uncl='./edifact03_uncl.rdf' xmlns:anxs='./edifact03_anxe.rdf' xmlns:anxc='./edifact03_anxc.rdf' xmlns:anxe='./edifact03_anxe.rdf' xmlns:unsl='./edifact03_unsl.rdf' xmlns:unknown='./edifact03_unknown.rdf'
A Resource Description Framework will be used to map to the original UN/EDIFACT document. This RDF is still vapor ware, and has to be coded. XML::Edifact is using the following internal mapping:
xmlns:edifact='./edifact03.rdf'; xmlns:unknown='./edifact03_unknown.rdf'; xmlns:trsd='./un_edifact_d96b/trsd.96b'; xmlns:trcd='./un_edifact_d96b/trcd.96b'; xmlns:tred='./un_edifact_d96b/trcd.96b'; xmlns:uncl='./un_edifact_d96b/uncl-1.96b'; and also xmlns:uncl='./un_edifact_d96b/uncl-2.96b'; xmlns:anxs='./un_edifact_d96b/D422_D.96B#annex_b/segments'; xmlns:anxc='./un_edifact_d96b/D422_D.96B#annex_b/composite'; xmlns:anxe='./un_edifact_d96b/D422_D.96B#annex_b/elements'; xmlns:unsl='./un_edifact_d96b/unsl.96b';
This mapping can badly be expressed by the XML namespace syntax, as several namespaces referencing the same document and the uncl namespaces is combined from two documents. So a RDF file will build the glue around UN/EDIFACT standard documents.
Any segment has a 3 characters TAG as its first element. The segments starting with UN are defined in Annex B of D422, while the remaining are defined in trsd.
CED COMPUTER ENVIRONMENT DETAILS
Our example segment starts with CED, so it will be translated on a segment level by looking at un_edifact_d96b/trsd.96b:
<trsd:computer.environment.details> ... </trsd:computer.environment.details>
The trsd document also tells us about position of the elements in the example segment. By looking at tred and uncl we can translate the second element, the "2"
<tred:computer.environment.details.qualifier uncl:code="1501:2"> Operating system </tred:computer.environment.details.qualifier>
Looking to a code list for translating an element is usual and definitely the reason of the small size of UN/EDIFACT messages. But its also the greatest mess as codelist extension are quite common and seldom interoperable. Also a German UN/EDIFACT may give "Betriebssystem" for the same code, so the attribute and the namespace of the attribute is of a greater importance than the translation. If a code can not be translated by uncl, codelist extensions may be tried if available. This would cause elements like:
<editeur:item.characteristic.coded editeur:code="7081:170"> Date of Publication </editeur:item.characteristic.coded>
As an editeur:code is not defined in the document type definition for the tred:item.characteristic.coded, item.characteristic.coded has to migrate in namespace. The same will occur for the composite, segment and message. So a single code list extension has to cause a namespace migration from edifact:message to editeur:message.
A code not in a codelist should be referenced as unknown:code="....:...", causing the same migration in namespace for the element, composite, segment and message towards unknown:message. The beauty of the approach is that no information can be lost on translation, even if code list lookup is not possible.
The second element in the example segment is a composite, containing several simple values that do not require code lists lookup.
<trcd:computer.environment.identification> <tred:computer.environment>Linux</tred:computer.environment> <tred:version>1.2</tred:version> <tred:release>13</tred:release> </trcd:computer.environment.identification>
TeleOrdering UK is one of the main Value Added Network for book order routing. The following example message was striped down from a message they send to me for pointing at some problems in XML::Edifact version 0.30 code. The original message contained 27 orders and was enriched by IMD+F+BPU segments to precise the order routing of single line items. I have deleted those IMD segments, and also the remaining 26 messages to save bandwidth and to have a clean UN/EDIFACT message as example. I hope that I did patches of anxs:message.trailer and anxs:functional.group.trailer right. I have also added newlines after the tick for readablilty.
UNA:+.? ' UNB+UNOA:1+7349734757:12+5033075000007:14+990621:2200+00000000000627++B-ORD++1++1' UNG+ORDERS+7349734757:12+5033075000007:14+990621:2200+00000000000627+UN+D:93A' UNH+00000000035773+ORDERS:D:93A:UN' BGM+220' DTM+4:990621:101' FTX+DEL+3+DUY' RFF+ON:70678989' DTM+153:000000:101' NAD+OB+0091987:160:16' LIN+1' PIA+5+0711213046:IB' QTY+21:4' PRI+AAA:4.99:SR:DPR::LBR' LIN+2' PIA+5+0711214476:IB' QTY+21:6' PRI+AAA:12.99:SR:DPR::LBR' LIN+3' PIA+5+0711213496:IB' QTY+21:8' PRI+AAA:4.99:SR:DPR::LBR' UNS+S' CNT+2:18' UNT+22+00000000035773' UNE+1+00000000000627' UNZ+1+00000000000627'
The translation of this message follows. The message is grown from 614 bytes to 11790 bytes, but improved in readability and can now be processed or produced with XML tools.
<?xml version="1.0"?> <!DOCTYPE edifact:message SYSTEM "./edifact03.dtd"> <!-- XML message produced by edi2xml.pl (c) Kraehe@Bakunin.North.De --> <edifact:message xmlns:edifact='./edifact03.rdf' xmlns:trsd='./edifact03_trsd.rdf' xmlns:trcd='./edifact03_trcd.rdf' xmlns:tred='./edifact03_tred.rdf' xmlns:uncl='./edifact03_uncl.rdf' xmlns:anxs='./edifact03_anxe.rdf' xmlns:anxc='./edifact03_anxc.rdf' xmlns:anxe='./edifact03_anxe.rdf' xmlns:unsl='./edifact03_unsl.rdf' > <!-- SEGMENT UNB+UNOA:1+7349734757:12+5033075000007:14+990621:2200+00000000000627++B-ORD++1++1 --> <anxs:interchange.header> <anxc:syntax.identifier> <anxe:syntax.identifier unsl:code="0001:UNOA">UN/ECE level A</anxe:syntax.identifier> <anxe:syntax.version.number>1</anxe:syntax.version.number> </anxc:syntax.identifier> <anxc:interchange.sender> <anxe:sender.identification>7349734757</anxe:sender.identification> <anxe:recipients.identification.qualifer unsl:code="0007:12">Telephone number</anxe:recipients.identification.qualifer> </anxc:interchange.sender> <anxc:interchange.recipient> <anxe:recipient.identification>5033075000007</anxe:recipient.identification> <anxe:recipients.identification.qualifer unsl:code="0007:14">EAN (European Article Numbering Association)</anxe:recipients.identification.qualifer> </anxc:interchange.recipient> <anxc:date.time.of.preparation> <anxe:date>990621</anxe:date> <anxe:time>2200</anxe:time> </anxc:date.time.of.preparation> <anxe:interchange.control.reference>00000000000627</anxe:interchange.control.reference> <anxe:application.reference>B-ORD</anxe:application.reference> <anxe:acknowledgement.request unsl:code="0031:1">Requested</anxe:acknowledgement.request> <anxe:test.indicator unsl:code="0035:1">Interchange is a test</anxe:test.indicator> </anxs:interchange.header> <!-- SEGMENT UNG+ORDERS+7349734757:12+5033075000007:14+990621:2200+00000000000627+UN+D:93A --> <anxs:functional.group.header> <anxe:functional.group.identification>ORDERS</anxe:functional.group.identification> <anxc:application.senders.identification> <anxe:application.senders.identification>7349734757</anxe:application.senders.identification> <anxe:recipients.identification.qualifer unsl:code="0007:12">Telephone number</anxe:recipients.identification.qualifer> </anxc:application.senders.identification> <anxc:application.recipients.identification> <anxe:recipients.identification>5033075000007</anxe:recipients.identification> <anxe:recipients.identification.qualifer unsl:code="0007:14">EAN (European Article Numbering Association)</anxe:recipients.identification.qualifer> </anxc:application.recipients.identification> <anxc:date.time.of.preparation> <anxe:date>990621</anxe:date> <anxe:time>2200</anxe:time> </anxc:date.time.of.preparation> <anxe:functional.group.reference.number>00000000000627</anxe:functional.group.reference.number> <anxe:controlling.agency unsl:code="0051:UN">UN/ECE/TRADE/WP.4</anxe:controlling.agency> <anxc:message.version> <anxe:message.version.number>D</anxe:message.version.number> <anxe:message.release.number>93A</anxe:message.release.number> </anxc:message.version> </anxs:functional.group.header> <!-- SEGMENT UNH+00000000035773+ORDERS:D:93A:UN --> <anxs:message.header> <anxe:message.reference.number>00000000035773</anxe:message.reference.number> <anxc:message.identifier> <anxe:message.type unsl:code="0065:ORDERS">Purchase order message</anxe:message.type> <anxe:message.version.number>D</anxe:message.version.number> <anxe:message.release.number>93A</anxe:message.release.number> <anxe:controlling.agency unsl:code="0051:UN">UN/ECE/TRADE/WP.4</anxe:controlling.agency> </anxc:message.identifier> </anxs:message.header> <!-- SEGMENT BGM+220 --> <trsd:beginning.of.message> <trcd:document.message.name> <tred:document.message.name.coded uncl:code="1001:220">Order</tred:document.message.name.coded> </trcd:document.message.name> </trsd:beginning.of.message> <!-- SEGMENT DTM+4:990621:101 --> <trsd:date.time.period> <trcd:date.time.period> <tred:date.time.period.qualifier uncl:code="2005:4">Order date/time</tred:date.time.period.qualifier> <tred:date.time.period>990621</tred:date.time.period> <tred:date.time.period.format.qualifier uncl:code="2379:101">YYMMDD</tred:date.time.period.format.qualifier> </trcd:date.time.period> </trsd:date.time.period> <!-- SEGMENT FTX+DEL+3+DUY --> <trsd:free.text> <tred:text.subject.qualifier uncl:code="4451:DEL">Delivery information</tred:text.subject.qualifier> <tred:text.function.coded uncl:code="4453:3">Text for immediate use</tred:text.function.coded> <trcd:text.reference> <tred:free.text.identification>DUY</tred:free.text.identification> </trcd:text.reference> </trsd:free.text> <!-- SEGMENT RFF+ON:70678989 --> <trsd:reference> <trcd:reference> <tred:reference.qualifier uncl:code="1153:ON">Order number (purchase)</tred:reference.qualifier> <tred:reference.number>70678989</tred:reference.number> </trcd:reference> </trsd:reference> <!-- SEGMENT DTM+153:000000:101 --> <trsd:date.time.period> <trcd:date.time.period> <tred:date.time.period.qualifier uncl:code="2005:153">Cancellation date/time, latest</tred:date.time.period.qualifier> <tred:date.time.period>000000</tred:date.time.period> <tred:date.time.period.format.qualifier uncl:code="2379:101">YYMMDD</tred:date.time.period.format.qualifier> </trcd:date.time.period> </trsd:date.time.period> <!-- SEGMENT NAD+OB+0091987:160:16 --> <trsd:name.and.address> <tred:party.qualifier uncl:code="3035:OB">Ordered by</tred:party.qualifier> <trcd:party.identification.details> <tred:party.id.identification>0091987</tred:party.id.identification> <tred:code.list.qualifier uncl:code="1131:160">Party identification</tred:code.list.qualifier> <tred:code.list.responsible.agency.coded uncl:code="3055:16">DUNS (Dun &amp; Bradstreet)</tred:code.list.responsible.agency.coded> </trcd:party.identification.details> </trsd:name.and.address> <!-- SEGMENT LIN+1 --> <trsd:line.item> <tred:line.item.number>1</tred:line.item.number> </trsd:line.item> <!-- SEGMENT PIA+5+0711213046:IB --> <trsd:additional.product.id> <tred:product.id.function.qualifier uncl:code="4347:5">Product identification</tred:product.id.function.qualifier> <trcd:item.number.identification> <tred:item.number>0711213046</tred:item.number> <tred:item.number.type.coded uncl:code="7143:IB">ISBN (International Standard Book Number)</tred:item.number.type.coded> </trcd:item.number.identification> </trsd:additional.product.id> <!-- SEGMENT QTY+21:4 --> <trsd:quantity> <trcd:quantity.details> <tred:quantity.qualifier uncl:code="6063:21">Ordered quantity</tred:quantity.qualifier> <tred:quantity>4</tred:quantity> </trcd:quantity.details> </trsd:quantity> <!-- SEGMENT PRI+AAA:4.99:SR:DPR::LBR --> <trsd:price.details> <trcd:price.information> <tred:price.qualifier uncl:code="5125:AAA">Calculation net</tred:price.qualifier> <tred:price>4.99</tred:price> <tred:price.type.coded uncl:code="5375:SR">Suggested retail</tred:price.type.coded> <tred:price.type.qualifier uncl:code="5387:DPR">Discount price</tred:price.type.qualifier> <tred:measure.unit.qualifier>LBR</tred:measure.unit.qualifier> </trcd:price.information> </trsd:price.details> <!-- SEGMENT LIN+2 --> <trsd:line.item> <tred:line.item.number>2</tred:line.item.number> </trsd:line.item> <!-- SEGMENT PIA+5+0711214476:IB --> <trsd:additional.product.id> <tred:product.id.function.qualifier uncl:code="4347:5">Product identification</tred:product.id.function.qualifier> <trcd:item.number.identification> <tred:item.number>0711214476</tred:item.number> <tred:item.number.type.coded uncl:code="7143:IB">ISBN (International Standard Book Number)</tred:item.number.type.coded> </trcd:item.number.identification> </trsd:additional.product.id> <!-- SEGMENT QTY+21:6 --> <trsd:quantity> <trcd:quantity.details> <tred:quantity.qualifier uncl:code="6063:21">Ordered quantity</tred:quantity.qualifier> <tred:quantity>6</tred:quantity> </trcd:quantity.details> </trsd:quantity> <!-- SEGMENT PRI+AAA:12.99:SR:DPR::LBR --> <trsd:price.details> <trcd:price.information> <tred:price.qualifier uncl:code="5125:AAA">Calculation net</tred:price.qualifier> <tred:price>12.99</tred:price> <tred:price.type.coded uncl:code="5375:SR">Suggested retail</tred:price.type.coded> <tred:price.type.qualifier uncl:code="5387:DPR">Discount price</tred:price.type.qualifier> <tred:measure.unit.qualifier>LBR</tred:measure.unit.qualifier> </trcd:price.information> </trsd:price.details> <!-- SEGMENT LIN+3 --> <trsd:line.item> <tred:line.item.number>3</tred:line.item.number> </trsd:line.item> <!-- SEGMENT PIA+5+0711213496:IB --> <trsd:additional.product.id> <tred:product.id.function.qualifier uncl:code="4347:5">Product identification</tred:product.id.function.qualifier> <trcd:item.number.identification> <tred:item.number>0711213496</tred:item.number> <tred:item.number.type.coded uncl:code="7143:IB">ISBN (International Standard Book Number)</tred:item.number.type.coded> </trcd:item.number.identification> </trsd:additional.product.id> <!-- SEGMENT QTY+21:8 --> <trsd:quantity> <trcd:quantity.details> <tred:quantity.qualifier uncl:code="6063:21">Ordered quantity</tred:quantity.qualifier> <tred:quantity>8</tred:quantity> </trcd:quantity.details> </trsd:quantity> <!-- SEGMENT PRI+AAA:4.99:SR:DPR::LBR --> <trsd:price.details> <trcd:price.information> <tred:price.qualifier uncl:code="5125:AAA">Calculation net</tred:price.qualifier> <tred:price>4.99</tred:price> <tred:price.type.coded uncl:code="5375:SR">Suggested retail</tred:price.type.coded> <tred:price.type.qualifier uncl:code="5387:DPR">Discount price</tred:price.type.qualifier> <tred:measure.unit.qualifier>LBR</tred:measure.unit.qualifier> </trcd:price.information> </trsd:price.details> <!-- SEGMENT UNS+S --> <anxs:section.control> <anxe:section.identification unsl:code="0081:S">Detail/summary section separation</anxe:section.identification> </anxs:section.control> <!-- SEGMENT CNT+2:18 --> <trsd:control.total> <trcd:control> <tred:control.qualifier uncl:code="6069:2">Number of line items in message</tred:control.qualifier> <tred:control.value>18</tred:control.value> </trcd:control> </trsd:control.total> <!-- SEGMENT UNT+22+00000000035773 --> <anxs:message.trailer> <anxe:number.of.segments.in.the.message>22</anxe:number.of.segments.in.the.message> <anxe:message.reference.number>00000000035773</anxe:message.reference.number> </anxs:message.trailer> <!-- 17056 lines deleted --> <!-- SEGMENT UNE+1+00000000000627 --> <anxs:functional.group.trailer> <anxe:number.of.messages>1</anxe:number.of.messages> <anxe:functional.group.reference.number>00000000000627</anxe:functional.group.reference.number> </anxs:functional.group.trailer> <!-- SEGMENT UNZ+1+00000000000627 --> <anxs:interchange.trailer> <anxe:interchange.control.count>1</anxe:interchange.control.count> <anxe:interchange.control.reference>00000000000627</anxe:interchange.control.reference> </anxs:interchange.trailer> </edifact:message>
This note mostly describes already implemented code. UN/EDIFACT is a complex topic, and as I'm thinking pragmatic I have chosen prototyping as a design method. The Perl5 source code of this prototype is free available under GNU general public license, so anybody can look at the code and use it for own development.
The XML::Edifact module is not intended to run many large batches in a short time. If you want to use XML-Edifact in a productivity environment optimization in C would be necessary. This should be delayed till XML-Edifact stabilized from a weekly changing prototype to a stable 1.0 versions. Several steps have to be done, so take a look at the road map in the XML::Edifact documentation. Other papers will address specific points.
I like to invite any to participate in XML-Edifact. www.xml-edifact.org will collect those papers. Please feel free to contact me by eMail - Michael Koehne