Package org.apache.daffodil.api
package org.apache.daffodil.api
Provides the classes necessary to compile DFDL schemas, parse and
unparse files using the compiled objects, and retrieve results and
parsing diagnostics
Overview
TheDaffodil
object is a factory object to create a
Compiler
. The Compiler
provides
a method to compile a provided DFDL schema into a ProcessorFactory
,
which creates a DataProcessor
:
Compiler c = Daffodil.compiler();
ProcessorFactory pf = c.compileFile(file);
DataProcessor dp = pf.onPath("/");
The DataProcessor
provides the necessary functions to parse and
unparse data, returning a ParseResult
or
UnparseResult
, respectively. These contain information about the
parse/unparse, such as whether or not the processing succeeded with any diagnostic information.
The DataProcessor
also provides two functions that can be used to
perform parsing/unparsing via the SAX API. The first creates a
DaffodilParseXMLReader
which is used for parsing, and the
second creates a DaffodilUnparseContentHandler
which is used for
unparsing.
DaffodilParseXMLReader xmlReader = dp.newXMLReaderInstance();
DaffodilUnparseContentHandler unparseContentHandler = dp.newContentHandlerInstance(output);
The DaffodilParseXMLReader
has several methods that allow one to
set properties and handlers (such as ContentHandlers or ErrorHandlers) for the reader. One can
use any contentHandler/errorHandler as long as they extend the
ContentHandler
and ErrorHandler
interfaces
respectively. One can also set properties for the DaffodilParseXMLReader
using DaffodilParseXMLReader.setProperty(java.lang.String, java.lang.Object)
.
The following properties can be set as follows:
The constants below have literal values starting with "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:sax:" and ending with "BlobDirectory", "BlobPrefix" and "BlobSuffix" respectively.
xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBDIRECTORY(),
Paths.get(System.getProperty("java.io.tmpdir"))); // value type: java.nio.file.Paths
xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBPREFIX(), "daffodil-sax-"); // value type String
xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBSUFFIX(), ".bin"); // value type String
The properties can be retrieved using the same variables with
DaffodilParseXMLReader.getProperty(java.lang.String)
and casting
to the appropriate type as listed above.
The following handlers can be set as follows:
xmlReader.setContentHandler(contentHandler);
xmlReader.setErrorHandler(errorHandler);
The handlers above must implement the following interfaces respectively:
org.xml.sax.ContentHandler
org.xml.sax.ErrorHandler
The ParseResult
can be found as a property within the
DaffodilParseXMLReader
using this uri:
"urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:sax:ParseResult" or
DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT
.
In order for a successful unparse to happen, the SAX API requires the
unparse to be kicked off by a parse call to any XMLReader
implementation that
has the DaffodilUnparseContentHandler
registered as its content
handler. To retrieve the UnparseResult
, one can use
DaffodilUnparseContentHandler.getUnparseResult()
once the
XMLReader.parse run is complete.
Parse
DataProcessor Parse
TheDataProcessor.parse(org.apache.daffodil.api.InputSourceDataInputStream, org.apache.daffodil.api.infoset.InfosetOutputter)
method accepts input data to parse in the form
of a InputSourceDataInputStream
and an
InfosetOutputter
to determine the output representation
of the infoset (e.g. Scala XML Nodes, JDOM2 Documents, etc.):
JDOMInfosetOutputter jdomOutputter= new JDOMInfosetOutputter();
InputSourceDataInputStream is = new InputSourceDataInputStream(data);
ParseResult pr = dp.parse(is, jdomOutputter);
Document doc = jdomOutputter.getResult();
The DataProcessor.parse(org.apache.daffodil.api.InputSourceDataInputStream, org.apache.daffodil.api.infoset.InfosetOutputter)
method is thread-safe and may be called multiple
times without the need to create other data processors. However,
InfosetOutputter
's are not thread safe, requiring a
unique instance per thread. An InfosetOutputter
should
call InfosetOutputter.reset()
before reuse (or a new one
should be allocated). For example:
JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
for (File f : inputFiles) {
jdomOutputter.reset();
InputSourceDataInputStream is = new InputSourceDataInputStream(new FileInputStream(f)));
ParseResult pr = dp.parse(is, jdomOutputter);
Document doc = jdomOutputter.getResult();
}
One can repeat calls to parse() using the same InputSourceDataInputStream to continue parsing
where the previous parse ended. For example:
InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
boolean keepParsing = true;
while (keepParsing && is.hasData()) {
jdomOutputter.reset();
ParseResult pr = dp.parse(is, jdomOutputter);
...
keepParsing = !pr.isError();
}
SAX Parse
TheDaffodilParseXMLReader.parse(org.apache.daffodil.api.InputSourceDataInputStream)
method accepts input data to parse in
the form of a InputSourceDataInputStream
. The output
representation of the infoset, as well as how parse errors are handled, are dependent on the
content handler and the error handler provided to the DaffodilParseXMLReader
.
For example the SAXHandler
provides a JDOM representation, whereas
other ContentHandlers may output directly to a OutputStream
or Writer
.
SAXHandler contentHandler = new SAXHandler();
xmlReader.setContentHandler(contentHandler);
InputSourceDataInputStream is = new InputSourceDataInputStream(data);
xmlReader.parse(is);
ParseResult pr = (ParseResult) xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
Document doc = saxHandler.getDocument();
The The DaffodilParseXMLReader.parse(org.apache.daffodil.api.InputSourceDataInputStream)
method is not thread-safe and may
only be called again/reused once a parse operation is completed. This can be done multiple
times without the need to create new DaffodilParseXMLReaders, ContentHandlers or ErrorHandlers.
It might be necessary to reset whatever ContentHandler is used (or allocate a new one). A
thread-safe implementation would require unique instances of the DaffodilParseXMLReader and its
components. For example:
SAXHandler contentHandler = new SAXHandler();
xmlReader.setContentHandler(contentHandler);
for (File f : inputFiles) {
contentHandler.reset();
InputSourceDataInputStream is = new InputSourceDataInputStream(new FileInputStream(f));
xmlReader.parse(is);
ParseResult pr = (ParseResult) xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
Document doc = saxHandler.getDocument();
}
The value of the supported features cannot be changed during a parse, and the parse will run
with the value of the features as they were when the parse was kicked off. To run a parse with
different feature values, one must wait until the running parse finishes, set the feature values
using the XMLReader's setFeature and run the parse again.
One can repeat calls to parse() using the same InputSourceDataInputStream to continue parsing
where the previous parse ended. For example:
InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
SAXHandler contentHandler = new SAXHandler();
xmlReader.setContentHandler(contentHandler);
Boolean keepParsing = true;
while (keepParsing && is.hasData()) {
contentHandler.reset();
xmlReader.parse(is);
ParseResult pr = xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
...
keepParsing = !pr.isError();
}
Unparse
DataProcessor Unparse
The sameDataProcessor
used for parse can be used to unparse an
infoset via the DataProcessor.unparse(org.apache.daffodil.api.infoset.InfosetInputter, java.nio.channels.WritableByteChannel)
method. An InfosetInputter
provides the infoset to unparse, with the unparsed data written to the
provided WritableByteChannel
. For example:
JDOMInfosetInputter jdomInputter = new JDOMInfosetInputter(doc);
UnparseResult ur = dp.unparse(jdomInputter, wbc)
SAX Unparse
In order to kick off an unparse via the SAX API, one must register theDaffodilUnparseContentHandler
as the ContentHandler for an
XMLReader implementation. The call to the
DataProcessor.newContentHandlerInstance(java.nio.channels.WritableByteChannel)
method must be provided with the WritableByteChannel
, where the unparsed
data ought to be written to. Any XMLReader implementation is permissible, as long as they have
XML Namespace support.
ByteArrayInputStream is = new ByteArrayInputStream(data);
ByteArrayOutputStream os = new ByteArrayOutputStream();
WritableByteChannel wbc = java.nio.channels.Channels.newChannel(os);
DaffodilUnparseContentHandler unparseContentHandler = dp.newContentHandlerInstance(wbc);
try {
XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
xmlReader.setContentHandler(unparseContentHandler)
xmlReader.parse(is)
} catch (ParserConfigurationException | SAXException e) {
...
} catch (DaffodilUnparseErrorSAXException | DaffodilUnhandledSAXException e) {
...
}
The call to the XMLReader.parse method must be wrapped in a try/catch, as
DaffodilUnparseContentHandler
relies on throwing an exception to
end processing in the case of any errors/failures.
There are two kinds of errors to expect
DaffodilUnparseErrorSAXException
, for the case when the
WithDiagnostics.isError()
is true, and
DaffodilUnhandledSAXException
, for any other errors.
In the case of an DaffodilUnhandledSAXException
,
DaffodilUnparseContentHandler.getUnparseResult()
will return null.
try {
xmlReader.parse(new InputSource(is));
} catch (DaffodilUnparseErrorSAXException | DaffodilUnhandledSAXException e) {
...
}
UnparseResult ur = unparseContentHandler.getUnparseResult();
Failures and Diagnostics
It is possible that failures could occur during the creation of theProcessorFactory
, DataProcessor
,
ParseResult
, or UnparseResult
. However,
rather than throwing an exception on error (e.g. invalid DFDL schema, parse error, etc), these classes extend
WithDiagnostics
, which is used to determine if an error occurred,
and any diagnostic information (see Diagnostic
) related to the step.
Thus, before continuing, one must check WithDiagnostics.isError()
.
For example:
ProcessorFactor pf = c.compile(files);
if (pf.isError()) {
java.util.List<Diagnostic> diags = pf.getDiagnostics();
for (Diagnostic d : diags) {
System.out.println(d.toString());
}
return -1;
}
Saving and Reloading Parsers
In some cases, it may be beneficial to save a parser and reload it. For example, when starting up, it may be quicker to reload an already compiled parser than to compile it from scratch. To save aDataProcessor
:
DataProcessor dp = pf.onPath("/");
dp.save(saveFile);
And to restore a saved DataProcessor
:
DataProcessor dp = Daffodil.reload(saveFile);
And use like below:
ParseResult pr = dp.parse(data);
or
DaffodilParseXMLReader xmlReader = dp.newXMLReaderInstance();
... // setting appropriate handlers
xmlReader.parse(data);
ParseResult pr = xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
-
ClassDescriptionSource code generation and compilation is performed with a language-specific
CodeGenerator
}, which must be interrogated for diagnostics to see if each call was successful or not.Compile DFDL schemas intoProcessorFactory
's or reload saved parsers intoDataProcessor
's.Factory object to create aCompiler
These are the events that a derived specific InfosetInputter creates.SAX Method of parsing schema and getting the DFDL Infoset via designated org.xml.sax.ContentHandler, based on the org.xml.sax.XMLReader interfaceAccepts SAX callback events from any SAX XMLReader for unparsingReturns the EntityResolver used by Daffodil to resolve import/include schemaLocations.Relevant data location for a diagnostic message.Compiled version of a DFDL Schema, used to parse data and get the DFDL infosetAn enumeration of all DFDL's simple types.Class containing diagnostic information.Indicates that this API is experimental and may change or be removed in the future.Provides Daffodil with byte data from an InputStream, ByteBuffer, or byte Array.Relevant schema location for a diagnostic message.Result of callingDataProcessor.parse(input:org\.apache\.daffodil* DataProcessor.parse
, containing any diagnostic information, and the final data locationFactory to createDataProcessor
s, used for parsing dataInterface for Parse and Unparse resultsResult of callingDataProcessor#unparse(InfosetInputter, java.nio.channels.WritableByteChannel)
, containing diagnostic informationInterface that adds diagnostic information to classes that extend it.