Skip navigation links

Package org.apache.daffodil.japi

Provides the classes necessary to compile DFDL schemas, parse and unparse files using the compiled objects, and retrieve results and parsing diagnostics

See: Description

Package org.apache.daffodil.japi Description

Provides the classes necessary to compile DFDL schemas, parse and unparse files using the compiled objects, and retrieve results and parsing diagnostics

Overview

The Daffodil object is a factory object to create a Compiler. The Compiler provides a method to compile a provided DFDL schema into a ProcessorFactory, which creates a DataProcessor:
 
 Compiler c = Daffodil.compiler();
 ProcessorFactory pf = c.compileFile(file);
 DataProcessor dp = pf.onPath("/");
 
The DataProcessor provides the necessary functions to parse and unparse data, returning a ParseResult or UnparseResult, respectively. These contain information about the parse/unparse, such as whether or not the processing succeeded with any diagnostic information. The DataProcessor also provides two functions that can be used to perform parsing/unparsing via the SAX API. The first creates a DaffodilParseXMLReader which is used for parsing, and the second creates a DaffodilUnparseContentHandler which is used for unparsing.
 
 DaffodilParseXMLReader xmlReader = dp.newXMLReaderInstance();
 DaffodilUnparseContentHandler unparseContentHandler = dp.newContentHandlerInstance(output);
 
The DaffodilParseXMLReader has several methods that allow one to set properties and handlers (such as ContentHandlers or ErrorHandlers) for the reader. One can use any contentHandler/errorHandler as long as they extend the ContentHandler and ErrorHandler interfaces respectively. One can also set properties for the DaffodilParseXMLReader using DaffodilParseXMLReader.setProperty(java.lang.String, java.lang.Object). The following properties can be set as follows:

The constants below have literal values starting with "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:sax:" and ending with "BlobDirectory", "BlobPrefix" and "BlobSuffix" respectively.

 
 xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBDIRECTORY(),
  Paths.get(System.getProperty("java.io.tmpdir"))); // value type: java.nio.file.Paths
 xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBPREFIX(), "daffodil-sax-"); // value type String
 xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBSUFFIX(), ".bin"); // value type String
 
 
The properties can be retrieved using the same variables with DaffodilParseXMLReader.getProperty(java.lang.String) and casting to the appropriate type as listed above. The following handlers can be set as follows:
 
 xmlReader.setContentHandler(contentHandler);
 xmlReader.setErrorHandler(errorHandler);
 
 
The handlers above must implement the following interfaces respectively:
 
 org.xml.sax.ContentHandler
 org.xml.sax.ErrorHandler
 
 
The ParseResult can be found as a property within the DaffodilParseXMLReader using this uri: "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:sax:ParseResult" or DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT(). In order for a successful unparse to happen, the SAX API requires the unparse to be kicked off by a parse call to any XMLReader implementation that has the DaffodilUnparseContentHandler registered as its content handler. To retrieve the UnparseResult, one can use DaffodilUnparseContentHandler.getUnparseResult() once the XMLReader.parse run is complete.

Parse

Dataprocessor Parse

The DataProcessor.parse(org.apache.daffodil.japi.io.InputSourceDataInputStream, org.apache.daffodil.japi.infoset.InfosetOutputter) method accepts input data to parse in the form of a InputSourceDataInputStream and an InfosetOutputter to determine the output representation of the infoset (e.g. Scala XML Nodes, JDOM2 Documents, etc.):
 
 JDOMInfosetOutputter jdomOutputter= new JDOMInfosetOutputter();
 InputSourceDataInputStream is = new InputSourceDataInputStream(data);
 ParseResult pr = dp.parse(is, jdomOutputter);
 Document doc = jdomOutputter.getResult();
 
The DataProcessor.parse(org.apache.daffodil.japi.io.InputSourceDataInputStream, org.apache.daffodil.japi.infoset.InfosetOutputter) method is thread-safe and may be called multiple times without the need to create other data processors. However, InfosetOutputter's are not thread safe, requiring a unique instance per thread. An InfosetOutputter should call InfosetOutputter.reset() before reuse (or a new one should be allocated). For example:
 
 JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
 for (File f : inputFiles) {
   jdomOutputter.reset();
   InputSourceDataInputStream is = new InputSourceDataInputStream(new FileInputStream(f)));
   ParseResult pr = dp.parse(is, jdomOutputter);
   Document doc = jdomOutputter.getResult();
 }
 
One can repeat calls to parse() using the same InputSourceDataInputStream to continue parsing where the previous parse ended. For example:
 
 InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
 JDOMInfosetOutputter jdomOutputter = new JDOMInfosetOutputter();
 boolean keepParsing = true;
 while (keepParsing && is.hasData()) {
   jdomOutputter.reset();
   ParseResult pr = dp.parse(is, jdomOutputter);
   ...
   keepParsing = !pr.isError();
 }
 

SAX Parse

The DaffodilParseXMLReader.parse( org.apache.daffodil.japi.io.InputSourceDataInputStream) method accepts input data to parse in the form of a InputSourceDataInputStream. The output representation of the infoset, as well as how parse errors are handled, are dependent on the content handler and the error handler provided to the DaffodilParseXMLReader. For example the SAXHandler provides a JDOM representation, whereas other ContentHandlers may output directly to a OutputStream or Writer.
 
 SAXHandler contentHandler = new SAXHandler();
 xmlReader.setContentHandler(contentHandler);
 InputSourceDataInputStream is = new InputSourceDataInputStream(data);
 xmlReader.parse(is);
 ParseResult pr = (ParseResult) xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
 Document doc = saxHandler.getDocument();
 
The The DaffodilParseXMLReader.parse( org.apache.daffodil.japi.io.InputSourceDataInputStream) method is not thread-safe and may only be called again/reused once a parse operation is completed. This can be done multiple times without the need to create new DaffodilParseXMLReaders, ContentHandlers or ErrorHandlers. It might be necessary to reset whatever ContentHandler is used (or allocate a new one). A thread-safe implementation would require unique instances of the DaffodilParseXMLReader and its components. For example:
 
 SAXHandler contentHandler = new SAXHandler();
 xmlReader.setContentHandler(contentHandler);
 for (File f : inputFiles) {
   contentHandler.reset();
   InputSourceDataInputStream is = new InputSourceDataInputStream(new FileInputStream(f));
   xmlReader.parse(is);
   ParseResult pr = (ParseResult) xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
   Document doc = saxHandler.getDocument();
 }
 
 
The value of the supported features cannot be changed during a parse, and the parse will run with the value of the features as they were when the parse was kicked off. To run a parse with different feature values, one must wait until the running parse finishes, set the feature values using the XMLReader's setFeature and run the parse again. One can repeat calls to parse() using the same InputSourceDataInputStream to continue parsing where the previous parse ended. For example:
 
 InputSourceDataInputStream is = new InputSourceDataInputStream(dataStream);
 SAXHandler contentHandler = new SAXHandler();
 xmlReader.setContentHandler(contentHandler);
 Boolean keepParsing = true;
 while (keepParsing && is.hasData()) {
   contentHandler.reset();
   xmlReader.parse(is);
   val pr = xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT());
   ...
   keepParsing = !pr.isError();
 }
 
 

Unparse

Dataprocessor Unparse

The same DataProcessor used for parse can be used to unparse an infoset via the DataProcessor.unparse(org.apache.daffodil.japi.infoset.InfosetInputter, java.nio.channels.WritableByteChannel) method. An InfosetInputter provides the infoset to unparse, with the unparsed data written to the provided WritableByteChannel. For example:
 
 JDOMInfosetInputter jdomInputter = new JDOMInfosetInputter(doc);
 UnparseResult ur = dp.unparse(jdomInputter, wbc)
 

SAX Unparse

In order to kick off an unparse via the SAX API, one must register the DaffodilUnparseContentHandler as the contentHandler for an XMLReader implementation. The call to the DataProcessor.newContentHandlerInstance(java.nio.channels.WritableByteChannel) method must be provided with the WritableByteChannel, where the unparsed data ought to be written to. Any XMLReader implementation is permissible, as long as they have XML Namespace support.
 
  ByteArrayInputStream is = new ByteArrayInputStream(data);
  ByteArrayOutputStream os = new ByteArrayOutputStream();
  WritableByteChannel wbc = java.nio.channels.Channels.newChannel(os);
  DaffodilUnparseContentHandler unparseContentHandler = dp.newContentHandlerInstance(wbc);
  try {
   XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
   xmlReader.setContentHandler(unparseContentHandler)
   xmlReader.parse(is)
  } catch (ParserConfigurationException | SAXException e) {
   ...
 `} catch catch (DaffodilUnparseErrorSAXException | DaffodilUnhandledSAXException e) {
   ...
  }
 
 
The call to the XMLReader.parse method must be wrapped in a try/catch, as DaffodilUnparseContentHandler relies on throwing an exception to end processing in the case of any errors/failures. There are two kinds of errors to expect DaffodilUnparseErrorSAXException, for the case when the WithDiagnostics.isError(), and DaffodilUnparseErrorSAXException, for any other errors. In the case of an DaffodilUnhandledSAXException, DaffodilUnparseContentHandler.getUnparseResult() will return null.
 
  try {
    xmlReader.parse(new InputSource(is));
  } catch (DaffodilUnparseErrorSAXException | DaffodilUnhandledSAXException e) {
    ...
  }
  UnparseResult ur = unparseContentHandler.getUnparseResult();
 
 

Failures and Diagnostics

It is possible that failures could occur during the creation of the ProcessorFactory, DataProcessor, or ParseResult. However, rather than throwing an exception on error (e.g. invalid DFDL schema, parse error, etc), these classes extend WithDiagnostics, which is used to determine if an error occurred, and any diagnostic information (see Diagnostic) related to the step. Thus, before continuing, one must check WithDiagnostics.isError(). For example:
 
 ProcessorFactor pf = c.compile(files);
 if (pf.isError()) {
   java.util.List<Diagnostic> diags = pf.getDiagnostics();
   foreach (Diagnostic d : diags) {
     System.out.println(d.toString());
   }
   return -1;
 }
 

Saving and Reloading Parsers

In some cases, it may be beneficial to save a parser and reload it. For example, when starting up, it may be quicker to reload an already compiled parser than to compile it from scratch. To save a DataProcessor:
 
 DataProcessor dp = pf.onPath("/");
 dp.save(saveFile);
 
And to restore a saved DataProcessor:
 
 DataProcessor dp = Daffodil.reload(saveFile);
 ParseResult pr = dp.parse(data);
 
And use like below:
 
 ParseResult pr = dp.parse(data);
 
or
 
 DaffodilParseXMLReader xmlReader = dp.newXMLReaderInstance();
 ... // setting appropriate handlers
 xmlReader.parse(data);
 ParseResult pr = xmlReader.getProperty("...ParseResult");
 
Skip navigation links