Showing posts with label stream. Show all posts
Showing posts with label stream. Show all posts

Thursday, July 4, 2013

Sending large attachments via SOAP and MTOM in Java

     Sometimes you need to pass a large chunk of unstructured (possibly even binary) data via SOAP protocol — for instance, you wish to attach a file to a message. The default way to do this is to pass the data in an XML element with base64Binary type. What it effectively means is, your data will be Base64-encoded and passed inside the message body. Not only your data gets enlarged by about 30%, but also any client or server that sends or receives such message will have to parse it entirely which may be time and memory consuming on large volumes of data.

     To solve this problem, the MTOM standard was defined. Basically it allows you to pass the content of a base64Binary block outside of the SOAP message, leaving a simple reference element instead. As for the correspondent HTTP binding, the message is transferred as a SOAP with attachments with a multipart/related content type. I won't go into the details here, you may learn it all straight from the above mentioned standards and RFCs.

     The tricky part is, although we've disposed of a 30% volume overhead by passing the data outside of the message, the standards themselves don't specify the ways of processing the messages by the implementations of clients and servers — whether the messages should be completely read into memory with all their attachments during sending and receiving or offloaded on external storage. By default, the implementations (including Java's SAAJ) usually read the attachments completely into memory, thus causing a possibility of running out of memory on large files or heavy-loaded systems. In Java, this is usually signified by a "java.lang.OutOfMemoryError: Java heap space" error.

     In this post I will demonstrate a simple client-server application that can transfer SOAP attachments of arbitrary volume with disk offloading, using Apache CXF on the client and Oracle's SAAJ implementation (a part of JDK 6+) on the server. This will require some tuning for the mentioned frameworks. The complete code of the application is available on GitHub.

     First, we will place the common files (XSD and WSDL) in a separate project, as they will be used by both client and sever. The WSDL schema of the service is relatively straightforward: we have a port with a single operation that consists of a SimpleRequest request and a SimpleResponse response from the server. The file is transferred in the request to the server. The XSD schema of request and response is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<s:schema elementFormDefault="qualified"
          targetNamespace="http://forketyfork.ru/mtomsoap/schema"
          xmlns:s="http://www.w3.org/2001/XMLSchema"
          xmlns:xmime="http://www.w3.org/2005/05/xmlmime">

    <s:element name="SampleRequest">
        <s:annotation>
            <s:documentation>Service request</s:documentation>
        </s:annotation>
        <s:complexType>
            <s:sequence>
                <s:element name="text" type="s:string" />
                <s:element name="file" type="s:base64Binary" xmime:expectedContentTypes="*/*" />
            </s:sequence>
        </s:complexType>
    </s:element>

    <s:element name="SampleResponse">
        <s:annotation>
            <s:documentation>Service response</s:documentation>
        </s:annotation>
        <s:complexType>
            <s:attribute name="text" type="s:string" />
        </s:complexType>
    </s:element>

</s:schema>
     Take a note of the imported xmime schema, and the usage of xmime:expectedContentTypes="*/*" attribute on a binary data element. This enables us to generate correct JAXB code out of this schema, because by default the base64Binary element corresponds to a byte[] array field in the JAXB-mapped class. But as we'll see, the expectedContentTypes attribute alters the generation of the class:

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "text",
    "file"
})
@XmlRootElement(name = "SampleRequest")
public class SampleRequest {

    @XmlElement(required = true)
    protected String text;
    @XmlElement(required = true)
    @XmlMimeType("*/*")
    protected DataHandler file;

    ...
     Note that the file field is of type DataHandler, which allows for streaming processing of the data.

     We shall generate the JAXB classes for both client and server, and a service class for the client, using Apache CXF cxf-codegen-plugin for Maven during build-time. The configuration is as follows:

<plugin>
    <groupId>org.apache.cxf</groupId>
    <artifactId>cxf-codegen-plugin</artifactId>
    <version>${cxf.version}</version>
    <executions>
        <execution>
            <id>generate-sources</id>
            <phase>generate-sources</phase>
            <configuration>
                <sourceRoot>${project.build.directory}/generated-sources/cxf</sourceRoot>
                <wsdlOptions>
                    <wsdlOption>
                        <wsdl>${basedir}/src/main/resources/service.wsdl</wsdl>
                        <wsdlLocation>classpath:service.wsdl</wsdlLocation>
                    </wsdlOption>
                </wsdlOptions>
            </configuration>
            <goals>
                <goal>wsdl2java</goal>
            </goals>
        </execution>
    </executions>
</plugin>
     In this Maven plugin configuration we explicitly specify the wsdlLocation property that will be included into the generated service class. Without it, the generated path to the WSDL file will be a local path on the developer's machine, which we obviously don't want.

     The client (module mtom-soap-client) is plain simple, as it is based on Apache CXF and a generated SampleService class. Here we only enable MTOM for underlying SOAP binding and specify an infinite timeout, as the transfer of large files may take time:


        // Creating a CXF-generated service
        Sample sampleClient = new SampleService().getSampleSoap12();

        // Setting infinite HTTP timeouts
        HTTPClientPolicy httpClientPolicy = new HTTPClientPolicy();
        httpClientPolicy.setConnectionTimeout(0);
        httpClientPolicy.setReceiveTimeout(0);
        HTTPConduit httpConduit = (HTTPConduit) ClientProxy.getClient(sampleClient).getConduit();
        httpConduit.setClient(httpClientPolicy);

        // Enabling MTOM for the SOAP binding provider
        BindingProvider bindingProvider = (BindingProvider) sampleClient;
        SOAPBinding binding = (SOAPBinding) bindingProvider.getBinding();
        binding.setMTOMEnabled(true);

        // Creating request object
        SampleRequest request = new SampleRequest();
        request.setText("Hello");
        request.setFile(new DataHandler(new FileDataSource(args[0])));

        // Sending request
        SampleResponse response = sampleClient.sample(request);

        System.out.println(String.format("Server responded: \"%s\"", response.getText()));

     The server is based on the Spring WS framework. Only we won't use a typical default <annotation-config /> configuration here and specify a custom DefaultMethodEndpointAdapter configuration, because we need Spring WS to use our custom-configured jaxb2Marshaller bean:

<!-- The service bean -->
<bean class="ru.forketyfork.mtomsoap.server.SampleServiceEndpoint" p:uploadPath="/tmp"/>

<!-- SAAJ message factory configured for SOAP v1.2 -->
<bean id="messageFactory" class="org.springframework.ws.soap.saaj.SaajSoapMessageFactory"
      p:soapVersion="#{T(org.springframework.ws.soap.SoapVersion).SOAP_12}"/>

<!-- JAXB2 Marshaller configured for MTOM -->
<bean id="jaxb2Marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller"
      p:contextPath="ru.forketyfork.mtomsoap.schema"
      p:mtomEnabled="true"/>

<!-- Endpoint mapping for the @PayloadRoot annotation -->
<bean class="org.springframework.ws.server.endpoint.mapping.PayloadRootAnnotationMethodEndpointMapping" />

<!-- Endpoint adapter to marshal endpoint method arguments and return values as JAXB2 objects -->
<bean class="org.springframework.ws.server.endpoint.adapter.DefaultMethodEndpointAdapter">
    <property name="methodArgumentResolvers">
        <list>
            <ref bean="marshallingPayloadMethodProcessor" />
        </list>
    </property>
    <property name="methodReturnValueHandlers">
        <list>
            <ref bean="marshallingPayloadMethodProcessor" />
        </list>
    </property>
</bean>

<!-- JAXB@ Marshaller/Unmarshaller for method arguments and return values -->
<bean id="marshallingPayloadMethodProcessor" class="org.springframework.ws.server.endpoint.adapter.method.MarshallingPayloadMethodProcessor">
    <constructor-arg ref="jaxb2Marshaller" />
</bean>
     Important thing to notice here is a mtomEnabled property of jaxb2Marshaller, the rest of the configuration is quite typical.

     The SampleServiceEndpoint class is a service that is bound via the @PayloadRoot annotation to process our SampleRequest requests:

    @PayloadRoot(namespace = "http://forketyfork.ru/mtomsoap/schema", localPart = "SampleRequest")
    @ResponsePayload
    public SampleResponse serve(@RequestPayload SampleRequest request) throws IOException {

        // randomly generating file name as a UUID
        String fileName = UUID.randomUUID().toString();
        File file = new File(uploadPath + File.separator + fileName);

        // writing attachment to file
        try(FileOutputStream fos = new FileOutputStream(file)) {
            request.getFile().writeTo(fos);
        }

        // constructing the response
        SampleResponse response = new SampleResponse();
        response.setText(String.format("Hi, just received a %d byte file from ya, saved with id = %s",
                file.length(), fileName));

        return response;
    }

     Notice how we work with the request.getFile() field of the request. Remember, the type of the field is DataHandler. What actually happens is, the request.getFile() wraps an InputStream that points to the attachment that was offloaded by SAAJ to disk when the request was received. So we may copy this file to another location or process it in any way while not loading it completely into memory.

     A final trick is to enable the attachment offloading for the Oracle's SAAJ implementation that is bundled with Oracle's JDK starting from version 6. To do that, we must run our server with the -Dsaaj.use.mimepull=true JVM argument.

     Once again, the complete code for the article is available on GitHub.

Friday, June 21, 2013

How to return a file, a stream or a classpath resource from a Spring MVC controller

     You can use AbstractResource subclasses as return values from the controller methods, combining them with the @ResponseBody method annotation.

     Consequently, as soon as you know the filesystem path of the file or have its URI, returning a file from a Spring MVC controller is as easy as:
    @RequestMapping(value = "/file", method = RequestMethod.GET, 
                produces = MediaType.IMAGE_JPEG_VALUE)
    @ResponseBody
    public Resource getFile() throws FileNotFoundException {
        return new FileSystemResource("/Users/forketyfork/cat.jpg");
    }
     The code to return a classpath resource is quite similar:
    @RequestMapping(value = "/classpath", method = RequestMethod.GET, 
                produces = MediaType.IMAGE_JPEG_VALUE)
    @ResponseBody
    public Resource getFromClasspath() {
        return new ClassPathResource("cat.jpg");
    }
     But how about outputting data from a stream? A common advice is to inject HttpServletResponse as a method parameter and write directly to the output stream of the response. But this badly breaks the abstraction, not to mention the testability. Technically we can write to a Writer introduced as a method parameter, like this:
    @RequestMapping(value = "/writer", method = RequestMethod.GET, 
                produces = MediaType.TEXT_PLAIN_VALUE)
    @ResponseBody
    public void getStream(Writer writer) throws IOException {
        writer.write("Hello World!");
    }
     A seemingly simple one-liner. But if you consider serving a large chunk of binary data, this approach appears to be quite slow, memory-consuming and not very handy as it uses the Writer which deals in chars. Moreover, Spring MVC is not able to set the Content-Length header until the output is finished. Here's a slightly more verbose solution, which however does not break the abstraction and is fast and testable.
    @RequestMapping(value = "/stream", method = RequestMethod.GET, 
                produces = MediaType.TEXT_PLAIN_VALUE)
    @ResponseBody
    public Resource getStream() {

        String string = "Hello World!";
        // acquiring the stream
        InputStream stream = new ByteArrayInputStream(string.getBytes());
        // counting the length of data
        final long contentLength = string.length();

        return new InputStreamResource(stream){
            @Override
            public long contentLength() throws IOException {
                return contentLength;
            }
        };

    }
     First, we acquire the stream. Then we count the length of the content we need to output. This may be done in some optimized fashion so as not to process the content entirely. Spring MVC first calls the contentLength() method of the InputStreamResource, sets the Content-Length header and then pipes the stream to the client.

     Here we touch on a bit of inconsistency in Spring API. The class InputStreamResource extends the AbstractResource, which in turn implements the method contentLength() by processing the whole incapsulated stream to count its length. InputStreamResource does not override the contentLength() method, but does override the getInputStream() method, prohibiting to call it more than once, which effectively does not allow for direct usage of this class as a controller method return value. In the example above we override the contentLength() method and provide the correct functionality.