Saturday, June 11, 2011

Framing the Big Study Problem

Summary: Large studies such as thin slice CT create a performance problem in unoptimized implementations; DICOM provides several means of addressing these problems without throwing out DICOM entirely and reimplementing, as the MINT folks originally proposed; retrospective use of the enhanced multi-frame family of objects may be able to alleviate this problem, even without support in the modalities, by converting legacy single-frame DICOM objects to enhanced multi-frame objects in the PACS for distribution to workstations or other PACS or archives.

Long version:

A group of folks at Johns Hopkins, Harris Corp, and Vital Images have been working on the "large study" problem and have produced a largely "DICOM free" implementation (apart from modality image ingestion) called Medical Imaging Network Transport (MINT). They are now proposing that this become a new "standard" and be blessed by and incorporated in DICOM as a "replacement". Since the MINT implementation is based on HTTP transport, DICOM WG 27 Web Technology has become the home for these discussions. Not surprisingly, the "replace everything" work item proposal was rejected by the DICOM Standards Committee at our last meeting by a large majority - you can read the summary in the minutes of the committee and see the slides presented by the MINT folks.

The rejection by the committee of the proposal should not be interpreted as a rejection of the validity of the use-case, however.

It is accepted that large studies potentially pose a problem for many existing implementations, both for efficient transfer from the central store to the user's desktop for viewing or analysis, and for bulk transfer between two stores (e.g., between a PACS and a "vendor neutral archive" or a regional image repository).

So, to move forward with solving the problem WG 6 and WG 27 met together earlier this week to try to achieve consensus on what the existing DICOM standard has to offer in this respect, and to identify any gaps may exist that could be filled by incremental extensions to the standard.

If one puts aside the assumption that it is necessary to completely replace DICOM (and hence re-solve every problem that DICOM and PACS vendors have spent the last quarter of a century solving), and instead focus narrowly on the key aspects of concern, two essential issues emerge:
  1. transporting large numbers of slices as separate single instances (files) is potentially extremely inefficient
  2. replicating the "meta-data" for the entire patient/study/series/acquisition in every separate single instance is also potentially extremely inefficient, and though the size of the meta-data is trivial by comparison with the bulk data, the effort to repeatedly parse it and sort out what it means as a whole on the receiving end is definitely not trivial
MINT, as currently implemented, tries to "normalize" the entire study, and this is what was initially proposed to DICOM.

Yet this approach ignores the significant effort that has already been put into "normalizing" each acquisition at the modality end, specifically, the "enhanced multi-frame" family of DICOM objects defined for CT, MR and PET as well as XA/XRF, and new applications like 3D X-ray, breast tomosynthesis, ophthalmic optical coherence tomography (OCT), intra-vascular OCT, pathology whole slide imaging (WSI), etc. The following slide (which I simplified and redrew from an early one produced by either Bob Haworth or Kees Verduin for WG 16) illustrates how the enhanced multi-frame family of objects uses the shared and per-frame "functional groups" (as well as the top level DICOM dataset) to factor out the commonality compared to encoding single slices each with its own complete "header":
Now, it is no secret that adoption of the enhanced family of objects has been very slow, especially by the modalities that already have single frame "legacy" DICOM objects, particularly the CT and MR. Currently only Philips offers a commercial MR implementation and Toshiba offers a commercial CT implementation. Many PACS are capable of storing and regurgitating these over a DICOM connection, but may not be capable of viewing them or sorting or annotating them correctly, or performing more sophisticated functions on them like 3D and MPR rendering, nor for that matter are they well supported in many CD-based viewers, etc.

But it is important to distinguish between gaps in implementations, as opposed to gaps in the DICOM standard. If the standard already specifies a means to solve a problem it should be used by implementers; inventing a new "standard" like MINT to solve the problem is not going to encourage implementation (unless it solves other pressing problems as well). The bottom line here seems to be that PACS vendors in particular are not well motivated to solve in an interoperable (standard) way, any problem beyond ingestion of images; many PACS vendors may be quite happy with proprietary implementations between the archive/manager component of their PACS and their image display devices or software. But the last thing we need are multiple competing standard approaches to solving the same problem (or entire competing standards), since that only compromises interoperability.

So, to cut a long story short, the argument was put forth this week that use of the enhanced multi-frame family of objects for encoding a single "acquisition" as a single object should suffice to achieve the vast majority of the benefits of the "study normalization" suggested by MINT.

We explored some of what could be achieved by using enhanced multi-frame objects and observed that:
  1. though not all modalities can create enhanced multi-frame objects, it is possible to "convert" the original legacy single frame objects into such multi-frame objects
  2. the modality-specific enhanced multi-frame objects have many mandatory and coded attributes that are not present in the legacy single frame object, which it is challenging if not impossible to populate during such a conversion
  3. there are "secondary capture" enhanced multi-frame objects that do permit the optional inclusion of position, orientation, temporal and dimension information extracted from legacy single frame objects, and conversion to these might suffice for the vast majority of bulk transfer and viewing and analysis use-cases
  4. it may be desirable to either a) document in the standard how to perform such a conversion, or b) define new IODs and SOP Classes that are somewhere in between the "everything optional" enhanced secondary capture objects and the modality-specific objects in terms of requirements, in order to assure interoperability of archives and viewers using such an approach
  5. it may also be desirable to specify the requirements for full round-trip fidelity conversion from the legacy single frame objects to the converted enhanced multi-frame object and back again, to allow intermediate devices to take advantage of the multi-frame objects but still serve extracted single frame objects to legacy receiving devices, of which there will remain many in the installed base
These ideas are not new. For example, adding informative language to the standard about how to perform the round trip for MR objects was discussed at WG 16 during the later phases of development of Supplement 49, and particularly as we were preparing to promote it and demonstrate it. I was not in favor of adding that informative text at the time, but in retrospect it might have limited subsequent confusion if we had. I for one certainly underestimated the inertia of many of the existing modality vendors. WG 16 discussed at that time the use of negotiating the SOP Class over the DICOM association, and "falling back" to sending legacy single frame images when the SCP does not support the enhanced multi-frame image SOP Classes. When I was with GE in the mid-1990's, we used this "trick" when implementing the DX objects in the digital detector systems, i.e., to fall back to CR or secondary capture if the PACS was unaware of the "new" DX objects, and this approach was discussed with the other participants in WG 2 as a means of mitigating the risk of adopting the "new" SOP Classes. If you look at the inside of a modern Philips enhanced multi-frame MR object, you will see in there a private sequence data element, within each item of which is a complete list of all the attributes necessary to reconstruct a set of legacy single frame images, even though many of these duplicate the information contained in the "proper" place in the enhanced object "functional group" sequences; this simplifies the conversion to legacy process because the converter doesn't need to "understand" the enhanced objects functional groups.

So, the new action item for WG 6 (and more specifically for me, since I volunteered to write it), is to produce a work item proposal for the committee to define a new IOD and SOP Class (or perhaps modality-specific family of them), for "transitional multi-frame converted legacy" images, with the deficiency in the existing standard being the lack of a set of multi-frame images that can be fully populated with only the limited information in the legacy images but with sufficient mandatory position, orientation, temporal and dimension information to satisfy the 3D and 4D viewing and rendering and bulk transfer use cases.

In the interim, now that MINT guys have been encouraged to look at the potential use of the secondary capture multi-frame objects, they have the opportunity to experiment with them to see if they can achieve the necessary performance in their implementation.

The following four slides illustrate graphically the principle of migration from:
  1. a completely proprietary optimized PACS to workstation interface (where the "viewer" is essentially "part of the PACS"), to
  2. a DICOM standard PACS to workstation boundary (possible with current single frame DICOM Query/Retrieve/Store interfaces, but likely not "optimized" for performance by the vendor), to
  3. converting to multi-frame objects (or passing through those from modalities), combined with round-trip de-conversion to support legacy workstations, to
  4. supporting PACS to PACS (or Image Manager/Image Archive or Vendor Neutral Archive) transfers also using legacy objects converted to multi-frame if supported by both sides:





In what ways does this proposal differ from what the MINT implementation has done to date ?
  1. the aggregation of meta-data would occur at the "acquisition" level, and not the entire "study" level; this would seem to be sufficient to capture the vast majority of the performance benefit in that when viewing or performing 3D/4D analysis, the bulk of the pixel data and meta-data for each "set" will be within one object
  2. the enhanced multi-frame objects require that every frame have the same number of rows and columns and mostly the same pixel data characteristics (bit depth, etc.); this means that funky image shapes like localizers will end up in separate objects
  3. the opportunity exists to pre-populate the "dimension" information that is a feature of the enhanced family of objects, e.g., this dimension is space, this is time, etc., rather than have to "figure it out" retrospectively from each vendor's pattern of use of the individual descriptive attributes
The emphasis in this discussion has been on the use case for cross-sectional modality acquisitions, which have been the primary source of performance concerns to date. Another area of "large study" concern is digital mammography, which is characterized by relatively small numbers or relatively large images. Given that there are a small number of these, the network transfer (or database insertion and extraction) performance problems of unoptimized implementations may be less of a factor. Arguably, it might be nice to have access to "normalized meta-data" about the handful of images before transferring the bulk data, but this is probably not a sufficient concern to justify throwing away the whole of DICOM to use MINT. A real problem is going to occur when breast tomosynthesis becomes popular, and for these there is already an enhanced multi-frame DICOM object defined in Supplement 125, and modality vendors seem committed to implementing it, given the fact that a dedicated viewer is going to be required for effective use of these, regardless.

Also discussed at our recent meeting was the availability of mechanisms in DICOM for gaining access to selected frames and to meta-data (the "header") without transferring everything. Those two features are defined in Supplement 119, Instance and Frame Level Retrieve SOP Classes, which was specifically written to address the consequences of putting "everything" in single large objects. For example, if a report references one or two key frames in a very large object, one needs the ability to retrieve just those frames efficiently. Supplement 119 defines a mechanism for doing so, by extracting those frames, and building a small but still valid DICOM object to retrieve and display. The existing WADO HTTP-based DICOM service also supports the retrieval of a selected frame, as do the equivalent SOAP-based Web Services transactions defined in IHE XDS-I, back ported into the DICOM standard in Supplement 148 WADO via Web Services, currently out for ballot. Though Supplement 119 does defined a SOP Class for gaining access to the meta-data without transferring the bulk data, if one uses the JPEG Interactive Protocol (JPIP) to access frames or selected regions of a frame in JPEG 2000, one can also gain access to the meta-data using a specific Transfer Syntax (see Supplement 106).

Unlike IHE (particularly IHE XDS and XDS-I), the MINT guys are also RESTful at heart, and this is reflected in their current implementation. We tried to keep out of the REST versus SOAP religious wars during our most recent discussion, and focus on what DICOM already has to solve the use case. Yet to be resolved is the matter of whether DICOM already has sufficient pure DICOM network protocol support and HTTP-based support to satisfy the use-cases without having to introduce additional RESTful equivalents. On the one hand there is a Committee and WG 6 level desire to not have multiple gratuitously different ways to do the same thing; on the other hand there may be significant advantages to alternative mechanisms if they can take effective advantage of off-the-shelf HTTP infrastructure components. A case in point is the use of HTTP caching that requires some statelessness in the transactions to be effective. The MINT guys were advised to present evidence that such caching is sufficiently beneficial in order to justify the introduction of yet another transport mechanism, and this is something that WG 27 intends to follow up on.

Related religious wars about whether or not DICOM or HTTP should be used "within" an enterprise (i.e., over the LAN), the extent to which DICOM can be used between LANs that are nominally part of the same "enterprise" but are separated by firewalls (i.e., if the Canadians can do DICOM between two places, why can't Johns Hopkins), why XDS-I is not sufficient, etc., were mentioned but essentially deferred for another day. One key aspect mentioned, but not discussed in much detail, was the matter of user authentication and access control, and the IHE direction that uses Kerberos (EUA) within an enterprise and SAML assertions (XUA) across enterprises; this is easy for DICOM and SOAP-based WS like XDS-I, but potentially problematic for RESTful solutions (like WADO). Whether or not Vendor Neutral Archives (VNA), whatever they are, are a good idea or not was also not debated; we simply agreed that the efficient bulk data transfer from one archive to another using a standard protocol is a genuine use case. That said, the IHE radiology guys (myself included) are contemplating considering (again) the question of whether to separate the Image Manager from the Image Archive Actors in the IHE Radiology Technical Framework, so we do have the opportunity to start a whole new war in a whole new forum.

Another interesting use case that we discussed is the so-called "zero-footprint" viewer that can run in a "standard" browser that makes use of no additional technology, whether it be a medical application or generic plug-in like Adobe Flash or whatever, since not everybody has that available (especially on mobile devices like tablets). This essentially requires that the server be able to provide a source of meta-data and bulk pixel data that is amenable to efficient rendering and sufficient interaction within something as simple as JavaScript. The extent to which DICOM, WADO and IHE XDS-I based web services are lacking with respect to this zero footprint use case, and to what extent aspects of MINT offer advantages, remains to be determined. There has already been a lot of work in this area; see for example the dcm4che XERO approach discussed briefly in this article about the Benefits of Using the DCM4CHE DICOM Archive. I am not surely exactly what state the open source XERO project is in, given that Agfa uses it in a commercial implementation now, but obviously many of the principles are generally applicable. The question of whether a JSON or a GBP representation of the DICOM header is required (as opposed to MINT's dislike of WG 23's Supplement 118 XML representation of DICOM attributes) has not yet been explored; certainly JSON would seem like a more natural fit if JavaScript is the primary mechanism of implementation, but I dare say that could become the subject of yet another religious war.

There are many other practical issues that the MINT folks have encountered, such as the lack of uniqueness of UIDs, or inconsistency in some of the patient or study information between individual slices, but many of these can be characterized as "implementation" problems faced by any PACS or archive when populating their databases, and not something that the DICOM standard (or any standard) can really resolve. I.e., if an implementation fails to comply with the standard it is just plain "bad" (or the model of the real-world in the standard does not actually match the real-world). At some point (usually ingestion from the modality, and/or administrative study merges and other "corrections"), any implementation has to deal with this, and will have to regardless of whether the originally proposed MINT approach or the conversion to enhanced multi-frame DICOM approach is used. With respect to the need for change management, the MINT proponents were made aware of the Image Object Change Management (IOCM) profile defined by IHE, which addresses the use-cases and implementation of change in a loosely-coupled multi-archive environment, as well as the IHE Multiple Image Manager/Archive (MIMA) profile, which addresses archives with different patient identity domains and what to do with DICOM identifying attributes when transferring across domains. With respect to modalities or other implementations that create non-unique UIDs, the need to a) detect and correct for this on ingestion, and b) report the defects to the offending vendor, was emphasized.

Finally, the foregoing should not be taken to mean that switching to the use of multi-frame objects is a panacea, nor indeed a prerequisite for the efficient transport of large multi-slice studies. As we were careful to emphasize during the initial roll out of the enhanced multi-frame DICOM CT and MR objects, the primary goal was improved interoperability for advanced applications, not transfer performance improvement, since it was well known at the time that optimized applications transporting single frame objects can achieve very good performance (e.g., through the negotiation and use of DICOM asynchronous operations, or multiple simultaneous associations if asynchronous operations cannot be negotiated, both of which eliminate the impact of delayed acknowledgment of individual C-STORE operations, whether it be due to network latency or application level delays such as waiting for successful database insertion before acknowledgment). Rather, poor observed performance in the real world is often a consequence of applications simply not being optimized or well designed in this respect, and many vendors' engineers are far too quick to switch to a proprietary optimized protocol and ignore opportunities for optimizing standards-based solutions. As a case in point, this old white paper from Oracle on A Performance Evaluation of Storage and Retrieval of DICOM Image Content, which shows quite impressive numbers for single frame DICOM images using JDBC (not DICOM network) based server to client retrieval over five 1 Gigabit Ethernet connections between server and client (over 400 MB/s, 852 images/s, 1497 Cardiac CT studies per hour). MINT performance figures over a single connection as published so far are also impressive (Harris's results and Vital's results) though the hardware is different. The bottom line is probably that the protocol used is less important than the architecture and implementation details on both end, and in comparing performance claims for specific commercial implementations, one needs to be sure one is comparing apples with apples rather than oranges. The lack of a published industry standard benchmark for these use-cases is probably a significant gap that we should try to close.

The following slide is one that I produced for the early enhanced multi-frame demonstration and educational lectures, about where the DICOM protocol critical delay lies:

The C-STORE acknowledgment discussion is separable but related to the discussion of TCP/IP performance in the presence of significant latency (or also significant packet loss). As was emphasized at the recent meeting by the purveyor of a potential proprietary TCP/IP replacement (Aspera), unmodified TCP/IP over wide area networks is not ideal for taking full advantage of the theoretical bandwidth limits, and both DICOM and HTTP (and hence MINT, which is HTTP-based) are potentially at a disadvantage in this respect. The conventional answer to this is to use multiple connections and associations and to swap out the TCP stack at both ends of a slow connection (e.g., a satellite link), and/or to use a "WAN accelerator" box at both ends (such as something from Circadence). I am not recommending or promoting any of these technologies or companies, since I have no experience with them. I will say that the idea of changing DICOM applications and tool kits to use something other than TCP/IP or to add the ability to negotiate something proprietary (as Aspera was suggesting), is superficially way less attractive to me, than putting in a box in between that takes care of the problem, transparent to the applications, if it achieves anything close to the maximum possible "goodput". Anyway, if you are interested in thinking about TCP/IP performance issues, I have found Hassan and Jain's book High Performance TCP/IP Networking to be a good introduction. In this discussion, not for the first time, trying to take advantage of UDP or features of various peer-to-peer network protocols was also discussed, and a quick Google search on DICOM and P2P or UDP file transfer will reveal some interesting articles and experiments.

David

PS. Note that in the foregoing I reference and provide links to numerous DICOM Supplements that introduced various features; since most of these supplements have long since been folded into the body of the DICOM standard, and may have had subsequent corrections applied, implementers need to reference the latest DICOM standard text and not the old supplement text, as appropriate. I reference the supplements only to provide a historical time line and to provide the context for interpreting their scope and use.