Sunday, October 4, 2015

What's that 'mean', or is 'mean' 'meaningless'?

Summary: The current SNOMED code for "mean" used in DICOM is not defined to have a particular meaning of mean, which comes to light when considering adding geometric as opposed to arithmetic mean. Other sources like NCI Thesaurus have unambiguously defined terms. The STATO formal ontology does not help because of its circular and incomplete definitions.

Long Version:

In this production company closing logo for Far Field Productions, a boy point to a tree and says "what's that mean?"

One might well ask when reading DICOM PS3.16 and trying to decide when to use the coded  "concept" (R-00317, SRT, "Mean") (SCT:373098007).

This question arose when Mathieu Malaterre asked about adding "geometric mean", which means (!) it is now necessary to distinguish "geometric" from "arithmetic" mean.

As you probably know, DICOM prefers not to "make up" its own "concepts" for such things, but to defer to external sources when possible. SNOMED is a preferred such external source (at least for now, pending an updated agreement with IHTSDO that will allow DICOM to continue to add SNOMED terms to PS3.16 and allow implementers to continue to use them with license or royalty payments, like the old agreement). However, when we do this, we do not provide explicit (textual or ontologic) definitions, though we may choose to represent one of multiple possible alternative terms (synonyms) rather than the preferred term, or indeed make up our own "code meaning" (which is naughty, probably, if it subtly alters the interpretation).

So what does "mean" "mean"?

Well, SNOMED doesn't say anything useful about (R-00317, SRT, "Mean") (SCT:373098007). The SNOMED "concept" for "mean" has parents:

> SNOMED CT Concept (SNOMED RT+CTV3)
  > Qualifier value (qualifier value) 
     > Descriptor (qualifier value) 
        > Numerical descriptors (qualifier value)
 

which doesn't help a whole lot. This is pretty par for the course with SNOMED, even though some SNOMED "concepts" (not this one) have (in addition to their "Is a" hierarchy), a more formal definition produced by other types of relationship (e.g., "Procedure site - direct", "Method"), etc. I believe these are called "fully defined" (as distinct from "primitive").

So one is left to interpret the SNOMED "term" that is supplied as best one can.

UMLS has (lexically) mapped SCT:373098007 to UMLS:C1298794, which is "Mean - numeric estimation technique", and unfortunately has no mappings to other schemes (i.e., it is a dead end). UMLS seems to have either consciously or accidentally not linked the SNOMED-specific meaningless mean with any of (C0444504 ,UMLS, "Statistical mean"), (C2347634, UMLS, "Population mean") or (C2348143, UMLS, "Sample mean").

There is no UMLS entry for "arithmetic mean" that I could find, but the "statistical mean" that UMLS reports, is linked to the "mean" from NCI Thesaurus, (C53319, NCIt, "Mean"), which is defined textually as one might expect, as "the sum of a set of values divided by the number of values in the set". This is consistent with how Wikipedia, the ultimate albeit evolving source of all knowledge, defines "arithmetic mean".

SNOMED has no "geometric mean" but UMLS and NCI Thesaurus do. UMLS:C2986759 maps to NCIt:C94906.

One might expect that one should be able to do better than arbitrary textual definitions for a field as formalized as statistics. Sure enough I managed to find STATO, a general-purpose STATistics Ontology, which looked promising on the face of it. One can poke around in it on-line (hint: look at the classes tab and expand the tree), or download the OWL file and use a tool like Protégé.

If you are diligent (and are willing to wade through the Basic Formal Ontology (BFO) based hierarchy:

entity
> continuant
  > dependent continuant
    > generic dependent continuant
      > information content entity
        > data item
          > measurement data item
            > measure of central tendency
              > average value

one finally gets to a child, "average value", which has an "alternative term" of "arithmetic mean".

Yeah!

But wait, what is its definition? There is a textual annotation "definition" that is "a data item that is produced as the output of an averaging data transformation and represents the average value of the input data".

F..k! After all that work, can you say "circular"? I am sure Mr. Rogers can.

More formally, STATO says "average value" is equivalent to "is_specified_output_of some 'averaging data transformation'". OK, may be there is hope there, so let's look at the definition of "averaging data transformation" in the "occurrent" hierarchy (don't ask; read the "Building Ontologies with Basic Formal Ontology" book).

Textual definition: "An averaging data transformation is a data transformation that has objective averaging". Equivalent to "(has_specified_output some 'average value') or (achieves_planned_objective some 'averaging objective')".

Aargh!

Shades of lexical semantics (Cruse is a good read, by the way), and about as useful for our purposes:(

At least though, we know that STATO:'average value' is a sub-class of STATO:'measure of central tendency', which has a textual definition of "a measure of central tendency is a data item which attempts to describe a set of data by identifying the value of its centre", so I guess we are doing marginally better than SNOMED in this respect (but that isn't a very high bar). Note that in the previous sentence I didn't show "codes" for the STATO "concepts", because it doesn't seem to define "codes", and just uses the human-readable "labels" (but Cimino-Desiderata-non-compliance is a subject for another day).

In my quest to find a sound ontological source for the "concept" of "geometric mean", I was also thwarted. No such animal in STATO apparently, yet, as far as I could find (maybe I should ask them).

So not only does STATO have useless circular definitions but it is not comprehensive either. Disappointed!

So I guess the best we can do in DICOM for now, given that the installed base (especially of ultrasound devices) probably use (R-00317, SRT, "Mean") a lot, is to add text that says when we use that code, we really "mean" "mean" in the sense of "arithmetic mean", and not the more generic concept of other things called "mean", and add a new code that is explicitly "geometric mean". Perhaps SNOMED will add a new "concept" for "geometric mean" on request and/or improve their "numerical descriptors" hierarchy, but in the interim either the NCI Thesaurus term NCIt:C94906 or the UMLS entry UMLS:C2986759 would seem to be adequate for our purposes. Sadly, the more formal ontologies have not been helpful in this respect, at least the one I could find anyway.

Maybe we should also be extremely naughty and replace all uses of (R-00317, SRT, "Mean") in the DICOM Standard with (R-00317, SRT, "Arithmetic mean"), just to be sure there is no ambiguity in the DICOM usage (and suggest to SNOMED that they add it as an alternative term). This would be less disruptive to the DICOM installed base than replacing the inadequately defined SNOMED code with the precisely defined NCI Thesaurus code.

David

PS. I italicize "concept" because there is debate over what SNOMED historically and currently defines "concept" to be, quite apart from the philosophical distinctions made by "realist" and "idealist" ontologists (or is it "nominalists" and "conceptualists"). I guess you know you are in trouble when you invoke Aristotle. Sort of like invoking Lincoln I suppose (sounds better when James McEachin says it).

1 comment:

Oscar said...

Dear David,

Thanks for bringing this to our attention

A couple of clarifications on 2 points made in your blog post
1. ".....STATO doesn't seem to define 'codes'and just uses the human-readable 'labels'"'....""

this is incorrect:
STATO does define code in the form of PURL:
http://purl.obolibrary.org/obo/STATO_0000401 is the ID (code)
for an entity/concept whose 'preferred name' is 'sample mean'


each STATO class has a URI and a set of class metadata, namely 'prefered label', 'definition', 'definition source (with bibliographic reference)', 'curation status', 'example of usage || R command', 'term editor'


2. "STATO have useless circular definitions":

Paying closer attention to that particular class indicates it is an import from OBI, the ontology of biomedical investigation. STATO aims to interoperate with the rest of the OBO foundry resources and therefore reused concepts from artefacts which covered some of the ground in previous work. So the not so great definition is on OBI, failure to improve on this is on us.
We have now done our due diligence by providing a clearer textual definition, replacing the OBI definition and supported it by a link to numpy python command for computing the values.

The latest release of STATO (https://github.com/ISA-tools/stato/releases/tag/v1.3) now distinguishes the following means:

arithmetic mean (definition extension to OBI_0000679) synonym for 'average value'
geometric mean (http://purl.obolibrary.org/obo/STATO_0000396)
harmonic mean (http://purl.obolibrary.org/obo/STATO_0000397)
weighting arithmetic mean (http://purl.obolibrary.org/obo/STATO_0000398)
interquartile mean (http://purl.obolibrary.org/obo/STATO_0000399), a subtype of trimmed mean (http://purl.obolibrary.org/obo/STATO_0000163)
quadratic mean (http://purl.obolibrary.org/obo/STATO_0000400)



Finally, we'd like to point you to STATO github code repository:

https://github.com/ISA-tools/stato
and its issue tracker:
https://github.com/ISA-tools/stato/issues

This is the most straightforward way to contribute to STATO content and collaborate with our group.

Best wishes

Philippe Rocca-Serra
Alejandra Gonzalez-Beltran