describe

Post ideas for new functionality you'd like to see in McIDAS-V or ideas for new tutorials.
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

describe

Post by joleenf »

Hi,

I have been looking at the describe() function in McIDAS-V 1.6. It might be nice to get the display as well as return a dictionary with the values from the describe statistics. This would allow the user to see as well as use the values when needed. In my case, I would like to use the IQR and quartiles in a report on each image. I would not necessarily need to see the entire result of describe in each run of my script. Perhaps there is already a quick way to do this in the statistics package.

Thanks,
Joleen
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

Re: describe

Post by joleenf »

Hi,

It looks like I can use the Statistics package with this problem:

band2 = loadADDEImage(**goesParms)

import Statistics
band2Stats=Statistics(band2)
Q1=band2Stats.percentile(25)
Q3=band2Stats.percentile(75)
IQR=Q3-Q1

Thanks,
Joleen
User avatar
JPNIII
Posts: 11
Joined: Tue Mar 24, 2009 5:31 am

Re: describe

Post by JPNIII »

Just wanted to add a kudos to Jon for this method -- very useful!!!!

Thank you.
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

Re: describe

Post by joleenf »

Hi,

I am wondering if there is a good way to display some of the describe output on an image. This is what I have done so far...

Code: Select all

panel1=buildWindow()[0]
<display image>
lin=30
   ele=150
   panel1.annotate('10.7um - 3.9 um difference (10.7 um clouds @ <250K)', line=lin, element=ele,color='yellow', alignment=('right','center'))
   
   diffImg=diffImg*mask(lw,'<',250,1)
   l=((describe(diffImg)).split('\n'))
   for index,text in enumerate(l):
      try:
         lin=lin+20
         newText=text.replace('::','')
         panel1.annotate(newText, line=lin, element=ele,color='yellow', alignment=('right','center'))
         
      except:
         pass



g13_sw_with_diffstats_2016_Mar_26.png



But this creates about three null-pointer exceptions. Also, I may only need a few of the stats: histogram, min, max, q1,q2,q3,iqr,skew,variance.

Also, if I would create a time series, this could be a big problem and I would have to remember to keep the stats sorted.



Advice?

Thanks,
Joleen
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

Re: describe

Post by joleenf »

It also seems that in the case above, the histogram retains some information from the values which are NaN due to a mask.

If a is the shortwave image, and b is the longwave image, I run a mask in the following manner:

a=selectData()
b=selectData()

c=sub(a,b)*mask(b,'<',250,1)

print describe(c[0])

Is there a way to create the histogram so that only the fields with values are included in the histogram? Should there be more variation in the histogram and the length of field change when the mask is applied versus when it is not?

Perhaps it should be something like:

Code: Select all

a=selectData()
b=selectData()

c=sub(a,b)

ind=find(b[0],'<',250)

data=c[0].getValues()

newDataArray=[]
for index in ind:
   newDataArray.append(data[0][index])

newField=field(newDataArray)

print describe(newField)


Joleen
User avatar
bobc
Posts: 988
Joined: Mon Nov 15, 2010 5:57 pm

Re: describe

Post by bobc »

Hi Joleen -

I just wanted to let you know that I'm looking into all of this. I'm replicating your NPE errors on an OS X machine, but the same script runs without error on Windows 7.

Thanks -
Bob
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

Re: describe

Post by joleenf »

Hi Bob,

Thanks for the reply. In a partial answer to my question. One way to plot only a few variables if I don't need a histogram plot is to use the statistics package as outlined in the scripting tutorial (http://www.ssec.wisc.edu/mcidas/software/v/docs/Scripting1.5_v2.pdf but plot the values on an image rather than to a file. Perhaps it would require too much development to plot a histogram (or box plot) ;) on an image at this time.

I am still wondering about how the basic histogram in the describe function deals with NaN values. It seems to use the NaN points in the total number of points.

Thanks,
Joleen
User avatar
bobc
Posts: 988
Joined: Mon Nov 15, 2010 5:57 pm

Re: describe

Post by bobc »

Hi Joleen -

We spoke about this at our team meeting, and it looks like the reason for the errors is that there are a couple of empty strings at the end of the describe() output. OS X doesn't appear to be handling these empty strings correctly. I wrote this up as Inquiry 2304.

We found out that a workaround for this is modifying your loop like so:

Code: Select all

l=((describe(diffImg)).split('\n'))
for index, text in enumerate(l):
    try:
        lin=lin+20
        newText=text.replace('::','')
        if newText:
            panel[0].annotate(newText,line=lin, element=ele,color='yellow', alignment=('right','center'))
  except:
     pass

We came to the consensus that with a field that includes NaN values from a mask, when this is passed through the describe() function the NaN values shouldn't be included in the histogram output. This matches the other statistical parameters, where if values not contained within the mask's threshold are set to 0 then the 0 values are included in the histogram and other statistical fields like Min and Mode. If the values not contained within the mask's threshold are set to NaN, then the Min/Mode values don't use these NaN values, and neither does the histogram.

We see how this could be confusing to users where if missing values are set to 0 there are included in the histogram, and if they are set to NaN they aren't included. With this in mind, we believe an appropriate thing to do would be to add min/max values of the histogram before/after the histogram to make it easier to interpret. I've added this request, among a couple of others, to Inquiry 2039. While we believe that these NaN values aren't currently included in the histogram output, I agree that they are included in the 'Length' output.

As for only returning only certain statistical parameters from describe(), here is a way you can do that (without enumerate):

Code: Select all

indices = [0,2,3,5,6,7,8,12,14]
newText = describe(diffImg).split('\n')
for i in indices:
    try:
        lin = lin + 20
        currentLine = newText[i].replace('::','')
        if currentLine:
            panel[0].annotate(currentLine,line=lin,element=ele,color='yellow',alignment=('right','center'))
    except:
        pass

These indices correspond to histogram, min, max, q1, q2, q3, iqr, skew, and variance. After speaking with a programmer, we thought that it would be nice to add a keyword to describe() that would allow for selecting which statistical parameters would be returned. This way, a user wouldn't need to be concerned over which index corresponds to a parameter. This is included as part of Inquiry 2039.

Please let me know if this doesn't answer your questions.

Thanks -
Bob
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

Re: describe

Post by joleenf »

Hi Bob,

You have covered the topic well. As far as displaying certain parameters of describe, either keywords or having describe return a dictionary as a result but display in the current format would probably work. This might alleviate the need to understand the statistics package and imports. I have to admit, I was very focused on getting this task done and forgot that I could get each of these variables independently from the statistics package (but not the histogram).

Finally, I mentioned earlier about getting a box plot on the image. At our team meeting today, we decided we would find a way to get the statistics from McV and then display a time series of box plots using another package next to a satellite image.

Thank-you for the work around and displaying the endpoint values will work fine for the histogram in this case as a quick display tool.

Joleen
User avatar
joleenf
Posts: 1123
Joined: Mon Jan 19, 2009 7:16 pm

Re: describe

Post by joleenf »

Hi,

Would it be possible to add the numGoodPoints to the describe output? The length is slightly misleading because this reports the total size of the data object, including missing values.

Thanks,
Joleen
Post Reply