Notes

From ViRBO

Jump to: navigation, search

Contents

  1. Output Formats
    1. Text
    2. Text with integer timestamps
    3. Text with IS8601 timestamps
    4. MATLAB 6.5+ script
    5. IDL 6.4+ script
  2. Averaging
  3. Fill data
  4. Time grid
  5. Time stamps
  6. Filtering
  7. Versioning
    1. Merged Files
    2. Other Data
  8. Merged files
    1. CDF
      1. IDL
      2. MATLAB
    2. MAT
    3. Zip

1. Output Formats

1.1. Text

The output is

YYYY MM DD HH MM SS MS mS nS p1 p2 p3

If any of MM, SS, MS (millisecond), ms (microsecond), or ns (nanosecond) is always zero, it is omitted.

1.2. Text with integer timestamps

The numbering is relative to the start day. For example, if the cadence is six hours and the start date is July 4, 2000, then the output would be

0 1
1 2
2 NaN
3 4

1.3. Text with IS8601 timestamps

In the form of

2000-01-01T01:01:01.00000Z p1 p2 p3

The ISO8601 format is described here.

1.4. MATLAB 6.5+ script

A MATLAB script is returned that loads the selected data into memory.

1.5. IDL 6.4+ script

An IDL script is returned that loads the selected data into memory.

2. Averaging

Averaging is performed by computing the block mean of the valid data points. For example, if the native cadence is 6 hours with timestamp/value pairs of

00:03:00 1.0
00:09:00 2.0
00:15:00 NaN
00:21:00 4.0

Then a 1-day average would give (1.0+2.0+4.0)/3.

If all values are NaN, the average is NaN.

3. Fill data

Data served by ViRBO all have a uniform fill value of IEEE 754 NaN values. NaN is used to represent either a fill value when the data was placed on a uniform time grid or a invalid data point that was, for example, indicated by -999 in the original data file.

4. Time grid

Some data are manipulated more quickly if they are on a uniform time grid (this depends on how much the data compresses and the sparseness of the data). If this is the case, then the data served by ViRBO will be on a uniform time grid, but there will be a one-to-one match between the original data and the served data only where there are non-fill data.

5. Time stamps

For consistency, timestamps are placed at the center of the interval when the timestamps in the original file labels the start of the interval but the averaging interval is the entire interval. For example, if the input file is

YYYY MM DD HH Data
1972 06 23 00  1
1972 06 23 01  2

then the corresponding output file will contain

         Time             Data
1972-06-23 00:30:00.000    1
1972-06-23 01:30:00.000    2

Note that there are exceptions to the convention of a time stamp labeling the start of the averaging interval in a text file with data on a uniform time grid. One example is some 1-minute ground magnetometer data files. The first value in a given file with one day of data may have an averaging interval that starts 30 seconds into the previous day.

6. Filtering

The filtering option is used primarily by Autoplot in order to allow large time ranges to be visualized by loading only the data required to give the same pixel view as if all of the data were loaded.

  • Number Valid - For a data set that has been averaged to a uniform cadence, this is the number of values used in the average. For a data set with uniform cadence with no averaging selected, this is either 1 or 0.
  • Maximum/Minimum - For a data set that has been averaged to a uniform cadence, this is the maximum/minimum value in the averaging interval. For a data set with uniform cadence with no averaging selected, this is the same as no filtering.

The Number Valid filter can be used along with averaging to determine the number of valid data points in a given time range. For example,

Unfiltered 1-minute native time step

Max in 1-hour bins

Number valid in each 1-hour bin

7. Versioning

7.1. Merged Files

Merged files have names of the form NAME_merged_YYYYMMDD-vX. The YYYYMMDD indicates the day of the last timestamp of the data in the file. The vX indicates the version of the data. If a new merged file is created, the version number will not change unless the data in the timerange covered by the old merged file is different.

7.2. Other Data

Many of the data sets available through ViRBO have been pre-cached. This is for two reasons:

  1. To allow for versioning of data from unversioned data bases. For example, some data bases use files with a name that does not have a version. Data accessed from ViRBO is generally accessed from a URL with a version label in it. The versioning convention used is that if a data set grows, its version does not change. Any other changes will result in an increment of the ViRBO version label.
  2. To improve access speed. Without pre-caching some of the data files, non-trivial data requests and queries could not be performed or would require response times that are on the order of minutes, even when the size of the response is small.

8. Merged files

Merged files contain all of the data for a particular data subset and were created to simplify the data preparation process for long-time-scale analysis. All merged files are compressed with the zip format and are updated every 6-12 months if new data are available (or sooner, by sending a request to virbo@virbo.org). If your unzip program says that the zip file is corrupt, then you must install a zip64 decompressor that can handle zipped files that expand to larger than 4 GB. See #Zip for more information.

8.1. CDF

This section contains examples and information about reading CDF files and subsets of a large CDF files in #IDL and #MATLAB. See Time_stamps for information time stamp conventions.

Internally, the CDF variables are not compressed. This choice was made to optimize the access speed of a subset of a variable at the expense of disk space (in order to access a subset of a compressed variable, the entire variable must be extracted to a temporary file.) The feature of specifying sparse records (see the CDF user's guide, section 1.4.4) was not used to ensure the CDF files could be read by older software.

8.1.1. IDL

More software for reading CDF files are available at http://cdf.gsfc.nasa.gov/html/FAQ.html#cdfsw.

Basic CDF read of a single variable from a CDF file:

pro list_and_dump_variables                                                     
 
; Tested on IDL 6.4.  
; IDL 6.2 needs CDF reader patch http://cdf.gsfc.nasa.gov/html/cdf_patch_for_idl6x_new.html
; (merged files were created with CDF 3.2 library and apparently, CDF files are not 
; backwards compatible?!)
; First download and unzip
; ftp://virbo.org/OMNI/OMNI2/merged/OMNI_OMNI2_merged_20090112-v0.cdf.zip                                        
; You may see this non-critical warning:                                        
; CDF_CONTROL: Function completed but: NO_PADVALUE_SPECIFIED: A pad             
; value has not been specified.                                                 
 
file= 'OMNI_OMNI2_merged_20090112-v0.cdf'    
 
cdf= cdf_open( file )                                                           
x  = cdf_inquire( cdf )                                                         
 
nvars= x.nzvars ; assume only z variables are used.                             
print, 'file ',file,' contains ', nvars, ' variables:'                          
for i=0,x.nzvars-1 do begin                                                     
   cdf_control, cdf, variable=i, /zvar, get_var_info=info1                      
   info2= cdf_varinq( cdf, i, /zvar )                                           
   print, ' ', info2.name, ': ', info1.maxrecs, ' records found.'               
endfor                                                                          
 
variable= info2.name                                                            
 
cdf_control, cdf, variable=variable, get_var_info=info1                         
cdf_varget, cdf, variable, data, rec_start=0, rec_count=info1.maxrecs           
 
print,''                                                                        
print,''                                                                        
print,'DATA array is variable' ,' -',info2.name,'- in ',file                    
help, data                                                                      
 
end

8.1.2. MATLAB

Prior to Matlab 7.5, there were a number of problems with the MATLAB CDF file readers which caused file reads to be 100 times slower than comparable reads using other programs. For these versions of MATLAB, please use the merged #MAT binary files instead of the merged CDF binary files.

The ViRBO developers identified the problems and worked with the Matlab and CDF developers to implement fixes. (Note that there are a number of additional improvements to the MATLAB CDF readers posted at http://cdf.gsfc.nasa.gov.)

To quickly read typical CDF files using MATLAB 7.5 and later, use the following syntax

cdfread('file.cdf','ConvertEpochToDatenum',1,'CombineRecords', 1, ...);

For example, to read the entire file, use

cdfread('file.cdf','ConvertEpochToDatenum',1,'CombineRecords', 1);

and to read a single variable, use

cdfread('file.cdf','ConvertEpochToDatenum',1,'CombineRecords', 1,'Variable',VariableName);

where VariableName is a string such as 'By' or 'Dst'.

Example: Extracting a subset of data in a time range

See demo file in http://virbo.org/svn/virbo/cdf

Example: Inspecting contents of file

% Use this to inspect the contents of a merged CDF file.
FILE = 'OMNI_OMNI2_merged_20090112-v0.cdf.zip';                                 
if ~exist(FILE)                                                                 
  fprintf('Downloading %s\n',FILE);                                             
  urlwrite(['ftp://virbo.org/OMNI/OMNI2/merged/',FILE],FILE);        
  unzip(FILE);                                                                  
end                                                                             
FILE = regexprep(FILE,'.zip','');                                               
 
VARNUM = 41;                                                                    
info   = cdfinfo(FILE);                                                         
 
fprintf('Variables in %s:\n',FILE);                                             
for i = 1:size(info.Variables,1)                                                
  fprintf('%3d %s\n',i,info.Variables{i,1});                                    
end                                                                             
 
mlv = ver;                                                                      
if (str2num(mlv.Version) < 7.5)                                                 
  fprintf('Using slow read method. See http://virbo.org/Notes\n');              
  fprintf('Reading time variable ''Epoch''\n');                                 
  epoch = cdfread(FILE,'Variable','Epoch');                                     
  mldn  = todatenum(cat(1,epoch{:}));                                           
  fprintf('Reading variable ''%s''\n',info.Variables{VARNUM,1});                
  data = cdfread(FILE,'Variable',info.Variables{VARNUM,1});                     
  % Read all data:                                                              
  % Data = cdfread(FILE);                                                       
else                                                                            
  mldn = cdfread(FILE,'ConvertEpochToDatenum',1,...                             
                  'CombineRecords', 1, ...                                      
                  'Variable','Epoch');                                          
  data = cdfread(FILE,'ConvertEpochToDatenum',1,...                             
                 'CombineRecords', 1, ...                                       
                 'Variable',info.Variables{VARNUM,1});                          
  % Read all data:                                                              
  % Data = cdfread(FILE,'ConvertEpochToDatenum',1,'CombineRecords', 1); 
end                                                                             
 
varname = info.Variables{VARNUM,1};                                             
eval(sprintf('%s = data;',varname));                                            
whos(varname);                                                                  
fprintf('Variable %d is %s\n\n',VARNUM,varname);                                
fprintf('Start Time = %s\nEnd Time   = %s\n\n',...                              
        datestr(mldn(1)),datestr(mldn(end)));                                   
plot(mldn,data);                                                                
xlabel('MATLAB DATENUM (Days since Jan. 1 0000)');                              
ylabel(varname);

8.2. MAT

MAT files are MATLAB version 6 binary files that can be read using MATLAB versions since 6.0 or Octave since version 2.9. For MATLAB versions prior to 7.5, MAT files may be a better option. To see the list of variables in the file, use

whos -file FILENAME.mat

To read a single variable, use

load FILENAME.mat VARIABLENAME

The time variable Time is DATENUM using the same time stamp convention as used in the CDF file (see notes on timestamps).

8.3. Zip

On Linux, the default unzip program (as of January 2009) will not work for files that uncompress to larger than 4 G. You can use 7zip, for example,

sudo apt-get install p7zip p7zip-full ; /usr/bin/7z e MERGE_FILENAME

or compile a version 6 beta version of InfoZip's unzip (version 5.5 is typically used in Linux distributions).

ftp://ftp.info-zip.org/pub/infozip/beta/

On Windows you can use WinZip > 11.0 or 7zip.

Retrieved from "http://virbo.org//Notes"
Personal tools