Notes
From ViRBO
Contents |
1. Output Formats
1.1. Text
The output is
YYYY MM DD HH MM SS MS mS nS p1 p2 p3
If any of MM, SS, MS (millisecond), ms (microsecond), or ns (nanosecond) is always zero, it is omitted.
1.2. Text with integer timestamps
The numbering is relative to the start day. For example, if the cadence is six hours and the start date is July 4, 2000, then the output would be
0 1 1 2 2 NaN 3 4
1.3. Text with IS8601 timestamps
In the form of
2000-01-01T01:01:01.00000Z p1 p2 p3
The ISO8601 format is described here.
1.4. MATLAB 6.5+ script
A MATLAB script is returned that loads the selected data into memory.
1.5. IDL 6.4+ script
An IDL script is returned that loads the selected data into memory.
2. Averaging
Averaging is performed by computing the block mean of the valid data points. For example, if the native cadence is 6 hours with timestamp/value pairs of
00:03:00 1.0 00:09:00 2.0 00:15:00 NaN 00:21:00 4.0
Then a 1-day average would give (1.0+2.0+4.0)/3.
If all values are NaN, the average is NaN.
3. Fill data
Data served by ViRBO all have a uniform fill value of IEEE 754 NaN values. NaN is used to represent either a fill value when the data was placed on a uniform time grid or a invalid data point that was, for example, indicated by -999 in the original data file.
4. Time grid
Some data are manipulated more quickly if they are on a uniform time grid (this depends on how much the data compresses and the sparseness of the data). If this is the case, then the data served by ViRBO will be on a uniform time grid, but there will be a one-to-one match between the original data and the served data only where there are non-fill data.
5. Time stamps
For consistency, timestamps are placed at the center of the interval when the timestamps in the original file labels the start of the interval but the averaging interval is the entire interval. For example, if the input file is
YYYY MM DD HH Data 1972 06 23 00 1 1972 06 23 01 2
then the corresponding output file will contain
Time Data 1972-06-23 00:30:00.000 1 1972-06-23 01:30:00.000 2
Note that there are exceptions to the convention of a time stamp labeling the start of the averaging interval in a text file with data on a uniform time grid. One example is some 1-minute ground magnetometer data files. The first value in a given file with one day of data may have an averaging interval that starts 30 seconds into the previous day.
6. Filtering
The filtering option is used primarily by Autoplot in order to allow large time ranges to be visualized by loading only the data required to give the same pixel view as if all of the data were loaded.
- Number Valid - For a data set that has been averaged to a uniform cadence, this is the number of values used in the average. For a data set with uniform cadence with no averaging selected, this is either 1 or 0.
- Maximum/Minimum - For a data set that has been averaged to a uniform cadence, this is the maximum/minimum value in the averaging interval. For a data set with uniform cadence with no averaging selected, this is the same as no filtering.
The Number Valid filter can be used along with averaging to determine the number of valid data points in a given time range. For example,
Unfiltered 1-minute native time step
Number valid in each 1-hour bin
7. Versioning
7.1. Merged Files
Merged files have names of the form NAME_merged_YYYYMMDD-vX
. The YYYYMMDD
indicates the day of the last timestamp of the data in the file. The vX
indicates the version of the data. If a new merged file is created, the version number will not change unless the data in the timerange covered by the old merged file is different.
7.2. Other Data
Many of the data sets available through ViRBO have been pre-cached. This is for two reasons:
- To allow for versioning of data from unversioned data bases. For example, some data bases use files with a name that does not have a version. Data accessed from ViRBO is generally accessed from a URL with a version label in it. The versioning convention used is that if a data set grows, its version does not change. Any other changes will result in an increment of the ViRBO version label.
- To improve access speed. Without pre-caching some of the data files, non-trivial data requests and queries could not be performed or would require response times that are on the order of minutes, even when the size of the response is small.
8. Merged files
Merged files contain all of the data for a particular data subset and were created to simplify the data preparation process for long-time-scale analysis. All merged files are compressed with the zip format and are updated every 6-12 months if new data are available (or sooner, by sending a request to virbo@virbo.org). If your unzip program says that the zip file is corrupt, then you must install a zip64 decompressor that can handle zipped files that expand to larger than 4 GB. See #Zip for more information.
8.1. CDF
This section contains examples and information about reading CDF files and subsets of a large CDF files in #IDL and #MATLAB. See Time_stamps for information time stamp conventions.
Internally, the CDF variables are not compressed. This choice was made to optimize the access speed of a subset of a variable at the expense of disk space (in order to access a subset of a compressed variable, the entire variable must be extracted to a temporary file.) The feature of specifying sparse records (see the CDF user's guide, section 1.4.4) was not used to ensure the CDF files could be read by older software.
8.1.1. IDL
More software for reading CDF files are available at http://cdf.gsfc.nasa.gov/html/FAQ.html#cdfsw.
Basic CDF read of a single variable from a CDF file:
pro list_and_dump_variables ; Tested on IDL 6.4. ; IDL 6.2 needs CDF reader patch http://cdf.gsfc.nasa.gov/html/cdf_patch_for_idl6x_new.html ; (merged files were created with CDF 3.2 library and apparently, CDF files are not ; backwards compatible?!) ; First download and unzip ; ftp://virbo.org/OMNI/OMNI2/merged/OMNI_OMNI2_merged_20090112-v0.cdf.zip ; You may see this non-critical warning: ; CDF_CONTROL: Function completed but: NO_PADVALUE_SPECIFIED: A pad ; value has not been specified. file= 'OMNI_OMNI2_merged_20090112-v0.cdf' cdf= cdf_open( file ) x = cdf_inquire( cdf ) nvars= x.nzvars ; assume only z variables are used. print, 'file ',file,' contains ', nvars, ' variables:' for i=0,x.nzvars-1 do begin cdf_control, cdf, variable=i, /zvar, get_var_info=info1 info2= cdf_varinq( cdf, i, /zvar ) print, ' ', info2.name, ': ', info1.maxrecs, ' records found.' endfor variable= info2.name cdf_control, cdf, variable=variable, get_var_info=info1 cdf_varget, cdf, variable, data, rec_start=0, rec_count=info1.maxrecs print,'' print,'' print,'DATA array is variable' ,' -',info2.name,'- in ',file help, data end
8.1.2. MATLAB
Prior to Matlab 7.5, there were a number of problems with the MATLAB CDF file readers which caused file reads to be 100 times slower than comparable reads using other programs. For these versions of MATLAB, please use the merged #MAT binary files instead of the merged CDF binary files.
The ViRBO developers identified the problems and worked with the Matlab and CDF developers to implement fixes. (Note that there are a number of additional improvements to the MATLAB CDF readers posted at http://cdf.gsfc.nasa.gov.)
To quickly read typical CDF files using MATLAB 7.5 and later, use the following syntax
cdfread('file.cdf','ConvertEpochToDatenum',1,'CombineRecords', 1, ...);
For example, to read the entire file, use
cdfread('file.cdf','ConvertEpochToDatenum',1,'CombineRecords', 1);
and to read a single variable, use
cdfread('file.cdf','ConvertEpochToDatenum',1,'CombineRecords', 1,'Variable',VariableName);
where VariableName is a string such as 'By' or 'Dst'.
Example: Extracting a subset of data in a time range
See demo file in http://virbo.org/svn/virbo/cdf
Example: Inspecting contents of file
% Use this to inspect the contents of a merged CDF file. FILE = 'OMNI_OMNI2_merged_20090112-v0.cdf.zip'; if ~exist(FILE) fprintf('Downloading %s\n',FILE); urlwrite(['ftp://virbo.org/OMNI/OMNI2/merged/',FILE],FILE); unzip(FILE); end FILE = regexprep(FILE,'.zip',''); VARNUM = 41; info = cdfinfo(FILE); fprintf('Variables in %s:\n',FILE); for i = 1:size(info.Variables,1) fprintf('%3d %s\n',i,info.Variables{i,1}); end mlv = ver; if (str2num(mlv.Version) < 7.5) fprintf('Using slow read method. See http://virbo.org/Notes\n'); fprintf('Reading time variable ''Epoch''\n'); epoch = cdfread(FILE,'Variable','Epoch'); mldn = todatenum(cat(1,epoch{:})); fprintf('Reading variable ''%s''\n',info.Variables{VARNUM,1}); data = cdfread(FILE,'Variable',info.Variables{VARNUM,1}); % Read all data: % Data = cdfread(FILE); else mldn = cdfread(FILE,'ConvertEpochToDatenum',1,... 'CombineRecords', 1, ... 'Variable','Epoch'); data = cdfread(FILE,'ConvertEpochToDatenum',1,... 'CombineRecords', 1, ... 'Variable',info.Variables{VARNUM,1}); % Read all data: % Data = cdfread(FILE,'ConvertEpochToDatenum',1,'CombineRecords', 1); end varname = info.Variables{VARNUM,1}; eval(sprintf('%s = data;',varname)); whos(varname); fprintf('Variable %d is %s\n\n',VARNUM,varname); fprintf('Start Time = %s\nEnd Time = %s\n\n',... datestr(mldn(1)),datestr(mldn(end))); plot(mldn,data); xlabel('MATLAB DATENUM (Days since Jan. 1 0000)'); ylabel(varname);
8.2. MAT
MAT files are MATLAB version 6 binary files that can be read using MATLAB versions since 6.0 or Octave since version 2.9. For MATLAB versions prior to 7.5, MAT files may be a better option. To see the list of variables in the file, use
whos -file FILENAME.mat
To read a single variable, use
load FILENAME.mat VARIABLENAME
The time variable Time
is DATENUM
using the same time stamp convention as used in the CDF file (see notes on timestamps).
8.3. Zip
On Linux, the default unzip program (as of January 2009) will not work for files that uncompress to larger than 4 G. You can use 7zip, for example,
sudo apt-get install p7zip p7zip-full ; /usr/bin/7z e MERGE_FILENAME
or compile a version 6 beta version of InfoZip's unzip (version 5.5 is typically used in Linux distributions).
ftp://ftp.info-zip.org/pub/infozip/beta/