HDMC

From ViRBO

Agenda for the HDMC Meeting 4-6 August 2010 (still subject to change--suggestions welcome; check [1] for updates)

George Mason University

1. Logistics

The logistics for this meeting (including meeting room) are the same as that for the ESSI meeting, which runs Monday, Tuesday, and Wednesday am.

2. Wednesday 4 August

(Less formal discussion)

13:30 Relevance of HDMC activities; "Vision" of what the idea, architecture, roles of various groups, etc. should be. How do we become plausible? Two aspects: (1) to data providers (MOUs?), and (2) to end users. Discussion of longer-term view of the HPDE: what should it look like in 2-3 years? What will a "VxO" mean, for example? How will Final Archives and VxOs interact? How will things be connected? What standards will be needed to make what services?

15:00 Highlights of ESSI workshop as perceived by any who where present. See http://virbo.org/ESSI

15:40 Achieving completeness of described resources -- are we one track?

16:10 How is it all registered? How are registries administered? What tools are needed/useful?

17:00 (or when everyone is done) Adjourn for the day.

3. Thursday 5 August

09:00 Welcome and logistics (Bob, Tom)

09:10 Brief overview of where we are, logistically, financially, and otherwise. (Aaron/Jeff)

9:30 Recap of Wednesday's discussions. -- Aaron with help

(Plan for meeting: For each area present and make precise a proposal, with goals, plans to achieve them with milestones and impacts on all relevant groups, and where we are with respect to the plan.)

09:40 The completeness problem: how to come to closure. (Aaron/Todd)

Working sessions based on the Working Groups

3.1. Registries

Come to a clear plan of what to do and what is expected of everyone Roles of VxOs, git, VSPO. (Todd)

3.1.1. The Role of Registries

A mechanism whereby relevant repository items and metadata about them can be registered such that a pointer to their location, and all their metadata, can be retrieved as a result of a query. [EBXML_2001a]
Registration accomplishes three main goals: identification, provenance, and monitoring quality. [ISO-11179-part1]

3.1.2. Functional View

Functional Overview

3.1.3. Registry Services

Resolver
- id - The resource ID to locate.
- granules - Return a list of URLs for Granules associated with the resource ID.
- startdate - The start date of the interval of interest.
- stopdate - The stop date of the interval of interest.
- size - Determine the size only of a granule look-up.
- tree - Show the items in tree mark-up at given resource ID prefix.
- recursive - Retrieve the description for the given resource ID and for all resources referenced in the description.
Download
- id - The resource ID to locate.
- startdate - The start date of the interval of interest.
- stopdate - The stop date of the interval of interest.

3.1.4. Other Approaches

ebXML (ISO 15000)
OAI-PMH

10:45 Break

3.2. Data Access

The solution to the formats problem (making formats transparent), and how to get there quickly (Jon V.)

3.2.1. What is the formats problem?

Tasks that are conceptually simple are made prohibitively difficult due to format and layout differences.

Noted that there are limits to what solving the formats problem entails - more complex requests (fill data for missing days, for example) should be provided by a different service.

I have one or more time ranges of data, and I want to get it on my computer as format X
I want a copy of data sety Y (500 GB) so I can study all of it.
I have a model result and I want to compare it with dataset Z
I want to compare two or more datasets.
I want to find similar events in many datasets (Data Mining).

3.2.2. What will solve the formats problem?

A uniform way to obtain any amount of any dataset in the format that you prefer.

Must be the same API for every dataset
Must be able to request one time range, or many time ranges, or the whole time range
Must include a comprehensive set of heliophysics resources
Must deliver data in the format you want

There was discussion about the fact that this does not solve all formats problem (non time series) and it was suggested these limitations should be noted when presenting.

There was discussion about where the line gets drawn between solving the science problem versus the formats problem.

There was a discussion about licenses - if it is a library, then some projects may not be able to use it if the license is not OSI-compatible. Plan is to open source it, but there are pieces that are problematic.

3.2.3. At the core: a uniform reader library

"Doubly uniform data reader API".

Takes in descriptions for each data set (configuration files).

3.2.4. Explaining the library

Why a library? It can be embedded into services or used directly in clients.
"doubly uniform" API
- uniform function call or calls for request
- uniform data returned by reader

UniformData d = get_data( name, time_range)

result request

dataset descriptors - SPASE-based

3.2.5. Library Use in Client and Server

3.2.6. Standards Needed: Client and Server API

Client - basic access using time range only:

get_data(name,
         time_range (1 or more) -or- URLs (1 or more)
         )

discussion on where the -or- should go and issue of a reverse lookup being required.

Server - include more complex options for data delivery:

get_data( name,
          output options (output format;
             delivery - streaming or bundled files;
             file bundling options - granularity)
          time_range (1 or more) -or- URLs (1 or more)
          filters (0 or more)
         )

Note: this is mostly for discussion within the HDMC Data Access group

Discussion of relation to SPASE-QL for the API -

Discussion of relation to DAP and OPeNDAP - DAP is the transport protocol, this library would be viewed as an "I/O Service Provider" in the OPeNDAP terminology.

3.2.7. Client API

is very simple (time range only)
- very easy to implement if you need a custom reader
- more complex capabilities are generic and based purely on simple access API
delivers data to an internal model
- many-to-one mapping
- actual data model is not too important
- output mechanisms need to be written separately

3.2.8. Server API

Must include data delivery options (streaming, files, links with notifications)
Include more advanced features (filtering, merging datasets)
Should be based on existing server technology (OPeNDAP?)

3.2.9. Uniform Data Delivery

System components

Clear focus and expertise

uniform reader library
many dataset descriptors in configuration files

Collaboration is important

service that uses the library to provide uniform data
client integration tools for library API
Client integration tools for server API

(Following this was a series of slides discussing these elements in more detail.)

3.2.10. Schedule

The idea is to have something working by the Fall AGU 2010, and provide periodic updates to interested parties via the HDMC Data Access Google group.

3.2.11. Discussion

IDL Save set option? No - large project plus other issues like proprietary format, etc. Lindholm has a partial implementation and c libraries exist for this, however.
An audience member requested a specification of the output.

12:00 Lunch

3.3. Visualization

Identify the most urgent needs, and make a plan to meet them (Bob W.)

Email list: http://groups.google.com/group/hdmc-visualization

Context (I think):

We want to identify cross-VxO needs for visualization and then create a service and/or software project dedicated to meeting the needs
If a VxO is writing software instead of plumbing (connecting existing tools and services to create higher-order products), there is probably a need for a separate service or software project. We want to "purify" the VxOs (allow them to work on core-VxO tasks).
We have one VxO for each science specialty; We should have ~1-2 software projects for each visualization specialty?
This approach minimizes the software development effort required by VxOs to meet community demands for visualization products.

Survey question:

What are high-priority visualization tools or visualization services that you would like for your VxO or data service?
- What software exists that can be built upon and extended?
- Should it be a service or a software package?
How would you improve existing visualization tools?

Bob's response:

I would like to see Autoplot3D project Discussion - this is a big project and it would be nice to get support through various NSF programs. Could be proposed to Heliophysics Data Environment call
- Given a URL to a file with data on a 3D grid, produce a sensible plot.
- Work as thick or thin client.
- A user should be able to extend the code base and build extensions.
- Project would be hosted on SourceForge or equivalent and have an OSI-compatible [2] license.
- Potential projects cismdx Space Weather Explorer Visit VTK [3] ViSBARD TecPlot Berrios' project at CCMC
I would like to see a "image browse" service There was agreement that this would be a good idea - a central place or project where all of these tools are collected and integrated.
- Given a list of images, give me a web interface that allows me to easily browse, sort, and interface with the images. There are many examples of this for photographs, e.g. [4]
- Allow me to request new interfaces or ways of looking at the images (e.g., stitch them into a movie).
- Allow me to overlay images

3.4. SPASE-QL

What is to be expected of the system, the VxOs, and why/how (Tom)

14:15 Break

3.5. SPDF

SPDF Services

3.6. Event lists

Clear statement of standards and schedule of implementation; when will be have which event lists registered in what uniform way? (Bobby)

Heliophysics Event List Manager (HELM)

HDMC Event Lists working group [5]

3.6.1. Status

Development began in Oct. 2008
Set up <http://helm.gsfc.nasa.gov>
Began discussions with Todd King, Rod Potter and others on HELM capabilities
Bernie Harris (GSFC) developed an initial schema (Eventlist.xsd) and API based on REST ideas, with output in semantic XHTML and JSON
Back end uses the eXist XML database, with source maintained in a Mercurial repository
Developed example clients using java.net.HttpURLConnection, Jersey client API, and curl/sh, and developed a test suite
Created test events for Geotail and IMP8 bowshock crossing, and CDAWeb events
Initially developed to have user accounts and authentication for user-supplied event lists and comments, but finally restructured to remove these accounts, due to difficulties in meeting NASA security requirements with an international user base. User-supplied lists will have to go through some review before posting, except where they are generated by trusted sources (such as CDAWeb, SSCweb, VxOs)
Moved the services to the operational server
Next: ingest a number of diverse event lists, further modifying the schema as required

3.6.2. More Status

See all web services available

curl --request OPTIONS http://helm.gsfc.nasa.gov/WS/helm/1/ | xmllint --format –

http://helm.gsfc.nasa.gov/development/HelmWebServices.html
Queries can return return XML, JSON, and XHTML <http://helm.gsfc.nasa.gov/WS/helm/1/eventlists> representations
XSLT <http://helm.gsfc.nasa.gov/development/helm2simile.xslt> transform to create a Simile Widget XML Timeline Event Source file from a helm Eventlist
http://helm.gsfc.nasa.gov/RSS/ feeds

3.6.3. Sample Calls

http://helm/WS/helm/1/EventLists/Intersection?listUrl=http://helm/helm/cdawebEvents.xml&listUrl=http://helm/helm/lastCdawebEvent.xml
http://helm/WS/helm/1/EventLists/Intersection?listUrl=http://helm/helm/cdawebEvents.xml&listUrl=http://helm/helm/modLastCdawebEvent.xml
http://helm/WS/helm/1/EventLists/Union?listUrl=http://helm/helm/cdawebEvents.xml&listUrl=http://helm/helm/lastCdawebEvent.xml
http://helm/WS/helm/1/EventLists/Union?listUrl=http://helm/helm/cdawebEvents.xml&listUrl=http://helm/helm/modLastCdawebEvent.xml
http://helm/WS/helm/1/EventLists/ShiftTime?quantity=-P1D&listUrl=http://helm/helm/cdawebEvents.xml
http://helm/WS/helm/1/EventLists/SymmetricallyAdjustTimeSpan?quantity=P1D&listUrl=http://helm/helm/cdawebEvents.xml

3.6.4. Schema

<!-- HELM Event List schema -->
  <Eventlist>
    <annotation>
      <documentation>outermost container or envelope</documentation>
    </annotation>
    <sequence>
?     <spase:ResourceID/>
      <spase:ResourceHeader/>
?     <spase:AccessInformation/>
?     <spase:ProviderResourceName/>
?     <spase:ProviderVersion/>
       <Name/> ? still need Name in addition to spase:ResourceHeader/spase:ResourceName ? 
       <spase:Description/>
      <spase:PhenomenonType/>
      <spase:Keyword/>
      <KeywordGroup/>
? still need these attributes with spase:ResourceHeader ?
      <created/>
      <updated/>
      <status/>
      <Event/> <Event/> <Event/> <Event/> <Event/> ...
    <sequence>
  <Event>
      <sequence>
        <spase:TimeSpan/>
        <spase:Description/>
        <Details/>
      </sequence>

3.6.5. Example

curl "http://helm.gsfc.nasa.gov/WS/helm/1/eventlists/imp8_bowshock" |xmllint --format –

<ns2:Eventlist xmlns="http://www.spase-group.org/data/schema" xmlns:ns2="http://helm.gsfc.nasa.gov/data/schema">
 
  <ResourceHeader>
    <ResourceName>IMP-8 Bowshock Crossing Events</ResourceName>
    <ReleaseDate>2010-06-15T00:00:00</ReleaseDate>
    <Description>An example eventlist detailing IMP-8 bowshock crossing events.</Description>
    <Contact>
      <PersonID>spase://SMWG/Person/Robert.E.McGuire</PersonID>
      <Role>DataProducer</Role>
      <Role>GeneralContact</Role>
      <Role>MetadataContact</Role>
      <Role>Publisher</Role>
    </Contact>
    <InformationURL>
      <Name>MULTIPLE SPACECRAFT BOW SHOCK CROSSINGS DATABASE</Name>
      <URL>http://ftpbrowser.gsfc.nasa.gov/bowshock.html</URL>
    </InformationURL>
  </ResourceHeader>
 
  <ns2:Name>IMP-8 Bowshock Crossing Events</ns2:Name>
  <PhenomenonType>BowShockCrossing</PhenomenonType>
  <Keyword>test</Keyword>
 
  <ns2:Event>
    <TimeSpan>
      <StartDate>1995-01-03T00:04:00</StartDate> <StopDate>1995-01-03T00:04:00</StopDate>
    </TimeSpan>
    <Description>GSE/XYZ (Re) = -21.3, 26.4, 22.6</Description>
    <ns2:Details xmlns="http://helm.gsfc.nasa.gov/data/schema" xmlns:spase="http://www.spase-group.org/data/schema" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
      <SubDetails>IMP-8 bowshock crossing sub-details</SubDetails>
   </ns2:Event>
 
   <ns2:Event>
     <TimeSpan>
       <StartDate>1995-01-03T00:07:00</StartDate>
       <StopDate>1995-01-03T00:07:00</StopDate>
     </TimeSpan>
     <Description>GSE/XYZ (Re) = -21.3, 26.4, 22.6</Description>
   </ns2:Event>
 
</ns2:Eventlist>

3.6.6. Queries

Get all Bow Shock Crossing events:

curl –globoff "http://helm.gsfc.nasa.gov/WS/helm/1/eventlists xpathp=//spase:PhenomenonType[.='BowShockCrossing']" | xmllint --format –

Get all CDAWeb events

curl "http://helm.gsfc.nasa.gov/WS/helm/1/eventlists?keyword=cdaweb" | xmllint --format -

3.6.7. Work schedule and key milestones

Set the requirements of the query and event list service (messages, constraints, prototype queries, other APIs)
Build and test prototype event list manager (middleware for individual queries, user interface, search, testing)
Extend the prototype with data retrieval queries (from VxOs, from archives and others)
Extend the prototype with query service features
- Add simple cross-list correlation (headers and time only)
- Add ability for users to add/update lists
- Add ability to add more search sources
- Release event list service API to VxOs
Extend the prototype with remote event lists
- Publish API for providers to serve event lists
- Connect to existing event list services
Test the prototype (verify that remotely hosted event lists are read correctly and queries can be executed)
Implement releasable version
Add annotation features
Document the system for maintainers, end users, and developers
Advertise / public outreach
Ongoing maintenance

3.7. SPASE

3.7.1. Overview - Past Year

Data Model

Official release: 2.1.0
Current draft: 2.2.0
- Add “Hardcopy” as a format
- Add “Operating Span” to Instrument and Observatory
- Add coordinate systems for solar physics (HCC, HCR, HPC, HPR)
- Update definitions
- More changes under consideration (i.e., S3_Bucket)

Software

Release of SPASE toolkit

Web site

New design for website (ready to implement)

3.7.2. Activities of SPASE

Tools

SPASE Editor
SPASE Validator
SPASE Referential checker.
SPASE Collator.

Engineering/Design

SPASE data model. (dictionary, documents, tools)

Services

SMWG (Core entity descriptions)
SPASE services (reference implementation)
- resolver, downloader, render, status,
- jetty server, explorer

Content

SPASE publication list
Website

3.7.3. Roles for SPASE

Current focus is the SPASE data model.
Demonstration technologies.
Keep scope or expand?

15:15 Data and Models: what realistic integration can we achieve? (Aaron/Darren)

16:00 How do we become plausible? Two aspects: (1) to data providers, and (2) to end users. (Continuing discussion, hopefully more focused on achievable goals. Specify content of MOUs) (All)

18:00 Adjourn for the day

19:00 Group dinner - Suggestions [6]

4. Friday 6 August

08:30 Priorities for next 6-12 months. Review of plans for previous day.

10:30 Break

10:50 Discussion of White Paper for NRC Decadal Survey

11:45 Lunch

13:00 Science meeting attendance/plans. Future telecon/meeting plans.

14:00 Adjourn

Retrieved from "http://virbo.org//HDMC"