The research community regularly uses GPS data products from multiple data centers such as global tracking data from the several IGS Global Data Centers or regional continuous data from the data centers of networks such as CORS, SCIGN, or BARD. Access to the information about the data holdings at these various centers is complicated, however, by the fact that each center has developed its own system for data management. To address this problem, the UNAVCO Boulder Facility, Scripps and CDDIS have embarked in an effort to develop a "seamless" archive, a distributed archive having individual databases but allowing access via a common interface.
The present concept for the seamless archive is shown in Figure 1 of the main report. A standard metadata exchange format, which was developed by the Boulder Facility staff and proposed to the other archive data centers in FY97, provides the link between the data centers. These metadata are extracted using center-specific database query tools, or "middleware". This allows the data centers to maintain their own internal structure while presenting a common interface to the user. The user will be able to access the data and metadata via interactive Web browsing tools or automatic non-interactive software, without having to direct queries to a specific site.
The basis of the metadata standards have been agreed upon by the initial participating archive centers and the details of the definitions will be finalized by the close of FY97 for implementation in FY98. A prototype Web User Interface (WUI) has been developed by the Boulder Facility which provides map, temporal and tabular reports from the Archive Database using a combination of HTML, CGI-bin and Java methods. The design of UNAVCO's WUI is generalized so that information from other data centers can be reported once the standard access is implemented. Plans for FY98 include linking the three types of reports (map, temporal and tabular) so the data user can step quickly between tabular and graphical views, and developing a standardized data request mechanism.
The Boulder Facility's investment in designing and implementing a relational database model with an Oracle engine and establishing standard forms for field metadata collection and data entry have provided a solid foundation for implementing the seamless archive. Scripps currently has a hierarchical file system, but has recently purchased the hardware and Oracle software needed to implement a true relational database. Goddard's CDDIS has also just purchased Oracle and is migrating their data to a relational model. The improvements at these sites will facilitate implementation of seamless reporting in FY98 and eventually support seamless data requests.
DMAG completely rewrote UNAVCO's QC software in FY97 to improve the community's ability to assess data quality and the Facility's ability to supply validated data files. The alpha-version of TEQC software included quality checking and metadata-editing features at the beginning of FY97, followed by the beta-version in April 1997 with extensive documentation and an on-line manual. The beta release included RINEX translators for several binary formats (Trimble and Ashtech RS-232 real-time data streams and high-rate TurboBinary) for which there had previously been no publicly available translators. Later enhancements in FY97 resulted in translator capability for a total of 11 different binary formats, all of which are used for storing native binary files in the UNAVCO Archive.
A growing number of institutions are using TEQC for data quality checking and pre-processing. The Boulder Facility Archive uses it for extracting metadata from binary and RINEX files, injecting metadata into the On-line Repository and Archive Database, and for building RINEX files from archived binary data files. TEQC is also being used for similar purposes by the Southern California Integrated GPS Network (SCIGN) Data Center, University of California at San Diego, University of Colorado at Boulder, Texas A & M University, University of Texas at Austin, Rensselaer Polytechnic Institute, and the UCAR GPS Research Group. Many non-NSF funded groups also use TEQC such as the National Geodetic Survey (NGS) in support of the North American CORS network and the Goddard Space Flight Center for quality control of data from the ~150 station global IGS network, both sources of data used by UNAVCO investigators.
An automatic paper scanning capability was implemented in FY97 and will be used in FY98 to begin scanning the thousands of one-of-a-kind paper records which accompany the GPS data sets in the Boulder Physical Repository. These records include occupation logs, monument site descriptions, sketches and site access notes. Scanning these records will ensure the preservation of this important information. The capability to generate CD-R datasets was also developed in 1997. Raw data, RINEX data, scanned logsheets, and site descriptions can now be stored along with metadata on a disk which can be read on UNIX, PC or Apple platforms. Platform independence is further enhanced by having a CD index which can be accessed using a Web browser such as Netscape which can also be used to view or print the scanned records. This capability will make delivery and use of project data more efficient, convenient and complete.
In FY97, the Boulder Facility worked on improving data transfer methods in order to facilitate the regular transfer of files to the archive from regional data collection centers. With the expected quadrupling of the number of contributing permanent stations expected over the next year, new methods are needed to improve the robustness and efficiency of continuous data transfers. UNAVCO has adopted a system from the atmospheric sciences community, called the Local Data Manager/Internet Data Distribution (LDM/IDD) system. The UCAR-managed Unidata Program developed LDM to provide near real-time access to meteorological data to 130 universities via the Internet. DMAG identified the specific needs of the GPS community and coordinated with Unidata to modify LDM to include a standard data type for GPS data. The Facility then reconfigured LDM and designed new data management software to provide a data management topology for robust and reliable delivery of GPS data. LDM also has the potential to provide regular, automated delivery of GPS data and data products to multiple end users. This feature will be tested with the Basin and Range data by the end of FY97. Data transfer using LDM has proven to be quite robust and overcomes many of the problems associated with ftp transfers such as lack of security and data corruption. A parallel effort is underway by IRIS for seismological data transfers.
Last modified: Monday, 27-Oct-2014 18:36:09 UTC