Please use this identifier to cite or link to this item:
https://hdl.handle.net/11681/40203
Title: | Data Lake Ecosystem Workflow |
Authors: | Salter, R. Cody. Dong, Quyen T. Coleman, Cody A. Seale, Maria A. Ruvinsky, Alicia I. Walker, LaKenya K. Bond, W. Glenn. |
Keywords: | Big data Datasets Electronic data processing--Workflow Data curation--Workflow |
Publisher: | Information Technology Laboratory (U.S.) Engineer Research and Development Center (U.S.) |
Series/Report no.: | Technical Report (Engineer Research and Development Center (U.S.)) ; no. ERDC/ITL TR-21-2 |
Abstract: | The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL. |
Description: | Technical Report |
Gov't Doc #: | ERDC/ITL TR-21-2 |
Rights: | Approved for Public Release; Distribution is Unlimited |
URI: | https://hdl.handle.net/11681/40203 http://dx.doi.org/10.21079/11681/40203 |
Size: | 48 pages / 2.48 MB |
Types of Materials: | |
Appears in Collections: | Technical Report |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ERDC-ITL TR-21-2.pdf | 2.48 MB | Adobe PDF | ![]() View/Open |