Please use this identifier to cite or link to this item: https://hdl.handle.net/11681/41960
Title: Natural language indexing for pedoinformatics
Authors: Furey, John S.
Davis, Austin V.
Seiter-Moser, Jennifer M.
Keywords: Soil science
Classification
Taxonomy
Databases
Text mining
Publisher: Environmental Laboratory (U.S.)
Engineer Research and Development Center (U.S.)
Series/Report no.: Miscellaneous Paper (Engineer Research and Development Center (U.S.)) ; no. ERDC/EL MP-21-12
Is Version Of: Furey, John, Austin Davis, and Jennifer Seiter-Moser. "Natural language indexing for pedoinformatics." Geoderma 334 (2019): 49-54. https://doi.org/10.1016/j.geoderma.2018.07.050
Abstract: The multiple schema for the classification of soils rely on differing criteria but the major soil science systems, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources soil classification systems, are primarily based on inferred pedogenesis. Largely these classifications are compiled from individual observations of soil characteristics within soil profiles, and the vast majority of this pedologic information is contained in nonquantitative text descriptions. We present initial text mining analyses of parsed text in the digitally available USDA soil taxonomy documentation and the Soil Survey Geographic database. Previous research has shown that latent information structure can be extracted from scientific literature using Natural Language Processing techniques, and we show that this latent information can be used to expedite query performance by using syntactic elements and part-of-speech tags as indices. Technical vocabulary often poses a text mining challenge due to the rarity of its diction in the broader context. We introduce an extension to the common English vocabulary that allows for nearly-complete indexing of USDA Soil Series Descriptions.
Description: Miscellaneous Paper
Gov't Doc #: ERDC/EL MP-21-12
Rights: Approved for Public Release; Distribution is Unlimited
URI: https://hdl.handle.net/11681/41960
http://dx.doi.org/10.21079/11681/41960
Size: 11 pages / 879.67 kB
Types of Materials: PDF/A
Appears in Collections:Miscellaneous Paper

Files in This Item:
File Description SizeFormat 
ERDC-EL MP-21-12.pdf879.67 kBAdobe PDFThumbnail
View/Open