A study of address matching using text matching techniques: A case study of telecommunication company
Poster
Authors/Editors
Strategic Research Themes
Publication Details
Author list: สุทธิพงศ์ อริยมงคลชัย, ปรมี วิทยานุพงศ์, พรทิพย์ เดชพิชัย, ฐิติมา โฆษิตพันธวงศ์
Publication year: 2023
Start page: 83
End page: 84
Number of pages: 2
Languages: Thai (TH)
Abstract
The purpose of this project was to prepare and clean the customer address data from the customer database and study the text-matching method to separate postpaid mobile phone users from clients who had never utilized fixed broadband internet service in the customer database using a postpaid mobile phone services customer database (mobile: 9,958,474 clients), broadband internet services, high-speed internet services, fixed telephones, exchanges, and internet partners customer database (non-mobile: 1,996,765 clients) and fixed broadband internet customer database (fbb: 2,100,957 clients). The performance of text-matching method was evaluated by accuracy from the Confusion Matrix table.
Data preparation and data cleaning with regexp_replace functions, missing data, and duplicates data and completely address text matching reduced data size by 23.33% and 12.8%, respectively. The result of conditional address text matching by one-to-many with identical house numbers, sub-districts, and districts about 6,533,241 pairs was subsequently considered using the text-matching technique such as Levenshtein distance, Cosine similarity, and Jaccard similarity. It had found that there was not the same address for 6,351,338 pairs (97.22%). The accuracy of text-matching technique using stratified sampling was between 87.60% and 93.36% with a 95% confidence interval.
Keywords
การเตรียมข้อมูล, ความคล้ายคลึงโคไซน์, ความคล้ายคลึงแจ็คการ์ด, ความถูกต้อง, ฟังก์ชัน, ระยะห่างเลเวนชเตย์น, อินเทอร์เน็ตบรอดแบนด์