A study of address matching using text matching techniques: A case study of telecommunication company

Poster


Authors/Editors


Strategic Research Themes


Publication Details

Author listสุทธิพงศ์ อริยมงคลชัย, ปรมี วิทยานุพงศ์, พรทิพย์ เดชพิชัย, ฐิติมา โฆษิตพันธวงศ์

Publication year2023

Start page83

End page84

Number of pages2

LanguagesThai (TH)


Abstract

The purpose of this project was to prepare and clean the customer address data from the customer database and study the text-matching method to separate postpaid mobile phone users from clients who had never utilized fixed broadband internet service in the customer database using a postpaid mobile phone services customer database (mobile: 9,958,474 clients), broadband internet services, high-speed internet services, fixed telephones, exchanges, and internet partners customer database (non-mobile: 1,996,765 clients) and fixed broadband internet customer database (fbb: 2,100,957 clients). The performance of text-matching method was evaluated by accuracy from the Confusion Matrix table.

Data preparation and data cleaning with regexp_replace functions, missing data, and duplicates data and completely address text matching reduced data size by 23.33% and 12.8%, respectively. The result of conditional address text matching by one-to-many with identical house numbers, sub-districts, and districts about 6,533,241 pairs was subsequently considered using the text-matching technique such as Levenshtein distance, Cosine similarity, and Jaccard similarity. It had found that there was not the same address for 6,351,338 pairs (97.22%). The accuracy of text-matching technique using stratified sampling was between 87.60% and 93.36% with a 95% confidence interval.


Keywords

การเตรียมข้อมูลความคล้ายคลึงโคไซน์ความคล้ายคลึงแจ็คการ์ดความถูกต้องฟังก์ชันระยะห่างเลเวนชเตย์นอินเทอร์เน็ตบรอดแบนด์


Last updated on 2023-15-08 at 17:08