{"id":358,"date":"2023-10-14T08:45:11","date_gmt":"2023-10-14T15:45:11","guid":{"rendered":"http:\/\/improdango.com\/?page_id=358"},"modified":"2023-10-31T22:51:58","modified_gmt":"2023-11-01T05:51:58","slug":"analytical-approaches","status":"publish","type":"page","link":"http:\/\/improdango.com\/?page_id=358","title":{"rendered":"Analytical Approaches"},"content":{"rendered":"\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:25%\">\n<p><a href=\"http:\/\/improdango.com\/?page_id=362\" data-type=\"page\" data-id=\"362\">&#x2196; Malicious URLs Dataset<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:25%\">\n<p class=\"has-text-align-center has-extra-small-font-size\"><a href=\"http:\/\/improdango.com\/?page_id=360\" data-type=\"page\" data-id=\"360\">Visualizations &#x2197;<\/a><\/p>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"phish\"><em>Phishing Email Detection Using Classification<\/em><\/h4>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<ul class=\"wp-block-list\">\n<li><strong>Why:<\/strong> Data provided by Kaggle is important information for banking as it helps us to focus on predicting if a received email is a phishing email or not.<\/li>\n\n\n\n<li><strong>Value:<\/strong> Banking organizations can improve security by strategically vetting large amounts of email data to ensure they are not exposed to attacks through phishing emails.<\/li>\n<\/ul>\n\n\n\n<p><strong>Data Source:<\/strong> Data obtain from Kaggle contains over 28,000 rows of email body text and it classifies them between phishing or safe email.<\/p>\n\n\n\n<p><strong>Analytical Process<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Obtained data from Kaggle would get stored in a SQL table via an API.<\/li>\n\n\n\n<li>Data then would be used to train through an algorithm to vet future incoming emails.<\/li>\n\n\n\n<li>Python will be used to achieve algorithms.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<div class=\"wp-block-media-text is-stacked-on-mobile\" style=\"grid-template-columns:37% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"364\" height=\"142\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61Attributes.png\" alt=\"\" class=\"wp-image-366 size-full\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61Attributes.png 364w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61Attributes-300x117.png 300w\" sizes=\"auto, (max-width: 364px) 100vw, 364px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p><strong>Product Attributes<\/strong> were collected through Python. Data was subject through a text analysis as main attribute is not tabular. This pre-process reduce training data to 18,634 instances.<\/p>\n<\/div><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"932\" height=\"559\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61AttributesPython.png\" alt=\"\" class=\"wp-image-365\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61AttributesPython.png 932w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61AttributesPython-300x180.png 300w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p61AttributesPython-768x461.png 768w\" sizes=\"auto, (max-width: 932px) 100vw, 932px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p><strong>Product Attributes<\/strong> were collected using Python. This summary highlights that text dataset (emails) were classified into two types: Safe and Phishing emails. Also, shown below is a description of the fields.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"992\" height=\"408\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes2.png\" alt=\"\" class=\"wp-image-375\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes2.png 992w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes2-300x123.png 300w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes2-768x316.png 768w\" sizes=\"auto, (max-width: 992px) 100vw, 992px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"690\" height=\"551\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes.png\" alt=\"\" class=\"wp-image-376\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes.png 690w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Attributes-300x240.png 300w\" sizes=\"auto, (max-width: 690px) 100vw, 690px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<p><strong>K-Fold Random Forest: <\/strong>Popular algorithm for Classification. This algorithm is a tree-based ML that leverages the power of multiple decision trees where each node is a random subset of features to calculate the output (Sharma 2020).<\/p>\n\n\n\n<div class=\"wp-block-media-text has-media-on-the-right is-stacked-on-mobile is-vertically-aligned-center\"><div class=\"wp-block-media-text__content\">\n<p><strong>Process<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Data was set through pre-processing steps:\n<ul class=\"wp-block-list\">\n<li>Removal of instances with missing values.<\/li>\n\n\n\n<li>Replacement of Training spaces and characters.<\/li>\n\n\n\n<li>Update of all characters to lower case and removal of duplicates records.<\/li>\n\n\n\n<li>Removal of extra columns and export to ascii format.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<\/div><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"759\" height=\"783\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Really-1.png\" alt=\"\" class=\"wp-image-385 size-full\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Really-1.png 759w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p63Really-1-291x300.png 291w\" sizes=\"auto, (max-width: 759px) 100vw, 759px\" \/><\/figure><\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<div class=\"wp-block-media-text is-stacked-on-mobile\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"818\" height=\"555\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p64Kfold2.png\" alt=\"\" class=\"wp-image-382 size-full\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p64Kfold2.png 818w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p64Kfold2-300x204.png 300w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p64Kfold2-768x521.png 768w\" sizes=\"auto, (max-width: 818px) 100vw, 818px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<ol class=\"wp-block-list\" start=\"2\">\n<li>Data was trained on 2\/3 of the data with a tree forest of 500-fold.<\/li>\n\n\n\n<li>Data was set to .15 features to be considered when looking for best split.<\/li>\n\n\n\n<li>Job was set not to run in parallel.<\/li>\n\n\n\n<li>Randomness was set to 42<\/li>\n<\/ol>\n<\/div><\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<p><strong>Confusion Matrix <\/strong>was applied to provide measures of model\u2019s accuracy with k-fold random forest algorithm which demonstrated an accuracy of 95.5%<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"636\" height=\"599\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p65A.png\" alt=\"\" class=\"wp-image-390\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p65A.png 636w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p65A-300x283.png 300w\" sizes=\"auto, (max-width: 636px) 100vw, 636px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"606\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p65B.png\" alt=\"\" class=\"wp-image-391\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p65B.png 624w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p65B-300x291.png 300w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<p><strong>Decision Tree Classifier:<\/strong> which is a supervised machine learning algorithm used in our case for classification. This \u201cdecision tree is a simple series of sequential decisions made to reach a specific result\u201d (Sharma 2000.)<\/p>\n\n\n\n<div class=\"wp-block-media-text has-media-on-the-right is-stacked-on-mobile is-vertically-aligned-center\"><div class=\"wp-block-media-text__content\">\n<p><strong>Process<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>As our data set is a semi-structured, text mining was applied to classify data.<\/li>\n\n\n\n<li>Data set was pre-processed by cleaning it and converting it to UTF8, tab delimited<\/li>\n\n\n\n<li>Using Python data was set for patter recognition by creating a column for each free text subsequently providing a class label.<\/li>\n<\/ol>\n<\/div><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"519\" height=\"432\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p66DecisionTreeClassifier.png\" alt=\"\" class=\"wp-image-393 size-full\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p66DecisionTreeClassifier.png 519w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p66DecisionTreeClassifier-300x250.png 300w\" sizes=\"auto, (max-width: 519px) 100vw, 519px\" \/><\/figure><\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"network\"><em>Network Intrusion Detection Using Classification<\/em><\/h4>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<ul class=\"wp-block-list\">\n<li><strong>Why:<\/strong> This dataset is important to the banking industry to simulate whether a network is under attack and an intrusion to the network has occurred.<\/li>\n\n\n\n<li><strong>Value:<\/strong> Financial institutions can benefit from this analysis to determine&nbsp; what is a benign attack and what classifies as a network intrusion to quickly respond.<\/li>\n<\/ul>\n\n\n\n<p><strong>Data Source:<\/strong> Data obtained from Kaggle contains over 170,000 rows of data going through a network during a&nbsp;morning working day.<\/p>\n\n\n\n<p><strong>Analytical Process<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Obtained data from Kaggle&nbsp;would get&nbsp;stored in a SQL table&nbsp;via an API.<\/li>\n\n\n\n<li>Data then would be used to train&nbsp;through&nbsp;an algorithm to vet&nbsp;future incoming emails.&nbsp;<\/li>\n\n\n\n<li>Python will be used for algorithms and visualizations<\/li>\n<\/ol>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p><strong>Product Attributes<\/strong> were imported in Python with data pre-processing and training of the models for the y_ train output.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"599\" height=\"197\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkA.png\" alt=\"\" class=\"wp-image-395\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkA.png 599w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkA-300x99.png 300w\" sizes=\"auto, (max-width: 599px) 100vw, 599px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"622\" height=\"305\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkB.png\" alt=\"\" class=\"wp-image-396\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkB.png 622w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkB-300x147.png 300w\" sizes=\"auto, (max-width: 622px) 100vw, 622px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<p><strong>Random Forest: <\/strong>Machine learning classification method to predict the class or category of an input data (benign or intrusion)<\/p>\n\n\n\n<div class=\"wp-block-media-text has-media-on-the-right is-stacked-on-mobile is-vertically-aligned-center\"><div class=\"wp-block-media-text__content\">\n<p><strong>Process<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Dataset uploaded to Python with preprocessing completed for analysis to remove null values and large values not pertinent for analysis<\/li>\n\n\n\n<li>RandomForestClassifier used to classify and train x and y<\/li>\n\n\n\n<li>Series created for value counts of benign and intrusion value counts<\/li>\n\n\n\n<li>Bar chart visualization created with x and y labels updated (reference Network Intrusion Visualization)<\/li>\n<\/ol>\n<\/div><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"563\" height=\"367\" src=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkC.png\" alt=\"\" class=\"wp-image-399 size-full\" srcset=\"http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkC.png 563w, http:\/\/improdango.com\/wp-content\/uploads\/2023\/10\/p69networkC-300x196.png 300w\" sizes=\"auto, (max-width: 563px) 100vw, 563px\" \/><\/figure><\/div>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"urls\"><em>Analysis of Malicious URLs Data<\/em><\/h4>\n\n\n\n<div class=\"wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<p><strong>URL Country Identification: <\/strong>URLs and IPs were run through an API to determine country location (<a href=\"http:\/\/api.ipstack.com\">http:\/\/api.ipstack.com<\/a>).&nbsp; Due to limitations on volume of data accessible via API (without paying a significant fee), the data returned focuses on foreign domains only.<\/p>\n\n\n\n<p>The results of this analysis were placed into a geospatial visualization, which places a larger dot on the countries which were identified as having more malicious URLs in the dataset.&nbsp; This visualization offers a quick yet valuable snapshot which highlights the countries with the most malicious URLs.<\/p>\n\n\n\n<p>This data is useful for identifying&nbsp;countries and domains which may be high risk and warrant additional scrutiny or countermeasures.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>&#x2196; Malicious URLs Dataset Visualizations &#x2197; Phishing Email Detection Using Classification Data Source: Data obtain from Kaggle contains over 28,000 rows of email body text and it classifies them between phishing or safe email. Analytical Process Product Attributes were collected through Python. Data was subject through a text analysis as main attribute is not tabular. [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":0,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-358","page","type-page","status-publish","hentry"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/improdango.com\/index.php?rest_route=\/wp\/v2\/pages\/358","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/improdango.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/improdango.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/improdango.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"http:\/\/improdango.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=358"}],"version-history":[{"count":37,"href":"http:\/\/improdango.com\/index.php?rest_route=\/wp\/v2\/pages\/358\/revisions"}],"predecessor-version":[{"id":474,"href":"http:\/\/improdango.com\/index.php?rest_route=\/wp\/v2\/pages\/358\/revisions\/474"}],"wp:attachment":[{"href":"http:\/\/improdango.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=358"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}