99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務合肥法律

TCS3393 DATA MINING代做、代寫Python/Java編程

時間:2024-03-24  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



FACULTY OF ENGINEERING, BUILT-ENVIRONMENT, AND INFORMATION
TECHNOLOGY (FOEBEIT)
BACHELOR OF INFORMATION TECHNOLOGY (HONS)
JANUARY-MAY 2024 INTAKE
TCS3393 DATA MINING
GROUP ASSIGNMENT [2-3 members per group]
This assignment is worth 25% of the overall marks available for this module. This assignment
aims to help the student explore and analyse a set of data and reconstruct it into meaningful
representations for decision-making.
The online landscape is ever-evolving, with websites serving as crucial assets for businesses,
organizations, and individuals. As the internet continues to grow, the need for accurate and
efficient website classification becomes paramount. Understanding the nature of websites, their
content, and the user experience they provide is vital for various purposes, including online
security, marketing strategies, and content filtering.
Embarking on a data science project, you collaborate with a cybersecurity firm dedicated to
enhancing web security measures. The firm provides you with a rich dataset encompassing
various attributes of websites, including their URLs, user comments, and assigned categories.
Your objective is to develop a classification model capable of accurately categorizing websites
based on these variables.
The dataset includes information on the URLs of different websites, user comments associated
with those websites, and pre-existing categories assigned to them. The challenge lies in creating
a model that not only accurately classifies websites but also adapts to the dynamic nature of the
online environment, where new types of websites constantly emerge.
Introduction
2
Your goal is to implement advanced data analysis techniques to train a model that enhances the
efficiency of web classification.
Techniques
The techniques used to explore the dataset using various data exploration, manipulation,
transformation, and visualization techniques are covered in the course. As an additional feature,
you must explore further concepts which can improve the retrieval effects. The datasetprovided
for this assignment is related to the website classification.
Dataset
This dataset contains information on 1407 websites URL. It includes 3 variables that describe
various categories of websites. The dataset will be analyzed using subsets of these variables for
descriptive and quantitative analyses, depending on the specific models used.
Objective:
Develop a classification model to categorize websitesusing advanced data science techniques.The
model should robustly classify the website based on comments stated in the dataset.
Tasks:
1. Data Exploration:
• Conduct an initial exploration of the dataset to understand its structure, size, and
variables.
• Examine the distribution of website categories to identify any imbalances in the
dataset.
• Explore the distribution of URLs and user comments length to gain insights into the
data.
Assignment Task: Websites Classification
3
2. Descriptive Analysis:
A. Basic Exploration:
• Describe the structure of the dataset. How many observations and variables
does it contain?
• What are the data types of the variables in the dataset?
B. Statistical Summary:
• Provide a statistical summary of the 'Category' variable. What are the most
common website categories?
• Calculate basic descriptive statistics (mean, median, standard deviation) for
relevant numeric variables.
C. URL Analysis:
• Analyze the distribution of website URLs. Are there any patterns or
commonalities?
• Are there any outlier URLs that need special attention?
3. Data Preprocessing:
A. Cleaning Text Data:
• Explore the 'cleaned_website_text' variable. What preprocessing steps would
you take to clean text data for analysis?
• Implement text cleaning techniques and explain their importance in preparing
data for text-based analysis.
B. Handling Missing Values:
• Identify if there are any missing values in the dataset. Propose strategies for
handling missing values, specifically in the 'cleaned_website_text' column.
4. Visualization:
A. Category Distribution Visualization:
• Create a bar chart or pie chart to visually represent the distribution of website
categories.
• How does the visualization help in understanding the balance or imbalance of
the dataset?
B. Text Data Visualization:
• Generate word clouds or frequency plots for the 'cleaned_website_text'
variable. What insights can be gained from these visualizations?
4
5. Model Development
A. Data Mining Analysis:
• Split the dataset into training and testing sets for model evaluation.
• Implement various machine learning algorithms for classification, such as logistic
regression, decision trees, or random forests.
B. Training and Evaluation
• Evaluate the performance of each model using metrics like accuracy, precision, recall,
and F**score.
• Discuss the challenges and considerations specific to evaluating a model for website
classification.
6. Advanced Techniques:
i. Feature Engineering:
• Propose additional features that could enhance the model's performance.
How might these features capture more nuanced information about websites?
ii.Dynamic Nature of Websites:
• Given the dynamic nature of the online environment, how could the model
adapt to newly emerging website types? Discuss strategies for model
adaptation.
7. Create Dashboard, Report and Conclusions:
• Summarize the findings, including insights gained from exploratory data analysis and
the performance of the classification model.
• How interpretable is the chosen model? Can you explain the decision-making process
of the model in the context of website classification?
• Provide recommendations for further improvements or considerations in the dynamic
landscape of web classification.
• Reflect on the challenges encountered during the analysis. What potential
improvements or future work would you recommend to enhance the model's
performance?
This assignment allows students to apply knowledge of data exploration, preprocessing, data
modelling, and model building to solve a real-world problem in the business domain. It also
encourages them to explore additional concepts for improving model performance.
5
• The complete Python program (source code (ipynb)) and report must be submitted to
Blackboard.
• Python Script (Program Code):
o Name the file under your name and SUKD number.
o Start the first two lines in your program by typing your name and SUKD
number. For example:
# Nor Anis Sulaiman
#SUKD20231234
o For each question, give an ID and explain what you want to discover. For example:
a. Explore the distribution of website categories in the dataset. Are there any specific
categories that are more prevalent than others?
b. Visualize the distribution of URL lengths and user comments lengths. Are there patterns
or outliers that could be informative for the classification model?
c. What steps would you take to clean and preprocess the URLs and user comments for
effective analysis?
d. How might you handle any missing values in the dataset, and what impact could they
have on the classification model?
e. Provide descriptive statistics for key variables such as URL lengths and user comments
lengths. What insights can be derived from these statistics?
f. Explore potential additional features that could enhance the model's ability to classify
websites accurately.
g. How might the inclusion of features derived from URLs or user comments contribute
to the overall model performance?
h. Choose a classification algorithm suitable for website classification. Explain your
choice.
i. Implement the chosen algorithm using Python and relevant libraries. What
considerations should be taken into account during the model implementation phase?
j. Split the dataset into training and testing sets. How would you assess the performance
of the model using metrics like accuracy, precision, recall, and F**score?
k. Discuss potential challenges in evaluating the model's effectiveness and generalization
to new websites.
l. Create visualizations to interpret the model's predictions and showcase its classification
performance.
Deliverables
6
As part of the assessment, you must submit the project report in printed and softcopy form,
which should have the following format:
A) Cover Page:
All reports must be prepared with a front cover. A protective transparent plastic sheet can be
placed in front of the report to protect the front cover. The front cover should be presented with
the following details:
o Module
o Coursework Title
o Intake
o Student name and ID
o Date Assigned (the date the report was handed out).
o Date Completed (the date the report is due to be handed in).
B) Contents:
• Introduction and assumptions (if any)
• Data import / Cleaning / pre-processing / transformation
• Each question must start in a separate page and contains:
o Analysis Techniques - data exploration / manipulation / visualization
o Screenshot of source code with the explanation.
o Screenshot of output/plot with the explanation.
o Outline the findings based on the results obtained.
• The extra feature explanation must be on a separate page and contain:
Documents: Coursework Report
7
o Screenshot of source code with the explanation.
o Screenshot of output/plot with the explanation.
o Explain how adding this extra feature can improve the results.
C) Conclusion
• Depth and breadth of analysis
• Quality and depth of feedback on the analysis process
• Reflection on learning and areas for improvement
D) References
• The font size used in the report must be 12pt, and the font is Times New Roman. Full
source code is not allowed to be included in the report. The report must be typed and
clearly printed.
• You may source algorithms and information from the Internet or books. Proper
referencing of the resources should be evident in the document.
• All references must be made using the APA (American Psychological Association)
referencing style as shown below:
o The theory was first propounded in 1970 (Larsen, A.E. 1971), but since then has
been refuted; M.K. Larsen (1983) is among those most energetic in their
opposition……….
o /**Following source code obtained from (Danang, S.N. 2002)*/
int noshape=2;
noshape=GetShape();
• A list of references at the end of your document or source code must be specified in the
following format:
Larsen, A.E. 1971, A Guide to the Aquatic Science Literature, McGraw-Hill, London.
Larsen, M.K. 1983, British Medical Journal [Online], Available from
http://libinfor.ume.maine.edu/acquatic.htm (Accessed 19 November 1995)
Danang, S.N., 2002, Finding Similar Images [Online], The Code Project, *Available
from http://www.codeproject.com/bitmap/cbir.asp, [Accessed 14th *September 2006]
Further information on other types of citation is available in Petrie, A., 2003, UWE
Library Services Study Skills: How to reference [online], England, University of
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

掃一掃在手機打開當前頁
  • 上一篇:ECM1410代做、代寫java編程設計
  • 下一篇:代做CS 550、代寫c++,Java編程語言
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務+熱設計優(yōu)化
    急尋熱仿真分析?代做熱仿真服務+熱設計優(yōu)化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發(fā)動機性能
    挖掘機濾芯提升發(fā)動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
    合肥機場巴士2號線
    合肥機場巴士2號線
    合肥機場巴士1號線
    合肥機場巴士1號線
  • 短信驗證碼 豆包 幣安下載 AI生圖 目錄網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                欧美激情一区三区| 精品午夜久久福利影院| 日韩精品一区二区三区中文精品| 亚洲成人自拍偷拍| 五月天亚洲婷婷| 国产91丝袜在线播放| 成人性生交大片免费看中文 | 日韩免费性生活视频播放| 色菇凉天天综合网| 欧美丝袜丝交足nylons| 久久精品人人做人人综合 | 国产99久久精品| 欧美亚日韩国产aⅴ精品中极品| 日韩中文字幕一区二区三区| av电影在线观看一区| 久久久久久亚洲综合| 国产一区二区在线观看视频| 日韩免费在线观看| 成人免费的视频| 亚洲成人午夜影院| 国产午夜精品美女毛片视频| 本田岬高潮一区二区三区| 亚洲国产一区二区在线播放| 欧美日韩成人在线一区| 国内精品在线播放| 欧美久久久久久蜜桃| 亚洲另类在线制服丝袜| 欧美美女喷水视频| 国产v综合v亚洲欧| 亚洲色图欧洲色图| 国产视频一区二区在线| 北岛玲一区二区三区四区| 视频一区二区欧美| 亚洲一区二区三区在线| 久久综合一区二区| 在线观看视频91| 日本乱人伦aⅴ精品| 成人美女在线视频| 国产伦精品一区二区三区免费迷 | 亚洲.国产.中文慕字在线| 亚洲综合视频在线观看| 亚洲一区二区偷拍精品| 蜜臀av性久久久久av蜜臀妖精| 国产精品18久久久久久久网站| 欧美精品精品一区| 欧美午夜精品一区二区三区| 欧美日韩在线播放| 天天色天天爱天天射综合| 欧美日韩高清一区二区| 精品一区二区三区欧美| 久久久一区二区| 欧美一区二视频| 狠狠色狠狠色综合日日91app| 国产在线乱码一区二区三区| 亚洲午夜久久久久久久久电影网| 欧美午夜免费电影| 国产精品不卡一区| 精品视频在线免费看| 欧美日韩一区中文字幕| 欧美人牲a欧美精品| 色久优优欧美色久优优| 国产美女精品在线| 欧美不卡激情三级在线观看| 精品第一国产综合精品aⅴ| 中文字幕第一区二区| 亚洲精品欧美二区三区中文字幕| 国产精品18久久久久久久网站| 69久久99精品久久久久婷婷| 欧美在线综合视频| 日韩久久久久久| 中文字幕免费不卡| 久久超碰97中文字幕| fc2成人免费人成在线观看播放| 亚洲综合男人的天堂| 青娱乐精品在线视频| 欧美理论电影在线| 视频一区视频二区中文| 91在线云播放| 最新不卡av在线| 成人午夜视频在线| 欧美一个色资源| 美女看a上一区| 337p亚洲精品色噜噜噜| 亚洲综合丝袜美腿| 色噜噜狠狠一区二区三区果冻| 99视频超级精品| 国产精品网站导航| 成人免费视频播放| 久久婷婷色综合| 丁香六月久久综合狠狠色| 国产精品沙发午睡系列990531| 久久久久国产精品人| 国产乱码精品一品二品| 精品国产第一区二区三区观看体验 | 国产精品系列在线观看| 精品奇米国产一区二区三区| 久久国产日韩欧美精品| 日韩一级二级三级| 国产成人综合精品三级| 欧美激情在线看| 色婷婷综合视频在线观看| 日韩av中文字幕一区二区| 国产无人区一区二区三区| 在线一区二区三区四区五区| 婷婷中文字幕一区三区| 亚洲欧美在线另类| 欧美岛国在线观看| 在线观看欧美日本| 北岛玲一区二区三区四区| 男女性色大片免费观看一区二区| 激情国产一区二区| 中文字幕日韩一区二区| 欧美日韩一卡二卡三卡| 成人精品亚洲人成在线| 日本不卡一区二区| 亚洲激情一二三区| 成人免费在线视频| 国产精品视频在线看| 久久久久久影视| 欧美电视剧免费观看| 欧美日韩精品久久久| www.亚洲人| 99精品久久免费看蜜臀剧情介绍| 欧美一区二区三区四区在线观看 | 91美女蜜桃在线| av不卡在线播放| 美女视频一区在线观看| 男女男精品视频网| 国产精品资源在线看| 午夜精品一区在线观看| 日韩精品亚洲一区二区三区免费| 欧美妇女性影城| 欧美精品在线观看一区二区| 欧美乱妇15p| 国产日韩三级在线| 亚洲综合一区在线| 奇米一区二区三区| 国产成人免费av在线| 92精品国产成人观看免费| 色综合久久久久综合体| 欧美日韩精品是欧美日韩精品| 亚洲一区二区三区四区在线观看 | 风间由美一区二区三区在线观看| 精品久久久久久无| 中文字幕欧美区| 精品99999| 日韩免费视频一区二区| 中文字幕一区二区三区av| 青青草国产精品亚洲专区无| 色999日韩国产欧美一区二区| 亚洲午夜国产一区99re久久| 久久精品72免费观看| 天堂av在线一区| 成a人片亚洲日本久久| 欧美精品在线观看一区二区| 中文字幕在线一区免费| 国产精品一区久久久久| 久久久蜜臀国产一区二区| 久久se这里有精品| 欧美tickling挠脚心丨vk| 五月婷婷久久综合| 欧美亚洲国产bt| 亚洲色图视频免费播放| 欧美性受xxxx黑人xyx性爽| 国产精品三级在线观看| 岛国一区二区在线观看| 久久久777精品电影网影网 | 99久久婷婷国产精品综合| 日韩一级免费观看| 国产91精品一区二区麻豆亚洲| 国产麻豆精品theporn| 欧美成人精品3d动漫h| 国产成人综合在线观看| 久久一区二区三区四区| 国产大陆a不卡| 亚洲成人7777| 亚洲国产精华液网站w| caoporn国产精品| 日韩一区精品视频| 国产三级精品在线| 欧美高清激情brazzers| 粉嫩久久99精品久久久久久夜| 久久国内精品视频| 自拍偷拍亚洲综合| 久久精品亚洲乱码伦伦中文 | 亚洲男人天堂av网| 日韩一级免费观看| 欧美视频在线播放| 国产不卡在线视频| 日韩电影在线免费看| 一区二区三区不卡视频在线观看| 国产乱子伦视频一区二区三区| 99久久国产综合精品麻豆| 日韩**一区毛片| 1024成人网| 精品久久久久久无| 91精品国产入口在线| 欧美日韩日日骚| 制服丝袜在线91| 91麻豆精品91久久久久久清纯|