99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

CS439編程代寫、代做Java程序語言
CS439編程代寫、代做Java程序語言

時間:2024-10-13  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



CS439: Introduction to Data Science Fall 2024 
 
Problem Set 1 
 
Due: 11:59pm Friday, October 11, 2024 
 
Late Policy: The homework is due on 10/11 (Friday) at 11:59pm. We will release the solutions 
of the homework on Canvas on 10/16 (Wednesday) 11:59pm. If your homework is submitted to 
Canvas before 10/11 11:59pm, there will no late penalty. If you submit to Canvas after 10/11 
11:59pm and before 10/16 11:59pm (i.e., before we release the solution), your score will be 
penalized by 0.9k
, where k is the number of days of late submission. For example, if you 
submitted on 10/14, and your original score is 80, then your final score will be 80*0.93
=58.** 
for 14-11=3 days of late submission. If you submit to Canvas after 10/16 11:59pm (i.e., after we 
release the solution), then you will earn no score for the homework.  
 
General Instructions 
 
Submission instructions: These questions require thought but do not require long answers. 
Please be as concise as possible. You should submit your answers as a writeup in PDF format, 
for those questions that require coding, write your code for a question in a single source code 
file, and name the file as the question number (e.g., question_1.java or question_1.py), finally, 
put your PDF answer file and all the code files in a folder named as your Name and NetID (i.e., 
Firstname-Lastname-NetID.pdf), compress the folder as a zip file (e.g., Firstname-LastnameNetID.zip),
and submit the zip file via Canvas. 
 
For the answer writeup PDF file, we have provided both a word template and a latex template 
for you, after you finished the writing, save the file as a PDF file, and submit both the original 
file (word or latex) and the PDF file. 
 
Questions 
 
1. Map-Reduce (35 pts) 
 
Write a MapReduce program in Hadoop that implements a simple “People You Might Know” 
social network friendship recommendation algorithm. The key idea is that if two people have a 
lot of mutual friends, then the system should recommend that they connect with each other. 
 
Input: Use the provided input file hw1q1.zip. 
 
The input file contains the adjacency list and has multiple lines in the following format: 
<User><TAB><Friends> 
 Here, <User> is a unique integer ID corresponding to a unique user and <Friends> is a commaseparated
 list of unique IDs corresponding to the friends of the user with the unique ID <User>. 
Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B, then B is 
also friend with A. The data provided is consistent with that rule as there is an explicit entry for 
each side of each edge. 
 
Algorithm: Let us use a simple algorithm such that, for each user U, the algorithm recommends 
N = 10 users who are not already friends with U, but have the largest number of mutual friends 
in common with U. 
 
Output: The output should contain one line per user in the following format: 
 
<User><TAB><Recommendations> 
 
where <User> is a unique ID corresponding to a user and <Recommendations> is a commaseparated
 list of unique IDs corresponding to the algorithm’s recommendation of people that 
<User> might know, ordered by decreasing number of mutual friends. Even if a user has 
fewer than 10 second-degree friends, output all of them in decreasing order of the number of 
mutual friends. If a user has no friends, you can provide an empty list of recommendations. If 
there are multiple users with the same number of mutual friends, ties are broken by ordering 
them in a numerically ascending order of their user IDs. 
 
Also, please provide a description of how you are going to use MapReduce jobs to solve this 
problem. We only need a very high-level description of your strategy to tackle this problem. 
 
Note: It is possible to solve this question with a single MapReduce job. But if your solution 
requires multiple MapReduce jobs, then that is fine too. 
 
What to submit: 
 
(i) The source code as a single source code file named as the question number (e.g., 
question_1.java). 
 
(ii) Include in your writeup a short paragraph describing your algorithm to tackle this problem. 
 
(iii) Include in your writeup the recommendations for the users with following user IDs: 
924, 8941, 8942, **19, **20, **21, **22, 99**, 9992, 9993. 
 
 
2. Association Rules (35 pts) 
 
Association Rules are frequently used for Market Basket Analysis (MBA) by retailers to 
understand the purchase behavior of their customers. This information can be then used for many different purposes such as cross-selling and up-selling of products, sales promotions, 
loyalty programs, store design, discount plans and many others. 
 
Evaluation of item sets: Once you have found the frequent itemsets of a dataset, you need to 
choose a subset of them as your recommendations. Commonly used metrics for measuring 
significance and interest for selecting rules for recommendations are: 
 
2a. Confidence (denoted as conf(A → B)): Confidence is defined as the probability of 
occurrence of B in the basket if the basket already contains A: 
 
conf(A → B) = Pr(B|A), 
 
where Pr(B|A) is the conditional probability of finding item set B given that item set A is 
present. 
 
2b. Lift (denoted as lift(A → B)): Lift measures how much more “A and B occur together” than 
“what would be expected if A and B were statistically independent”: 
* and N is the total number of transactions (baskets). 
 
3. Conviction (denoted as conv(A→B)): it compares the “probability that A appears without B if 
they were independent” with the “actual frequency of the appearance of A without B”: 
 
(a) [5 pts] 
 
A drawback of using confidence is that it ignores Pr(B). Why is this a drawback? Explain why lift 
and conviction do not suffer from this drawback? 
 
(b) [5 pts] 
 
A measure is symmetrical if measure(A → B) = measure(B → A). Which of the measures 
presented here are symmetrical? For each measure, please provide either a proof that the 
measure is symmetrical, or a counterexample that shows the measure is not symmetrical. 
 
(c) [5 pts] 
 A measure is desirable if its value is maximal for rules that hold 100% of the time (such rules are 
called perfect implications). This makes it easy to identify the best rules. Which of the above 
measures have this property? Explain why. 
 
 
Product Recommendations: The action or practice of selling additional products or services to 
existing customers is called cross-selling. Giving product recommendation is one of the 
examples of cross-selling that are frequently used by online retailers. One simple method to 
give product recommendations is to recommend products that are frequently browsed 
together by the customers. 
 
Suppose we want to recommend new products to the customer based on the products they 
have already browsed on the online website. Write a program using the A-priori algorithm to 
find products which are frequently browsed together. Fix the support to s = 100 (i.e. product 
pairs need to occur together at least 100 times to be considered frequent) and find itemsets of 
size 2 and 3. 
 
Use the provided browsing behavior dataset browsing.txt. Each line represents a browsing 
session of a customer. On each line, each string of 8 characters represents the id of an item 
browsed during that session. The items are separated by spaces. 
 
Note: for the following questions (d) and (e), the writeup will require a specific rule ordering 
but the program need not sort the output. 
 
(d) [10pts] 
 
Identify pairs of items (X, Y) such that the support of {X, Y} is at least 100. For all such pairs, 
compute the confidence scores of the corresponding association rules: X ⇒ Y, Y ⇒ X. Sort the 
rules in decreasing order of confidence scores and list the top 5 rules in the writeup. Break ties, 
if any, by lexicographically increasing order on the left hand side of the rule. 
 
(e) [10pts] 
 
Identify item triples (X, Y, Z) such that the support of {X, Y, Z} is at least 100. For all such triples, 
compute the confidence scores of the corresponding association rules: (X, Y) ⇒ Z, (X, Z) ⇒ Y, 
and (Y, Z) ⇒ X. Sort the rules in decreasing order of confidence scores and list the top 5 rules in 
the writeup. Order the left-hand-side pair lexicographically and break ties, if any, by 
lexicographical order of the first then the second item in the pair. 
 
What to submit: 
 
Include your properly named code file (e.g., question_2.java or question_2.py), and include the 
answers to the following questions in your writeup: 
 (i) Explanation for 2(a). 
 
(ii) Proofs and/or counterexamples for 2(b). 
 
(iii) Explanation for 2(c). 
 
(iv) Top 5 rules with confidence scores for 2(d). 
 
(v) Top 5 rules with confidence scores for 2(e). 
 
3. Locality-Sensitive Hashing (30 pts) 
 
When simulating a random permutation of rows, as described in Sec 3.3.5 of MMDS textbook, 
we could save a lot of time if we restricted our attention to a randomly chosen k of the n rows, 
rather than hashing all the row numbers. The downside of doing so is that if none of the k rows 
contains a 1 in a certain column, then the result of the min-hashing is “don’t know,” i.e., we get 
no row number as a min-hash value. It would be a mistake to assume that two columns that 
both min-hash to “don’t know” are likely to be similar. However, if the probability of getting 
“don’t know” as a min-hash value is small, we can tolerate the situation, and simply ignore such 
min-hash values when computing the fraction of min-hashes in which two columns agree. 
 
(a) [10 pts] 
 
Suppose a column has m 1’s and therefore (n-m) 0’s. Prove that the probability we get 
“don’t know” as the min-hash value for this column is at most (
+,-
+ ).. 
 
(b) [10 pts] 
 
Suppose we want the probability of “don’t know” to be at most  ,/0. Assuming n and m are 
both very large (but n is much larger than m or k), give a simple approximation to the smallest 
value of k that will assure this probability is at most  ,/0. Hints: (1) You can use (
+,-
+ ). as the 
exact value of the probability of “don’t know.” (2) Remember that for large x, (1 − /
1
)1 ≈ 1/ . 
 
(c) [10 pts] 
 
Note: This question should be considered separate from the previous two parts, in that we are 
no longer restricting our attention to a randomly chosen subset of the rows. 
 When min-hashing, one might expect that we could estimate the Jaccard similarity without 
using all possible permutations of rows. For example, we could only allow cyclic permutations 
i.e., start at a randomly chosen row r, which becomes the first in the order, followed by rows 
r+1, r+2, and so on, down to the last row, and then continuing with the first row, second row, 
and so on, down to row r−1. There are only n such permutations if there are n rows. However, 
these permutations are not sufficient to estimate the Jaccard similarity correctly. 
 
Give an example of two columns such that the probability (over cyclic permutations only) that 
their min-hash values agree is not the same as their Jaccard similarity. In your answer, please 
provide (a) an example of a matrix with two columns (let the two columns correspond to sets 
denoted by S1 and S2) (b) the Jaccard similarity of S1 and S2, and (c) the probability that a 
random cyclic permutation yields the same min-hash value for both S1 and S2. 
 
What to submit: 
 
Include the following in your writeup: 
 
(i) Proof for 3(a) 
 
(ii) Derivation and final answer for 3(b) 
 
(iii) Example for 3(c) 
 
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp




 

掃一掃在手機打開當前頁
  • 上一篇:FINM8006代寫、代做Python編程設計
  • 下一篇:&#160;ICT50220代做、代寫c++,Java程序設計
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
    合肥機場巴士2號線
    合肥機場巴士2號線
    合肥機場巴士1號線
    合肥機場巴士1號線
  • 短信驗證碼 豆包 幣安下載 AI生圖 目錄網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                国产日本亚洲高清| 亚洲电影一区二区| 粉嫩欧美一区二区三区高清影视| 成年人午夜久久久| 国产偷国产偷亚洲高清人白洁 | 欧美精品在线一区二区三区| 日韩二区在线观看| 久久在线免费观看| 精品国产一区二区亚洲人成毛片 | 国产剧情一区二区三区| 亚洲欧美二区三区| 欧美经典一区二区| 国产精品久久久久久久久免费桃花| 欧美一区二区三区四区视频| 成人一道本在线| 亚洲欧美精品午睡沙发| 在线不卡免费欧美| 一本一本大道香蕉久在线精品| 视频一区二区三区在线| 亚洲国产电影在线观看| 国产视频视频一区| 日韩丝袜情趣美女图片| 欧美影视一区二区三区| 欧美专区日韩专区| 91麻豆蜜桃一区二区三区| 色综合视频一区二区三区高清| 男人操女人的视频在线观看欧美| 国产一区91精品张津瑜| 精品国产在天天线2019| 国产欧美1区2区3区| 久久久五月婷婷| 极品尤物av久久免费看| 亚洲精品一区二区三区四区高清| 国产亚洲一区二区三区在线观看| 91一区二区三区在线观看| 精品免费国产二区三区| 亚洲国产婷婷综合在线精品| 午夜精品久久久久久久久久 | 欧美视频日韩视频在线观看| 精品国精品自拍自在线| 亚洲欧洲日本在线| 在线免费精品视频| 日韩二区三区四区| 制服丝袜成人动漫| 岛国av在线一区| 天天色图综合网| 图片区小说区国产精品视频 | www.66久久| 日本中文字幕不卡| 国产日韩一级二级三级| 国产精品一区二区果冻传媒| 日韩欧美亚洲国产精品字幕久久久| 天堂成人免费av电影一区| 欧美日韩一区高清| 国产乱色国产精品免费视频| 亚洲高清免费在线| 久久久久久麻豆| 欧美在线一区二区| 亚洲午夜影视影院在线观看| 99精品1区2区| 日本一区免费视频| 欧美一区二区三区在线观看| 欧美亚洲丝袜传媒另类| 国内偷窥港台综合视频在线播放| 久久久久成人黄色影片| 国产精品久久久久久久久久免费看 | 欧美日本不卡视频| 日韩美女一区二区三区四区| 国产成人综合自拍| 丝瓜av网站精品一区二区| 欧美系列一区二区| 亚洲视频网在线直播| 国产自产2019最新不卡| 欧美日韩一卡二卡| 日韩高清在线不卡| 中文字幕一区在线观看| 亚洲国产高清在线| 国产精品私人影院| 精品1区2区在线观看| 色天使色偷偷av一区二区| 精品视频在线免费| 制服丝袜激情欧洲亚洲| a亚洲天堂av| 麻豆国产精品视频| 亚洲影视在线观看| 国产精品午夜在线观看| 国产欧美日韩亚州综合| 日韩欧美国产一区二区在线播放| 色婷婷综合久久久中文一区二区| 欧美午夜在线观看| 成人精品gif动图一区| 亚洲激情校园春色| 国产欧美一区二区精品忘忧草| 国产91精品一区二区麻豆网站| 亚洲精品一区二区三区精华液| 国产主播一区二区| 国产风韵犹存在线视精品| 国产成人亚洲综合a∨猫咪| 九色综合国产一区二区三区| 国产精品丝袜久久久久久app| 久久久精品国产99久久精品芒果| 91精品在线免费| 91国偷自产一区二区开放时间 | 亚洲四区在线观看| 91精品国产一区二区三区| 国产三级三级三级精品8ⅰ区| 日韩一级免费一区| 在线观看欧美黄色| 一本色道亚洲精品aⅴ| 欧美巨大另类极品videosbest | 国产精品精品国产色婷婷| 欧美日韩精品是欧美日韩精品| 久久众筹精品私拍模特| 国产精品理论片在线观看| 午夜激情一区二区三区| 成人综合在线观看| 精品国产99国产精品| 亚洲va天堂va国产va久| 亚洲成人动漫在线免费观看| 国产成人av一区二区三区在线| 欧美日韩综合不卡| 久久精品国产久精国产| 91高清视频在线| 欧美一级欧美三级在线观看| 日韩一区二区在线播放| 91麻豆123| 欧美肥大bbwbbw高潮| 91精品国产黑色紧身裤美女| 精品国产成人系列| 中文子幕无线码一区tr| 中文在线一区二区| 偷窥少妇高潮呻吟av久久免费| 日韩国产精品大片| 成人福利视频网站| 欧美中文字幕亚洲一区二区va在线| 欧日韩精品视频| 久久精品视频免费观看| 亚洲午夜电影网| 国产精品白丝jk白祙喷水网站| 色婷婷综合久久久中文字幕| 日韩亚洲欧美在线| 亚洲啪啪综合av一区二区三区| 日本麻豆一区二区三区视频| 欧美三级在线播放| 欧美国产欧美综合| 蜜桃视频在线观看一区二区| 欧美在线你懂的| 人人狠狠综合久久亚洲| 国产**成人网毛片九色| 欧美日韩aaa| 色综合久久88色综合天天 | 99热在这里有精品免费| 日韩色视频在线观看| 日本伊人色综合网| 久久亚洲春色中文字幕久久久| 精品一区二区成人精品| 精品日韩欧美在线| 国内一区二区视频| 国产精品日日摸夜夜摸av| 国产精品一区二区三区网站| 国产精品午夜电影| www.亚洲色图| 亚洲主播在线观看| 久久色.com| jlzzjlzz欧美大全| 午夜精品久久久久久| 欧美日韩在线播放三区| 国产精品一品视频| 一区二区三区日韩欧美精品| 欧美日韩国产片| 岛国av在线一区| 免费视频一区二区| 午夜欧美在线一二页| 亚洲国产精品t66y| 国产色综合一区| 精品噜噜噜噜久久久久久久久试看| 一本色道久久加勒比精品| 国产成人精品www牛牛影视| 高清shemale亚洲人妖| 久久狠狠亚洲综合| 欧美日韩色一区| 在线观看区一区二| 亚洲图片激情小说| 精品精品国产高清a毛片牛牛 | 亚洲妇女屁股眼交7| 亚洲色图视频网站| 国产精品久久久久一区| 国产精品久久久久影院色老大| 2021国产精品久久精品| 精品久久久久久久一区二区蜜臀| 欧美日韩视频第一区| 在线国产亚洲欧美| 91免费版pro下载短视频| 国产成人夜色高潮福利影视| 国产成人精品免费| 国产成人在线视频网址| 国产精品1024| 国产成人亚洲综合a∨婷婷| 国产.欧美.日韩| 91网站在线播放|