99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

ECE 498代寫、代做Python設(shè)計(jì)編程
ECE 498代寫、代做Python設(shè)計(jì)編程

時(shí)間:2024-11-15  來(lái)源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機(jī)打開當(dāng)前頁(yè)
  • 上一篇:IEMS5731代做、代寫java設(shè)計(jì)編程
  • 下一篇:ENGG1110代做、R編程語(yǔ)言代寫
  • 無(wú)相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評(píng)軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務(wù)-企業(yè)/產(chǎn)品研發(fā)/客戶要求/設(shè)計(jì)優(yōu)化
    有限元分析 CAE仿真分析服務(wù)-企業(yè)/產(chǎn)品研發(fā)
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    出評(píng) 開團(tuán)工具
    出評(píng) 開團(tuán)工具
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    海信羅馬假日洗衣機(jī)亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機(jī)亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機(jī)場(chǎng)巴士4號(hào)線
    合肥機(jī)場(chǎng)巴士4號(hào)線
    合肥機(jī)場(chǎng)巴士3號(hào)線
    合肥機(jī)場(chǎng)巴士3號(hào)線
  • 短信驗(yàn)證碼 目錄網(wǎng) 排行網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                国产欧美日韩久久| 精品少妇一区二区三区在线播放| 久久精品国产网站| 亚洲国产精品精华液网站| 中文字幕不卡在线播放| 亚洲国产电影在线观看| 久久久久久久精| 久久久久久久久久久久久夜| 日韩精品一区二区在线| 日韩一区二区视频在线观看| 在线播放中文一区| 日韩一区和二区| 欧美刺激脚交jootjob| 欧美第一区第二区| 精品福利二区三区| 国产欧美日本一区二区三区| 久久女同性恋中文字幕| 欧美极品另类videosde| 日韩一区中文字幕| 亚洲婷婷在线视频| 亚洲一区二区三区四区在线观看 | 亚洲精品日产精品乱码不卡| 国产精品成人免费在线| 国产精品久久看| 一区二区三区在线免费| 亚洲国产欧美一区二区三区丁香婷| 午夜精品在线看| 久久爱另类一区二区小说| 国产不卡视频一区| 一本一道综合狠狠老| 欧美日韩免费视频| 久久亚洲精品国产精品紫薇| 国产日产精品一区| 一片黄亚洲嫩模| 蜜臀99久久精品久久久久久软件| 国产精品 日产精品 欧美精品| 色美美综合视频| 日韩久久久久久| 亚洲激情一二三区| 美日韩一区二区三区| 成人app下载| 欧美一级国产精品| 亚洲欧洲无码一区二区三区| 亚洲一区二区三区不卡国产欧美| 韩国视频一区二区| 日本伦理一区二区| 精品福利一二区| 三级影片在线观看欧美日韩一区二区 | 一区二区国产盗摄色噜噜| 久久精品国产99国产精品| 丁香天五香天堂综合| 欧美精品在线视频| 中文字幕在线免费不卡| 青青草伊人久久| 欧美午夜片在线观看| 国产精品嫩草影院av蜜臀| 麻豆成人91精品二区三区| 色av一区二区| 国产精品国产自产拍在线| 精品一区二区在线视频| 欧美肥妇毛茸茸| 亚洲精品成人精品456| 国产成人在线视频免费播放| 欧美一级久久久久久久大片| 亚洲伊人色欲综合网| 成人免费视频一区二区| 久久久久久久性| 精品一区二区三区香蕉蜜桃| 在线播放日韩导航| 首页综合国产亚洲丝袜| 色国产精品一区在线观看| 国产精品久久久久一区二区三区 | 另类综合日韩欧美亚洲| 在线不卡的av| 日韩成人av影视| 欧美日韩日本视频| 亚洲成人你懂的| 欧美人狂配大交3d怪物一区| 亚洲一区二区高清| 欧美私人免费视频| 亚洲成人av福利| 欧美日韩国产小视频在线观看| 亚洲精品免费在线播放| 91福利国产精品| 一区二区三区在线观看欧美| 欧美专区在线观看一区| 亚洲欧洲制服丝袜| 欧美日韩三级一区| 麻豆91在线播放| 国产亚洲短视频| 成人午夜看片网址| 亚洲精品国产品国语在线app| 色爱区综合激月婷婷| 亚洲成人激情综合网| 欧美精品v日韩精品v韩国精品v| 午夜精品久久久| 欧美精品一区二区蜜臀亚洲| 国产露脸91国语对白| 中文字幕av免费专区久久| 日本伦理一区二区| 美女爽到高潮91| 国产精品成人免费精品自在线观看 | 国产福利一区二区三区在线视频| 日本一区二区三区在线不卡| 91麻豆.com| 日韩**一区毛片| 欧美国产综合一区二区| 91丨九色丨蝌蚪丨老版| 午夜不卡av免费| 欧美精品一区二区三区在线播放| 丁香天五香天堂综合| 亚洲伦理在线免费看| 欧美一区二区人人喊爽| 成人丝袜高跟foot| 亚洲福利国产精品| 久久精品一二三| 欧美巨大另类极品videosbest| 狠狠色丁香久久婷婷综| 亚洲人成小说网站色在线| 欧美高清hd18日本| 成人精品视频网站| 奇米色一区二区| 最新国产精品久久精品| 日韩欧美一区在线| 91免费版在线| 国产黄色精品视频| 午夜精品久久久| 亚洲美女免费在线| 欧美激情在线一区二区三区| 欧美一级午夜免费电影| 一本色道久久综合亚洲91| 国产激情91久久精品导航 | 国产精品一线二线三线| 日韩精品成人一区二区三区| 最新日韩在线视频| 国产亚洲欧美中文| 欧美大胆人体bbbb| 欧美性猛交xxxx黑人交| 91丨porny丨最新| 风间由美一区二区三区在线观看| 日本在线不卡一区| 亚洲成av人片在www色猫咪| 中文字幕一区av| 亚洲欧美综合色| 国产精品美女久久久久高潮| 日韩精品一区二区三区在线播放| 欧美日韩三级视频| 欧美日韩免费电影| 欧美性猛交xxxxxx富婆| 色偷偷一区二区三区| 色婷婷综合久久久| 91视频免费看| 91亚洲精华国产精华精华液| 99久久精品99国产精品| 成人免费视频网站在线观看| 成人一区二区三区中文字幕| 成人一区二区三区中文字幕| 国产盗摄一区二区| 成人av动漫在线| av激情成人网| 一本色道亚洲精品aⅴ| 欧美视频一区在线| 欧美另类高清zo欧美| 777色狠狠一区二区三区| 91精品黄色片免费大全| 日韩欧美在线综合网| 欧美精品一区在线观看| 国产欧美日本一区二区三区| **欧美大码日韩| 婷婷一区二区三区| 精品一区精品二区高清| 成人免费av网站| 欧美亚洲国产一区二区三区 | 国产欧美日韩精品一区| 亚洲视频你懂的| 亚洲风情在线资源站| 亚洲一区二区三区四区在线免费观看 | 一区二区三区免费看视频| 日日骚欧美日韩| 精品在线一区二区三区| 风流少妇一区二区| 欧洲生活片亚洲生活在线观看| 欧美肥胖老妇做爰| 久久精品一区八戒影视| 亚洲另类中文字| 日韩综合小视频| 国产成人精品影视| 欧美亚洲高清一区二区三区不卡| 在线综合视频播放| 国产精品美女久久福利网站| 亚洲一区二区在线免费看| 麻豆国产精品官网| 91亚洲精华国产精华精华液| 日韩女优av电影| 亚洲精品国产视频| 激情图片小说一区| 日本道色综合久久| 久久伊人蜜桃av一区二区| 亚洲精品国产精华液| 高清不卡在线观看av|