99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

ECE 498代寫、代做Python設(shè)計(jì)編程
ECE 498代寫、代做Python設(shè)計(jì)編程

時(shí)間:2024-11-15  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:IEMS5731代做、代寫java設(shè)計(jì)編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    出評(píng) 開團(tuán)工具
    出評(píng) 開團(tuán)工具
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    海信羅馬假日洗衣機(jī)亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機(jī)亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士1號(hào)線
    合肥機(jī)場巴士1號(hào)線
  • 短信驗(yàn)證碼 豆包 幣安下載 AI生圖 目錄網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                亚洲成人精品影院| 亚洲精品免费一二三区| 综合欧美亚洲日本| 99久久久无码国产精品| 亚洲色图欧洲色图| 日本道在线观看一区二区| 亚洲一区二区三区四区五区中文 | 国产91精品欧美| 国产精品久久久久久一区二区三区 | 欧美高清性hdvideosex| 蜜桃在线一区二区三区| 国产精品污污网站在线观看| 色噜噜狠狠成人网p站| 免费欧美日韩国产三级电影| 国产三级精品在线| 欧美丰满少妇xxxxx高潮对白| 国精产品一区一区三区mba桃花| 亚洲欧洲综合另类| 久久久九九九九| 91超碰这里只有精品国产| 国产精品亚洲一区二区三区在线| 成人丝袜高跟foot| 午夜精品久久久久久久久久久 | 欧美日韩高清在线| 国产白丝网站精品污在线入口| 亚洲成年人网站在线观看| 国产精品乱码一区二三区小蝌蚪| 欧美精品aⅴ在线视频| 一本色道亚洲精品aⅴ| 国产成人亚洲精品青草天美| 热久久一区二区| 亚洲欧美另类小说| 国产日韩精品久久久| 91精品国产日韩91久久久久久| 一本到三区不卡视频| 国产成人免费视| 久久99国内精品| 免费在线看一区| 五月天精品一区二区三区| 亚洲一卡二卡三卡四卡 | 麻豆精品久久精品色综合| 亚洲黄色尤物视频| 亚洲色图在线看| 国产精品人妖ts系列视频| 精品国产123| 精品sm在线观看| 精品国产乱码久久久久久浪潮| 欧美一级日韩免费不卡| 这里只有精品免费| 欧美成人艳星乳罩| 久久香蕉国产线看观看99| 久久亚洲精精品中文字幕早川悠里| 日韩免费电影一区| 337p粉嫩大胆色噜噜噜噜亚洲| 日韩一区二区三区观看| 日韩精品中文字幕一区二区三区 | 一区二区在线电影| 一区二区三区在线免费视频| 亚洲黄色av一区| 亚洲超碰精品一区二区| 视频一区二区不卡| 精品影视av免费| 久久精品国产亚洲a| 亚洲视频你懂的| 亚洲一二三区视频在线观看| 五月天网站亚洲| 久久99精品国产.久久久久| 国产成人精品免费| 色偷偷久久人人79超碰人人澡 | 在线中文字幕一区二区| 色综合久久中文综合久久97| 欧美日韩一区二区欧美激情 | 精品毛片乱码1区2区3区| 久久网站最新地址| 日本一区二区不卡视频| 亚洲精品成a人| 老鸭窝一区二区久久精品| 国产成人高清在线| 欧美三级电影精品| 国产偷国产偷亚洲高清人白洁| 综合久久给合久久狠狠狠97色| 亚洲最快最全在线视频| 久久99热狠狠色一区二区| 一本一道综合狠狠老| 精品国产乱码久久| 亚洲福利一区二区三区| 国产精品亚洲视频| 欧美中文字幕一区| 国产偷国产偷精品高清尤物| 亚洲国产sm捆绑调教视频 | 成人久久视频在线观看| 91成人免费电影| 久久久久88色偷偷免费| 一区二区三区四区在线免费观看| 久久aⅴ国产欧美74aaa| 欧美嫩在线观看| 久久男人中文字幕资源站| 伊人婷婷欧美激情| 成人影视亚洲图片在线| www亚洲一区| 日韩精品国产精品| 91成人在线精品| 综合久久一区二区三区| 成人激情小说网站| 久久综合色天天久久综合图片| 偷窥国产亚洲免费视频| 99精品在线观看视频| 欧美激情中文字幕| 国产91精品免费| 欧美国产欧美亚州国产日韩mv天天看完整| 久久精品国产一区二区三区免费看| 欧美色爱综合网| 洋洋av久久久久久久一区| 91在线国产观看| 中文字幕一区二区三区色视频| 国产乱国产乱300精品| 日韩欧美一二三四区| 日韩一区欧美二区| 欧美年轻男男videosbes| 香蕉av福利精品导航| 欧美日韩国产影片| 丝袜亚洲精品中文字幕一区| 精品视频123区在线观看| 亚洲精品成人天堂一二三| 91免费观看在线| 最新热久久免费视频| 成人免费视频一区| 国产精品不卡在线观看| 色综合天天在线| 亚洲乱码国产乱码精品精的特点| 一本到不卡免费一区二区| 亚洲高清三级视频| 日韩精品在线一区二区| 国模娜娜一区二区三区| 国产精品萝li| 欧美视频中文字幕| 日韩电影免费在线| 国产亚洲综合av| 99久久精品国产导航| 亚洲一区二区四区蜜桃| 91精品久久久久久久久99蜜臂| 精品一区二区综合| 亚洲人成网站在线| 欧美日韩国产小视频在线观看| 精品一区二区在线免费观看| 亚洲欧洲精品一区二区精品久久久| 色香色香欲天天天影视综合网| 日韩黄色一级片| 欧美激情中文字幕| 欧美日韩一二区| 国产剧情在线观看一区二区| 国产精品福利在线播放| 欧美色图一区二区三区| 国产在线不卡视频| 亚洲激情图片小说视频| 精品国产91久久久久久久妲己| av亚洲精华国产精华精华| 午夜a成v人精品| 国产精品免费网站在线观看| 欧美美女黄视频| 成人av中文字幕| 蜜臀99久久精品久久久久久软件| 亚洲视频一区二区免费在线观看| 日韩视频免费观看高清完整版在线观看 | 中文字幕亚洲视频| 欧美一区二区网站| 99久久精品国产一区| 国产一区二区三区视频在线播放| 亚洲国产中文字幕| 国产精品久久毛片av大全日韩| 欧美成人精精品一区二区频| 色欧美日韩亚洲| eeuss鲁片一区二区三区在线观看| 精品一区二区三区不卡| 亚洲狠狠爱一区二区三区| 国产精品伦理一区二区| 久久婷婷国产综合国色天香| 91精品一区二区三区久久久久久 | 亚洲三级免费电影| 精品美女一区二区| 91精品国产色综合久久ai换脸| 色婷婷亚洲精品| 成人一级视频在线观看| 国产一区二区电影| 蜜桃精品视频在线观看| 亚洲成人tv网| 亚洲成人精品一区二区| 亚洲国产精品一区二区久久恐怖片 | 一个色妞综合视频在线观看| 欧美高清在线视频| 中文字幕二三区不卡| 国产视频911| 国产精品你懂的| 国产精品二三区| 中文字幕一区二区5566日韩| 国产精品私人影院| 中文字幕在线一区| 中文字幕一区免费在线观看| 亚洲欧洲精品成人久久奇米网| 亚洲蜜臀av乱码久久精品|