99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

ECE 498代寫、代做Python設(shè)計(jì)編程
ECE 498代寫、代做Python設(shè)計(jì)編程

時(shí)間:2024-11-15  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:IEMS5731代做、代寫java設(shè)計(jì)編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    出評(píng) 開團(tuán)工具
    出評(píng) 開團(tuán)工具
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    海信羅馬假日洗衣機(jī)亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機(jī)亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士1號(hào)線
    合肥機(jī)場巴士1號(hào)線
  • 短信驗(yàn)證碼 豆包 幣安下載 AI生圖 目錄網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          亚洲国产精品一区二区www在线| 国产精品日韩欧美| 久久久久国产精品www | 中文国产成人精品久久一| 亚洲高清久久| 亚洲国产欧美一区二区三区同亚洲| 狠狠色综合播放一区二区| 国产亚洲福利| 国产区精品在线观看| 国产专区欧美专区| 影音先锋亚洲一区| 一区二区三区在线免费观看| 国产精品日韩欧美一区二区三区| 欧美日韩一区三区| 国产精品伦理| 国产一区二区三区精品欧美日韩一区二区三区| 国产亚洲亚洲| 亚洲电影激情视频网站| 亚洲国产精品电影在线观看| 亚洲日本欧美日韩高观看| 一区二区国产日产| 欧美一区二区三区免费看| 久久香蕉国产线看观看av| 欧美男人的天堂| 国产日韩欧美a| 亚洲成色777777女色窝| av成人毛片| 欧美在线视频a| 欧美国产综合视频| 国产乱码精品一区二区三区不卡| 一区精品久久| 亚洲夜间福利| 免费观看久久久4p| 国产精品成人免费| 亚洲国产一区二区三区青草影视 | 久久久久五月天| 欧美激情国产精品| 国产精品不卡在线| 在线成人av| 亚洲欧美一区二区精品久久久| 久久久久久久久久看片| 欧美日韩性视频在线| 一区二区在线不卡| 午夜综合激情| 欧美日韩一区高清| 亚洲国产精品悠悠久久琪琪| 欧美一区二视频在线免费观看| 欧美人妖另类| 精品电影在线观看| 欧美在线视频免费| 国产精品美女久久久久aⅴ国产馆| 在线成人激情视频| 久久黄色小说| 国产伦精品一区二区三区四区免费 | 午夜精品福利一区二区三区av | 免费亚洲电影在线观看| 国产欧美日本一区二区三区| 99在线观看免费视频精品观看| 久久久九九九九| 国产女人aaa级久久久级| 亚洲乱码国产乱码精品精可以看 | 欧美大片一区二区| 亚洲国产高潮在线观看| 久久精品道一区二区三区| 国产精品福利在线观看| 亚洲另类自拍| 欧美日韩成人综合在线一区二区 | 亚洲电影观看| 香蕉视频成人在线观看| 国产精品久久久久永久免费观看| 一区二区高清视频| 欧美日韩在线一区二区| 99热精品在线| 欧美偷拍一区二区| 亚洲图片在线观看| 国产精品久久久久天堂| 亚洲欧美在线网| 国产女人精品视频| 亚洲欧美日韩在线不卡| 国产美女诱惑一区二区| 欧美亚洲综合另类| 国外成人免费视频| 美女脱光内衣内裤视频久久网站| 亚洲丁香婷深爱综合| 免费久久精品视频| 亚洲高清影视| 久久久免费精品视频| 影音先锋亚洲电影| 欧美国产一区在线| 亚洲小说欧美另类婷婷| 国产女人水真多18毛片18精品视频| 久久超碰97中文字幕| 国内精品久久久久久| 你懂的国产精品永久在线| 一区二区三区精品在线| 国产欧美一区二区三区视频| 久久亚洲综合色| 激情五月婷婷综合| 久久免费偷拍视频| 亚洲欧洲日韩在线| 欧美性猛交xxxx乱大交退制版| 午夜视频在线观看一区二区| 激情婷婷久久| 欧美天天视频| 久久琪琪电影院| 这里只有精品电影| 黄色综合网站| 国产精品爱啪在线线免费观看| 欧美一站二站| 99精品国产高清一区二区| 国产视频不卡| 欧美视频一区二区三区…| 久久婷婷亚洲| 亚洲欧美伊人| 一本色道久久| 国产一区二区三区不卡在线观看| 欧美老女人xx| 久久久久久自在自线| 99视频一区二区| 国产主播喷水一区二区| 欧美日本一道本| 欧美中在线观看| 亚洲一区二区三区四区中文 | 欧美视频在线观看| 久久婷婷av| 欧美在线精品一区| 中日韩高清电影网| 亚洲每日在线| 亚洲国产网站| 在线观看一区视频| 国产麻豆精品视频| 欧美日韩一区二区在线播放| 欧美国产日本韩| 老司机久久99久久精品播放免费| 欧美一区三区二区在线观看| 亚洲欧美日韩高清| 9i看片成人免费高清| 亚洲精品日韩久久| 国产午夜精品全部视频在线播放| 欧美日韩国产电影| 免费不卡中文字幕视频| 亚洲欧美影音先锋| 亚洲视频一二区| 99亚洲视频| 亚洲高清色综合| 一区二区三区在线高清| 国产美女一区二区| 国产性天天综合网| 国产日韩欧美综合在线| 国产精品影院在线观看| 国产农村妇女精品一区二区| 国产伦精品一区二区三区高清版| 国产精品区一区二区三| 国产精品区一区| 国产一区二区三区久久悠悠色av| 国产人成精品一区二区三| 国产一区二区久久| 在线精品亚洲| 91久久久精品| 亚洲视频导航| 欧美在线91| 欧美va日韩va| 国产精品久久二区二区| 国产亚洲在线| 国外成人在线视频| 欧美日韩调教| 国产精品手机视频| 黄色一区二区在线观看| 国产精品一区久久久久| 黄色精品一区| 亚洲精品美女91| 亚洲免费观看高清完整版在线观看| 亚洲电影激情视频网站| 亚洲九九精品| 亚洲亚洲精品在线观看 | 国产精品视频久久| 国产一区二区精品久久99| 亚洲国产精品久久久久秋霞不卡| 亚洲人成小说网站色在线| 99精品免费视频| 久久精品国产精品亚洲精品| 欧美视频在线播放| 亚洲第一精品夜夜躁人人躁| 亚洲一区精品在线| 欧美日韩1080p| 亚洲第一中文字幕| 久久er精品视频| 国产精品午夜在线| 99国产精品久久久久久久成人热| 久久精品人人做人人综合| 国产精品女同互慰在线看| 日韩视频在线一区二区三区| 久久一本综合频道| 国产视频久久久久| 亚洲欧美日韩国产综合在线| 欧美精品一区二区精品网| 亚洲第一成人在线| 久久久久久久网| 极品av少妇一区二区| 久久婷婷综合激情|