Common Hooks
The most common hooking functions are supported out of the box by Unseal.
Some of the methods can be used directly as a function in a hook, others return such a function and some will return the hook itself. This will be indicated in the docstring.
Saving Outputs
This method can be used directly in the construction of a hook.
- common_hooks.save_output(detach: bool = True) Callable
Basic hooking function for saving the output of a module to the global context object
- Parameters
cpu (bool) – Whether to save the output to cpu.
detach (bool) – Whether to detach the output.
- Returns
Function that saves the output to the context object.
- Return type
Callable
Replacing Activations
This method is a factory and returns a function that can be used in a hook to replace the activation of a layer.
- common_hooks.replace_activation(replacement_tensor: torch.Tensor, tuple_index: Optional[int] = None) Callable
Creates a hook which replaces a module’s activation (output) with a replacement tensor. If there is a dimension mismatch, the replacement tensor is copied along the leading dimensions of the output.
Example: If the activation has shape
(B, T, D)
and replacement tensor has shape(D,)
which you want to plug in at position t in the T dimension for every tensor in the batch, then indices should be:,t,:
.- Parameters
indices (str) – Indices at which to insert the replacement tensor
replacement_tensor (torch.Tensor) – Tensor that is filled in.
tuple_index (int) – Index of the tuple in the output of the module.
- Returns
Function that replaces part of a given tensor with replacement_tensor
- Return type
Callable
Saving Attention
- common_hooks.transformers_get_attention(output_idx: Optional[int] = None) Callable
Creates a hooking function to get the attention patterns of a given layer.
- Parameters
heads (Optional[Union[int, Iterable[int], str]], optional) – The heads for which to save the attention, defaults to None
output_idx (Optional[int], optional) – If the attention module returns a tuple, use this argument to index it, defaults to None
- Returns
func, hooking function that saves attention of the specified heads
- Return type
Callable
Creating an Attention Hook
- common_hooks.create_attention_hook(key: str, output_idx: Optional[int] = None, attn_name: Optional[str] = 'attn', layer_key_prefix: Optional[str] = None, heads: Optional[Union[int, Iterable[int], str]] = None) unseal.hooks.commons.Hook
Creates a hook which saves the attention patterns of a given layer.
- Parameters
layer (int) – The layer to hook.
key (str) – The key to use for saving the attention patterns.
output_idx (Optional[int], optional) – If the module output is a tuple, index it with this. GPT like models need this to be equal to 2, defaults to None
attn_name (Optional[str], optional) – The name of the attention module in the transformer, defaults to ‘attn’
layer_key_prefix (Optional[str], optional) – The prefix in the model structure before the layer idx, e.g. ‘transformer->h’, defaults to None
heads (Optional[Union[int, Iterable[int], str]], optional) – Which heads to save the attention pattern for. Can be int, tuple of ints or string like ‘1:3’, defaults to None
- Returns
Hook which saves the attention patterns
- Return type
Hook
Creating a Logit Hook
- common_hooks.create_logit_hook(model: unseal.hooks.commons.HookedModel, unembedding_key: str, layer_key_prefix: Optional[str] = None, target: Optional[Union[int, List[int]]] = None, position: Optional[Union[int, List[int]]] = None, key: Optional[str] = None, split_heads: Optional[bool] = False, num_heads: Optional[int] = None) unseal.hooks.commons.Hook
Create a hook that saves the logits of a layer’s output. Outputs are saved to save_ctx[key][‘logits’].
- Parameters
layer (int) – The number of the layer
model (HookedModel) – The model.
unembedding_key (str) – The key/name of the embedding matrix, e.g. ‘lm_head’ for causal LM models
layer_key_prefix (str) – The prefix of the key of the layer, e.g. ‘transformer->h’ for GPT like models
target (Union[int, List[int]]) – The target token(s) to extract logits for. Defaults to all tokens.
position (Union[int, List[int]]) – The position for which to extract logits for. Defaults to all positions.
key (str) – The key of the hook. Defaults to {layer}_logits.
split_heads (bool) – Whether to split the heads. Defaults to False.
num_heads (int) – The number of heads to split. Defaults to None.
- Returns
The hook.
- Return type
Hook
GPT _attn
Wrapper
- common_hooks.gpt_attn_wrapper(save_ctx: Dict, c_proj: torch.Tensor, vocab_embedding: torch.Tensor, target_ids: torch.Tensor, batch_size: Optional[int] = None) Tuple[Callable, Callable]
Wraps around the [AttentionBlock]._attn function to save the individual heads’ logits. This is necessary because the individual heads’ logits are not available on a module level and thus not accessible via a hook.
- Parameters
func (Callable) – original _attn function
save_ctx (Dict) – context to which the logits will be saved
c_proj (torch.Tensor) – projection matrix, this is W_O in Anthropic’s terminology
vocab_matrix (torch.Tensor) – vocabulary/embedding matrix, this is W_V in Anthropic’s terminology
target_ids (torch.Tensor) – indices of the target tokens for which the logits are computed
batch_size (Optional[int]) – batch size to reduce memory footprint, defaults to None
- Returns
inner, func, the wrapped function and the original function
- Return type
Tuple[Callable, Callable]