logit_lense module

unseal.logit_lense.generate_logit_lense(model: unseal.hooks.commons.HookedModel, tokenizer: transformers.models.auto.tokenization_auto.AutoTokenizer, sentence: str, layers: Optional[List[int]] = None, ranks: Optional[bool] = False, kl_div: Optional[bool] = False, include_input: Optional[bool] = False, layer_key_prefix: Optional[str] = None)

Generates the necessary data to generate the plots from the logits lense post.

Returns None for ranks and kl_div if not specified.

Parameters

model (HookedModel) – Model that is investigated.
tokenizer (AutoTokenizer) – Tokenizer of the model.
sentence (str) – Sentence to be analyzed.
layers (Optional[List[int]]) – List of layers to be investigated.
ranks (Optional[bool], optional) – Whether to return ranks of the correct token throughout layers, defaults to False
kl_div (Optional[bool], optional) – Whether to return the KL divergence between intermediate probabilities and final output probabilities, defaults to False
include_input (Optional[bool], optional) – Whether to include the immediate logits/ranks/kld after embedding the input, defaults to False
layer_key_prefix (Optional[str], optional) – Prefix for the layer keys, e.g. ‘transformer->h’ for GPT like models, defaults to None

Returns

logits, ranks, kl_div

Return type

Tuple[torch.Tensor]