Skip to content

Basic Components API Reference

This document provides detailed documentation for basic components in Torch-RecHub, including feature processing, data transformation, and other fundamental functionalities.

Feature Processing

Feature Columns

DenseFeature

  • Introduction: Process continuous numerical features.
  • Parameters:
    • name (str): Feature name
    • dimension (int): Feature dimension
    • dtype (str): Data type, default 'float32'

SparseFeature

  • Introduction: Process discrete categorical features.
  • Parameters:
    • name (str): Feature name
    • vocabulary_size (int): Size of category vocabulary
    • embedding_dim (int): Embedding vector dimension
    • dtype (str): Data type, default 'int32'
    • embedding_name (str): Embedding layer name, default None

VarLenSparseFeature

  • Introduction: Process variable-length discrete features.
  • Parameters:
    • name (str): Feature name
    • vocabulary_size (int): Size of category vocabulary
    • embedding_dim (int): Embedding vector dimension
    • maxlen (int): Maximum sequence length
    • dtype (str): Data type, default 'int32'
    • embedding_name (str): Embedding layer name, default None
    • combiner (str): Sequence pooling method, options: 'sum', 'mean', 'max', default 'mean'

Data Transformation

Data Preprocessing

MinMaxScaler

  • Introduction: Normalize numerical features.
  • Parameters:
    • feature_range (tuple): Normalization range, default (0, 1)

StandardScaler

  • Introduction: Standardize numerical features.
  • Parameters:
    • with_mean (bool): Whether to remove mean, default True
    • with_std (bool): Whether to scale by standard deviation, default True

LabelEncoder

  • Introduction: Encode categorical features.
  • Methods:
    • fit(values): Fit the encoder
    • transform(values): Transform data
    • fit_transform(values): Fit and transform

Data Format Conversion

pandas_to_torch

  • Introduction: Convert Pandas data to PyTorch tensors.
  • Parameters:
    • df (pd.DataFrame): Input DataFrame
    • dense_cols (list): List of continuous feature column names
    • sparse_cols (list): List of discrete feature column names
    • device (str): Device type, 'cpu' or 'cuda'

numpy_to_torch

  • Introduction: Convert NumPy arrays to PyTorch tensors.
  • Parameters:
    • arrays (list): List of NumPy arrays
    • device (str): Device type, 'cpu' or 'cuda'

Model Components

Activation Functions

Dice

  • Introduction: Dice activation function, proposed in Deep Interest Network (DIN).
  • Parameters:
    • epsilon (float): Smoothing parameter, default 1e-3
    • device (str): Device type, default 'cpu'

Attention Mechanisms

ScaledDotProductAttention

  • Introduction: Scaled dot-product attention mechanism.
  • Parameters:
    • temperature (float): Temperature parameter for scaling
    • attn_dropout (float): Attention dropout rate

MultiHeadAttention

  • Introduction: Multi-head attention mechanism.
  • Parameters:
    • d_model (int): Model dimension
    • n_heads (int): Number of attention heads
    • d_k (int): Key vector dimension
    • d_v (int): Value vector dimension
    • dropout (float): Dropout rate