phoenix.Inferences
Parameters
- dataframe (pandas.DataFrame): The data to be analyzed or compared.
- schema (Schema): A schema that assigns the columns of the dataframe to the appropriate model dimensions (features, predictions, actuals, etc.).
- name (Optional[str]): The name used to identify the inferences in the application. If not provided, a random name will be generated.
Attributes
- dataframe (pandas.DataFrame): The pandas dataframe of the inferences.
- schema (Schema): The schema of the inferences.
- name (str): The name of the inferences.
The input dataframe and schema are lightly processed during inference initialization and are not necessarily identical to the corresponding
dataframe and schema attributes.Usage
Define inferencesds from a pandas dataframe df and a schema object schema by running
ds is then passed as the primary or reference argument to launch_app.
phoenix.Schema
Parameters
- prediction_id_column_name (Optional[str]): The name of the dataframe’s prediction ID column, if one exists. Prediction IDs are strings that uniquely identify each record in Phoenix inferences (equivalently, each row in the dataframe). If no prediction ID column name is provided, Phoenix will automatically generate unique UUIDs for each record of the inferences upon Inferences initialization.
-
timestamp_column_name (Optional[str]): The name of the dataframe’s timestamp column, if one exists. Timestamp columns must be pandas Series with numeric, datetime or object dtypes.
-
If the timestamp column has numeric dtype (
intorfloat), the entries of the column are interpreted as Unix timestamps, i.e., the number of seconds since midnight on January 1st, 1970. - If the column has datetime dtype and contains timezone-naive timestamps, Phoenix assumes those timestamps belong to the local timezone and converts them to UTC.
- If the column has datetime dtype and contains timezone-aware timestamps, those timestamps are converted to UTC.
- If the column has object dtype having ISO8601 formatted timestamp strings, those entries are converted to datetime dtype UTC timestamps; if timezone-naive then assumed as belonging to local timezone.
- If no timestamp column is provided, each record in the inferences is assigned the current timestamp upon Inferences initialization.
-
If the timestamp column has numeric dtype (
-
feature_column_names (Optional[List[str]]): The names of the dataframe’s feature columns, if any exist. If no feature column names are provided, all dataframe column names that are not included elsewhere in the schema and are not explicitly excluded in
excluded_column_namesare assumed to be features. - tag_column_names (Optional[List[str]]): The names of the dataframe’s tag columns, if any exist. Tags, like features, are attributes that can be used for filtering records of the inferences while using the app. Unlike features, tags are not model inputs and are not used for computing metrics.
- prediction_label_column_name (Optional[str]): The name of the dataframe’s predicted label column, if one exists. Predicted labels are used for classification problems with categorical model output.
- prediction_score_column_name (Optional[str]): The name of the dataframe’s predicted score column, if one exists. Predicted scores are used for regression problems with continuous numerical model output.
- actual_label_column_name (Optional[str]): The name of the dataframe’s actual label column, if one exists. Actual (i.e., ground truth) labels are used for classification problems with categorical model output.
- actual_score_column_name (Optional[str]): The name of the dataframe’s actual score column, if one exists. Actual (i.e., ground truth) scores are used for regression problems with continuous numerical output.
- prompt_column_names (Optional[EmbeddingColumnNames]): An instance of EmbeddingColumnNames delineating the column names of an LLM model’s prompt embedding vector, prompt text, and optionally links to external resources.
- response_column_names (Optional[EmbeddingColumnNames]): An instance of EmbeddingColumnNames delineating the column names of an LLM model’s response embedding vector, response text, and optionally links to external resources.
- embedding_feature_column_names (Optional[Dict[str, EmbeddingColumnNames]]): A dictionary mapping the name of each embedding feature to an instance of EmbeddingColumnNames if any embedding features exist, otherwise, None. Each instance of EmbeddingColumnNames associates one or more dataframe columns containing vector data, image links, or text with the same embedding feature. Note that the keys of the dictionary are user-specified names that appear in the Phoenix UI and do not refer to columns of the dataframe.
-
excluded_column_names (Optional[List[str]]): The names of the dataframe columns to be excluded from the implicitly inferred list of feature column names. This field should only be used for implicit feature discovery, i.e., when
feature_column_namesis unused and the dataframe contains feature columns not explicitly included in the schema.
Usage
See the guide on how to create your own dataset for examples.phoenix.EmbeddingColumnNames
embedding_feature_column_names field of Schema.
[source]
Parameters
- vector_column_name (str): The name of the dataframe column containing the embedding vector data. Each entry in the column must be a list, one-dimensional NumPy array, or pandas Series containing numeric values (floats or ints) and must have equal length to all the other entries in the column.
- raw_data_column_name (Optional[str]): The name of the dataframe column containing the raw text associated with an embedding feature, if such a column exists. This field is used when an embedding feature describes a piece of text, for example, in the context of NLP.
- link_to_data_column_name (Optional[str]): The name of the dataframe column containing links to images associated with an embedding feature, if such a column exists. This field is used when an embedding feature describes an image, for example, in the context of computer vision.
See here for recommendations on handling local image files.
Usage
See the guide on how to create embedding featuresfor examples.phoenix.TraceDataset
TraceDataset is mostly used for loading trace data that has been previously saved to file.
[source]
Parameters
- dataframe (pandas.DataFrame): a dataframe each row of which is a flattened representation of a span. See LLM Traces for more on traces and spans.
- name (str): The name used to identify the dataset in the application. If not provided, a random name will be generated.
Attributes
- dataframe (pandas.DataFrame): a dataframe each row of which is a flattened representation of a span. See LLM Traces for more on traces and spans.
- name (Optional[str]): The name used to identify the dataset in the application.
Usage
The code snippet below shows how to read data from atrace.jsonl file into a TraceDataset, and then pass the dataset to Phoenix through launch_app. Each line of the trace.jsonl file is a JSON string representing a span.

