Result analysis

Usage

The following are some examples on how to use the qad package to train and test quantum models, and reproduce results from the paper. ## Training the unsupervised quantum kernel machine The training and testing of the unsupervised kernel machine is done using the train.py and test.py in scripts/kernel_machines/, respectively. The configuration parameters of the model, e.g., quantum or classical version, feature map, number of training samples, backend used for the quantum computation, etc, are defined through the arguments of the train.py and test.py scripts. For instance, to train the model:

python train.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --unsup --nqubits 8 --feature_map u_dense_encoding --run_type ideal --output_folder quantum_test --nu_param 0.01 --ntrain 600 --quantum

For details regarding different arguments check the documentation: TODO READTHEDOCS. To test the same model:

python test.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --model trained_qsvms/quantum_test_nu\=0.01_ideal/

Producing figures

After the unsuperised quantum and classical kernel machines have been trained and test scores have been saved, one can summarise their performance with a ROC curve plot. Firstly, following our convention the test scores are prepared for plotting using `scripts/kernel_machines/scripts/prepare_plot_scores.py <https://github.com/vbelis/latent-ad-qml/blob/docs-reformat/scripts/kernel_machines/prepare_plot_scores.py>`__, and by running

python prepare_plot_scores.py --classical_folder trained_qsvms/c_test_nu\=0.01/ --quantum_folder trained_qsvms/q_test_nu\=0.01_ideal/ --out_path test_plot --name_suffix n<n_test>_k<k_folds>

Then, we load the score values from the saved files using our convention, e.g. for the case of three different signals, with eight latent dimensions, 600 training datapoints, 100k testing datapoints, and k=5 folds

read_dir='/path/to/data'
n_folds = 5
latent_dim = '8'
n_samples_train=600
mass=['35', '35', '15']
br_na=['NA', '', 'BR'] # narrow (NA) or broad (BR)
signal_name=['RSGraviton_WW', 'AtoHZ_to_ZZZ', 'RSGraviton_WW']
ntest = ['100', '100', '100']

q_loss_qcd=[]; q_loss_sig=[]; c_loss_qcd=[]; c_loss_sig=[]
for i in range(len(signal_name)):
    #if br_na[i]:
    with h5py.File(f'{read_dir}/Latent_{latent_dim}_trainsize_{n_samples_train}_{signal_name[i]}'
                   '{mass[i]}{br_na[i]}_n{ntest[i]}k_kfold{n_folds}.h5', 'r') as file:
        q_loss_qcd.append(file['quantum_loss_qcd'][:])
        q_loss_sig.append(file['quantum_loss_sig'][:])
        c_loss_qcd.append(file['classic_loss_qcd'][:])
        c_loss_sig.append(file['classic_loss_sig'][:])

The final ROC plot, as it appears in the paper in Fig. 3, can be obtained

colors = ['forestgreen', '#EC4E20', 'darkorchid']
legend_signal_names=['Narrow 'r'G $\to$ WW 3.5 TeV', r'A $\to$ HZ $\to$ ZZZ 3.5 TeV', 'Broad 'r'G $\to$ WW 1.5 TeV']
pl.plot_ROC_kfold_mean(q_loss_qcd, q_loss_sig, c_loss_qcd, c_loss_sig, legend_signal_names, n_folds,\
                legend_title=r'Anomaly signature', save_dir='../jupyter_plots', pic_id='test',
                palette=colors, xlabel=r'$TPR$', ylabel=r'$FPR^{-1}$')

Example for the unsupervised kernel machine performance on different anomalies:

Visualization

get_roc_data(qcd: ndarray, bsm: ndarray, fix_tpr: bool = False) → Tuple[ndarray]

Compute roc curves given the background and anomaly datasets.

Parameters:

qcd (np.ndarray) – Background QCD dataset.
bsm (np.ndarray) – Anomaly, Beyond the Standard Model (BSM) dataset.
fix_tpr (bool, optional) – Constant threshold selection for ROC curve calculation, by default False

Returns:

np.ndarray False Positive Rate array. np.ndarray True Positive Rate array.

Return type:

Tuple

get_FPR_for_fixed_TPR(tpr_window: float, fpr_loss: ndarray, tpr_loss: ndarray, tolerance: float) → float

Get FPR for a fixed value of TPR.

Calculation of the ROC curve is in discrete steps. A window of tolerance is defined around the desired TPR working point and the mean of FPR is taken there.

Parameters:

tpr_window (float) – TPR working point, typically 0.6 or 0.8
fpr_loss (np.ndarray) – FPR array of the ROC curve.
tpr_loss (np.ndarray) – TPR array of the ROC curve.
tolerance (float) – Tolerance around working point. 0.1-1% window.

Returns:

Mean FPR at the tolerance window around the TPR working point

Return type:

float

get_mean_and_error(data: ndarray) → Tuple[float]

Compute the mean and std of an array.

Parameters:

data (np.ndarray) – The input array.

Returns:

float:: The mean.
float:: The standard deviation.

Return type:

Tuple

plot_ROC_kfold_mean(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, pic_id: Optional[str] = None, xlabel: str = 'TPR', ylabel: str = '1/FPR', legend_title: str = '$ROC$', save_dir: Optional[str] = None, palette: List[str] = ['#3E96A1', '#EC4E20', '#FF9505'])

Calculate the mean ROC curve and its std uncertainty band.

Using the scores of the the classical and quantum models, the ROC curves are computed for each on of the k-folds. The mean and std is computed, and the ROC mean ROC curve is plotted with its error band. The AUC mean and std is also calculated and presented in the legend of the figure.

Parameters:

quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.
quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.
classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.
classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.
ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.
n_folds (int) – Number of k-folds.
pic_id (str, optional) – Name of the output figure, by default None
xlabel (str, optional) – Label for the x-axis of the figure, by default “TPR”
ylabel (str, optional) – Label for the y-axis of the figure, by default r”1/FPR”
legend_title (str, optional) – Title of the main legend, by default “$”
save_dir (str, optional) – Output directory for the produced figure, by default None
palette (List[str], optional) – Colors for the 3 ROC curves per plot based on the ids, by default [“#3E96A1”, “#EC4E20”, “#FF9505”]

create_table_for_fixed_TPR(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, tpr_windows: List[float] = [0.4, 0.6, 0.8], tolerance: float = 0.01) → DataFrame

Compute mean and std of FPR @FPR working point.

Parameters:

quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.
quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.
classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.
classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.
ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.
n_folds (int) – Number of k-folds.
tpr_windows (List[float]) – TPR working point, by default [0.4, 0.6, 0.8]
tolerance (float) – Tolerance around working point, by default 0.01

Returns:

Latex table of the results.

Return type:

pd.DataFrame