Result analysis
Usage
The following are some examples on how to use the qad package to
train and test quantum models, and reproduce results from the paper. ##
Training the unsupervised quantum kernel machine The training and
testing of the unsupervised kernel machine is done using the
train.py and test.py in scripts/kernel_machines/,
respectively. The configuration parameters of the model, e.g., quantum
or classical version, feature map, number of training samples, backend
used for the quantum computation, etc, are defined through the arguments
of the train.py and test.py scripts. For instance, to train the
model:
python train.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --unsup --nqubits 8 --feature_map u_dense_encoding --run_type ideal --output_folder quantum_test --nu_param 0.01 --ntrain 600 --quantum
For details regarding different arguments check the documentation: TODO READTHEDOCS. To test the same model:
python test.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --model trained_qsvms/quantum_test_nu\=0.01_ideal/
Producing figures
After the unsuperised quantum and classical kernel machines have been
trained and test scores have been saved, one can summarise their
performance with a ROC curve plot. Firstly, following our convention the
test scores are prepared for plotting using
`scripts/kernel_machines/scripts/prepare_plot_scores.py <https://github.com/vbelis/latent-ad-qml/blob/docs-reformat/scripts/kernel_machines/prepare_plot_scores.py>`__,
and by running
python prepare_plot_scores.py --classical_folder trained_qsvms/c_test_nu\=0.01/ --quantum_folder trained_qsvms/q_test_nu\=0.01_ideal/ --out_path test_plot --name_suffix n<n_test>_k<k_folds>
Then, we load the score values from the saved files using our convention, e.g. for the case of three different signals, with eight latent dimensions, 600 training datapoints, 100k testing datapoints, and k=5 folds
read_dir='/path/to/data'
n_folds = 5
latent_dim = '8'
n_samples_train=600
mass=['35', '35', '15']
br_na=['NA', '', 'BR'] # narrow (NA) or broad (BR)
signal_name=['RSGraviton_WW', 'AtoHZ_to_ZZZ', 'RSGraviton_WW']
ntest = ['100', '100', '100']
q_loss_qcd=[]; q_loss_sig=[]; c_loss_qcd=[]; c_loss_sig=[]
for i in range(len(signal_name)):
#if br_na[i]:
with h5py.File(f'{read_dir}/Latent_{latent_dim}_trainsize_{n_samples_train}_{signal_name[i]}'
'{mass[i]}{br_na[i]}_n{ntest[i]}k_kfold{n_folds}.h5', 'r') as file:
q_loss_qcd.append(file['quantum_loss_qcd'][:])
q_loss_sig.append(file['quantum_loss_sig'][:])
c_loss_qcd.append(file['classic_loss_qcd'][:])
c_loss_sig.append(file['classic_loss_sig'][:])
The final ROC plot, as it appears in the paper in Fig. 3, can be obtained
colors = ['forestgreen', '#EC4E20', 'darkorchid']
legend_signal_names=['Narrow 'r'G $\to$ WW 3.5 TeV', r'A $\to$ HZ $\to$ ZZZ 3.5 TeV', 'Broad 'r'G $\to$ WW 1.5 TeV']
pl.plot_ROC_kfold_mean(q_loss_qcd, q_loss_sig, c_loss_qcd, c_loss_sig, legend_signal_names, n_folds,\
legend_title=r'Anomaly signature', save_dir='../jupyter_plots', pic_id='test',
palette=colors, xlabel=r'$TPR$', ylabel=r'$FPR^{-1}$')
Example for the unsupervised kernel machine performance on different anomalies:
Visualization
- get_roc_data(qcd: ndarray, bsm: ndarray, fix_tpr: bool = False) Tuple[ndarray]
Compute roc curves given the background and anomaly datasets.
- Parameters:
qcd (np.ndarray) – Background QCD dataset.
bsm (np.ndarray) – Anomaly, Beyond the Standard Model (BSM) dataset.
fix_tpr (bool, optional) – Constant threshold selection for ROC curve calculation, by default False
- Returns:
np.ndarray False Positive Rate array. np.ndarray True Positive Rate array.
- Return type:
Tuple
- get_FPR_for_fixed_TPR(tpr_window: float, fpr_loss: ndarray, tpr_loss: ndarray, tolerance: float) float
Get FPR for a fixed value of TPR.
Calculation of the ROC curve is in discrete steps. A window of tolerance is defined around the desired TPR working point and the mean of FPR is taken there.
- Parameters:
tpr_window (float) – TPR working point, typically 0.6 or 0.8
fpr_loss (np.ndarray) – FPR array of the ROC curve.
tpr_loss (np.ndarray) – TPR array of the ROC curve.
tolerance (float) – Tolerance around working point. 0.1-1% window.
- Returns:
Mean FPR at the tolerance window around the TPR working point
- Return type:
float
- get_mean_and_error(data: ndarray) Tuple[float]
Compute the mean and std of an array.
- Parameters:
data (np.ndarray) – The input array.
- Returns:
- float:
The mean.
- float:
The standard deviation.
- Return type:
Tuple
- plot_ROC_kfold_mean(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, pic_id: Optional[str] = None, xlabel: str = 'TPR', ylabel: str = '1/FPR', legend_title: str = '$ROC$', save_dir: Optional[str] = None, palette: List[str] = ['#3E96A1', '#EC4E20', '#FF9505'])
Calculate the mean ROC curve and its std uncertainty band.
Using the scores of the the classical and quantum models, the ROC curves are computed for each on of the k-folds. The mean and std is computed, and the ROC mean ROC curve is plotted with its error band. The AUC mean and std is also calculated and presented in the legend of the figure.
- Parameters:
quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.
quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.
classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.
classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.
ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.
n_folds (int) – Number of k-folds.
pic_id (str, optional) – Name of the output figure, by default None
xlabel (str, optional) – Label for the x-axis of the figure, by default “TPR”
ylabel (str, optional) – Label for the y-axis of the figure, by default r”1/FPR”
legend_title (str, optional) – Title of the main legend, by default “$”
save_dir (str, optional) – Output directory for the produced figure, by default None
palette (List[str], optional) – Colors for the 3 ROC curves per plot based on the ids, by default [“#3E96A1”, “#EC4E20”, “#FF9505”]
- create_table_for_fixed_TPR(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, tpr_windows: List[float] = [0.4, 0.6, 0.8], tolerance: float = 0.01) DataFrame
Compute mean and std of FPR @FPR working point.
- Parameters:
quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.
quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.
classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.
classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.
ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.
n_folds (int) – Number of k-folds.
tpr_windows (List[float]) – TPR working point, by default [0.4, 0.6, 0.8]
tolerance (float) – Tolerance around working point, by default 0.01
- Returns:
Latex table of the results.
- Return type:
pd.DataFrame