API Reference

CLI

Command-line entry point for ALMA Search.

This module turns user input files into ALMA archive queries, collects the raw matches, applies deduplication and cleaner-row selection, and finally writes the public CSV output.

alma_search.cli.parse_args(argv=None)[source]

Parse command-line arguments for the alma_search executable.

Parameters:: argv (sequence[str] | None, optional) – Optional argument list. When None, argparse reads from the live command line.
Returns:: Parsed options and positional arguments used by main().
Return type:: argparse.Namespace

alma_search.cli.main(argv=None)[source]

Execute the end-to-end ALMA search workflow.

The workflow is:

Parse command-line options.
Load and normalize target coordinates.
Query the ALMA TAP service once per target.
Convert raw archive rows into output rows.
Deduplicate, optionally apply the cleaner filter, and write CSV.

Parameters:: argv (sequence[str] | None, optional) – Optional command-line arguments to parse.
Returns:: Exit status code. 0 means success, 1 means a local validation or file-processing error, and 2 means every remote ALMA query failed so no trustworthy output could be produced.
Return type:: int

Input and Output Helpers

Input parsing and output-table helpers.

This module is responsible for normalizing user-supplied target catalogs, combining and cleaning intermediate result rows, computing the final observed-species flag, and writing the canonical CSV schema.

alma_search.io.get_output_columns(species)[source]

Return the exported column order for a specific observed-species label.

Parameters:: species (Any) – User-supplied species name, for example "CO" or "HCN".
Returns:: Public CSV columns in the exact order used for export, with the final internal flag column renamed to the user-facing label.
Return type:: list[str]

alma_search.io.compute_observed_species_flag(target_lines, distance_arcsec, fov_arcsec, observed_species, observed_distance_threshold_arcsec, observed_fov_threshold_arcsec)[source]

Compute the internal observed-species score for one result row.

The internal score is later converted to a simple Yes or No in the exported CSV. A positive value means the selected species is considered observed for that source. The score values are:

1.0 when the inferred line is present and the ALMA pointing is within the distance threshold.
0.5 when the inferred line is present, the pointing is farther away, but the field of view is large enough to still count as coverage.
0.0 otherwise.

Parameters:

target_lines (Any) – Comma-separated inferred line names for the row.
distance_arcsec (Any) – Angular separation between the input target and ALMA pointing center.
fov_arcsec (Any) – Approximate ALMA field of view in arcseconds.
observed_species (Any) – Species to test, such as "CO" or "HCN".
observed_distance_threshold_arcsec (float) – Distance threshold for a definite match.
observed_fov_threshold_arcsec (float) – FOV threshold for the looser coverage case.

Returns:

Internal score used during post-processing.

Return type:

float

alma_search.io.observed_species_flag_to_label(value)[source]

Convert an internal observed-species score into a CSV label.

Parameters:: value (Any) – Numeric score produced by compute_observed_species_flag().
Returns:: "Yes" when the score is greater than zero, otherwise "No".
Return type:: str

alma_search.io.build_no_match_row(input_name, input_ra_deg, input_dec_deg)[source]

Create a placeholder row for a target with no returned ALMA matches.

Parameters:

input_name (str) – Original target name from the input catalog.
input_ra_deg (float) – Input right ascension in decimal degrees.
input_dec_deg (float) – Input declination in decimal degrees.

Returns:

Output row with coordinate fields filled and science metadata set to missing values.

Return type:

dict[str, Any]

alma_search.io.load_targets_from_table(df)[source]

Normalize a tabular input catalog into Name, ra_deg, dec_deg.

Supported schemas are:

Name, ra_deg, dec_deg for decimal-degree coordinates.
Name, ra, dec for sexagesimal or decimal text coordinates.

Parameters:: df (pandas.DataFrame) – Raw table read from a CSV-like input file.
Returns:: Normalized target table with decimal-degree coordinates.
Return type:: pandas.DataFrame
Raises:: ValueError – If the required columns are missing or any coordinate row cannot be parsed.

alma_search.io.load_targets_from_text(path)[source]

Load targets from a plain-text coordinate list.

Each non-comment line must have the form Name,RA DEC where the coordinate tokens can be decimal degrees or sexagesimal text.

Parameters:: path (str) – Path to the plain-text input file.
Returns:: Normalized target table with columns Name, ra_deg, and dec_deg.
Return type:: pandas.DataFrame
Raises:: ValueError – If any line cannot be parsed into a valid name and coordinate pair.

alma_search.io.load_targets(path, logger=None)[source]

Load a target catalog from CSV-like or plain-text input.

The function first attempts structured CSV parsing. If that fails, it falls back to the plain-text parser used for line-based coordinate lists.

Parameters:

path (str) – Input file path.
logger (Any | None, optional) – Optional logger used to report fallback decisions.

Returns:

Normalized target catalog in decimal degrees.

Return type:

pandas.DataFrame

alma_search.io.combine_arrays(values)[source]

Combine classified array labels into the canonical order.

Parameters:: values (sequence[str]) – Array classifications such as "12m", "7m", or comma-separated combinations from multiple rows.
Returns:: Unique array labels joined in 12m,7m,TP order.
Return type:: str

alma_search.io.combine_bands(values)[source]

Combine ALMA band metadata into a sorted comma-separated string.

Parameters:: values (sequence[Any]) – Raw band_list field values, potentially containing repeated values or multiple delimiters.
Returns:: Unique bands sorted numerically when possible.
Return type:: str

alma_search.io.combine_lines(values)[source]

Combine inferred line labels from multiple rows.

Parameters:: values (sequence[str]) – Comma-separated line lists, often from multiple observations being merged into one output row.
Returns:: Unique line names in first-seen order. Returns "Unknown" only when no explicit line name is available but at least one source row was marked as unknown.
Return type:: str

alma_search.io.blank_string_to_na(value)[source]

Convert blank strings to pandas.NA.

Parameters:: value (Any) – Scalar value to normalize.
Returns:: pandas.NA for empty strings, otherwise the original value.
Return type:: Any

alma_search.io.finalize_results(df, observed_species='CO', observed_distance_threshold_arcsec=30.0, observed_fov_threshold_arcsec=100.0)[source]

Apply final cleanup and derive the observed-species flag column.

Parameters:

df (pandas.DataFrame) – Intermediate result table using the internal schema.
observed_species (Any, optional) – Species used for the final observed-in-ALMA decision.
observed_distance_threshold_arcsec (float, optional) – Distance threshold for a definite observed flag.
observed_fov_threshold_arcsec (float, optional) – FOV threshold for the looser observed flag.

Returns:

Cleaned result table with normalized missing values and a groupwise observed-species score propagated across rows with the same source Name.

Return type:

pandas.DataFrame

alma_search.io.select_cleaner_rows(df, observed_species='CO', max_observed_rows_per_name=5)[source]

Reduce the final table to a smaller human-review subset.

Rules

Keep one unmatched row when a source has no ALMA match.
Keep up to max_observed_rows_per_name closest rows when the selected observed species exists.
Otherwise keep the single closest row for that source.

param df:: Finalized result table.
type df:: pandas.DataFrame
param observed_species:: Species used to decide whether a source has relevant rows.
type observed_species:: Any, optional
param max_observed_rows_per_name:: Maximum number of closest species-matching rows to keep per source.
type max_observed_rows_per_name:: int, optional
returns:: Filtered table sorted by source name and distance.
rtype:: pandas.DataFrame

Parameters:

df (DataFrame)
observed_species (Any)
max_observed_rows_per_name (int)

Return type:

DataFrame

alma_search.io.deduplicate_results(df, dedup_level, observed_species='CO', observed_distance_threshold_arcsec=30.0, observed_fov_threshold_arcsec=100.0)[source]

Deduplicate raw result rows before final export.

Parameters:

df (pandas.DataFrame) – Raw per-observation result rows in the internal schema.
dedup_level (str) – Deduplication mode. Supported values are "none", "project", and "project_target".
observed_species (Any, optional) – Species used for the final observed-species flag.
observed_distance_threshold_arcsec (float, optional) – Distance threshold for a definite observed flag.
observed_fov_threshold_arcsec (float, optional) – FOV threshold for the looser observed flag.

Returns:

Deduplicated and finalized output rows.

Return type:

pandas.DataFrame

Raises:

ValueError – If dedup_level is not one of the supported values.

alma_search.io.write_csv(df, path, observed_species='CO')[source]

Write the public CSV output file.

Parameters:

df (pandas.DataFrame) – Final result table in the internal schema.
path (str) – Destination CSV file path.
observed_species (Any, optional) – Species name used to label the final Observed ... in ALMA? column.

Return type:

None

Archive Query and Search Logic

ALMA archive querying and result-row construction.

This module contains the logic that talks to the ALMA TAP service, interprets frequency metadata, infers likely spectral lines from frequency coverage, and translates archive rows into the package’s internal output schema.

alma_search.search.create_tap_service(tap_url='https://almascience.eso.org/tap')[source]

Create a TAP client for the ALMA science archive.

Parameters:: tap_url (str, optional) – TAP endpoint URL. The default points to the public ALMA archive.
Returns:: pyvo.dal.TAPService instance.
Return type:: Any
Raises:: ImportError – If pyvo is not installed in the current Python environment.

alma_search.search.build_adql_query(ra_deg, dec_deg, radius_deg)[source]

Build the ADQL cone-search query used against ALMA ObsCore.

Parameters:

ra_deg (float) – Cone center right ascension in decimal degrees.
dec_deg (float) – Cone center declination in decimal degrees.
radius_deg (float) – Search radius in decimal degrees.

Returns:

ADQL query string selecting the ObsCore fields needed by this package.

Return type:

str

alma_search.search.query_alma_cone(service, ra_deg, dec_deg, radius_arcmin)[source]

Query the ALMA archive around one target position.

Parameters:

service (Any) – TAP service client, typically created by create_tap_service().
ra_deg (float) – Cone center right ascension in decimal degrees.
dec_deg (float) – Cone center declination in decimal degrees.
radius_arcmin (float) – Cone-search radius in arcminutes.

Returns:

Query results as a pandas table. Returns an empty frame when the query completes successfully but finds no rows.

Return type:

pandas.DataFrame

alma_search.search.parse_frequency_support(frequency_support)[source]

Parse the ALMA frequency_support metadata field into GHz intervals.

The field often contains text fragments like:: [87.30..89.17GHz, …] 1.23456E+11..1.24567E+11Hz 230.1 .. 232.0 GHz U 234.0 .. 236.0 GHz

The parser is intentionally permissive and extracts every interval that looks like number .. number unit.

Parameters:: frequency_support (Any) – Raw ObsCore frequency_support value.
Returns:: List of (low_ghz, high_ghz) intervals. The list is empty when no recognizable interval is present.
Return type:: list[tuple[float, float]]

alma_search.search.coarse_frequency_interval_from_em(em_min, em_max)[source]

Convert wavelength bounds into a coarse frequency interval.

Parameters:

em_min (Any) – Minimum wavelength in meters from ObsCore.
em_max (Any) – Maximum wavelength in meters from ObsCore.

Returns:

Single coarse (low_ghz, high_ghz) interval, or an empty list when the wavelength bounds are unavailable or invalid.

Return type:

list[tuple[float, float]]

alma_search.search.infer_lines(frequency_support, em_min, em_max, line_velocity_tolerance_kms, line_catalog_ghz=None)[source]

Infer likely spectral lines covered by an ALMA observation.

Parameters:

frequency_support (Any) – Raw spectral-coverage metadata string from ObsCore.
em_min (Any) – Minimum wavelength in meters, used as a fallback when frequency_support is absent or unparsable.
em_max (Any) – Maximum wavelength in meters, used alongside em_min.
line_velocity_tolerance_kms (float) – Velocity tolerance used to widen the rest-frequency matching window.
line_catalog_ghz (dict[str, float] | None, optional) – Optional replacement catalog mapping line labels to rest frequencies in GHz.

Returns:

Comma-separated matched line names, or "Unknown" when no reliable coverage interval could be inferred.

Return type:

str

alma_search.search.classify_array(instrument_name)[source]

Classify an ALMA observation using only the instrument name field.

Parameters:: instrument_name (Any) – Raw ALMA instrument metadata.
Returns:: Comma-separated array labels such as "12m" or "7m,TP".
Return type:: str

alma_search.search.classify_array_from_metadata(instrument_name, antenna_arrays)[source]

Classify ALMA array usage from instrument and antenna metadata.

Parameters:

instrument_name (Any) – Raw ObsCore instrument metadata.
antenna_arrays (Any) – Raw antenna-array metadata, often containing antenna IDs that reveal the array type.

Returns:

Canonical array classification assembled from the available metadata.

Return type:

str

alma_search.search.rows_from_query_results(input_name, input_ra_deg, input_dec_deg, query_df, line_velocity_tolerance_kms)[source]

Transform raw ALMA query rows into the package’s internal row schema.

Parameters:

input_name (str) – Source name from the user’s input catalog.
input_ra_deg (float) – Input source right ascension in decimal degrees.
input_dec_deg (float) – Input source declination in decimal degrees.
query_df (pandas.DataFrame) – Raw ALMA query results for this source.
line_velocity_tolerance_kms (float) – Velocity tolerance passed through to infer_lines().

Returns:

One internal output row per valid ALMA archive row.

Return type:

list[dict[str, Any]]

Line Matching

Line-catalog and observed-species helpers.

This module stores the small built-in spectral-line catalog used for coarse coverage inference and provides string-matching helpers for the final observed-species flag.

alma_search.lines.normalize_observed_species_label(species)[source]

Normalize a user-supplied observed-species label.

Parameters:: species (Any) – Raw species name from CLI or API input.
Returns:: Cleaned species label, or the package default when the input is blank.
Return type:: str

alma_search.lines.observed_species_column_name(species)[source]

Return the exported column header for the observed-species flag.

Parameters:: species (Any) – Species label selected by the user.
Returns:: Human-readable column title such as "Observed CO in ALMA?".
Return type:: str

alma_search.lines.extract_line_species_token(line_name)[source]

Extract the species part of a line label.

Parameters:: line_name (str) – Full line label such as "HCN(1-0)".
Returns:: Prefix before the first (, used for species comparison.
Return type:: str

alma_search.lines.normalize_species_token(token)[source]

Normalize a species token for comparison.

Parameters:: token (str) – Raw species token.
Returns:: Upper-case token with whitespace and square brackets removed.
Return type:: str

alma_search.lines.line_matches_observed_species(line_name, observed_species)[source]

Return whether an inferred line should count for the chosen species.

Parameters:

line_name (str) – Inferred line label from the built-in catalog.
observed_species (Any) – User-selected species label.

Returns:

True when the line belongs to the requested species family. The special query CO matches common isotopologues such as 12CO, 13CO, C18O, and C17O.

Return type:

bool

alma_search.lines.has_observed_species_line(target_lines, observed_species)[source]

Test whether a row’s inferred line list includes the chosen species.

Parameters:

target_lines (Any) – Comma-separated inferred line list.
observed_species (Any) – User-selected species label.

Returns:

True when at least one inferred line matches the requested species.

Return type:

bool

Utilities

Shared utility helpers for ALMA archive search workflows.

These helpers are intentionally small and reusable. They normalize missing values, parse coordinate strings, and combine repeated metadata values into stable CSV-friendly text.

alma_search.utils.configure_logging(verbose)[source]

Configure the package-wide logging format and level.

Parameters:: verbose (bool) – When True, enable debug logging. Otherwise use info-level logging.
Return type:: None

alma_search.utils.safe_get(record, key, default='')[source]

Read a dictionary-like value while normalizing null-like entries.

Parameters:

record (dict[str, Any]) – Mapping to read from.
key (str) – Key to retrieve.
default (Any, optional) – Fallback value used when the key is missing or null-like.

Returns:

Stored value or the supplied default.

Return type:

Any

alma_search.utils.is_blank(value)[source]

Return whether a value should be treated as missing text/data.

Parameters:: value (Any) – Value to test.
Returns:: True for None, pandas missing values, and empty strings.
Return type:: bool

alma_search.utils.normalize_whitespace(value)[source]

Collapse repeated whitespace in a scalar value.

Parameters:: value (Any) – Value to normalize.
Returns:: String with internal whitespace collapsed to single spaces, or an empty string when the value is blank.
Return type:: str

alma_search.utils.unique_preserve_order(items)[source]

Return unique items while preserving first-seen order.

Parameters:: items (iterable[str]) – Candidate string values.
Returns:: Non-blank unique values in their original encounter order.
Return type:: list[str]

alma_search.utils.stable_sort_numeric_strings(values)[source]

Sort string values numerically when possible, otherwise lexically.

Parameters:: values (iterable[str]) – String values to sort.
Returns:: Unique values sorted with numeric strings before non-numeric ones.
Return type:: list[str]

alma_search.utils.parse_ra_dec_to_degrees(ra_value, dec_value)[source]

Parse RA and Dec values into decimal degrees.

Parameters:

ra_value (Any) – Right ascension value in decimal degrees or sexagesimal text.
dec_value (Any) – Declination value in decimal degrees or sexagesimal text.

Returns:

Parsed (ra_deg, dec_deg) pair.

Return type:

tuple[float, float]

Raises:

ValueError – If either coordinate is blank or cannot be parsed.

alma_search.utils.format_ra_dec_strings(ra_deg, dec_deg)[source]

Format decimal-degree coordinates as sexagesimal strings.

Parameters:

ra_deg (float) – Right ascension in decimal degrees.
dec_deg (float) – Declination in decimal degrees.

Returns:

(ra_text, dec_text) formatted with colon separators.

Return type:

tuple[str, str]

alma_search.utils.to_optional_float(value, scale=1.0, digits=3)[source]

Convert a scalar to a rounded float when possible.

Parameters:

value (Any) – Input value to convert.
scale (float, optional) – Multiplicative scale factor applied before rounding.
digits (int, optional) – Number of decimal places to keep.

Returns:

Rounded float result, or pandas.NA when conversion fails.

Return type:

float | pandas.NA

alma_search.utils.format_float_text(value, digits=3)[source]

Format a scalar value as compact text for merged CSV fields.

Parameters:

value (Any) – Input scalar value.
digits (int, optional) – Number of decimal places used when formatting numeric values.

Returns:

Blank string for missing input, a cleaned text value for non-numeric input, or a trimmed numeric string.

Return type:

str

alma_search.utils.combine_scalar_values(values, digits=3)[source]

Combine repeated scalar values into a unique CSV-friendly string.

Parameters:

values (sequence[Any]) – Scalar values collected across rows.
digits (int, optional) – Number of decimal places for numeric formatting.

Returns:

Comma-separated unique values, or pandas.NA when nothing usable is available.

Return type:

str | pandas.NA