Set-of-Mark
Reference for the current version of the Set-of-Mark library.
Overview
The SOM library provides visual element detection and interaction capabilities. It is based on the Set-of-Mark research paper and the OmniParser model.
API Documentation
OmniParser Class
class OmniParser:
def __init__(self, device: str = "auto"):
"""Initialize the parser with automatic device detection"""
def parse(
self,
image: PIL.Image,
box_threshold: float = 0.3,
iou_threshold: float = 0.1,
use_ocr: bool = True,
ocr_engine: str = "easyocr"
) -> ParseResult:
"""Parse UI elements from an image"""
ParseResult Object
@dataclass
class ParseResult:
elements: List[UIElement] # Detected elements
visualized_image: PIL.Image # Annotated image
processing_time: float # Time in seconds
def to_dict(self) -> dict:
"""Convert to JSON-serializable dictionary"""
def filter_by_type(self, elem_type: str) -> List[UIElement]:
"""Filter elements by type ('icon' or 'text')"""
UIElement
class UIElement(BaseModel):
id: Optional[int] = Field(None) # Element ID (1-indexed)
type: Literal["icon", "text"] # Element type
bbox: BoundingBox # Bounding box coordinates { x1, y1, x2, y2 }
interactivity: bool = Field(default=False) # Whether the element is interactive
confidence: float = Field(default=1.0) # Detection confidence