6 min read

Getting Started with Hugging Face

Getting Started with Hugging Face

Here is how I explain Hugging Face (https://huggingface.co) when friends ask. It is the open library and social hub for modern AI. You get a giant catalog of models and datasets that you can pull into your code with a couple of imports, plus a place to publish and version the stuff you build. Why it matters is the network effect. A standard way to package models, a standard API to run them, clear model cards, reproducibility with commit hashes, and a sharing flow that feels like Git for ML. The result is faster experiments, clearer governance, and a portfolio you can point to when someone asks what you have shipped.

Let’s get a hands on feel on a Mac using Terminal. First sign in locally so you can pull and later push. Create a Free Hugging Face account, then in Terminal run:

brew install git-lfs
python3 -m pip install --upgrade pip
pip install "huggingface_hub[cli]" transformers datasets torch accelerate safetensors diffusers

huggingface-cli login
...
 To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
The token `Personal` has been saved to /Users/doronkatz/.cache/huggingface/stored_tokens
Your token has been saved in your configured git credential helpers (osxkeychain).
Your token has been saved to /Users/doronkatz/.cache/huggingface/token
Login successful.
The current active token is: `Personal`

Now we are going to try a library to convert text-to-image using the diffusers library where you can generate images from text prompts using models like Stable Diffusion. I used this model: https://huggingface.co/CompVis/stable-diffusion-v1-4

It feels magical because you just type a description, and a model paints it for you. Create a new python file, titled cool_t2i_share.py.

import os
import argparse
import time
from pathlib import Path

import torch
from diffusers import AutoPipelineForText2Image

def slugify(text):
    """Convert text to a filename-safe string."""
    return "".join(c if c.isalnum() or c in "._-" else "_" for c in text)[:80]

def load_pipeline(model_id, device, torch_dtype):
    """Load and configure the text-to-image pipeline."""
    pipe = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch_dtype)
    if device == "cuda":
        pipe = pipe.to("cuda")
        pipe.enable_attention_slicing()
        try:
            pipe.enable_xformers_memory_efficient_attention()
        except Exception:
            pass
    return pipe

def generate(pipe, prompt, steps, guidance, width, height, seed):
    """Generate an image using the pipeline."""
    generator = None
    if seed is not None:
        generator = torch.Generator(device=pipe.device.type).manual_seed(int(seed))
    result = pipe(
        prompt=prompt,
        num_inference_steps=int(steps),
        guidance_scale=float(guidance),
        width=int(width),
        height=int(height),
        generator=generator,
    )
    if hasattr(result, "nsfw_content_detected") and result.nsfw_content_detected:
        print("Warning: NSFW content was detected by the safety checker")
    return result.images[0]

def main():
    parser = argparse.ArgumentParser(description="Text to image generator (local only)")
    parser.add_argument("--prompt", type=str, required=True, help="Text description of the image")
    parser.add_argument("--model-id", type=str, default="stabilityai/sd-turbo", help="Hugging Face model ID")
    parser.add_argument("--num-inference-steps", type=int, default=4, help="Number of denoising steps")
    parser.add_argument("--guidance-scale", type=float, default=0.0, help="Guidance scale")
    parser.add_argument("--width", type=int, default=512, help="Image width")
    parser.add_argument("--height", type=int, default=512, help="Image height")
    parser.add_argument("--seed", type=int, default=None, help="Random seed")
    parser.add_argument("--output-dir", type=str, default="outputs", help="Output directory")
    args = parser.parse_args()

    if args.width % 8 != 0 or args.height % 8 != 0:
        raise ValueError("Width and height must be multiples of 8")

    device = "cuda" if torch.cuda.is_available() else "cpu"
    dtype = torch.float16 if device == "cuda" else torch.float32
    
    print(f"Using device: {device}")
    print(f"Loading model: {args.model_id}")
    
    pipe = load_pipeline(args.model_id, device, dtype)

    print(f"Generating image for prompt: '{args.prompt}'")
    start_time = time.time()
    image = generate(
        pipe=pipe,
        prompt=args.prompt,
        steps=args.num_inference_steps,
        guidance=args.guidance_scale,
        width=args.width,
        height=args.height,
        seed=args.seed,
    )
    generation_time = time.time() - start_time

    # Create timestamped output directory
    ts = time.strftime("%Y%m%d-%H%M%S")
    run_dir = Path(args.output_dir) / f"{ts}"
    run_dir.mkdir(parents=True, exist_ok=True)
    
    # Generate filename
    name = f"{slugify(args.prompt)}.png" if args.prompt else f"image_{ts}.png"
    out_path = run_dir / name
    image.save(out_path)
    print(f"Saved image to {out_path}")
    print(f"Generation completed in {generation_time:.2f} seconds")

if __name__ == "__main__":
    main()

We then run it, by entering: python3 cool_t2i_share.py --prompt "YOUR_PROMPT". This will download the Stable Diffusion model the first time (a few GB) and then produce an image from your prompt. You can swap out the prompt to anything — “a toy robot surfing a wave,” whatever comes to mind.

I tried the following: an impressionist painting of Seattle. That yielded:

Sharing Your Work

Finally, let's share what we did by pushing your output and code to Hugging Face so others can see and reuse it. You’ll need to set a token once, run one script, and end up with a Hub repo that has your image, the script, and a nice README.

pip install huggingface_hub gradio

# Create a token at https://huggingface.co/settings/tokens with "write" scope
export HF_TOKEN="hf_xxx_your_token_here"
export HF_USERNAME="your-hf-username"   

Update your python file as follows:

import os
import argparse
import time
from pathlib import Path

import torch
from diffusers import AutoPipelineForText2Image
from huggingface_hub import HfApi, create_repo

def slugify(text):
    """Convert text to a filename-safe string."""
    return "".join(c if c.isalnum() or c in "._-" else "_" for c in text)[:80]

def load_pipeline(model_id, device, torch_dtype):
    """Load and configure the text-to-image pipeline."""
    pipe = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch_dtype)
    if device == "cuda":
        pipe = pipe.to("cuda")
        pipe.enable_attention_slicing()
        try:
            pipe.enable_xformers_memory_efficient_attention()
        except Exception:
            pass
    return pipe

def generate(pipe, prompt, steps, guidance, width, height, seed):
    """Generate an image using the pipeline."""
    generator = None
    if seed is not None:
        generator = torch.Generator(device=pipe.device.type).manual_seed(int(seed))
    result = pipe(
        prompt=prompt,
        num_inference_steps=int(steps),
        guidance_scale=float(guidance),
        width=int(width),
        height=int(height),
        generator=generator,
    )
    if hasattr(result, "nsfw_content_detected") and result.nsfw_content_detected:
        print("Warning: NSFW content was detected by the safety checker")
    return result.images[0]

def push_folder_to_hub(local_folder, repo_id, repo_type, public):
    """Upload a folder to Hugging Face Hub."""
    token = os.environ.get("HUGGINGFACE_TOKEN") or os.environ.get("HF_TOKEN")
    create_repo(repo_id=repo_id, repo_type=repo_type, private=(not public), exist_ok=True, token=token)
    api = HfApi()
    api.upload_folder(
        folder_path=str(local_folder),
        repo_id=repo_id,
        repo_type=repo_type,
        path_in_repo="",
        commit_message="Upload from cool_t2i_share.py",
        token=token,
    )

def main():
    parser = argparse.ArgumentParser(description="Text to image with optional Hub upload")
    parser.add_argument("--prompt", type=str, required=True, help="Text description of the image")
    parser.add_argument("--model-id", type=str, default="stabilityai/sd-turbo", help="Hugging Face model ID")
    parser.add_argument("--num-inference-steps", type=int, default=4, help="Number of denoising steps")
    parser.add_argument("--guidance-scale", type=float, default=0.0, help="Guidance scale")
    parser.add_argument("--width", type=int, default=512, help="Image width")
    parser.add_argument("--height", type=int, default=512, help="Image height")
    parser.add_argument("--seed", type=int, default=None, help="Random seed")
    parser.add_argument("--output-dir", type=str, default="outputs", help="Output directory")
    parser.add_argument("--push-to-hub", action="store_true", help="Upload to Hugging Face Hub")
    parser.add_argument("--repo-id", type=str, default=None, help="Hub repository ID (USERNAME/repo-name)")
    parser.add_argument("--repo-type", type=str, choices=["model", "dataset"], default="model", help="Repository type")
    parser.add_argument("--public", action="store_true", help="Make the repo public")
    args = parser.parse_args()

    if args.width % 8 != 0 or args.height % 8 != 0:
        raise ValueError("Width and height must be multiples of 8")

    device = "cuda" if torch.cuda.is_available() else "cpu"
    dtype = torch.float16 if device == "cuda" else torch.float32
    
    print(f"Using device: {device}")
    print(f"Loading model: {args.model_id}")
    
    pipe = load_pipeline(args.model_id, device, dtype)

    print(f"Generating image for prompt: '{args.prompt}'")
    image = generate(
        pipe=pipe,
        prompt=args.prompt,
        steps=args.num_inference_steps,
        guidance=args.guidance_scale,
        width=args.width,
        height=args.height,
        seed=args.seed,
    )

    # Create timestamped output directory
    ts = time.strftime("%Y%m%d-%H%M%S")
    run_dir = Path(args.output_dir) / f"{ts}"
    run_dir.mkdir(parents=True, exist_ok=True)
    
    # Generate filename
    name = f"{slugify(args.prompt)}.png" if args.prompt else f"image_{ts}.png"
    out_path = run_dir / name
    image.save(out_path)
    print(f"Saved image to {out_path}")

    # Optionally upload to Hub
    if args.push_to_hub:
        if not args.repo_id:
            raise ValueError("When --push-to-hub is set, you must pass --repo-id USERNAME/repo-name")
        print(f"Pushing folder {run_dir} to {args.repo_id} as {args.repo_type}")
        push_folder_to_hub(run_dir, args.repo_id, args.repo_type, args.public)
        print("Upload completed")

if __name__ == "__main__":
    main()

Let's try running our changes: python3 cool_t2i_share.py --prompt "a serene mountain landscape with a lake". If everything looks good, we can proceed with sharing.

  1. Initialize your project for git (git init) add your files and make an initial commit.
  2. Create a new repository in github and push your code up there. I shared mine here: https://github.com/doronkatz/cool-t2i-share

And that's it, your first foray into Hugging Face.