{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a0047772",
   "metadata": {},
   "source": [
    "# Profiling and Speeding Up Python Code \n",
    "*Python Meeting @IA*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51b5f8d4",
   "metadata": {},
   "source": [
    "## Why Profiling?\n",
    "\n",
    "- Sometimes code feels *slow*, but it's not obvious why.  \n",
    "- Guessing is dangerous — we should **measure first**.  \n",
    "- Profiling tools show us **where the time goes**, so we can fix the real bottlenecks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae876646",
   "metadata": {},
   "source": [
    "## Timing Small Pieces of Code\n",
    "\n",
    "- For quick checks, we can use:\n",
    "  - `time` (very rough timing)  \n",
    "  - `timeit` (more precise, runs multiple times)  \n",
    "\n",
    "This is useful for comparing short code snippets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "630bd10a-c362-42b8-81cc-49cee51f9580",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "499999500000\n",
      "The sumation using method 1 took 0.03472304344177246 s\n"
     ]
    }
   ],
   "source": [
    "import time \n",
    "\n",
    "def slow_sum():\n",
    "    total = 0\n",
    "    for i in range(1000000):\n",
    "        total += i\n",
    "    return total\n",
    "\n",
    "# Now lets measure how long this method takes\n",
    "\n",
    "start = time.time()\n",
    "print(slow_sum())\n",
    "end = time.time()\n",
    "\n",
    "print(\"The sumation using method 1 took {} s\".format(end - start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "bf45007b-6273-4ccd-8f1d-97550accbd1e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "499999500000\n",
      "The sumation using method 2 took 0.0011777877807617188 s\n"
     ]
    }
   ],
   "source": [
    "import numpy as np \n",
    "\n",
    "def fast_sum():\n",
    "    return np.arange(1000000).sum()\n",
    "\n",
    "# Now lets measure how long this method takes\n",
    "\n",
    "start = time.time()\n",
    "print(fast_sum())\n",
    "end = time.time()\n",
    "\n",
    "print(\"The sumation using method 2 took {} s\".format(end - start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d5e387ca-f1f2-41ef-b77c-2ab3a9591f5f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Timing method 1 with timeit:\n",
      "0.04047145799268037 seconds\n",
      "\n",
      "Timing method 2 with timeit:\n",
      "0.0016550000000279397 seconds\n"
     ]
    }
   ],
   "source": [
    "import timeit\n",
    "\n",
    "print(\"Timing method 1 with timeit:\")\n",
    "print(timeit.timeit(slow_sum, number=1), \"seconds\")\n",
    "\n",
    "print(\"\\nTiming method 2 with timeit:\")\n",
    "print(timeit.timeit(fast_sum, number=1), \"seconds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1435f717-421f-42e9-be92-a916bfda09ab",
   "metadata": {},
   "source": [
    "#### Magic commands\n",
    "\n",
    "Magic commands are special commands in Jupyter Notebook, and they are denoted with either a single % or %%. These magic commands help us to perform a wide range of tasks that go beyond standard Python capabilities. Magic commands are like shortcuts that can make complex operations simple and easy to understand, they make our notebook more productive and efficient.\n",
    "\n",
    "Magic commands in Jupyter Notebook are divided into two categories i.e. Line Magic Commands and Cell Magic Commands. By the names, we can understand that the line magic commands will start with % and the cell magic commands will start with %%.\n",
    "\n",
    "**Line Magic Commands:** These are used to operate on a single line of code. For example: %timeit, %memit, %load, %reset, %who and etc.\n",
    "\n",
    "**Cell Magic Commands:** These affect the entire cell. For example: %%time, %%writefile, %%html, %%latex, %%bash and etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0c550979-0c99-4b3a-b061-8375ca6e558e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 55.4 ms, sys: 2.21 ms, total: 57.6 ms\n",
      "Wall time: 57.7 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "499999500000"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time slow_sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0d7a37a9-5c4b-4c53-8eb5-9ed41e0d6b8e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.65 ms, sys: 4.74 ms, total: 7.39 ms\n",
      "Wall time: 4.92 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "499999500000"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time fast_sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "ad29795a-f40a-469f-998c-178f1635c7a5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "22.8 ms ± 302 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit slow_sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e2cd1e26-6247-4f1a-94be-910f54d20c20",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "828 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit fast_sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "df1a5de6-ca72-4abe-b270-7709129ea79c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "499999500000\n",
      "CPU times: user 87 ms, sys: 1.54 ms, total: 88.6 ms\n",
      "Wall time: 91.8 ms\n"
     ]
    }
   ],
   "source": [
    "%%time \n",
    "\n",
    "# Time method 1 \n",
    "\n",
    "total = 0\n",
    "for i in range(1000000):\n",
    "    total += i\n",
    "print(total)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "30a11474-3117-49cd-87ba-31358cb7717a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "499999500000\n",
      "CPU times: user 3.5 ms, sys: 2.35 ms, total: 5.85 ms\n",
      "Wall time: 7.43 ms\n"
     ]
    }
   ],
   "source": [
    "%%time \n",
    "\n",
    "# Time method 2 \n",
    "\n",
    "print(np.arange(1000000).sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96af6c81-56f8-4460-b228-de8e51b0ee35",
   "metadata": {},
   "source": [
    "## Comparing different implementations of the same code: Which is faster? \n",
    "\n",
    "Now that we know how to time operations we can compare some implememntations of simple tasks. There are a lot of small optimizations that can add up to a lot of time in real-world software. Let's look at a few of the non-obvious ones."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "946c523c-3a49-43ba-af72-2f6eb666d67d",
   "metadata": {},
   "source": [
    "### Joining strings\n",
    "\n",
    "What is the best way to join a bunch of strings into a larger string? There are several ways of doing this, but some are clearly superior to others. Let's use timeit to test things out.\n",
    "\n",
    "We will use three different methods:\n",
    "\n",
    "--Using the builtin + operator to add strings together in an iterative way\n",
    "\n",
    "--Using the join method, as in \"\".join(list).\n",
    "\n",
    "--Iteratively adding the strings from the list together using \"%s %s\" string composition.\n",
    "\n",
    "Guess which method you think will be fastest? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0dc8d1ef-b506-45f2-bb43-be73bb979403",
   "metadata": {},
   "outputs": [],
   "source": [
    "string_list = ['let´s ', 'test ', 'how ', 'fast ', 'all ', 'these', 'methods ', 'work ']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "a454bed8-1471-4d91-bb55-5a41eeb5ee60",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "206 ns ± 0.635 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 1: built in operator \n",
    "\n",
    "output = \"\"\n",
    "for string in string_list:\n",
    "    output+=string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "366e55fe-7707-4e4f-b64f-58da13872c8e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "69.7 ns ± 0.167 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 2: join method \n",
    "\n",
    "\"\".join(string_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ea810c07-7bc4-4638-86f4-b6db2a7c9355",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "504 ns ± 2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 3: iteration\n",
    "\n",
    "output = \"\"\n",
    "for word in string_list:\n",
    "    output = \"%s %s\" % (output, word)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fd65aca-4bd7-4383-9a7c-2c6ef5ef3cef",
   "metadata": {},
   "source": [
    "### Building big lists \n",
    "\n",
    "What about building big lists or list-like structures (like numpy arrays)? We know how to construct lists in a variety of ways, so let's see which is fastest. Let's make a list of ascending perfect squares (i.e. 1, 4, 9, ...) for the first 1 million integers. We will use these methods:\n",
    "\n",
    "--Iteratively appending x**2 values on to an empty list\n",
    "\n",
    "--A for loop with the built in python range command\n",
    "\n",
    "--A for loop with the numpy arange command\n",
    "\n",
    "--Use the numpy arange command directly, and then take the square of it\n",
    "\n",
    "--Use map to map a lambda squaring function to a numpy array constructed with numpy arange\n",
    "\n",
    "Which method do you think will be fastest?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "62a556fa-4987-43f2-88bc-0c563f79d8ec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "37.4 ms ± 605 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 1: \n",
    "\n",
    "output = []\n",
    "for x in range(1000000): output.append(x**2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "5008bbb9-7094-40c4-8ab3-4f56918c8c4c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "36.6 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 2: \n",
    "\n",
    "[x**2 for x in range(1000000)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "bce09164-4e08-47d6-9f61-d1b5cdc0f98c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "56 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 3: \n",
    "\n",
    "[x**2 for x in np.arange(1000000)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "ec544fba-acea-4e7c-997d-ad31ad5a207b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.53 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 4: \n",
    "\n",
    "np.arange(1000000)**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "267a24ed-a28d-4fa0-86c2-e751ddfc3406",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "739 µs ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "# Method 5: \n",
    "\n",
    "map(lambda x: x**2, np.arange(1000000))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22e9e91b-7862-41c5-9e4e-3b7be898bf21",
   "metadata": {},
   "source": [
    "For loops in python are slow, you can generally speed up things by using Numpy which performs native operations using optimized C under the hood. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08afc2b6-332d-4ff2-9050-9394fabf62de",
   "metadata": {},
   "source": [
    "## Profiling code to pinpoint what's slow \n",
    "\n",
    "Profiling lets you see exactly which functions are being called the most and for how long. We will create a function that calls other functions within it and try to see which one of them takes the longest to run. We will use the [`Cprofile`](https://docs.python.org/3/library/profile.html) and [`line_profiler`](https://kernprof.readthedocs.io/en/latest/) libraries. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "9630415b-7e7d-4df0-89be-12586ee857cc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         32 function calls in 4.601 seconds\n",
      "\n",
      "   Ordered by: internal time\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      "       25    3.593    0.144    3.593    0.144 2755188138.py:3(calc_big_number)\n",
      "        1    1.005    1.005    1.005    1.005 {built-in method time.sleep}\n",
      "        1    0.002    0.002    3.595    3.595 2755188138.py:13(<listcomp>)\n",
      "        1    0.002    0.002    4.601    4.601 2755188138.py:15(build_list_and_sleep)\n",
      "        1    0.000    0.000    4.601    4.601 {built-in method builtins.exec}\n",
      "        1    0.000    0.000    4.601    4.601 <string>:1(<module>)\n",
      "        1    0.000    0.000    3.595    3.595 2755188138.py:9(build_list)\n",
      "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import cProfile\n",
    "\n",
    "def calc_big_number():\n",
    "    \"\"\"\n",
    "    'Calculates' 5000000\n",
    "    \"\"\"\n",
    "    return 5 ** 1000000\n",
    "\n",
    "def build_list():\n",
    "    \"\"\"\n",
    "    Creates a list of 25 values, all 5000000\n",
    "    \"\"\"\n",
    "    return [calc_big_number() for _ in range(25)]\n",
    "\n",
    "def build_list_and_sleep():\n",
    "    \"\"\"\n",
    "    Pause for 1 second  \n",
    "    \"\"\"\n",
    "    build_list()\n",
    "    time.sleep(1)\n",
    "\n",
    "cProfile.run(\"build_list_and_sleep()\", sort=\"tottime\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "5beb2367-0196-4482-89a9-c3f406076464",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " "
     ]
    },
    {
     "data": {
      "text/plain": [
       "         32 function calls in 4.571 seconds\n",
       "\n",
       "   Ordered by: internal time\n",
       "\n",
       "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
       "       25    3.563    0.143    3.563    0.143 2755188138.py:3(calc_big_number)\n",
       "        1    1.005    1.005    1.005    1.005 {built-in method time.sleep}\n",
       "        1    0.002    0.002    3.565    3.565 2755188138.py:13(<listcomp>)\n",
       "        1    0.001    0.001    4.571    4.571 2755188138.py:15(build_list_and_sleep)\n",
       "        1    0.000    0.000    4.571    4.571 {built-in method builtins.exec}\n",
       "        1    0.000    0.000    4.571    4.571 <string>:1(<module>)\n",
       "        1    0.000    0.000    3.565    3.565 2755188138.py:9(build_list)\n",
       "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%prun build_list_and_sleep()  #Using magic command in notebook "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a02deba-1125-40da-a93c-66d82eb3c663",
   "metadata": {},
   "source": [
    "We can also check specific lines in a code to explore which takes longer to execute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "14233132-7b3b-4f8d-8266-d84f21a23a5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext line_profiler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "7afa86c7-64a1-4861-9112-036b0dde9eb6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timer unit: 1e-09 s\n",
       "\n",
       "Total time: 4.55653 s\n",
       "File: /var/folders/cq/m81fshbs7ql8qty_fp08bv640000gn/T/ipykernel_17848/2755188138.py\n",
       "Function: build_list_and_sleep at line 15\n",
       "\n",
       "Line #      Hits         Time  Per Hit   % Time  Line Contents\n",
       "==============================================================\n",
       "    15                                           def build_list_and_sleep():\n",
       "    16                                               \"\"\"\n",
       "    17                                               Pause for 1 second  \n",
       "    18                                               \"\"\"\n",
       "    19         1 3551416000.0 3.55e+09     77.9      build_list()\n",
       "    20         1 1005111000.0 1.01e+09     22.1      time.sleep(1)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%lprun -f build_list_and_sleep build_list_and_sleep()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "418db7bc-aeef-4878-aac0-8af7c9f19252",
   "metadata": {},
   "source": [
    "### Using caching to speedup \n",
    "\n",
    "As you may have noticed in the example above, most of the time it took to run was due to the multiple calls of the `calc_big_number` method. In this case, since the value being calculated is constantly repeated, and doesn't use a lot of memory, it is a good candidate for caching (or in this case for replacing with a constant variable). \n",
    "\n",
    "We can use the [`functools.cache`](https://docs.python.org/3/library/functools.html) decorator for this. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "6e52f9db-494e-4ef4-bee6-fd8322ac4c6e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         8 function calls in 1.169 seconds\n",
      "\n",
      "   Ordered by: internal time\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      "        1    1.005    1.005    1.005    1.005 {built-in method time.sleep}\n",
      "        1    0.163    0.163    0.163    0.163 4156761090.py:5(calc_big_number)\n",
      "        1    0.000    0.000    0.164    0.164 4156761090.py:10(<listcomp>)\n",
      "        1    0.000    0.000    1.169    1.169 4156761090.py:12(build_list_and_sleep)\n",
      "        1    0.000    0.000    1.169    1.169 {built-in method builtins.exec}\n",
      "        1    0.000    0.000    1.169    1.169 <string>:1(<module>)\n",
      "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}\n",
      "        1    0.000    0.000    0.164    0.164 4156761090.py:9(build_list)\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Example using caching \n",
    "\n",
    "from functools import cache\n",
    "\n",
    "@cache\n",
    "def calc_big_number():\n",
    "    return 5 ** 1_000_000\n",
    "\n",
    "def build_list():\n",
    "    return [calc_big_number() for _ in range(25)]\n",
    "\n",
    "def build_list_and_sleep():\n",
    "    build_list()\n",
    "    time.sleep(1)\n",
    "\n",
    "cProfile.run(\"build_list_and_sleep()\", sort=\"tottime\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b4d41b8-48e4-491c-96ab-cf8ea050abbf",
   "metadata": {},
   "source": [
    "Now method `calc_big_number` is only called once, resulting on a speed up of about 3 seconds. \n",
    "\n",
    "Let's see another example of this. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "4db7f4d9-cf1b-46d0-8109-1ce49cd697f7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fibonacci with caching:\n",
      "         1004 function calls (4 primitive calls) in 0.002 seconds\n",
      "\n",
      "   Ordered by: internal time\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      "   1001/1    0.001    0.000    0.001    0.001 1626660552.py:3(fib)\n",
      "        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}\n",
      "        1    0.000    0.000    0.001    0.001 <string>:1(<module>)\n",
      "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Fibonacci example \n",
    "\n",
    "@cache\n",
    "def fib(n):\n",
    "    if n < 2:\n",
    "        return n\n",
    "    return fib(n-1) + fib(n-2)\n",
    "\n",
    "\n",
    "print(\"Fibonacci with caching:\")\n",
    "cProfile.run(\"fib(1000)\", sort=\"tottime\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "55520ec5-6692-483b-9492-5ac2cddc7f82",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         3 function calls in 0.000 seconds\n",
      "\n",
      "   Ordered by: internal time\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      "        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}\n",
      "        1    0.000    0.000    0.000    0.000 <string>:1(<module>)\n",
      "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "cProfile.run(\"fib(1000)\", sort=\"tottime\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "203a9f57-269e-43e5-85a0-cdf3c8175332",
   "metadata": {},
   "source": [
    "Now that the results of the recursive calls are being stored instead of computed multiple times, the program runs way faster!\n",
    "\n",
    "So why not cache everything? Some of the biggest cons of caching are:\n",
    "* If your cached result is subject to change, invalidating that cache can lead to more complex and bug-prone code.\n",
    "* Memory or disk space can be limited or more expensive."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bbf758f",
   "metadata": {},
   "source": [
    "## Profiling an Example from Astronomy\n",
    "\n",
    "Imagine we have a simulation with millions of particles (stars, dark matter, gas).  \n",
    "A common task is to compute the **center of mass**:\n",
    "\n",
    "\n",
    "$\\vec{r}_{\\text{CoM}} = \\frac{\\sum_i m_i \\vec{r}_i}{\\sum_i m_i}$\n",
    "\n",
    "\n",
    "This involves looping over all particles and summing their contributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "f8398b73",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.4997803691635228, 0.49988440484572855, 0.5000043045204186]\n"
     ]
    }
   ],
   "source": [
    "# Generate synthetic particle data\n",
    "\n",
    "N = 5000000\n",
    "masses = np.random.rand(N)   # particle masses\n",
    "positions = np.random.rand(N, 3)  # x, y, z positions\n",
    "\n",
    "def center_of_mass_python(masses, positions):\n",
    "    total_mass = 0.0\n",
    "    com = [0.0, 0.0, 0.0]\n",
    "    for i in range(len(masses)):\n",
    "        total_mass += masses[i]\n",
    "        com[0] += masses[i] * positions[i, 0]\n",
    "        com[1] += masses[i] * positions[i, 1]\n",
    "        com[2] += masses[i] * positions[i, 2]\n",
    "    return [c / total_mass for c in com]\n",
    "\n",
    "# Quick test\n",
    "print(center_of_mass_python(masses, positions)[:5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "4eb0b5e0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Profiling the Python loop version:\n",
      "CPU times: user 2.86 s, sys: 3.52 ms, total: 2.86 s\n",
      "Wall time: 2.89 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[0.4997803691635228, 0.49988440484572855, 0.5000043045204186]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Profiling the Python loop version:\")\n",
    "%time center_of_mass_python(masses, positions)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4dd1fe7",
   "metadata": {},
   "source": [
    "### Speeding it Up with NumPy\n",
    "\n",
    "Instead of looping in Python, we can let **NumPy** handle the math in fast C code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "1c4da701",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python loop: 2.9273802919778973 s\n",
      "NumPy      : 0.07335975000751205 s\n"
     ]
    }
   ],
   "source": [
    "# Numpy version\n",
    "def center_of_mass_numpy(masses, positions):\n",
    "    total_mass = np.sum(masses)\n",
    "    com = np.sum(masses[:, None] * positions, axis=0) / total_mass\n",
    "    return com\n",
    "\n",
    "# Compare timings\n",
    "print(\"Python loop: {} s\".format(timeit.timeit(lambda: center_of_mass_python(masses, positions), number=1)))\n",
    "print(\"NumPy      : {} s\".format(timeit.timeit(lambda: center_of_mass_numpy(masses, positions), number=1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8955979d",
   "metadata": {},
   "source": [
    "### Even Faster with [Numba](https://numba.pydata.org/numba-doc/dev/user/5minguide.html)\n",
    "\n",
    "CPython is an interpreted language, which offers benefits like dynamic typing, platform independence, and rapid prototyping. Performance is not one of those benefits. Numba and Cython are two different ways to compile Python and potentially improve performance.\n",
    "Numba is a JIT (just-in-time) compiler, which means that code is compile at run time (instead of ahead of running).  \n",
    "\n",
    "We can add the decorator `@jit` to our original python loop which will change the way this method is compiled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "c3742f91",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Numba: 0.3687928329745773\n"
     ]
    }
   ],
   "source": [
    "from numba import jit\n",
    "\n",
    "@jit(nopython=True)\n",
    "def center_of_mass_numba(masses, positions):\n",
    "    total_mass = 0.0\n",
    "    com = np.zeros(3)\n",
    "    for i in range(len(masses)):\n",
    "        total_mass += masses[i]\n",
    "        com[0] += masses[i] * positions[i, 0]\n",
    "        com[1] += masses[i] * positions[i, 1]\n",
    "        com[2] += masses[i] * positions[i, 2]\n",
    "    return com / total_mass\n",
    "\n",
    "print(\"Numba:\", timeit.timeit(lambda: center_of_mass_numba(masses, positions), number=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d02339ce-7b8e-4f3c-8ae3-dd579683c898",
   "metadata": {},
   "source": [
    "The first run includes the compilation time, if we run again we will notice the speedup. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "b8a8dd15-87d5-4608-9271-519e857e19aa",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python loop: 2.94738166697789 s\n",
      "NumPy      : 0.0634402499999851 s\n",
      "Numba      : 0.005656832974636927 s\n"
     ]
    }
   ],
   "source": [
    "# Compare timings\n",
    "print(\"Python loop: {} s\".format(timeit.timeit(lambda: center_of_mass_python(masses, positions), number=1)))\n",
    "print(\"NumPy      : {} s\".format(timeit.timeit(lambda: center_of_mass_numpy(masses, positions), number=1)))\n",
    "print(\"Numba      : {} s\".format(timeit.timeit(lambda: center_of_mass_numba(masses, positions), number=1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5d33d88-7e9c-4152-acf1-dcd742ed8276",
   "metadata": {},
   "source": [
    "### Other options \n",
    "\n",
    "* multi-threading\n",
    "* theano (math)\n",
    "* other python compilers(?)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f671b071",
   "metadata": {},
   "source": [
    "## Wrap-Up\n",
    "\n",
    "- **Don’t optimize blindly** — always profile first.  \n",
    "- Small changes (like using sets, caching, or NumPy) can make a *huge* difference.  \n",
    "- For even more speed, try Numba or other specialized tools.  \n",
    "\n",
    "Takeaway: Write code that not only *works*, but works *faster*. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language": "python",
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}