Category: Databases

  • How to reset your WordPress password via SQL

    You may have the need to reset your wordpress password and here I’ll show you how to do that, assuming you have access to your sites MySQL database.

    WordPress stores the password for your user in the wp_users table. The password itself is assumed to be hashed with MD5. You can set a new password with the following query:

    UPDATE `wp_users` SET `user_pass` = MD5('mynewpassword')
    WHERE `user_login` = 'user_x';

    Just replace user_login value ("user_x" in the query above) with your own username and replace the password ("mynewpassword in the query above) with the password you want.

    To list users that exist in the database, you can use this simple query:

    SELECT * FROM wp_users

    That’s all there is to it.

  • How to create a versioned table from a crud table in Postgres with triggers

    Example

    You have a user table in Postgres:

    CREATE SCHEMA tutorial;
    
    CREATE TABLE tutorial.crud_users (
        id INT NOT NULL,
        username VARCHAR NOT NULL,
        CONSTRAINT crud_table_pkey PRIMARY KEY (id)
    );

    You insert new users and update usernames for existing users. Update is a destructive operation that overwrites data, so how do you keep track of what usernames a particular id has had over time?

    Use a trigger

    You can use triggers in Postgres to remember the historical values of your rows.

    First create a new table to store the history, which you can think of as a logical stream of updates to the "crud_user" table. Each row in "stream_users" corresponds to a current or historical state of a row in "crud_users".

    CREATE TABLE tutorial.stream_users (
        id INT,
        created_ts TIMESTAMP,
        username VARCHAR NOT NULL,
        CONSTRAINT stream_table_pkey PRIMARY KEY (id, created_ts)
    );

    Next, create a trigger on "crud_users".

    CREATE OR REPLACE FUNCTION crud_users_insert_trigger_fnc()
      RETURNS trigger AS
    $$
    BEGIN
        INSERT INTO "tutorial"."stream_users" ( "id", "created_ts", "username")
        VALUES(NEW."id", now(), NEW."username");
    RETURN NEW;
    END;
    $$
    LANGUAGE 'plpgsql';
    
    CREATE TRIGGER user_insert_trigger
      AFTER INSERT OR UPDATE
      ON "tutorial"."crud_users"
      FOR EACH ROW
      EXECUTE PROCEDURE crud_users_insert_trigger_fnc();

    The trigger inserts a new row in "stream_users" every time you insert or update a row in "crud_users". Thus, we keep track of all historical and current values of users.

    Try it out

    select * from tutorial.crud_users; -- should be empty
    select * from tutorial.stream_users; -- should be empty
    
    insert into tutorial.crud_users(id, username) values (1, 'joe');
    
    update tutorial.crud_users set username='joe69' where id=1;

    Now, If you delete a user from "crud_users" we will not insert anything into the "stream_users" table. To improve the design, we could add a state column to stream_users, which reflects the kind of change, e.g. created, updated or deleted.

  • How to export CSV file to database with Python

    Pandas and SQLAlchemy offer powerful conversions between CSV files and tables in databases. Here is a small example:

    import pandas as pd
    from sqlalchemy import create_engine
    
    df = pd.read_csv('mydata.csv')
    
    engine = create_engine('sqlite:///mydata.db')
    df.to_sql('mytable', engine)
    

    Read more:

  • How to work with spatial data in Amazon Redshift

    While Redshift does not offer native support for spatial data, indexes and functions, there exists a partial workaround. Redshift supports Python UDFs and can also load custom Python libraries. Out of the box, Redshift has numpy, scipy, pandas and many other useful Python libraries. For spatial functionality, one saving grace is the high quality spatial libraries that exist for Python, such as shapely. Of course, the alternative is to simply implement useful spatial functions in Python directly, which we will do here. The drawback is that this does not provide the means for spatial indexes or native spatial types in Redshift. As long as you are working mainly with point data, this should not be a huge obstacle. While polygons and operations on them are useful in many cases, a properly utilized GeoHash can usually do the trick.

    So, let’s get into it! Connect to your Redshift cluster using a client of your choosing. I prefer SQLWorkbench/J. Properly connected, attempt to create the following UDF in Python, which implements the haversine formula using NumPy (thanks to jterrace for the solution).

    CREATE OR REPLACE FUNCTION haversine (lat1 float, lon1 float, lat2 float, lon2 float)
    RETURNS float IMMUTABLE AS
    $$
    
        from math import radians, sin, cos, asin, sqrt, pi, atan2
        import numpy as np
    
        earth_radius_miles = 3956.0
    
        def haversine(lat1, lon1, lat2, lon2):
            """Gives the distance between two points on earth.
            """
            lat1, lon1 = radians(lat1), radians(lon1)
            lat2, lon2 = radians(lat2), radians(lon2)
            dlat, dlon = (lat2 - lat1, lon2 - lon1)
            a = sin(dlat/2.0)**2 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2
            great_circle_distance = 2 * asin(min(1,sqrt(a)))
            return earth_radius_miles * great_circle_distance
     
        return haversine(lat1, lon1, lat2, lon2)
    $$ LANGUAGE plpythonu;
    

    Now, let’s use our new UDF to calculate the great-circle distance between a pair of points.

    SELECT haversine(37.160316546736745, -78.75, 39.095962936305476, -121.2890625)
    -- 2293.1324218790523
    

    One very big drawback is that it is incredibly slow (an understatement). The following query computes the function just 100 times, which on my cluster took over 17.21 seconds (jeez!):

    SELECT count(haversine(37.160316546736745, -78.75, 39.095962936305476, lon2 % 360 - 180)) FROM generate_series(1, 100) lon2
    

    Because the speed is so slow, I will investigate another way to achieve this goal with Redshift. Expect updates to this post.

  • How to compute Fibonacci sequence in SQL

    Inspired and simplified from a set of slides on using RDBMS for storing, managing, and querying graphs:

    with recursive fib(i,j) as (
        select 0,1
        union all
        select j, i+j from fib where j<1000
    )
    select i from fib