Getting to know (My)SQL

For a brief summary, see this[1] post.

[1] entry://20111111-104648


Entry: Queries
Date: Fri Nov  4 00:19:08 EDT 2011

select date,ip,req from entry NATURAL JOIN ip NATURAL JOIN req where 
  id_ip = (select (id_ip) from ip 
             where ip = '201.39.162.161');


Entry: SQL cheat sheet
Date: Fri Nov  4 00:35:12 EDT 2011

# apt-get install mysql-server
# mysql

CREATE database <database>;
SELECT * from <table>;
USE <database>;
CREATE TABLE agent (
     agent_id MEDIUMINT NOT NULL AUTO_INCREMENT,
     agent TEXT NOT NULL,
     PRIMARY KEY (agent_id)
);
INSERT INTO agent (agent) values ('me'), ('my dog');
SELECT COUNT(*) FROM <table>;


Entry: Regular expressions
Date: Mon Nov  7 20:28:18 EST 2011


 select * from ref where ref regexp 'ycombinator';
 select count(*) from ref where ref regexp 'pic18';

[1] http://www.go4expert.com/forums/showthread.php?t=2337


Entry: Histograms
Date: Mon Nov  7 22:38:51 EST 2011

Simple histograms need a "truncate" function like year, month, day
below.  Then the summary is a GROUP BY, optionally followed bya an
ORDER BY.

  SELECT year(date) AS year, count(*) AS hits FROM entry
  GROUP BY year(date)
  ORDER BY year;

  SELECT month(date) AS month, count(*) AS hits FROM entry
  GROUP BY month(date)
  ORDER BY month;

  SELECT day(date) AS day, count(*) AS hits FROM entry
  GROUP BY day(date)
  ORDER BY day;


[1] http://stackoverflow.com/questions/830814/sql-server-non-standard-date-based-histogram
[2] http://stackoverflow.com/questions/6201992/optimal-way-to-create-a-histogram-frequency-distribution-in-oracle


Entry: Known bots
Date: Mon Nov  7 22:55:19 EST 2011

[1] http://www.user-agents.org/


Entry: The SQL notes
Date: Fri Nov 11 10:46:48 EST 2011

I'm starting these notes as a SQL/DB beginner. 

I was never interested in databases or SQL until I read a chapter in
CTM[1] a couple of years ago.  There the relational model of
programming is developed in a way that is closer linked to declarative
& logic programming, a connection I never saw before but which caught
my attention, being interested in language design, interpreters,
compilers, ...

Recently I re-read that chapter in [1] when a practical need has come
up to write reports for a couple of years of Apache log files.  At
first, I attempted to write an ad-hoc database query engine in Scheme
(Racket) but soon realized that what I wanted to do was perfectly
possible in SQL.  Hence the Scheme part is limited to a parser that
translates a collection of log files to an .sql file that can be fed
into MySQL to produce a DB schema.

From there on the idea is to get to know SQL, to see what's possible
and what's reasonably (fast) and possibly move back to Racket bindings
for the reporting layer.

[1] http://www.info.ucl.ac.be/~pvr/book.html
[2] entry://../plt


Entry: Regexp and apache logs
Date: Sat Nov 12 10:06:09 EST 2011

I put all my logs together so I can share some information across
vhosts, such as bots.  Trouble is that I don't have vhost infor
recorded for each entry which might complicate things a bit.

Anyway, it seems that because requests need to be fully qualified,
there is no real problem here.  What might be the case is that I need
some precomputed request sets that cache the regular expression
matches for certain requests into separate tables or views.

Hmm.. actually it's not.  For zwizwa.be I can see:

 255209 | GET /darcs/darcs/zwizwa.be/ HTTP/1.0  


Entry: Views
Date: Sat Nov 12 10:28:35 EST 2011

DROP VIEW v_entry;
CREATE VIEW v_entry AS select * from 
       entry NATURAL JOIN
       vhost NATURAL JOIN
       ip    NATURAL JOIN
       stat  NATURAL JOIN
       req   NATURAL JOIN
       ref   NATURAL JOIN
       agent;

DROP VIEW v_blog;       
CREATE VIEW v_blog AS
       select * from req where req regexp 'GET (.*)/blog/\\?p=';


Entry: Getting at wordpress blog posts
Date: Sat Nov 12 11:20:38 EST 2011

This view is a subset of the req table which refer to blog posts.

DROP VIEW v_blog; 
CREATE VIEW v_blog AS
       select * from req where req regexp 'GET (http://.*)*/blog/\\?p=';

How to create a table from that that identifies per blog post?


Entry: Dumping SQL inside mysql binary
Date: Thu Feb 23 11:00:07 EST 2012

I'd like to do something that mysqldump does, but from inside the
mysql binary.  I have access to a remote (huge) DB table but I'd like
to work on part of the table locally, downloading the data only once.

Looking around online, the simplest seems to be a tab-separated value
file (TSV).

#!/bin/bash
cat <<EOF | mysql --host $HOST --port $PORT --batch -u$USER -p$PASS >out.tsv
SELECT * FROM $TABLE where ...;
EOF


Entry: Expanding text field into table using regular expressions
Date: Thu Feb 23 14:48:40 EST 2012

I have a table with a text field that has data in it that I want to
extract into table.  I.e. suppose the text field has 'AA:BB:CC' then I
want to separate this out into 3 fields 'AA','BB',CC'.

The way to do this is something like:

   SELECT fn(column) where ...;

where the fn is a function that parses the string.


Entry: Regexps
Date: Thu Feb 23 15:39:38 EST 2012

Seems that regex_replace is quite a hassle in mysql, so let's just do
it externally using sed.

echo "Debug : Sender device 48859 - '357005032445536 ' - id 129CE8 - flags 0002 - data 85510100"

echo "foo1" | sed -e 's/foo\(\.*\)/\1/g'


Entry: Representatives
Date: Thu Feb 23 23:55:52 EST 2012

I have a table of (x,y,z) where I'm mostly interested in unique (y,z)
combinations.  I know how to do unique single columns (group by) but
does this also work with multiple columns?


Entry: Query DB in bash
Date: Tue Feb 28 15:06:29 EST 2012

What's the simplest way to map a bash function over a table?  It turns
out to be simplest to just put the function in a script and use xargs.
However this makes maintaining state a bit hard.

Bottom line: use a real programming language, or implement the
procedure in straight SQL.


Entry: Some things I learned these days
Date: Wed Feb 29 09:40:07 EST 2012

Playing with MySQL logfiles for work.  Note that I'm an amateur at
this and some approaches might be convoluted.  Basic idea: in bash
it's simplest to work with TSV (tab separated value) files, iterate
over them with a "while read RECORD; do ... ; done" and use the
sed/awk/xargs duct tape tools.

* Adding data to the database.

  Basically, just generate SQL from any record stream and pipe it into
  mysql command line client.

  Also useful for custom dumps (write a script that does TSV -> SQL
  conversion using "while read RECORD; do ... ; done".

* Mirroring a database + regexp conversion.

  I wanted a sandbox to create more tables so didn't work in the
  read-only production DB.  Copying databases can usually be done
  using "mysqldump".  However I did not find a way to do a selective
  dump, so I use the TSV->SQL approach.

* Put individual SQL queries in a .sql files and load them in the
  mysql command line client using the "source" command.

* Use views to combine different tables.  I.e. I could find 3
  different kinds of data:

   - log files

   - log file annotation from constant data (i.e. ELF symbol tables)

   - manual annotation

  These are fundamentally different.  The first is the original raw
  data, second is data derved from original raw data in a different
  form (ELF binaries in this case).  

  The latter is data updated during the manual log processing, which
  is dynamic in nature and reflects the state of the current analysis.


Entry: Database cursors
Date: Wed Feb 29 10:04:32 EST 2012

So I've done queries 101 (data filtering), next is updates (for
loops / database cursors).

I want to run an UPDATE procedure for all the results in a SELECT.

Looks like this is possible within the UPDATE clause itself:

  UPDATE table
  SET column = value, ...
  WHERE ...

Though that does an iteration over the table itself (the iterated
table is the same as the updated one).  That's not what I want.  I
want to iterate over table1 and insert/update values in table2.

  for rec in table1; do
     if predicate(rec); then
         replace(table2, field, "value")
     end
  end


Entry: Apache logs in database
Date: Fri Mar  2 20:25:46 EST 2012

After de-noobing a bit at work this week, I'm thikning that there
might be some simpler ways to do the key indirection I'm using a
Racket script for.

Basic problem: the logs are one big table, but the entries in the
table have a lot of duplicates.  Find a way to do automatic key
generation for the entries: if an entry doesn't exist yet, make up a
new key, otherwise reuse old.

The problem is that while (I think) I have some grasp about the
functional part of SQL (queries), I have no clue about the procedural
ones (updates).

The first thing is probably to find out how procedures work.


Wait..

This can be done functionally by creating a view.  ( As Gerard says:
Can't you create a view? )

1.  Just add the entry.
2.  Insert if not exists each column.
3.  Create a view that joins the hashed/indexed entries

( Which is the *inverse* of the operation I'm using to build the
indirect table in the first place! )

Then of course, this can be done simpler.

1.  INSERT IF NOT EXISTS each entry into its respective table.
2.  SELECT to find id
3.  insert main entry


Entry: MySQL procedures
Date: Fri Mar  2 20:48:09 EST 2012

DELIMITER //GetAllProducts()
CREATE PROCEDURE GetAllProducts()
  BEGIN
  SELECT *  FROM products;
  END //
DELIMITER ;

From [1], the delimiter is necessary to be able to store multiple
statements in a procedure.  (weird!).

So simple sequencing is simple.  Next: variable binding.

First thing that comes to mind: what's the difference between

procedures and functions?
DELIMITER //
CREATE PROCEDURE addsome()
  BEGIN
  insert if not exists SELECT *  FROM products;
  END //
DELIMITER ;


[1] http://www.mysqltutorial.org/getting-started-with-mysql-stored-procedures.aspx
[2] http://dev.mysql.com/doc/refman/5.0/en/create-procedure.html


Entry: Playing with INSERT
Date: Sat Mar  3 09:15:21 EST 2012

   DROP TABLE if exists foo;
   CREATE TABLE foo(
          id   MEDIUMINT NOT NULL AUTO_INCREMENT,
          val  TEXT,
          PRIMARY KEY (id)
          );

   INSERT IGNORE foo (val) VALUES ("lalala");


From [1]: The logic is that the SELECT statement generates a single
row of data with the required values, but only when the values are not
already found.  That wasn't immediately useful.  Dicking around a bit
I come to this:

   INSERT INTO foo(val)
     (SELECT (val) from foo 
      WHERE NOT EXISTS (SELECT * FROM foo WHERE val = "lalala"));

   INSERT INTO foo(val)
     (SELECT "lalala" from DUAL
      WHERE NOT EXISTS (SELECT * FROM foo WHERE val = "lalala"));

Turning this into a procedure doesn't work..  I asked Yvonne, she
probably knows.


[1] http://stackoverflow.com/questions/913841/mysql-conditional-insert


Entry: Understanding SQL
Date: Sat Mar  3 09:39:33 EST 2012

I still don't quite get it..  Especially not the single value vs table
stuff.  Representing tables as lists, in Haskell syntax:

        f :  (a,b,c)   -> (d,e)
        g :  [(a,b,c)] -> [(e,e)]
        h :  [(a,b,c)] -> (d,e)

Here g = map f, where this mapping seems to happen automatically in
SQL depending on context?  And h is a projection from a table to a
single record, something like COUNT or EXISTS.  COUNT is called an
aggregate function.

First find of the day: SELECT without FROM returns a 1 element table:

  mysql> SELECT 123;
  +-----+
  | 123 |
  +-----+
  | 123 |
  +-----+
  1 row in set (0.00 sec)

which is enough to pass a value to a function that expects a table.
This is useful for testing out predicates, i.e.:

  SELECT ("lalala" IN (select (val) from foo));
  SELECT (EXISTS (select (val) from foo where val = "lalala"));

"SELECT ... FROM <ref>" requires a table reference.  It's possible to
abstract queries as tables (views).  How to add a query directly?  In
SqlServer this requires an alias[1].  Also works in Mysql:

  SELECT * from (SELECT * from foo) alias1;

How to save the result of a query as a table?  This is SELECT
... INTO, but that's not standard.  Mysql supports INSERT INTO[2]
which requires an existing table.


[1] http://bytes.com/topic/sql-server/answers/142990-select-subquery
[2] http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-select-into-table.html


Entry: MySQL regexp replace or just regexp match
Date: Sat Mar  3 13:14:27 EST 2012

It seems that I need access to regexp replace, but on second thought
there isn't really any "real" datamining going on (no processing on
the data embedded *inside* a REFERRER or a REQUEST field: the embedded
data just serves as input to predicate functions used for selecting
and grouping.  This seems like a very important point.

It seems plausible to think that this is the reason why MySQL has
matching but not replacing: don't store what can be computed.

Some useful views:

 - google searches
 - specific blog page hits
 - 


Though... I have some trouble with things like this:

GET /blog/?p=1089&cpage=1 HTTP/1.0
GET /blog/?p=1089 HTTP/1.0
GET /blog/?p=1089 HTTP/1.1

How to project them down to one representative?

Maybe it's a good idea to parse the REQ and REF fields a bit more.

Looks like I do need a regexp/replace function, but probably can do
without saving tables (i.e. keeping just views).


Entry: Regexp-replace
Date: Sat Mar  3 16:19:58 EST 2012

sudo apt-get install libpcre3-dev libmysqlclient-dev

Get library source[3], untar and
./configure && make && sudo make install

Libraries have been installed in:
   /usr/lib/mysql/plugin


[1] http://www.mysqludf.org/lib_mysqludf_preg/
[2] http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql
[3] http://www.mysqludf.org/lib_mysqludf_preg/lib_mysqludf_preg-1.0.1.tar.gz


Entry: External processing
Date: Sat Mar  3 16:29:34 EST 2012

Considering the size of the tables, it might make sense to just split
them externally using regexps.  Or even to do this in the
preprocessing step directly.


mysql> select count(*) from req;
+----------+
| count(*) |
+----------+
|   354134 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from ref;
+----------+
| count(*) |
+----------+
|    42338 |
+----------+
1 row in set (0.00 sec)

Anyways, I installed the PCRE wrapper for mysql and I find that the
next step is to weed out the pletora of hack attempts ;)

I found there are about 30 ways to request the same page, most of them
look like wordpress breakin attempts.


Entry: Nested queries
Date: Sat Mar  3 20:20:25 EST 2012

So.. With the regular expression matcher it's possible to create
partitions in the set of requests (i.e. projecting onto blog number).

  (remember from math class: a projection defines a partition!)

So what I want to do now is to map each request to a blog number, and
count the number of occurrences.

  select * from entry 
                natural join req_blog           # project request to id_blog
                group by id_blog
                order by id_blog limit 10;

Queries start to take a little longer.


Entry: Caching predicates / membership
Date: Sat Mar  3 22:20:22 EST 2012

Products (subqueries, joins) are expensive.  However in many cases the
subquery is an "IN" constraint.  Isn't it simpler to cache this
membership function into an explicit (t, bool) table and join with
that?


Entry: Mysql clone database
Date: Thu Jul  5 12:19:07 EDT 2012

Problem: production database is too large for efficient queries.

Solution: copy schema to local database + transfer part of the data.
I.e. create a local cache.


Entry: Emacs MySQL
Date: Thu Jul  5 12:24:54 EDT 2012

(require 'sql)
(setq sql-mysql-program "/home/tom/dbb_trailer/doc/mysql/mysql.sh")
(sql-mysql)


Entry: Joins
Date: Sat Jul  7 10:14:10 EDT 2012

Inner joins:  filtered carthesian product
------------------------------------------
INNER JOIN:   subset of carthesian product that matches predicate
JOIN:         like INNER JOIN but only equality predicates (equi join)
NATURAL JOIN: (equi) JOIN with automatic column name matching
CROSS JOIN:   unrestricted carthesian product


Outer joins:  filtered product + all non-matching rows
------------------------------------------------------
LEFT  OUTER JOIN
RIGHT OUTER JOIN


[1] http://en.wikipedia.org/wiki/Join_(SQL)


Entry: SQL in OpenOffice Calc
Date: Sun Mar 17 12:14:08 EDT 2013

sudo apt-get install libmysql-java

Tools -> Java -> Class Path -> add: /usr/share/java/mysql-connector-java.jar


[1] http://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=35708
[2] http://diwt.wordpress.com/2011/08/03/connect-to-mysql-database-from-libreoffice-base/

Entry: Function based on table
Date: Sat Mar 30 11:29:32 EDT 2013

Given a currency conversion table, I want to make a function:

  (date, currencty) -> rate 
or

  (date, currency, amount) -> amount

where e.g. the conversion table contains monthly data, so the input
date can be rounded from that.

After some struggle with SQL syntax, this is where I got at.  The main
remark is that to convert a set to a value, use "SELECT INTO".  It
will generate a warning when the set is not a singleton.

DROP FUNCTION IF EXISTS cconv;
DELIMITER |
CREATE FUNCTION cconv(date_in DATE)
RETURNS DECIMAL(10,6)
BEGIN
        DECLARE rv DECIMAL(10,6) default 1;
        SELECT rate INTO rv FROM exchange
        WHERE   year(date)=year(date_in) and
                month(date)=month(date_in) and
                den="EUR";
        RETURN rv;
END
|
DELIMITER ;


[1] http://stackoverflow.com/questions/8073404/function-complain-about-not-allowed-to-return-a-result-set-from-a-function-whe


Entry: Simplest way to print/pdf a table?
Date: Sun Mar 31 20:41:39 EDT 2013

Output to CSV[1].
Use --table [2] for tabular format.


[1] http://stackoverflow.com/questions/356578/how-to-output-mysql-query-results-in-csv-format
[2] http://dev.mysql.com/doc/refman/4.1/en/mysql-command-options.html


Entry: Mysql create database
Date: Sun Apr  7 11:03:41 EDT 2013


mysql> CREATE DATABASE databasename;
Query OK, 1 row affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON databasename.* TO "wordpressusername"@"hostname"
    -> IDENTIFIED BY "password";
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.01 sec)


Entry: Mysql data import
Date: Sun Apr  7 15:06:57 EDT 2013

[1] http://dev.mysql.com/doc/refman/5.0/en/load-data.html


Entry: Convert arbitrary CSV file to MySQL table
Date: Tue Aug  6 15:47:28 EDT 2013

# start mysql as root to create DB.
mysql> CREATE DATABASE tp;
mysql> GRANT ALL PRIVILEGES ON tp.* TO "tom"@"localhost" IDENTIFIED BY "password";
mysql> FLUSH PRIVILEGES;


# start mysql as user
mysql --local-infile -utom -ppassword tp


# Create an SQL file like this:
CREATE TABLE tp (
       brand   TEXT NOT NULL,
       model   TEXT NOT NULL,
       tac     TEXT NOT NULL,
       info    TEXT NOT NULL,
       profile TEXT NOT NULL,
       date    TEXT NOT NULL,
);
LOAD DATA LOCAL INFILE '/home/tom/beep/SIM/tools/tp_db.csv' INTO TABLE tp FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
# And source it


Entry: table as function
Date: Sat Sep 21 16:18:20 EDT 2013

Is there a quick way to abstract a table with a functional dependency
into a function?

I guess a natural join can do most of what i'm looking for, though it
would be nice to find that out..

Ha, I did this before:

DROP FUNCTION IF EXISTS cconv;
DELIMITER |
CREATE FUNCTION cconv(date_in DATE, den_in VARCHAR(3))
RETURNS DECIMAL(10,6)
DETERMINISTIC
BEGIN
        DECLARE rv DECIMAL(10,6) default 1;
        IF NOT den_in = 'USD' THEN
           SELECT rate INTO rv FROM exchange
           WHERE   year(date)=year(date_in) and
                   month(date)=month(date_in) and
                   den=den_in;
        END IF;      
        RETURN rv;
END
|
DELIMITER ;


For the apachelogs id<->text maps it is:

DROP FUNCTION IF EXISTS ip;
DELIMITER |
CREATE FUNCTION ip(id MEDIUMINT)
RETURNS TEXT
DETERMINISTIC
BEGIN
        DECLARE rv TEXT default '';
        SELECT ip INTO rv FROM ip WHERE ip.id_ip = id;
        RETURN rv;
END
|
DELIMITER ;


Entry: Fresh mysql install
Date: Mon Dec 23 10:32:42 EST 2013

Some questions that keep coming back:

- Where are users stored?
- How to create a database associated to a user

See Apr 7 2013 post.


Entry: Mysql root password
Date: Mon Dec 23 10:37:41 EST 2013

I have a debian mysq install but forgot the root password.

- how to fix, see [1].

- why is it important to have separate root pw auth?
In [1] it is mentioned this is insecure.  Why?


So I don't understand why it is insecure, but let's just go with the
flow here and follow directions.  As a matter of convention, just
create a root-readable/executable file /root/mysql containing the
password.

In [1] it says to use these commands in the init file specified as
mysqld_safe --init-file=....  This did not work for me.

UPDATE mysql.user SET Password=PASSWORD('MyNewPass') WHERE User='root';
FLUSH PRIVILEGES;

Using this instead:
mysqld --skip-grant-tables


[1] http://dev.mysql.com/doc/refman/5.0/en/resetting-permissions.html


Entry: Binning
Date: Tue Dec 24 17:44:20 EST 2013

Basic idea: create a table with a binning relation (recur) and fish
out the NULL references from a left outer join like this:

select flow.txinfo,txclass from
   (flow left outer join recur
    on    flow.txinfo = recur.txinfo)
   group by txinfo
   order by txclass ;


Entry: Naming intermediates
Date: Thu Dec 26 21:23:57 EST 2013

Is there a logic to SQL syntax?


Entry: Splitting regexp -> view?
Date: Fri Dec 27 13:14:23 EST 2013

Many real world tables contain "fields within fields" that make more
sense when they are separated out.  Otherwise queries involve many
regular expressions.

What is bettter: to leave the somewhat inconsistent relations burried
inside the fields, or to separate them out into separate fields.


Maybe the real question is: is it possible to JOIN on a regular
expression relation?

And if the answer is yes, is it too inefficient?

I see no reason why it would not be possible.  Joining should work on
any predicate.  If so then it might not be necessary to do a lot of
splitting (= premature optimization?).

Is it possible to store regexes in tables?


Entry: Annotation joins
Date: Mon Mar 31 08:59:36 EDT 2014

- JOIN:              only matches
- LEFT OUTER JOIN:   matches + non-matches, keep left

if the join is performed on a 1-1/0 map, the number of rows of the
result is the same as the left table.


Entry: Editable views
Date: Mon Mar 31 09:20:59 EDT 2014

Mysql supports editable views.  Does this work together with mysql
workbench?

Goal is to create a mapping table.  Create a 2-column table and fill
the primary keys from another table.  The simplest way to do this
seems to be to just copy a table?


INSERT admin.balance SELECT id,NULL FROM admin.flow;


Then create an editable view:

select expense_id, trans_id from admin.flow left join admin.balance on admin.flow.id = admin.balance.trans_id;


Hmmm... Mysql workbench doesn't let me edit the view.

I think the editable (edible?) views idea is a good one, so how to
bring it to completion?

What I really want is:
- flow table + extra field
- drop box that gives a selection of possible matches


Entry: Evidence based approach
Date: Mon Mar 31 09:58:37 EDT 2014

Start with the ebay expenses table, then iteratively find evidence in
the flow table.  Evidence is is the minimal match of amount + date
proximity.

Hmm... going crazy again.


The simplest approach seems to be to add a human-readable annotation
to each transaction id.  Something that is enough to cross-ref with
other tables.  Don't bother with id?


So let's just use temporary tables.
Create a copy of a usable view and edit it.  Then copy into mapping table.


Entry: 2013 taxes - what did we learn?
Date: Tue Apr  1 21:59:20 EDT 2014

- mysql workbench is a nice tool for low-level editing

- for most things though, you want a custom editor:
  i made a "row connect" tool using python pyside qt sql tableview

- didn't figure out how to make html table forms using racket.  harder
  than it seems..

- don't edit generated / converted data.  it's still a pain, but much
  better to put manual entries in a separate table and join it into a
  view.

- when making complex projections, also check complements as:
  select * from tab where key not in (select key from tab_proj);

- writing reports is a pain: take some time for it!

- still didn't figure out a good way to print a simple table to a pdf
  document.  main problem seems to be setting the column widths when
  using wrap-around text.  should be easy to compute by allowing a
  "max lines" wraparound.


Entry: Setting up mysql with SSL access
Date: Tue May 20 14:17:11 EDT 2014

I'm doing this from a fresh debian install.
- Make sure to config root user with a password (autogen, long)
- in /etc/mysql/keys:

# ca certificate
openssl genrsa 2048 >ca-key.pem
openssl req -new -x509 -nodes -days 3600 -key ca-key.pem >ca-cert.pem

# server key
openssl req -newkey rsa:2048 -days 3600 -nodes -keyout server-key.pem >server-req.pem
openssl x509 -req -in server-req.pem -days 3600 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 >server-cert.pem

# client key
openssl req -newkey rsa:2048 -days 3600 -nodes -keyout client-key.pem >client-req.pem
openssl x509 -req -in client-req.pem -days 3600 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 >client-cert.pem

chmod 400 *.pem

- in mysql command line:
GRANT ALL PRIVILEGES ON *.* TO 'tom'@'%' IDENTIFIED BY '<pw>' REQUIRE SSL;


# then connect:
mysql -ussluser -pssluser -P3430 --ssl-key=


[1] http://askubuntu.com/questions/194074/enabling-ssl-in-mysql
[2] http://www.chriscalender.com/?p=448
[3] http://stackoverflow.com/questions/5016505/mysql-grant-all-privileges-on-database


Entry: Can a table refer to a table?
Date: Wed May 21 10:13:11 EDT 2014

It seems that in SQL this doesn't make sense, as this would require
polymorphy.  Seems this gets into ORM territory[1].

Basically, my application understands subclasses, but how to I stick
them into the DB?  Essentially I want a table that refers to different
tables (types / classes), each with a set of rows (data/object
instances).

Maybe the main question is: how to stay relational and avoid ORM?

[1] http://en.wikipedia.org/wiki/Object-relational_mapping
[2] http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch


Entry: Editable queries
Date: Wed May 21 12:05:37 EDT 2014

As long as each identifier is unique and keys are present.  See
here[1] around 6min.

[1] https://www.youtube.com/watch?v=X_umYKqKaF0


Entry: Data Models in MySQL Workbench
Date: Wed May 21 14:03:05 EDT 2014

I can think in terms of relations, functions, ...
A collection of tables and views with keys is quite clear to me.
So what does the "Data Model" in workbench add to all this?

[1] http://dev.mysql.com/doc/workbench/en/wb-creating-eer-diagram.html


Entry: Foreign keys
Date: Wed May 21 14:05:46 EDT 2014

[1] http://dev.mysql.com/doc/refman/5.6/en/create-table-foreign-keys.html
[2] http://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_referential_integrity


Entry: Parameterize over tables?
Date: Sat May 24 11:32:14 EDT 2014

Is it possible in MySQL to parameterize something over a table?
I.e. have a table refer to another table in a field, then use that
name to perform a (generic) query in the table?

This is a bit like duck typing inheritance.

Is it possible?

Is it avoidable?


It seems possible using EXEC[1] or EXECUTE[2] in MySQL.  And probably
opens up a can of worms...

[1] http://www.techrepublic.com/forums/questions/doing-an-indirect-sql-select/#.
[2] http://dev.mysql.com/doc/refman/5.1/en/execute.html


Entry: CHAR vs VARCHAR: speed?
Date: Sat May 24 12:28:29 EDT 2014

Rule of thumb seems to be that speed wrt. key lookup/indexing isn't
much of an issue.  Storage impact is much larger.

[1] http://dba.stackexchange.com/questions/2640/what-is-the-performance-impact-of-using-char-vs-varchar-on-a-fixed-size-field


Entry: Editable views
Date: Mon May 26 12:51:35 EDT 2014

See also view algorithms[2].

[1] http://dev.mysql.com/doc/refman/5.0/en/view-updatability.html
[2] http://dev.mysql.com/doc/refman/5.6/en/view-algorithms.html


Entry: Normal forms
Date: Tue May 27 10:18:49 EDT 2014

1NF:

 - all columns contain atomic values

 - values of each attribute contain only a single value

 Meaning: don't aggregate what should be separate and don't duplicate.
 Solve these issues by introducing new tables.

2NF:

 - no non-prime attribute is dependent on any proper subset of any
   candidate key of the table

 Meaning: keys should be minimal, i.e. non-redundant.

3NF:

 - all the attributes in a table are dependent on the primary key and
   only the primary key.

 Meaning: every non-key is a proper function of the primary key.


[1] http://en.wikipedia.org/wiki/First_normal_form
[2] http://en.wikipedia.org/wiki/Second_normal_form
[3] http://en.wikipedia.org/wiki/Third_normal_form


Entry: Indirection through eval
Date: Thu May 29 15:11:16 EDT 2014

So I still want to find a way to deal with type abstraction in a
relational database.  There has to be a clever way...

Problem: I have a bunch of tables that contain "dynamically typed"
where a 'type' column contains a symbolic foreign type key referencing
a table of types, and a value which is just a string.


Entry: Outer product
Date: Wed Jun  4 15:01:13 CDT 2014

How to get to an outer product table from a collection of
(axis,coordinate) pairs?

Didn't find how to do this.  Seems to be a "higher order" union: the
number of union operators depends on the data.  I don't think SQL can
do this..   I solved it in Python for now.


Entry: Embedding nested structures
Date: Thu Jun 12 09:46:36 EDT 2014

I need to embed a nested C structure in a database, i.e. record the
byte offsets.  This seems to be similar to the previous problem of
outer product in that it is higher order: number of nestings depends
on the content.  However in practice in my problem there are only two
levels.

So what is a structure?

An invertible map from names to byte offsets.

I guess the key part here is the invertible nature: it should not be
lost when introducting the nesting operation.

Looks like this is the same trick as I've been using before: abstract
it as functions first.  See next post.


Entry: Encoding structure in databases
Date: Thu Jun 12 10:01:06 EDT 2014

It's hard to encode data structures into a data base once there is
some hierarchy involved.  There's the field of ORM which I don't know
much about and frankly find a bit smelly, and then there's the idea
that database tables are actually just finite functions.

This approach seems to work well:

  1. Encode the structure as a composition of pure functions

  2. Implement functions using tables with primary keys


Entry: Set difference
Date: Wed Sep 10 17:10:15 CEST 2014

Time to build some more intution...

Is set difference an expensive operation?

Look at EXCEPT and INTERSECT.


Entry: growing sqlite3
Date: Fri Sep 12 18:56:37 CEST 2014

Changing the structure of a DB without rebuilding is new to me!
Some notes.  I really start to like SQL, but the syntax is hard to remember.

  DELETE FROM storage where host = 'zoo';


sqlite has a limited form of 'ALTER TABLE', however adding a column is
and setting a default supported:

  ALTER TABLE storage ADD mtime INTEGER;
  UPDATE storage SET mtime = 0;

I'm not sure if it's possible to change the key structure.


Entry: Partitions
Date: Fri Sep 12 19:02:14 CEST 2014

select mtime>0,count(*) from storage group by mtime>0;


Entry: Database workstation
Date: Thu Sep 25 11:24:23 CEST 2014

Starting to like using sqlite to quickly whip up a database to do
certain simple queries.  One problem is microcontroller pin assignment
tracking.  Put all data in *.csv files, make one schema.sql file that has:

CREATE TABLE conn { ... }
.separator ","
.import conn.csv conn

Then create some reports:

CREATE VIEW ....


And run it as:
sqlite3 --init schema.sql :memory:


No .sqlite file required.


Entry: Taxes
Date: Sat Apr  4 12:01:55 EDT 2015

Two operations keep coming back:
- Data conversion for expense records (from anything to table)
- Classification into tax categories

For classification I was thinking about building a gui.  I did make
something last year to match up records.  But really, it seems easier
to use emacs or a spreadsheet to edit the csv files.

The problem is that these are one-time things: it is actually more
productive to do them mostly manually, and automate or facilitate only
the repetitive parts.


Entry: Path of least resistance: SQL + React
Date: Sat Sep  5 19:05:15 CEST 2015

Might be able to do this through Erlang.  Essentially what I need is
display of tables + annotation -> commit to different table.


Entry: Low level database edits: the problem
Date: Tue Sep  8 15:59:34 CEST 2015

I know what I want: editable views/queries.
Basically, this means that the presentation is separated from the view.
Most of the edits I do are either:
  - adding an annotaiton column (as a separate table to main table + "info join")
  - changing a column

Solutions:
- Present an ODB interface to libreoffice
- Web interface on top of react
- Through export/inmport of CSV + libreoffice edits
- Emacs table editing?

  
Entry: finite functions
Date: Mon Mar 28 12:47:38 EDT 2016

Since functional modeling (keys) is such a big deal in databases, why
is there no syntax to turn a table into a function?

E.g.  select  tab1(x) from tab2

Where the function tab1 corresponds to:

x -> select val from tab1 where key=x;


Maybe because it is hard to optimize?  You likely don't want to write
the function such that it does a single sub-select for each value.
Likely you want it as a join, so the query engine can do its magic.  

What would be nice maybe is to have some syntactic sugar to turn
functional references like that into joins...  Or to just learn to
live with the limitation?  It is nice as an abstraction anyway.


Entry: Normalization
Date: Mon Mar 28 12:56:20 EDT 2016

1NF:

- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key

2NF:

- 1NF + every non-prime attribute of the table is dependent on the
  whole of every candidate key.

3NF:

- 2NF + all the attributes in a table are determined only by the
  candidate keys of that table and not by any non-prime attributes.

BCNF: (similar to 3NF)

- redundancy based on functional dependency has been removed, although
  other types of redundancy may still exist.

4NF:

- BCNF + eliminate multivalued dependency

5NF:

- every non-trivial join dependency in it is implied by the candidate
  keys.

6NF:

- no nontrivial join dependencies at all, or alternatively:
- 6NF + every JD [Join Dependency] of R is trivial

DKNF:

- database contains no constraints other than domain constraints and
  key constraints.


https://en.wikipedia.org/wiki/Database_normalization
https://en.wikipedia.org/wiki/First_normal_form
https://en.wikipedia.org/wiki/Second_normal_form
https://en.wikipedia.org/wiki/Third_normal_form
https://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form
https://en.wikipedia.org/wiki/Fourth_normal_form
https://en.wikipedia.org/wiki/Fifth_normal_form
https://en.wikipedia.org/wiki/Sixth_normal_form
https://en.wikipedia.org/wiki/Domain-key_normal_form


Entry: sqlite creates table when importing a csv file, using first line as table names
Date: Wed Mar 30 11:20:50 EDT 2016

cool feature for quick&dirty ram-based scripting work, but confusing!
best to create schema separately.


Entry: index
Date: Thu Aug 10 00:19:22 EDT 2017

I think i just now understand how an index works: create an ordered
list that can be binary-searched, mapping to some other thing that can
be binary-searched in the original table.

https://www.youtube.com/watch?v=gpxnbly9bz4


Entry: SQLite virtual tables
Date: Wed Feb 28 11:27:39 CET 2018

https://sqlite.org/vtab.html

Maybe it is possible to flatten data structures?  E.g. using a
built-in function with multiple returns?

https://sqlite.org/c3ref/create_function.html


Example:
http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html

void msignFunc(sqlite3_context *context, int argc, sqlite3_value **argv){
  assert( argc==1 );
  switch( sqlite3_value_type(argv[0]) ){
    case SQLITE_INTEGER: {
      long long int iVal = sqlite3_value_int64(argv[0]);
      iVal = ( iVal > 0) ? 1: ( iVal < 0 ) ? -1: 0;
      sqlite3_result_int64(context, iVal);
      break;
    }
    case SQLITE_NULL: {
      sqlite3_result_null(context);
      break;
    }
    default: {
      double rVal = sqlite3_value_double(argv[0]);
      rVal = ( rVal > 0) ? 1: ( rVal < 0 ) ? -1: 0;
      sqlite3_result_double(context, rVal);
      break;
    }
  }
}

see sqlite3_result_* functions
https://sqlite.org/c3ref/result_blob.html


Use case: storing erlang term data.  To then expose internal structure
would require the generic fold over an eterm to fish out a particular
atom.


Entry: Categorizer
Date: Sat Mar 24 08:22:51 EDT 2018

I've been looking for a while to create "editable views" in Emacs.
Basically, this would create a 1-1 map between a table and an
alternative textual representation that can be edited.

One example of this would be partitioning:

The raw table:

item1,cata
item2,catb
item3,cata

Would be represented in human-readable form as:

-- cata
item1
item2
-- catb
item3

Navigating into that representation and performing edits would update
the table and re-generate the representation.

I've found that making this file-backed is convenient: this way Emacs
can be instructed to revert a buffer to a file, while other tools have
access to the file through the filesystem.

Alternatively, to do this in a buffer directly would likely mean to
create "edit differences", and require more code to implement.

The dumb solution is to write import and print scripts.
I.e. "bidirectional programming"
https://www.cis.upenn.edu/~bcpierce/papers/lenses-etapsslides.pdf
http://www.seas.upenn.edu/~harmony/


The emacs automation is then:
- given buffer's file location, find directory
- run make, passing in the notification continuation
- in continuation, revert buffer


Turn this into a generic Emacs library?


OK, works well.


Entry: Editable views
Date: Sun Apr 15 13:22:51 EDT 2018

I'd like to explore this "lenses" idea a bit more.  I like SQL for
storage -- essentially, finite functions, but in some cases it makes
sense to temporarily impose some more structure.

To do this in side a file in a certain syntactic format makes sense,
as it allows using a normal code editor to perform the edit.

I'd like to explore this for my hours DB.

What to learn?

- Kmett's lenses

- A nice pretty-printer.  Maybe stick to Racket?

- erl_tools/src/diff.erl


Entry: SQL and functional deps
Date: Thu Feb 28 22:47:48 EST 2019

a "left join" on a key in the right table is essentially a function
evaluation, where NULL is returned if there is no image.


Entry: Storing currency amounts
Date: Mon Mar 11 07:59:38 EDT 2019

Store it as integer multiple of cents.  Do this as part of the import
parser.


Entry: sqlite3 and decimal
Date: Sun Mar 17 10:12:22 EDT 2019

Probably best to use cents and integer, and apply it at import.
Problem is that migrating this is error prone, so should it be done?

Alternative is to stick to decimal.

sqlite3 supports decimal and gives it NUMERIC affinity.

https://www.sqlite.org/datatype3.html


Entry: sqlite3 and functions
Date: Wed Apr  3 09:13:40 EDT 2019

Basically, it is too limited at the single function level.  But it is
extensible, so make an extension.

It is also very different wrt. data types.  Nothing that can't be
fixed, but it is a pain to redo things.


Entry: A user interface for creating projections
Date: Wed Apr  3 10:54:12 EDT 2019

This is the main obstacle.  I still don't have a good solution for it.
Better next time.


Entry: sqlite procedures
Date: Wed Apr  3 11:22:17 EDT 2019

It doesn't seem to be able to do that.  Maybe the idea is to leave any
procedural work to the application?  Sqlite is supposed to be
embedded.


Entry: sqlite3 subqueries
Date: Wed Apr  3 18:45:51 EDT 2019

These are supported.  Apparently I just got the syntax wrong.
https://www.sqlite.org/syntax/select-stmt.html
https://www.sqlite.org/syntax/table-or-subquery.html