Getting to know (My)SQL For a brief summary, see this[1] post. [1] entry://20111111-104648 Entry: Queries Date: Fri Nov 4 00:19:08 EDT 2011 select date,ip,req from entry NATURAL JOIN ip NATURAL JOIN req where id_ip = (select (id_ip) from ip where ip = '201.39.162.161'); Entry: SQL cheat sheet Date: Fri Nov 4 00:35:12 EDT 2011 # apt-get install mysql-server # mysql CREATE database ; SELECT * from ; USE ; CREATE TABLE agent ( agent_id MEDIUMINT NOT NULL AUTO_INCREMENT, agent TEXT NOT NULL, PRIMARY KEY (agent_id) ); INSERT INTO agent (agent) values ('me'), ('my dog'); SELECT COUNT(*) FROM
; Entry: Regular expressions Date: Mon Nov 7 20:28:18 EST 2011 select * from ref where ref regexp 'ycombinator'; select count(*) from ref where ref regexp 'pic18'; [1] http://www.go4expert.com/forums/showthread.php?t=2337 Entry: Histograms Date: Mon Nov 7 22:38:51 EST 2011 Simple histograms need a "truncate" function like year, month, day below. Then the summary is a GROUP BY, optionally followed bya an ORDER BY. SELECT year(date) AS year, count(*) AS hits FROM entry GROUP BY year(date) ORDER BY year; SELECT month(date) AS month, count(*) AS hits FROM entry GROUP BY month(date) ORDER BY month; SELECT day(date) AS day, count(*) AS hits FROM entry GROUP BY day(date) ORDER BY day; [1] http://stackoverflow.com/questions/830814/sql-server-non-standard-date-based-histogram [2] http://stackoverflow.com/questions/6201992/optimal-way-to-create-a-histogram-frequency-distribution-in-oracle Entry: Known bots Date: Mon Nov 7 22:55:19 EST 2011 [1] http://www.user-agents.org/ Entry: The SQL notes Date: Fri Nov 11 10:46:48 EST 2011 I'm starting these notes as a SQL/DB beginner. I was never interested in databases or SQL until I read a chapter in CTM[1] a couple of years ago. There the relational model of programming is developed in a way that is closer linked to declarative & logic programming, a connection I never saw before but which caught my attention, being interested in language design, interpreters, compilers, ... Recently I re-read that chapter in [1] when a practical need has come up to write reports for a couple of years of Apache log files. At first, I attempted to write an ad-hoc database query engine in Scheme (Racket) but soon realized that what I wanted to do was perfectly possible in SQL. Hence the Scheme part is limited to a parser that translates a collection of log files to an .sql file that can be fed into MySQL to produce a DB schema. From there on the idea is to get to know SQL, to see what's possible and what's reasonably (fast) and possibly move back to Racket bindings for the reporting layer. [1] http://www.info.ucl.ac.be/~pvr/book.html [2] entry://../plt Entry: Regexp and apache logs Date: Sat Nov 12 10:06:09 EST 2011 I put all my logs together so I can share some information across vhosts, such as bots. Trouble is that I don't have vhost infor recorded for each entry which might complicate things a bit. Anyway, it seems that because requests need to be fully qualified, there is no real problem here. What might be the case is that I need some precomputed request sets that cache the regular expression matches for certain requests into separate tables or views. Hmm.. actually it's not. For zwizwa.be I can see: 255209 | GET /darcs/darcs/zwizwa.be/ HTTP/1.0 Entry: Views Date: Sat Nov 12 10:28:35 EST 2011 DROP VIEW v_entry; CREATE VIEW v_entry AS select * from entry NATURAL JOIN vhost NATURAL JOIN ip NATURAL JOIN stat NATURAL JOIN req NATURAL JOIN ref NATURAL JOIN agent; DROP VIEW v_blog; CREATE VIEW v_blog AS select * from req where req regexp 'GET (.*)/blog/\\?p='; Entry: Getting at wordpress blog posts Date: Sat Nov 12 11:20:38 EST 2011 This view is a subset of the req table which refer to blog posts. DROP VIEW v_blog; CREATE VIEW v_blog AS select * from req where req regexp 'GET (http://.*)*/blog/\\?p='; How to create a table from that that identifies per blog post? Entry: Dumping SQL inside mysql binary Date: Thu Feb 23 11:00:07 EST 2012 I'd like to do something that mysqldump does, but from inside the mysql binary. I have access to a remote (huge) DB table but I'd like to work on part of the table locally, downloading the data only once. Looking around online, the simplest seems to be a tab-separated value file (TSV). #!/bin/bash cat <out.tsv SELECT * FROM $TABLE where ...; EOF Entry: Expanding text field into table using regular expressions Date: Thu Feb 23 14:48:40 EST 2012 I have a table with a text field that has data in it that I want to extract into table. I.e. suppose the text field has 'AA:BB:CC' then I want to separate this out into 3 fields 'AA','BB',CC'. The way to do this is something like: SELECT fn(column) where ...; where the fn is a function that parses the string. Entry: Regexps Date: Thu Feb 23 15:39:38 EST 2012 Seems that regex_replace is quite a hassle in mysql, so let's just do it externally using sed. echo "Debug : Sender device 48859 - '357005032445536 ' - id 129CE8 - flags 0002 - data 85510100" echo "foo1" | sed -e 's/foo\(\.*\)/\1/g' Entry: Representatives Date: Thu Feb 23 23:55:52 EST 2012 I have a table of (x,y,z) where I'm mostly interested in unique (y,z) combinations. I know how to do unique single columns (group by) but does this also work with multiple columns? Entry: Query DB in bash Date: Tue Feb 28 15:06:29 EST 2012 What's the simplest way to map a bash function over a table? It turns out to be simplest to just put the function in a script and use xargs. However this makes maintaining state a bit hard. Bottom line: use a real programming language, or implement the procedure in straight SQL. Entry: Some things I learned these days Date: Wed Feb 29 09:40:07 EST 2012 Playing with MySQL logfiles for work. Note that I'm an amateur at this and some approaches might be convoluted. Basic idea: in bash it's simplest to work with TSV (tab separated value) files, iterate over them with a "while read RECORD; do ... ; done" and use the sed/awk/xargs duct tape tools. * Adding data to the database. Basically, just generate SQL from any record stream and pipe it into mysql command line client. Also useful for custom dumps (write a script that does TSV -> SQL conversion using "while read RECORD; do ... ; done". * Mirroring a database + regexp conversion. I wanted a sandbox to create more tables so didn't work in the read-only production DB. Copying databases can usually be done using "mysqldump". However I did not find a way to do a selective dump, so I use the TSV->SQL approach. * Put individual SQL queries in a .sql files and load them in the mysql command line client using the "source" command. * Use views to combine different tables. I.e. I could find 3 different kinds of data: - log files - log file annotation from constant data (i.e. ELF symbol tables) - manual annotation These are fundamentally different. The first is the original raw data, second is data derved from original raw data in a different form (ELF binaries in this case). The latter is data updated during the manual log processing, which is dynamic in nature and reflects the state of the current analysis. Entry: Database cursors Date: Wed Feb 29 10:04:32 EST 2012 So I've done queries 101 (data filtering), next is updates (for loops / database cursors). I want to run an UPDATE procedure for all the results in a SELECT. Looks like this is possible within the UPDATE clause itself: UPDATE table SET column = value, ... WHERE ... Though that does an iteration over the table itself (the iterated table is the same as the updated one). That's not what I want. I want to iterate over table1 and insert/update values in table2. for rec in table1; do if predicate(rec); then replace(table2, field, "value") end end Entry: Apache logs in database Date: Fri Mar 2 20:25:46 EST 2012 After de-noobing a bit at work this week, I'm thikning that there might be some simpler ways to do the key indirection I'm using a Racket script for. Basic problem: the logs are one big table, but the entries in the table have a lot of duplicates. Find a way to do automatic key generation for the entries: if an entry doesn't exist yet, make up a new key, otherwise reuse old. The problem is that while (I think) I have some grasp about the functional part of SQL (queries), I have no clue about the procedural ones (updates). The first thing is probably to find out how procedures work. Wait.. This can be done functionally by creating a view. ( As Gerard says: Can't you create a view? ) 1. Just add the entry. 2. Insert if not exists each column. 3. Create a view that joins the hashed/indexed entries ( Which is the *inverse* of the operation I'm using to build the indirect table in the first place! ) Then of course, this can be done simpler. 1. INSERT IF NOT EXISTS each entry into its respective table. 2. SELECT to find id 3. insert main entry Entry: MySQL procedures Date: Fri Mar 2 20:48:09 EST 2012 DELIMITER //GetAllProducts() CREATE PROCEDURE GetAllProducts() BEGIN SELECT * FROM products; END // DELIMITER ; From [1], the delimiter is necessary to be able to store multiple statements in a procedure. (weird!). So simple sequencing is simple. Next: variable binding. First thing that comes to mind: what's the difference between procedures and functions? DELIMITER // CREATE PROCEDURE addsome() BEGIN insert if not exists SELECT * FROM products; END // DELIMITER ; [1] http://www.mysqltutorial.org/getting-started-with-mysql-stored-procedures.aspx [2] http://dev.mysql.com/doc/refman/5.0/en/create-procedure.html Entry: Playing with INSERT Date: Sat Mar 3 09:15:21 EST 2012 DROP TABLE if exists foo; CREATE TABLE foo( id MEDIUMINT NOT NULL AUTO_INCREMENT, val TEXT, PRIMARY KEY (id) ); INSERT IGNORE foo (val) VALUES ("lalala"); From [1]: The logic is that the SELECT statement generates a single row of data with the required values, but only when the values are not already found. That wasn't immediately useful. Dicking around a bit I come to this: INSERT INTO foo(val) (SELECT (val) from foo WHERE NOT EXISTS (SELECT * FROM foo WHERE val = "lalala")); INSERT INTO foo(val) (SELECT "lalala" from DUAL WHERE NOT EXISTS (SELECT * FROM foo WHERE val = "lalala")); Turning this into a procedure doesn't work.. I asked Yvonne, she probably knows. [1] http://stackoverflow.com/questions/913841/mysql-conditional-insert Entry: Understanding SQL Date: Sat Mar 3 09:39:33 EST 2012 I still don't quite get it.. Especially not the single value vs table stuff. Representing tables as lists, in Haskell syntax: f : (a,b,c) -> (d,e) g : [(a,b,c)] -> [(e,e)] h : [(a,b,c)] -> (d,e) Here g = map f, where this mapping seems to happen automatically in SQL depending on context? And h is a projection from a table to a single record, something like COUNT or EXISTS. COUNT is called an aggregate function. First find of the day: SELECT without FROM returns a 1 element table: mysql> SELECT 123; +-----+ | 123 | +-----+ | 123 | +-----+ 1 row in set (0.00 sec) which is enough to pass a value to a function that expects a table. This is useful for testing out predicates, i.e.: SELECT ("lalala" IN (select (val) from foo)); SELECT (EXISTS (select (val) from foo where val = "lalala")); "SELECT ... FROM " requires a table reference. It's possible to abstract queries as tables (views). How to add a query directly? In SqlServer this requires an alias[1]. Also works in Mysql: SELECT * from (SELECT * from foo) alias1; How to save the result of a query as a table? This is SELECT ... INTO, but that's not standard. Mysql supports INSERT INTO[2] which requires an existing table. [1] http://bytes.com/topic/sql-server/answers/142990-select-subquery [2] http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-select-into-table.html Entry: MySQL regexp replace or just regexp match Date: Sat Mar 3 13:14:27 EST 2012 It seems that I need access to regexp replace, but on second thought there isn't really any "real" datamining going on (no processing on the data embedded *inside* a REFERRER or a REQUEST field: the embedded data just serves as input to predicate functions used for selecting and grouping. This seems like a very important point. It seems plausible to think that this is the reason why MySQL has matching but not replacing: don't store what can be computed. Some useful views: - google searches - specific blog page hits - Though... I have some trouble with things like this: GET /blog/?p=1089&cpage=1 HTTP/1.0 GET /blog/?p=1089 HTTP/1.0 GET /blog/?p=1089 HTTP/1.1 How to project them down to one representative? Maybe it's a good idea to parse the REQ and REF fields a bit more. Looks like I do need a regexp/replace function, but probably can do without saving tables (i.e. keeping just views). Entry: Regexp-replace Date: Sat Mar 3 16:19:58 EST 2012 sudo apt-get install libpcre3-dev libmysqlclient-dev Get library source[3], untar and ./configure && make && sudo make install Libraries have been installed in: /usr/lib/mysql/plugin [1] http://www.mysqludf.org/lib_mysqludf_preg/ [2] http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql [3] http://www.mysqludf.org/lib_mysqludf_preg/lib_mysqludf_preg-1.0.1.tar.gz Entry: External processing Date: Sat Mar 3 16:29:34 EST 2012 Considering the size of the tables, it might make sense to just split them externally using regexps. Or even to do this in the preprocessing step directly. mysql> select count(*) from req; +----------+ | count(*) | +----------+ | 354134 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from ref; +----------+ | count(*) | +----------+ | 42338 | +----------+ 1 row in set (0.00 sec) Anyways, I installed the PCRE wrapper for mysql and I find that the next step is to weed out the pletora of hack attempts ;) I found there are about 30 ways to request the same page, most of them look like wordpress breakin attempts. Entry: Nested queries Date: Sat Mar 3 20:20:25 EST 2012 So.. With the regular expression matcher it's possible to create partitions in the set of requests (i.e. projecting onto blog number). (remember from math class: a projection defines a partition!) So what I want to do now is to map each request to a blog number, and count the number of occurrences. select * from entry natural join req_blog # project request to id_blog group by id_blog order by id_blog limit 10; Queries start to take a little longer. Entry: Caching predicates / membership Date: Sat Mar 3 22:20:22 EST 2012 Products (subqueries, joins) are expensive. However in many cases the subquery is an "IN" constraint. Isn't it simpler to cache this membership function into an explicit (t, bool) table and join with that? Entry: Mysql clone database Date: Thu Jul 5 12:19:07 EDT 2012 Problem: production database is too large for efficient queries. Solution: copy schema to local database + transfer part of the data. I.e. create a local cache. Entry: Emacs MySQL Date: Thu Jul 5 12:24:54 EDT 2012 (require 'sql) (setq sql-mysql-program "/home/tom/dbb_trailer/doc/mysql/mysql.sh") (sql-mysql) Entry: Joins Date: Sat Jul 7 10:14:10 EDT 2012 Inner joins: filtered carthesian product ------------------------------------------ INNER JOIN: subset of carthesian product that matches predicate JOIN: like INNER JOIN but only equality predicates (equi join) NATURAL JOIN: (equi) JOIN with automatic column name matching CROSS JOIN: unrestricted carthesian product Outer joins: filtered product + all non-matching rows ------------------------------------------------------ LEFT OUTER JOIN RIGHT OUTER JOIN [1] http://en.wikipedia.org/wiki/Join_(SQL) Entry: SQL in OpenOffice Calc Date: Sun Mar 17 12:14:08 EDT 2013 sudo apt-get install libmysql-java Tools -> Java -> Class Path -> add: /usr/share/java/mysql-connector-java.jar [1] http://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=35708 [2] http://diwt.wordpress.com/2011/08/03/connect-to-mysql-database-from-libreoffice-base/ Entry: Function based on table Date: Sat Mar 30 11:29:32 EDT 2013 Given a currency conversion table, I want to make a function: (date, currencty) -> rate or (date, currency, amount) -> amount where e.g. the conversion table contains monthly data, so the input date can be rounded from that. After some struggle with SQL syntax, this is where I got at. The main remark is that to convert a set to a value, use "SELECT INTO". It will generate a warning when the set is not a singleton. DROP FUNCTION IF EXISTS cconv; DELIMITER | CREATE FUNCTION cconv(date_in DATE) RETURNS DECIMAL(10,6) BEGIN DECLARE rv DECIMAL(10,6) default 1; SELECT rate INTO rv FROM exchange WHERE year(date)=year(date_in) and month(date)=month(date_in) and den="EUR"; RETURN rv; END | DELIMITER ; [1] http://stackoverflow.com/questions/8073404/function-complain-about-not-allowed-to-return-a-result-set-from-a-function-whe Entry: Simplest way to print/pdf a table? Date: Sun Mar 31 20:41:39 EDT 2013 Output to CSV[1]. Use --table [2] for tabular format. [1] http://stackoverflow.com/questions/356578/how-to-output-mysql-query-results-in-csv-format [2] http://dev.mysql.com/doc/refman/4.1/en/mysql-command-options.html Entry: Mysql create database Date: Sun Apr 7 11:03:41 EDT 2013 mysql> CREATE DATABASE databasename; Query OK, 1 row affected (0.00 sec) mysql> GRANT ALL PRIVILEGES ON databasename.* TO "wordpressusername"@"hostname" -> IDENTIFIED BY "password"; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.01 sec) Entry: Mysql data import Date: Sun Apr 7 15:06:57 EDT 2013 [1] http://dev.mysql.com/doc/refman/5.0/en/load-data.html Entry: Convert arbitrary CSV file to MySQL table Date: Tue Aug 6 15:47:28 EDT 2013 # start mysql as root to create DB. mysql> CREATE DATABASE tp; mysql> GRANT ALL PRIVILEGES ON tp.* TO "tom"@"localhost" IDENTIFIED BY "password"; mysql> FLUSH PRIVILEGES; # start mysql as user mysql --local-infile -utom -ppassword tp # Create an SQL file like this: CREATE TABLE tp ( brand TEXT NOT NULL, model TEXT NOT NULL, tac TEXT NOT NULL, info TEXT NOT NULL, profile TEXT NOT NULL, date TEXT NOT NULL, ); LOAD DATA LOCAL INFILE '/home/tom/beep/SIM/tools/tp_db.csv' INTO TABLE tp FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; # And source it Entry: table as function Date: Sat Sep 21 16:18:20 EDT 2013 Is there a quick way to abstract a table with a functional dependency into a function? I guess a natural join can do most of what i'm looking for, though it would be nice to find that out.. Ha, I did this before: DROP FUNCTION IF EXISTS cconv; DELIMITER | CREATE FUNCTION cconv(date_in DATE, den_in VARCHAR(3)) RETURNS DECIMAL(10,6) DETERMINISTIC BEGIN DECLARE rv DECIMAL(10,6) default 1; IF NOT den_in = 'USD' THEN SELECT rate INTO rv FROM exchange WHERE year(date)=year(date_in) and month(date)=month(date_in) and den=den_in; END IF; RETURN rv; END | DELIMITER ; For the apachelogs id<->text maps it is: DROP FUNCTION IF EXISTS ip; DELIMITER | CREATE FUNCTION ip(id MEDIUMINT) RETURNS TEXT DETERMINISTIC BEGIN DECLARE rv TEXT default ''; SELECT ip INTO rv FROM ip WHERE ip.id_ip = id; RETURN rv; END | DELIMITER ; Entry: Fresh mysql install Date: Mon Dec 23 10:32:42 EST 2013 Some questions that keep coming back: - Where are users stored? - How to create a database associated to a user See Apr 7 2013 post. Entry: Mysql root password Date: Mon Dec 23 10:37:41 EST 2013 I have a debian mysq install but forgot the root password. - how to fix, see [1]. - why is it important to have separate root pw auth? In [1] it is mentioned this is insecure. Why? So I don't understand why it is insecure, but let's just go with the flow here and follow directions. As a matter of convention, just create a root-readable/executable file /root/mysql containing the password. In [1] it says to use these commands in the init file specified as mysqld_safe --init-file=.... This did not work for me. UPDATE mysql.user SET Password=PASSWORD('MyNewPass') WHERE User='root'; FLUSH PRIVILEGES; Using this instead: mysqld --skip-grant-tables [1] http://dev.mysql.com/doc/refman/5.0/en/resetting-permissions.html Entry: Binning Date: Tue Dec 24 17:44:20 EST 2013 Basic idea: create a table with a binning relation (recur) and fish out the NULL references from a left outer join like this: select flow.txinfo,txclass from (flow left outer join recur on flow.txinfo = recur.txinfo) group by txinfo order by txclass ; Entry: Naming intermediates Date: Thu Dec 26 21:23:57 EST 2013 Is there a logic to SQL syntax? Entry: Splitting regexp -> view? Date: Fri Dec 27 13:14:23 EST 2013 Many real world tables contain "fields within fields" that make more sense when they are separated out. Otherwise queries involve many regular expressions. What is bettter: to leave the somewhat inconsistent relations burried inside the fields, or to separate them out into separate fields. Maybe the real question is: is it possible to JOIN on a regular expression relation? And if the answer is yes, is it too inefficient? I see no reason why it would not be possible. Joining should work on any predicate. If so then it might not be necessary to do a lot of splitting (= premature optimization?). Is it possible to store regexes in tables? Entry: Annotation joins Date: Mon Mar 31 08:59:36 EDT 2014 - JOIN: only matches - LEFT OUTER JOIN: matches + non-matches, keep left if the join is performed on a 1-1/0 map, the number of rows of the result is the same as the left table. Entry: Editable views Date: Mon Mar 31 09:20:59 EDT 2014 Mysql supports editable views. Does this work together with mysql workbench? Goal is to create a mapping table. Create a 2-column table and fill the primary keys from another table. The simplest way to do this seems to be to just copy a table? INSERT admin.balance SELECT id,NULL FROM admin.flow; Then create an editable view: select expense_id, trans_id from admin.flow left join admin.balance on admin.flow.id = admin.balance.trans_id; Hmmm... Mysql workbench doesn't let me edit the view. I think the editable (edible?) views idea is a good one, so how to bring it to completion? What I really want is: - flow table + extra field - drop box that gives a selection of possible matches Entry: Evidence based approach Date: Mon Mar 31 09:58:37 EDT 2014 Start with the ebay expenses table, then iteratively find evidence in the flow table. Evidence is is the minimal match of amount + date proximity. Hmm... going crazy again. The simplest approach seems to be to add a human-readable annotation to each transaction id. Something that is enough to cross-ref with other tables. Don't bother with id? So let's just use temporary tables. Create a copy of a usable view and edit it. Then copy into mapping table. Entry: 2013 taxes - what did we learn? Date: Tue Apr 1 21:59:20 EDT 2014 - mysql workbench is a nice tool for low-level editing - for most things though, you want a custom editor: i made a "row connect" tool using python pyside qt sql tableview - didn't figure out how to make html table forms using racket. harder than it seems.. - don't edit generated / converted data. it's still a pain, but much better to put manual entries in a separate table and join it into a view. - when making complex projections, also check complements as: select * from tab where key not in (select key from tab_proj); - writing reports is a pain: take some time for it! - still didn't figure out a good way to print a simple table to a pdf document. main problem seems to be setting the column widths when using wrap-around text. should be easy to compute by allowing a "max lines" wraparound. Entry: Setting up mysql with SSL access Date: Tue May 20 14:17:11 EDT 2014 I'm doing this from a fresh debian install. - Make sure to config root user with a password (autogen, long) - in /etc/mysql/keys: # ca certificate openssl genrsa 2048 >ca-key.pem openssl req -new -x509 -nodes -days 3600 -key ca-key.pem >ca-cert.pem # server key openssl req -newkey rsa:2048 -days 3600 -nodes -keyout server-key.pem >server-req.pem openssl x509 -req -in server-req.pem -days 3600 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 >server-cert.pem # client key openssl req -newkey rsa:2048 -days 3600 -nodes -keyout client-key.pem >client-req.pem openssl x509 -req -in client-req.pem -days 3600 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 >client-cert.pem chmod 400 *.pem - in mysql command line: GRANT ALL PRIVILEGES ON *.* TO 'tom'@'%' IDENTIFIED BY '' REQUIRE SSL; # then connect: mysql -ussluser -pssluser -P3430 --ssl-key= [1] http://askubuntu.com/questions/194074/enabling-ssl-in-mysql [2] http://www.chriscalender.com/?p=448 [3] http://stackoverflow.com/questions/5016505/mysql-grant-all-privileges-on-database Entry: Can a table refer to a table? Date: Wed May 21 10:13:11 EDT 2014 It seems that in SQL this doesn't make sense, as this would require polymorphy. Seems this gets into ORM territory[1]. Basically, my application understands subclasses, but how to I stick them into the DB? Essentially I want a table that refers to different tables (types / classes), each with a set of rows (data/object instances). Maybe the main question is: how to stay relational and avoid ORM? [1] http://en.wikipedia.org/wiki/Object-relational_mapping [2] http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch Entry: Editable queries Date: Wed May 21 12:05:37 EDT 2014 As long as each identifier is unique and keys are present. See here[1] around 6min. [1] https://www.youtube.com/watch?v=X_umYKqKaF0 Entry: Data Models in MySQL Workbench Date: Wed May 21 14:03:05 EDT 2014 I can think in terms of relations, functions, ... A collection of tables and views with keys is quite clear to me. So what does the "Data Model" in workbench add to all this? [1] http://dev.mysql.com/doc/workbench/en/wb-creating-eer-diagram.html Entry: Foreign keys Date: Wed May 21 14:05:46 EDT 2014 [1] http://dev.mysql.com/doc/refman/5.6/en/create-table-foreign-keys.html [2] http://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_referential_integrity Entry: Parameterize over tables? Date: Sat May 24 11:32:14 EDT 2014 Is it possible in MySQL to parameterize something over a table? I.e. have a table refer to another table in a field, then use that name to perform a (generic) query in the table? This is a bit like duck typing inheritance. Is it possible? Is it avoidable? It seems possible using EXEC[1] or EXECUTE[2] in MySQL. And probably opens up a can of worms... [1] http://www.techrepublic.com/forums/questions/doing-an-indirect-sql-select/#. [2] http://dev.mysql.com/doc/refman/5.1/en/execute.html Entry: CHAR vs VARCHAR: speed? Date: Sat May 24 12:28:29 EDT 2014 Rule of thumb seems to be that speed wrt. key lookup/indexing isn't much of an issue. Storage impact is much larger. [1] http://dba.stackexchange.com/questions/2640/what-is-the-performance-impact-of-using-char-vs-varchar-on-a-fixed-size-field Entry: Editable views Date: Mon May 26 12:51:35 EDT 2014 See also view algorithms[2]. [1] http://dev.mysql.com/doc/refman/5.0/en/view-updatability.html [2] http://dev.mysql.com/doc/refman/5.6/en/view-algorithms.html Entry: Normal forms Date: Tue May 27 10:18:49 EDT 2014 1NF: - all columns contain atomic values - values of each attribute contain only a single value Meaning: don't aggregate what should be separate and don't duplicate. Solve these issues by introducing new tables. 2NF: - no non-prime attribute is dependent on any proper subset of any candidate key of the table Meaning: keys should be minimal, i.e. non-redundant. 3NF: - all the attributes in a table are dependent on the primary key and only the primary key. Meaning: every non-key is a proper function of the primary key. [1] http://en.wikipedia.org/wiki/First_normal_form [2] http://en.wikipedia.org/wiki/Second_normal_form [3] http://en.wikipedia.org/wiki/Third_normal_form Entry: Indirection through eval Date: Thu May 29 15:11:16 EDT 2014 So I still want to find a way to deal with type abstraction in a relational database. There has to be a clever way... Problem: I have a bunch of tables that contain "dynamically typed" where a 'type' column contains a symbolic foreign type key referencing a table of types, and a value which is just a string. Entry: Outer product Date: Wed Jun 4 15:01:13 CDT 2014 How to get to an outer product table from a collection of (axis,coordinate) pairs? Didn't find how to do this. Seems to be a "higher order" union: the number of union operators depends on the data. I don't think SQL can do this.. I solved it in Python for now. Entry: Embedding nested structures Date: Thu Jun 12 09:46:36 EDT 2014 I need to embed a nested C structure in a database, i.e. record the byte offsets. This seems to be similar to the previous problem of outer product in that it is higher order: number of nestings depends on the content. However in practice in my problem there are only two levels. So what is a structure? An invertible map from names to byte offsets. I guess the key part here is the invertible nature: it should not be lost when introducting the nesting operation. Looks like this is the same trick as I've been using before: abstract it as functions first. See next post. Entry: Encoding structure in databases Date: Thu Jun 12 10:01:06 EDT 2014 It's hard to encode data structures into a data base once there is some hierarchy involved. There's the field of ORM which I don't know much about and frankly find a bit smelly, and then there's the idea that database tables are actually just finite functions. This approach seems to work well: 1. Encode the structure as a composition of pure functions 2. Implement functions using tables with primary keys Entry: Set difference Date: Wed Sep 10 17:10:15 CEST 2014 Time to build some more intution... Is set difference an expensive operation? Look at EXCEPT and INTERSECT. Entry: growing sqlite3 Date: Fri Sep 12 18:56:37 CEST 2014 Changing the structure of a DB without rebuilding is new to me! Some notes. I really start to like SQL, but the syntax is hard to remember. DELETE FROM storage where host = 'zoo'; sqlite has a limited form of 'ALTER TABLE', however adding a column is and setting a default supported: ALTER TABLE storage ADD mtime INTEGER; UPDATE storage SET mtime = 0; I'm not sure if it's possible to change the key structure. Entry: Partitions Date: Fri Sep 12 19:02:14 CEST 2014 select mtime>0,count(*) from storage group by mtime>0; Entry: Database workstation Date: Thu Sep 25 11:24:23 CEST 2014 Starting to like using sqlite to quickly whip up a database to do certain simple queries. One problem is microcontroller pin assignment tracking. Put all data in *.csv files, make one schema.sql file that has: CREATE TABLE conn { ... } .separator "," .import conn.csv conn Then create some reports: CREATE VIEW .... And run it as: sqlite3 --init schema.sql :memory: No .sqlite file required. Entry: Taxes Date: Sat Apr 4 12:01:55 EDT 2015 Two operations keep coming back: - Data conversion for expense records (from anything to table) - Classification into tax categories For classification I was thinking about building a gui. I did make something last year to match up records. But really, it seems easier to use emacs or a spreadsheet to edit the csv files. The problem is that these are one-time things: it is actually more productive to do them mostly manually, and automate or facilitate only the repetitive parts. Entry: Path of least resistance: SQL + React Date: Sat Sep 5 19:05:15 CEST 2015 Might be able to do this through Erlang. Essentially what I need is display of tables + annotation -> commit to different table. Entry: Low level database edits: the problem Date: Tue Sep 8 15:59:34 CEST 2015 I know what I want: editable views/queries. Basically, this means that the presentation is separated from the view. Most of the edits I do are either: - adding an annotaiton column (as a separate table to main table + "info join") - changing a column Solutions: - Present an ODB interface to libreoffice - Web interface on top of react - Through export/inmport of CSV + libreoffice edits - Emacs table editing? Entry: finite functions Date: Mon Mar 28 12:47:38 EDT 2016 Since functional modeling (keys) is such a big deal in databases, why is there no syntax to turn a table into a function? E.g. select tab1(x) from tab2 Where the function tab1 corresponds to: x -> select val from tab1 where key=x; Maybe because it is hard to optimize? You likely don't want to write the function such that it does a single sub-select for each value. Likely you want it as a join, so the query engine can do its magic. What would be nice maybe is to have some syntactic sugar to turn functional references like that into joins... Or to just learn to live with the limitation? It is nice as an abstraction anyway. Entry: Normalization Date: Mon Mar 28 12:56:20 EDT 2016 1NF: - Eliminate repeating groups in individual tables. - Create a separate table for each set of related data. - Identify each set of related data with a primary key 2NF: - 1NF + every non-prime attribute of the table is dependent on the whole of every candidate key. 3NF: - 2NF + all the attributes in a table are determined only by the candidate keys of that table and not by any non-prime attributes. BCNF: (similar to 3NF) - redundancy based on functional dependency has been removed, although other types of redundancy may still exist. 4NF: - BCNF + eliminate multivalued dependency 5NF: - every non-trivial join dependency in it is implied by the candidate keys. 6NF: - no nontrivial join dependencies at all, or alternatively: - 6NF + every JD [Join Dependency] of R is trivial DKNF: - database contains no constraints other than domain constraints and key constraints. https://en.wikipedia.org/wiki/Database_normalization https://en.wikipedia.org/wiki/First_normal_form https://en.wikipedia.org/wiki/Second_normal_form https://en.wikipedia.org/wiki/Third_normal_form https://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form https://en.wikipedia.org/wiki/Fourth_normal_form https://en.wikipedia.org/wiki/Fifth_normal_form https://en.wikipedia.org/wiki/Sixth_normal_form https://en.wikipedia.org/wiki/Domain-key_normal_form Entry: sqlite creates table when importing a csv file, using first line as table names Date: Wed Mar 30 11:20:50 EDT 2016 cool feature for quick&dirty ram-based scripting work, but confusing! best to create schema separately. Entry: index Date: Thu Aug 10 00:19:22 EDT 2017 I think i just now understand how an index works: create an ordered list that can be binary-searched, mapping to some other thing that can be binary-searched in the original table. https://www.youtube.com/watch?v=gpxnbly9bz4 Entry: SQLite virtual tables Date: Wed Feb 28 11:27:39 CET 2018 https://sqlite.org/vtab.html Maybe it is possible to flatten data structures? E.g. using a built-in function with multiple returns? https://sqlite.org/c3ref/create_function.html Example: http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html void msignFunc(sqlite3_context *context, int argc, sqlite3_value **argv){ assert( argc==1 ); switch( sqlite3_value_type(argv[0]) ){ case SQLITE_INTEGER: { long long int iVal = sqlite3_value_int64(argv[0]); iVal = ( iVal > 0) ? 1: ( iVal < 0 ) ? -1: 0; sqlite3_result_int64(context, iVal); break; } case SQLITE_NULL: { sqlite3_result_null(context); break; } default: { double rVal = sqlite3_value_double(argv[0]); rVal = ( rVal > 0) ? 1: ( rVal < 0 ) ? -1: 0; sqlite3_result_double(context, rVal); break; } } } see sqlite3_result_* functions https://sqlite.org/c3ref/result_blob.html Use case: storing erlang term data. To then expose internal structure would require the generic fold over an eterm to fish out a particular atom. Entry: Categorizer Date: Sat Mar 24 08:22:51 EDT 2018 I've been looking for a while to create "editable views" in Emacs. Basically, this would create a 1-1 map between a table and an alternative textual representation that can be edited. One example of this would be partitioning: The raw table: item1,cata item2,catb item3,cata Would be represented in human-readable form as: -- cata item1 item2 -- catb item3 Navigating into that representation and performing edits would update the table and re-generate the representation. I've found that making this file-backed is convenient: this way Emacs can be instructed to revert a buffer to a file, while other tools have access to the file through the filesystem. Alternatively, to do this in a buffer directly would likely mean to create "edit differences", and require more code to implement. The dumb solution is to write import and print scripts. I.e. "bidirectional programming" https://www.cis.upenn.edu/~bcpierce/papers/lenses-etapsslides.pdf http://www.seas.upenn.edu/~harmony/ The emacs automation is then: - given buffer's file location, find directory - run make, passing in the notification continuation - in continuation, revert buffer Turn this into a generic Emacs library? OK, works well. Entry: Editable views Date: Sun Apr 15 13:22:51 EDT 2018 I'd like to explore this "lenses" idea a bit more. I like SQL for storage -- essentially, finite functions, but in some cases it makes sense to temporarily impose some more structure. To do this in side a file in a certain syntactic format makes sense, as it allows using a normal code editor to perform the edit. I'd like to explore this for my hours DB. What to learn? - Kmett's lenses - A nice pretty-printer. Maybe stick to Racket? - erl_tools/src/diff.erl Entry: SQL and functional deps Date: Thu Feb 28 22:47:48 EST 2019 a "left join" on a key in the right table is essentially a function evaluation, where NULL is returned if there is no image. Entry: Storing currency amounts Date: Mon Mar 11 07:59:38 EDT 2019 Store it as integer multiple of cents. Do this as part of the import parser. Entry: sqlite3 and decimal Date: Sun Mar 17 10:12:22 EDT 2019 Probably best to use cents and integer, and apply it at import. Problem is that migrating this is error prone, so should it be done? Alternative is to stick to decimal. sqlite3 supports decimal and gives it NUMERIC affinity. https://www.sqlite.org/datatype3.html Entry: sqlite3 and functions Date: Wed Apr 3 09:13:40 EDT 2019 Basically, it is too limited at the single function level. But it is extensible, so make an extension. It is also very different wrt. data types. Nothing that can't be fixed, but it is a pain to redo things. Entry: A user interface for creating projections Date: Wed Apr 3 10:54:12 EDT 2019 This is the main obstacle. I still don't have a good solution for it. Better next time. Entry: sqlite procedures Date: Wed Apr 3 11:22:17 EDT 2019 It doesn't seem to be able to do that. Maybe the idea is to leave any procedural work to the application? Sqlite is supposed to be embedded. Entry: sqlite3 subqueries Date: Wed Apr 3 18:45:51 EDT 2019 These are supported. Apparently I just got the syntax wrong. https://www.sqlite.org/syntax/select-stmt.html https://www.sqlite.org/syntax/table-or-subquery.html