Learning Cypher: Neo4j 2.1.2: performance improvements using indexes with WHERE IN

About the book

Learning Cypher is a practical, hands-on guide to designing, implementing, and querying a Neo4j database quickly and painlessly. Through a number of practical examples, book uncovers all the behaviors that will helps taking advantage of Neo4j effectively, with tips and tricks to help along the way. The book starts with the basic clauses and patterns to perform read-only queries with Cypher. With this book one can master the clauses required to modify a graph. Once the basics are understood properly, then one can learn about tools and practices to improve the performance of queries and how to migrate a database to Neo4j from the ground up. To finish off, the book covers Cypher operators and functions in detail.

This book is useful for anyone who wants to learn how to create, query, and maintain a graph database, or want to migrate to a graph database from SQL.

Friday, June 13, 2014

Neo4j 2.1.2: performance improvements using indexes with WHERE IN

Good news for Cypher users. In the previous version of Cypher, a performance problem came from the fact that the IN clause didn't use indexes.

Consider the following query, elaborated from a similar query in Chapter 4 of the book:

MATCH(n:User) WHERE n.email IN {emailQuery} RETURN n.userId

Profiling the query in Neo4j 2.0 you get the following plan:

  "plan": {
    "name": "ColumnFilter",
    "rows": 1,
    "dbHits": 0,
    "children": [
      {
        "name": "Extract",
        "rows": 1,
        "dbHits": 1,
        "children": [
          {
            "name": "Filter",
            "args": {
              "pred": "any(-_-INNER-_- in {emailQuery} where Property(n,email(5)) == -_-INNER-_-)",
              "_rows": 1,
              "_db_hits": 1002
            },
            "rows": 1,
            "dbHits": 1002,
            "children": [
              {
                "name": "NodeByLabel",
                "rows": 1002,
                "dbHits": 0,
                "children": []
              }
            ]
          }
        ]
      }
    ]
  }

Using Neo4j 2.1.2, instead, you get the following plan:

  "plan": {
    "name": "ColumnFilter",
    "args": {
      "ColumnsLeft": "keep columns n.userId",
      "Rows": "Rows(1)",
      "DbHits": "DbHits(0)"
    },
    "rows": 1,
    "dbHits": 0,
    "children": [
      {
        "name": "Extract",
        "rows": 1,
        "dbHits": 2,
        "children": [
          {
            "name": "SchemaIndex",
            "args": {
              "DbHits": "DbHits(2)",
              "Rows": "Rows(1)",
              "LegacyExpression": "{emailQuery}",
              "IntroducedIdentifier": "IntroducedIdentifier(n)",
              "Index": ":User(email)"
            },
            "rows": 1,
            "dbHits": 2,
            "children": []
          }
        ]
      },
      {
        "name": "SchemaIndex",
        "args": {
          "DbHits": "DbHits(2)",
          "Rows": "Rows(1)",
          "LegacyExpression": "{emailQuery}",
          "IntroducedIdentifier": "IntroducedIdentifier(n)",
          "Index": ":User(email)"
        },
        "rows": 1,
        "dbHits": 2,
        "children": []
      }
    ]
  }

Clearly, the number of DB hits is much lower.

Learning Cypher

About the book

About the book

Friday, June 13, 2014

Neo4j 2.1.2: performance improvements using indexes with WHERE IN

No comments:

Post a Comment